To meet the challenges of 21st century, Punjabi University has anticipated that without technical development, no language can develop and survive. Keeping this in view, Punjabi University, which has one of its primary goals of development of Punjabi language established an Advanced Centre for Technical Development of Punjabi Language, Literature & Culture on Mother Tongue’s day 21st Feb 2004. With the establishment of this centre by the Vice-Chancellor S. Swaran Singh Boparai, Punjabi University, became the first University in the world to set up such a Centre.
The Centre is led by Dr. Gurpreet Singh Lehal, Professor in Department of Computer Science and Engineering. Dr. Lehal is the pioneer in software development in Punjabi and has to his credit the first Gurmukhi Optical Character Recognition System, Punjabi Word Processor, Punjabi Spell Checker and Gurmukhi-Shahmukhi transliteration software to name a few.
In a short period of its establishment the centre has already made some remarkable achievements and many path breaking technologies have been developed by the staff of the centre and research students, which have simplified usage of Punjabi on computers and helped in bringing together the Punjabi community by breaking the script and language barriers. All of these resources are provided online for free usage at the Centre’s website (www.learnpunjabi.org). A brief account of these online resources is below:
1. Gurmukhi-Shahmukhi Transliteration
A unique feature of Punjabi is that it is written in two mutually incomprehensible scripts. In India Punjabi language is written in Gurmukhi script, while in Pakistan it is written in Shahmukhi (Urdu) script. This has created a script wedge as majority of Punjabi speaking people in Pakistan cannot read Gurmukhi script, and similarly the majority of Punjabi speaking people in India cannot comprehend Shahmukhi script. The first and foremost achievement of the centre has been development of high accuracy software, which enables the user to convert Gurmukhi script into Shahmukhi script and Shahmukhi script to Gurmukhi with just a click of a mouse. The transliteration software transcends the script barriers and brings close the Eastern & Western Punjabs including the Punjabi speakers scattered around the world. A medium sized book written in Gurmukhi script can be converted to Shahmukhi script with a high word accuracy of more than 98% in a few minutes, which manually could take weeks. Similarly the text in Shahmukhi can be converted to Gurmukhi in matter of seconds, resulting in a great saving of time and manual efforts. The software can even transliterate complete websites from one script to another. As an example, the website www.wichaar.com, which is in Shahmukhi can be completely converted and read in Gurmukhi using this software.
In addition to above, software have also been developed that can transliterate any Gurmukhi document to Roman or Devnagri scripts and vice versa. The centre has also developed a high accuracy Urdu/Hindi transliteration system, which can convert any Hindi website to Urdu and any Urdu website to Hindi. All these software are available for free use on the centre’s website.
2. Punjabi Teaching
Visualizing that the younger generation particularly among the Punjabi Diaspora are being detached from their mother tongue and common heritage, the centre has provided an online teaching program of Punjabi language. The program which has been developed with latest multimedia has following modules:-
· Punjabi alphabet formation : The animation of all the Gurmukhi alphabets along with the sounds, tones, pictures and description has been provided.
· Punjabi word formation : The common Punjabi words along with their animated formation, pictures and sounds have been developed.
· Interactive quizzes : Interactive quizzes designed to test the users knowledge of Punjabi alphabet have been developed.
· Multimedia content : Games like crossword puzzles, hanging man, recognising a word from its pronunciation, tongue twisters, folk tales, rhymes and talking stories that make learning easy and interesting are nicely presented.
· Pictorial vocabulary : A pictorial vocabulary of common Punjabi forms classifying various immediate environments like body parts, building, birds, animals, numbers etc. is also provided.
· Punjabi grammar : Eight lessons explaining about Punjabi grammar followed by multiple choice quiz have been developed.
Besides the above modules, more advanced courses are being added to the site including 21 video lectures by eminent Punjabi scholars. An interactive audio Punjabi teaching course divided into 20 chapters is also being developed, incorporating all the basic elements required to learn the language.
3. Legacy to Unicode Font Convertor
More than 500 ASCII based legacy fonts are currently available for Punjabi. The availability of too many fonts makes text recognition difficult as each font makes use of different keyboard mapping. With Unicode being the accepted standard for processing Punjabi across computers of all varying hardware and software, it becomes necessary to convert the Punjabi text encoded in legacy fonts to Unicode. For the first time, an intelligent legacy to Unicode font convertor has been developed, which uses statistical techniques to automatically detect font of any Punjabi text and convert the text to Unicode. This utility is also very useful to convert any PDF file to Unicode or to convert text encoded in any unknown font to Unicode. The font convertor has also been integrated with other utilities, such as Gurmukhi-Shahmukhi transliteration utility, which makes it easier for them to process Punjabi text encoded in any of the popular legacy fonts. This is first time any such utility has been developed for any Indian language.
4. Punjabi Typing Tool
Punjabi typing, particularly for inputting Unicode text, is much more complex as compared to English typing. For this purpose a Gurmukhi Unicode typing pad to simplify Unicode based Punjabi typing has been developed. While using this Gurmukhi Unicode pad the user can type and view Phonetic, Remington or on-screen keyboard as per his wish. This pad also provides an easy interface to convert existing text files of popular Gurmukhi fonts like Asees, Satluj and Anmol Lipi into Unicode. Not only this, Gurmukhi typing pad has the capability to send e-mails in Shahmukhi text by using powerful inbuilt transliteration module.
5. Online Dictionaries
Dictionaries are basic resources for any language. The centre has made available following dictionaries for free online usage:
English-Punjabi Topic Dictionary : A pictorial topic dictionary for Punjabi learners organized into more than eighty categories such as adjectives, nouns, food, fruits, animals, months etc. has been developed. The dictionary has around 3000 entries. Each word has an associated picture, Roman transliteration and English equivalent along with the pronunciation in Punjabi.
Multi media Punjabi-English Dictionary : The Punjabi-English paper dictionary developed by Punjabi University Patiala has been converted into electronic form. The Gurmukhi words are presented along with their English meanings, equivalent Shahmukhi transliteration and pronunciation. Besides Gurmukhi, the user can also search for Shahmukhi and English words. For easy and flexible searching, fuzzy tools have been provided.
6. Grammar Checker
The first and only Grammar Checker for Punjabi has been developed at the Centre by Dr Mandeep Singh Gill under the guidance of Dr Gurpreet Singh Lehal. The grammar checker is available online on the Centre’s website for checking the grammatical errors of simple Punjabi sentences and is useful for Punjabi learners.
7. Parts of Speech Tagging In Punjabi
Parts of speech tagging scheme tags a word with its parts of speech in a sentence. This is a necessary module in development of Natural Language Processing software such as machine translation, grammar checking etc. A rule based part of speech tagger for Punjabi has been developed at the centre and available for free online use. The parser is being enhanced by addition of a HMM based statistical POS tagger.
8. Morphological Analyzer
A Morphological analyzer is an essential and basic tool for building any language processing application for a natural language. The Morphological analyzer gives the morph analysis of a word i.e. for a given word a morphological analyzer will return its root word and word class along with other grammatical information depending upon its word class. A Punjabi Morphological Analyzer has been developed at the centre for online usage. The Morphological Analyzer is also being used by PARAS(Punjabi Assistant for Reading and Speaking) to display the meaning of any Punjabi word. It is worth mentioning that PARAS is another landmark software being developed at the centre. The software takes as input Punjabi text or Punjabi website and makes available meaning of each and every word in the text or website. One just moves mouse over any word and its meaning will be displayed in text box. There is no need to type the word for dictionary lookup and one can also see meanings of all possible inflections of a word, which may not be even present in the dictionary by just moving mouse over the word.
9. Machine Translation
The centre has also been working for development of machine translation systems and two high quality systems for translation of Hindi text to Punjabi and Punjabi text to Hindi have already been developed. These systems are freely available at the centre’s website.
10. Search Engine
A Google based search engine for Punjabi has been developed by the centre. The search engine provides the user the facility of easy typing of Gurmukhi search terms in different layouts and the queries can be searched in Gurmukhi, as well as in Hindi and Shahmukhi(Urdu) documents. Fuzzy and flexible search options are also provided by searching for mis-spelling and synonyms.
11. Optical Character Recognition (OCR)
An OCR is a software which enables a computer to translate character images into editable text. The advantage of OCR is that one can enter the printed document into the computer without retyping it, the OCR will automatically read and convert the document. For the first time, a high accuracy multi-font Gurmukhi OCR has been developed at the centre which can recognize Gurmukhi text printed in any of the common Gurmukhi fonts with more than 97% recognition accuracy at character level. Over the period, the OCR has been enhanced to recognize both Gurmukhi and English text in same document, making it first bilingual OCR for any Indian language. The OCR is available both in online and offline mode. The OCR can run in batch mode too, which is very useful for converting PDF documents containing multiple pages into Unicode documents in a single run.
The centre is also currently developing the first OCR for Shahmukhi script, which will greatly ease digitization of Shahmukhi texts. This will result in huge saving of time and money for conversion of Gurmukhi and Shahmukhi documents from paper format to digital form.
12. Punjabi Text Summarization
The first automatic Punjabi summarization system has also been developed at the centre. The software can summarize any Punjabi document by reducing it upto one-tenth of its original size while retaining the most important points of the original document. The software is available for free use at the centre’s website.
13. Speech Technology
Research for development of a Punjabi Text-to-speech synthesis (TTS) system, which can speak out Punjabi text, has been carried out at the Centre. A preliminary syllable based TTS system has already been developed, but the system is further being refined to improve the output quality before it will be made available online for free use.
In addition as discussed above, the Centre is also developing PARAS (Punjabi Assistant for Reading and Speaking), which can clearly pronounce any word from a text or a website.