URDU on the MAC

by

Kamal Abdali



The Mac OS X is capable of editing and word processing in Urdu. In a few simple steps, you can enable your Mac to handle Urdu documents.

This page could as well have been entitled "Urdu and Persian on the Mac", because the information given here can also be used to compose Persian (Farsi) documents. The keyboard, fonts, and explanations below apply equally to Persian, but the explanations are illustrated with Urdu words. The keyboard and fonts also suffice for Punjabi (Pakistani-style, written in the Arabic script). But the keyboard does not have all the letters of Sindhi and Pushto alphabets.

Please email your enquiries, comments, criticisms, and suggestions to me at k.abdali@acm.org


Installing Urdu keyboard layout

[Note that we are talking about an Urdu keyboard layout, not an Urdu keyboard. We will use this layout in order to type Urdu characters using the standard English keyboard that came with your Mac.]

  1. Click here to download the zip archive file UrduPhonetic.zip.
  2. Double click on this zip archive to extract two files from it: the keyboard layout file UrduPhonetic.keylayout and the icon file UrduPhonetic.icns.
  3. Move both these files to the folder /Library/Keyboard Layouts.
  4. Log out, then log in again. This will let the system install the Urdu keyboard.

Activating Urdu Input

  1. Pull down the Apple Menu in the menu bar (the apple icon on the top left of the screen), select System Preferences, then select International.
  2. Click on the Input Menu tab.
  3. A long list will appear, with the headings On, Name, Input Method, and Script. Scroll the list down to find the item with the name UrduPhonetic (with Input Method Keyboard and Script Unicode). Click the box to its left so it shows a check sign in it.
  4. Scroll down all the way. At the bottom of the list, there is an item Show input menu in menu bar. Click the box to its left so it shows a check sign in it.
  5. Scroll up all the way. Towards the top of the list, you will see two items, one named Character Palette and another named Keyboard Viewer. Click the boxes on their left so each shows a check sign.
  6. Close International.


System Preference
International


Keyboard Viewer


If a US flag was not previously visible at the top right of the screen (on the menu bar), it should now be displayed. This is the Keyboard menu. If you click on this flag, a menu will appear underneath it with various icons and names representing all the active keyboards. A keyboard named UrduPhonetic and an icon (UrduPhonetic Keyboard Icon) UrduPhonetic Keyboard Icon somewhat like the flag of Pakistan should now appear. If you click on it, then the keyboard icon on the top right of the screen will turn into the Pakistani flag, and any keys pressed on the Mac keyboard will produce Urdu characters. You can switch between keyboards by clicking on the flags (the keyboard icons).


Installing Urdu Fonts

On a Mac, it is best to use Naskh fonts (which are typically used in Sindhi, Arabic and Persian publications), not the Nastaleeq font (which is used in Pakistani newspapers). Although Nastaleeq fonts are available for the Mac, only some commercial word processors can make use of them to compose Urdu documents.

Mac OS comes with only one font, Geeza Pro, which can be used for Urdu, but the characters do not look particularly nice. Two free fonts for preparing pleasant-looking Urdu documents can be downloaded from SIL International.

  1. Visit the Arabic Script Unicode Fonts page on the SIL International Web site by clicking here.
  2. Scroll down to the Freeware License section.
  3. Download the two fonts Scheherazade Regular (AAT) and Lateef Regular (AAT). In addition to the AAT versions, also available are OpenType (OT) versions. Some software applications use these. If you will be only using TextEdit, as explained below, then you do not need OT fonts.
  4. The downloaded files have the suffix .zip. Click on each to unzip them, getting new files with the suffix .ttf.
  5. Move the two ttf files to /Library/Fonts.


SIL Fonts

Your Mac is now ready for handling documents in Urdu.


The UrduPhonetic Keyboard

This keyboard has been designed to closely resemble the phonetic keyboard of InPage, a popular commercial desktop publishing application for Urdu that runs under Windows.

The advantage of a phonetic keyboard is that you easily remember most of the keys (e.g., the key "b" for the Urdu letter "bay", "p" for "pay", "k" for "kaaf", "g" for "gaaf", etc.). As we do not have enough keys on the standard computer keyboards to assign to all Urdu characters, we need to use shifted keys for some (e.g., "shift-k" for "khay", "shift-g" for "ghain", etc.)

The Keyboard menu has an item Show Keyboard Viewer. If you select this item, then the system will display a small picture of the keyboard on the screen. In this picture you can see what character each key corresponds to. The characters will change appropriately if you press the shift key (or another modifier key) or select a different keyboard from the Keyboard menu.

The keyboard viewer picture is quite small, and the characters on the keys are hard to see. So below are larger pictures of the UrduPhonetic Keyboard showing the characters corresponding to shifted and unshifted keys. (If you like, you can save these pictures by right-clicking or CTRL-clicking on the pictures and selecting the Save Image As... item from the menu that pops up. You can then print the keyboard pictures for reference.)


UrduPhonetic Keyboard with unshifted keys

SIL Fonts


UrduPhonetic Keyboard with shifted keys

SIL Fonts


Even though we have tried to make the keyboard layout as phonetic as possible, the mismatch between the Urdu alphabet and the available keys on a Western keyboard has forced us to make some unintuitive mapping between letters and keys. But with a little practice you should be able to type most letters from memory.

The above two pictures contain all the information that there is to give about the UrduPhonetic key assignments. But for quick reference here are two tables of some useful key bindings.


Keys for similarly sounding letters

Similar sounding letters


Some unobvious key bindings

Uncommon bindings


Preparing Urdu Documents

EMAIL MESSAGES

An email message is usually a very simple document. If you compose your email on a Mac, and your recipient is also going to read your email on a Mac, then you can try writing your messages in Urdu. In fact, the messages are readable even on any non-Mac machine that has been configured with system options for right-to-left languages and on which appropriate fonts have ben installed. An occasional character in these messages might be undecipherable, and might be replaced with its unicode icon (or some gibberish, in the worst case).

Both Gmail and YahooMail systems work admirably when the Urdu Phonetic input is turned on and message composition is in Rich Text. The cursor behavior is a bit erratic; the cursor sometimes shows up at the right end of the line instead of being at left next to the last word typed. But you can ignore all that since your text is still set correcly. The cursor behavior will be eventually fixed by Google and Yahoo anyway.

Gmail allows specifying Urdu as the application language. In that case all the menus, titles, warnings, etc., will be translated to Urdu. But you do not need this extreme setting in order just to read and write Urdu email messages. While keeping English as the system working language, you can type Urdu text by selecting the UrduPhonetic input. Of course, you can also intersperse texts in Urdu and English by simply switching back and forth between UrduPhonetic and English keyboards.

TRADITIONAL DOCUMENTS

For the Mac, there is a free office suite NeoOffice which includes a powerful word processor and which can be used to edit Urdu documents. NewOffice, with an intuitive Mac look and feel, is a version of the multiplatform open source software OpenOffice. If you want to install and use NeoOffice, refer to its home page. We will not describe its use.

The Mac's built-in application TextEdit is good enough for simple documents. TextEdit is considered a text editor rather than a word processor. Yet it can be used for composing documents with multilingual text, embedded graphics, and other advanced features typically found only in large, expensive software applications.

In Plain Text mode, TextEdit allows only a single font and a single paragraph justification style for the entire document. In Rich Text mode, you can mix various font families, font sizes, font styles (e.g., bold, outlined, shadowed), and justifications (e.g., centered text, or text justified at left or right or both sides). You need to use Rich Text since Plain Text does not work well with Urdu.

To start a new Urdu document, select the menu item File | New, then Format | Text | Writing Direction | Right to Left. Set the Keyboard menu (Top Right) to Urdu Phonetic. Choose fonts, font styles, size, colors, etc., as is usual with most word processors. The default formatting is Plain Text. To switch to the Rich Text Format, do Format | Make Rich Text. Now you can apply a different justification to each paragraph, and a different formatting style to each selection.

Here is an image of the Mac screen during the editing of an Urdu document using TextEdit.


TextEdit In Action

TextEdit in Action


WEB PAGES

Skip this section if you are not interested in creating Web pages with Urdu content.

Modern web browsers are quite good at interpreting and displaying multi-lingual texts from their Unicode character encodings. Of course, the browser needs to be told that it should expect Unicode material in the web document (usually, an html file) that it is being asked to execute. At present there are two varieties of Unicode: one-byte and two-byte. Urdu characters happen to be coded as one-byte Unicode, designated as UTF-8. So to display Urdu text, you have to specify in your web document that its character set is UTF-8, as explained next.

The particular character set that a web document contains is specified by the meta statement. Near the beginning of your html file you will find some code that looks like this:

   <meta content="text/html; charset=ISO-8859-1" http-equiv="content-type">

(This is just an example. Your character set might have a name different from "ISO-8859-1".)
You have to change the character set name to "UTF-8", by replacing the above meta statement by:

   <meta content="text/html; charset=UTF-8" http-equiv="content-type">

Any Unicode inserted after this meta statement will be displayed as the character that the code represents. The Unicode for Urdu and Persian can be found in the Unicode Arabic page. A table which gives the standard Unicode as well as its html representation, called html numeric character reference, is given here.

In that table, you will notice that the html representations of the Urdu letters Alif, Re, Daal, and Vaao are, respectively, &#1575, &#1585, &#1583, and &#1608. So suppose in your html document you insert the following:

   <center>
   <font size="+3">
   &#1575&#1585&#1583&#1608
   </font>
   </center>

The result will be the word "Urdu" (in Urdu) displayed in large letters and centered in a line, as follows:

اردو

Typing numerical codes in this way is clearly tiresome, and practical only for displaying just a few characters. Fortunately, you don't have to enter character codes manually if you use the UrduPhonetic keyboard layout. The characters typed on this keyboard are automatically converted to their Unicode version and placed in the input. All you have to do is to switch to UrduPhonetic on the keyboard menu at the point in your html file where you desire to insert Urdu text.

A caveat is in order here. To prepare html files, you are likely to use some special editor different from TextEdit. We have seen that TextEdit (in RichText mode) processes Urdu letters correctly, displaying the right form of the letter and connecting the letters appropritaely. Other editors may not do all this. In particular, your typed letters might be displayed in their isolated forms from left to right in the order of their entry, and without being connected together. Or, they might appear garbled in even more annoying ways! Nevertheless, the typed text will be correctly input as if you manually entered the corresponding numerical codes.

Of course, the readers of your Web page will be able to see the Urdu text correctly only if their system has been configured for multi-lingual processing and has the Urdu fonts installed. In addition, it might be necessary for your readers to set the viewing option of their web browser for "Unicode (UTF-8)" character encoding.


Problems and Hints

INSTALLATION DIFFICULTIES

Most of the reported installation problems turned out to have a simple reason: during download or extraction, the file extensions got changed. Often a .txt extension was appended to one or more file names.

So first please make sure that your Mac shows extensions in file names. For this, move into Finder (for example, by clicking in a Finder window, or on the Finder icon in the Dock, or at a point on the screen which is not occupied by an application window). Then on the Menu bar (the one with the Apple icon at the left), click on Finder, then on Preferences, then on the Advanced tab. Now look at the Show all file extensions item. If the check box on its left does not have a check mark, then click on it so that a check mark appears there. Finally, close the Advanced window.

Now you can check whether the extensions of the UrduPhonetic files are correct. The downloaded file (UrduPhonetic.zip) and the files that your unzipper extracts (UrduPhonetic.keylayout and UrduPhonetic.icns) should have exactly those names. Change their extensions if necessary, ignoring the Finder's complaint that this could render your files dysfunctional.

Another problem some people have encountered is that during editing Urdu letters show up isolated rather than connected together in the normal way. This is usually due to TextEdit being in plain text mode rather than the rich text mode which Urdu editing requires.

To fix this problem, start TextEdit, and on the Menu bar click on TextEdit, then on Preferences, and then on the New Document tab. If the Rich Text radio button is not active, click on it. Now close Preferences, and quit TextEdit. When you restart TextEdit, it will use Rich Text as the default for new documents.

DIACRITICAL MARKS

In Urdu, short vowels (e`raab) are denoted by diacritical marks that are placed above, below, or to the left of the letter involved. Although usually omitted, they are occasionally needed to remove ambiguity or to show the correct pronunciation of a word. In particular, the tashdeed sign is always helpful to the reader of the text.

While composing text, you should type such a mark after typing the letter to which it belongs. The most frequently used marks are: zabar (shift->), zer (shift-<), pesh (shift-P), tashdeed (shift-_), and madd (shift-A). Alif with madd can be typed directly as shift-+. The "jazm" mark, which in Urdu printing looks like a tiny "daal" above a letter, is not available on the UrduPhonetic keyboard. An unattractive alternative is the "sukun" mark of Arabic orthography that looks like a little circle.

YE

The I and Y keys correspond, respectively, to the maaroof and majhool "ye" forms, popularly referred to as "choTi ye" ی and "baRi ye" ے , respectively. (See the note below about maaroof and majhool sounds.) Thus "galee" گلی (meaning lane) is to be typed as G, L, I, and "taaray" تارے (meaning stars) is to be typed as T, A, R, Y.

The form entered by Y does not connect to the next letter. So even a majhool "ye" letter that occurs in the middle of a word should be typed as I. For example, "bayTay" بیٹے (meaning sons) has to be typed B, I, shift-T, Y; even though both letters "ye" occuring in this word are pronounced with the majhool sound, the first one has to be entered as I.

In Arabic, the letter "ye" has two dots underneath. In Urdu, the two dots are shown only if "ye" appears in the middle of a word, not when it is the final letter of a word or when it stands alone (e.g., in an alphabet table). If needed, the "ye" with two dots ي can be typed as shift-5.

NOON GHUNNA

Noon Ghunna, which appears as the letter Noon but without a dot, is entered as shift-N. Thus "maaN" ماں (mother) is typed as M, A, shift-N. Noon Ghunna adds a nasal quality to the sound of the vowel preceding it.

In the freewheeling, inconsistent way of Urdu orthography, Noon Gunna is used only at the end of a word. In the middle of a word, even where Noon Ghunna would be appropriate, Urdu just uses the ordinary Noon. Examples: "saaNp" سانپ (snake) has to be entered as S, A, N, P; or "patang" پتنگ (kite) has to be entered as P, T, N, G. This inconsistency is forced by the circumstance that in the middle of a word, Noon is written as a shosha with a dot above. Without a dot, such a shosha would be visually quite confusing.

In some very old books, specially Urdu instructional primers, Noon Ghunna was indicated by a tiny inverted "v" (circumflex accent) placed above the Noon. This worked both in the middle and at the end of a word. But such efforts for consistency have been long done away with. In contrast, Hindi takes the rational approach of signifying the nasal modification by always placing a special mark above the affected letter.

HAMZA

The main forms of this letter are 1) independent ء , entered as shift-4, 2) hamza above "alif" أ , entered as the minus key (-), 3) hamza above "vaao" ؤ , entered as the equal key (=), and 4) hamza in the middle of a word ئ , entered as U.

If the hamza is the last letter of a word, use the independent hamza form (shift-4). The words in which this happens are generally derived from Arabic. Examples: "ziaa" ضیاء (meaning light) is entered by typing shift-J, I, A, shift-4; "zakaa" ذكاء (intelligence) is entered by typing shift-Z, K, A, shift-4. Sometimes such words are part of a combination such as "ziaa uddin" ضیأالدّین . It is then traditional to use the "hamza above alif" form. The above combination is entered as shift-J, I, -, A, L, D, shift-_, N. (This L is written but is slient, and the "daal" is pronounced with a tashdeed.)

It is important to understand that "hamza above alif" أ means two different things in Urdu and Arabic orthographies. In Urdu, it is a compactly written combination of two letters, the vowel alif followed by the consonant hamza. In Arabic, it stands for the consonant hamza alone (operated with the short vowel zabar), and is equivalent to Urdu's alif with a zabar اَ .

If the hamza occurs in the middle of a word, use the key U for it. When typed, it is displayed as a hamza over the letter "ye" ئ. But as soon as the next letter is typed, the ye disappears, and the correct combination of hamza and the next letter is displayed. Examples: "ghaael" گھائل (wounded) entered by typing G, H, A, U, L; "chaae" چائے (tea) entered by typing C, A, U, Y.

However, even in the middle of a word if a hamza precedes a "vaao" then the two should be typed as the single "hamza above vaao" key (equal sign). Example: "gaaoN" گاؤں (village) should be entered by typing G, A, =, shift-N, and not G, A, U, W, shift-N which would appear as گائوں.

HAY

"BaRi Hay" ح (humorously called "Halvay Vaali Hay") is entered by typing shift-H. Thus "muhabbat" محبّت (love) is entered by typing M, shift-H, B, (optionally shift-_ for tashdeed), T.

"Dochashmi Hay" ھ is entered by typing unshifted H. In modern Urdu orthography, this letter is used only in combination with some consonant (which precedes it), and its purpose is to modify that consonant's sound to make it an "aspirated letter".

"ChoTi Hay" ہ , entered by typing the letter O, is pronounced separately by itself rather than being just used to "aspirate" another consonant. For example, the "Hay" sound is pronounced independently in the word "kahaa" كہا (said); so this word is typed with a "ChoTi Hay", as K, O, A. This is in contrast to the word "khaa" كھا (Eat!) where the "Hay" is used to aspirate the "k" sound; so this word is spelled with a "Dochashmi Hay", as K, H, A.

In the word "majhool" مجہول even though "h" follows "j", no aspiration takes place since the two letters belong to different syllables ("maj-hool") and are pronounced independently. This word should therefore be typed as M, J, O, W, L, and not M, J, H, W, L which would appear as مجھول ! In general, "Dochashmi Hay" should not be used in any Urdu word that is derived from Arabic or Persian, since these languages do not have aspirated letters. Aspirated letters can occur only in the words of Indic origin.

There is an exception to the rule that "ChoTi Hay" must be pronounced with an "h" sound. At the end of a word, "choTi Hay" is pronounced as an A or E, not as H; for example, the word تكیہ typed as T, K, I, O is pronounced as "takya" (pillow).

An exception to that exception occurs sometimes, and the terminal "Hay" is actually pronounced as H, not A or E. For example, the word "shah" شہ (meaning check [of chess]) is typed X, O. The word "shaah" شاہ (meaning king), typed as X, A, O, is another example where a terminal "ChoTi Hay" is pronounced with an "h" sound.

However, the oddities of Urdu orthography do not end here. Sometimes in the words ending in a pronounced ChoTi Hay, the Hay is written twice! For example, the word "kah" (meaning say!) is often written as كہہ , entered by K, O, O; or "sah" (meaning bear!, from the verb "sahna") as سہہ , entered by S, O, O; or "faqeeh" (an expert of "fiq-h", i.e. jurisprudence) as فقیہہ , entered by F, Q, I, O, O. But you will find that this spelling variation, intended presumably to avoid the ChoTi Hay being wrongly pronounced as A or E, is practiced rather inconsistently.

PUNCTUATION

The end of an Urdu declarative sentence is marked with a small dash rather than a period. But the period key itself generates the dash in the UrduPhonetic keyboard. Other punctuation symbols such as question mark, exclamation, comma, semicolon, parentheses, brackets, double and single quotation marks, etc., are entered with the usual keys. Punctuation symbols are appropriately reversed or inverted to match the right-to-left flow of text.

INTER-WORD SPACES. (AND A LAMENT!)

People accustomed to Nastaleeq word processors will discover that TextEdit uses spaces and other punctuation to separate the words in a document. This is the correct and rational behavior, shared by every non-Nastaleeq word processor in the world. Nastaleeq word processors stand alone in suppressing inter-word spaces. The user, of course, still has to type spaces to signify ends of words, but those spaces are removed and the words follow each other in a continuous stream.

Just imagine reading this English page without spaces between words. Deciphering such a character stream requires, in essence, that you already know what you are trying to learn!

The defenders of the Nastaleeq practice might argue that, unlike the situation with English, the ends of Urdu words are often recognizable. But "often" is not "always". Here is an example: In the sample Urdu text above that has been typeset in TextEdit, you can distinguish the words because the beginning and end of each word are clearly displayed. Now take a close look at the first line of the sample: you will find that of the 19 words there, 10 are made up of parts that could themselves be thought of as words. For the same text edited with any popular Nastaleeq word processor, you would be able to skip over those unintended words only because you alreay know the intended words, not because the text display is of any help!

When computer typesetting of Nastaleeq was first introduced for Urdu in the 1980s, inter-word spaces were actually used. The practice of suppressing them is more recent. This unwise retrogression, justified in the name of "tradition and esthetics", is an unnecessary obstacle to anyone trying to learn Urdu. The Nastaleeq script already suffers from too many complexities, obscurities, irregularities, and inconsistencies. It makes no sense to invent more barriers to the accessibility of Urdu. The practice simply prolongs the time it takes students to master the language. It is also hindering the development of optical character recognition and other important electronic processing technologies for Urdu.

Exercise for the reader: Find out what ghatrabood is, and enjoy the story.


Notes for Persian

The Urdu alphabet contains the following additional letters that do not exist in Persian:

  1. The "retroflex" letters "Tay", "Daal" and "Ray", each of which has a tiny "toey" mark above. These letters are typed with the keys shift-T, shift-D, and shift-R, respectively.
  2. "Aspirated letters", formed by combining certain consonants with "Dochashmi Hay". (Aspirated letters are really combinations, and do not count as letters in the Urdu alphabet.)
    The distinction between "ChoTi Hay" (key O) and "Dochashmi Hay" (key H) described in the above section is irrelevant to Persian in which both these forms of "Hay" are used interchangeably.
  3. "Noon Ghunna" (key shift-N).
  4. "BaRi Ye" (key Y). In Persian, this variant of Ye is still used for calligraphic effect, but is not common in computer word processing.
Urdu and Persian differ markedly in the pronunciation of vowels. For example, in Persian the short vowels zer and pesh have majhool sounds and the long vowels vaao and ye have maaroof sounds. In Urdu, the same vowels do double duty to represent both maaroof and majhool sounds. These differences do not affect writing unless special marks are used to distinguish maaroof and majhool sounds.

There are minor variations in the uses of "hamza" between Urdu and Persian. These are all adequately accounted for by the keyboard and fonts we have discussed.

The standard ("educated person's") pronunciation of consonants is generally identical in Urdu and Persian, and often different from Arabic. Some of the similarities and differences are as follows:

  1. The consonants contain some groups of separate letters that are pronounced with the same sound in Urdu and Persian. They are listed in the first table at the end of the section The UrduPhonetic Keyboard. The letters of different groups are in different rows. In Arabic, the letters within each group have distinct pronunciations.
  2. When occurring as a consonant, the letter "vaao" (key W) is pronounced like "v" in Urdu and Persian, but like "w" in Arabic.
  3. The letter "qaaf" (key Q) has the same pronunciation in Urdu and Arabic but a different one in Persian. In Persian, the letters "qaaf" and "ghain" (key Shift-G) are pronounced alike, with the same sound as that of "ghain" in Arabic and Urdu.

Since we have called our keyboard phonetic, we wanted to relate the pronunciation of the alphabet letters with the keys being used to enter them. The tedious details given above will perhaps help you in remembering the keys. As you can see, it is hard to phonetically map the Urdu or Persian alphabet to a Latin-based keyboard!


Note on Maaroof and Majhool Vowels

There is an old classification of certain vowel sounds as maaroof (literally, well-known) or majhool (literally, unknown or unfamiliar). The difference between these can be illustrated with English words as follows:

  1. Short vowel mark Zer:       bill (maaroof), bell (majhool)
  2. Short vowel mark Pesh:     bull (maaroof), * (majhool)
  3. Long vowel letter Vaao:     pool (maaroof), pole (majhool)
  4. Long vowel letter Ye:        peel (maaroof), pale (majhool).
* The majhool pesh is difficult to illustrate in English because the letter "O", which is closest to that vowel, is pronounced in several different ways. But this is the vowel sound found in the Persian words "gol" (meaning flower) and "sokhan" (utterance).


Acknowledgement

UrduPhonetic was designed with the aid of Ukelele, a keyboard layout editor for MacOS. I thank John Brownie, the author of Ukelele, for developing this melodious software and for making it available under a freeware license.

I also wish to thank
     Amal Ahmed, Aaron Jakes, Shebab Javed, Karan Misra, Knut S. Vikor, and Muhammad Yusaf
for reporting problems and for offering suggestions to make this page more informative and useful.



Visitor Count Web Counter