Orthography
The writing system used by the Arabic language is an abjad consisting of 28 letters. It is a cursive script written from right to left and top to bottom. All letters have between two and four forms depending on their position in a word so as to be able to connect with the letters surrounding them:
- The initial form is used at the start of a word and connects with the letter after it.
- The medial form is used in the middle of a word and connects with the letters before and after it.
- The final form is used at the end of a word and connects with the letter preceding it.
The letters ʾalif (ا), dāl (د), dhāl (ذ), rāʾ (ر), zayn (ز), and wāw (و) are known as non-connecting letters. They only have two forms and do not connect to a subsequent letter in the word. Any letter that follows them must begin a new connection, taking its initial form (or isolated form if it is the last letter of the word).
Final | Medial | Initial | Isolated | IPA | Name | Romanization |
---|---|---|---|---|---|---|
ـا | ا | أَلِف [ʔalif] ʾalif | ʾ / ʔ, ā | |||
ـب | ـبـ | بـ | ب | /b/ | بَاء [baːʔ] bāʾ | b |
ـت | ـتـ | تـ | ت | /t/ | تَاء [taːʔ] tāʾ | t |
ـث | ـثـ | ثـ | ث | /θ/ | ثَاء [θaːʔ] thāʾ | ṯ, th |
ـج | ـجـ | جـ | ج | /d͡ʒ/ | جِيم [d͡ʒiːm] jīm | j |
ـح | ـحـ | حـ | ح | /ħ/ | حَاء [ħaːʔ] ḥāʾ | ḥ |
ـخ | ـخـ | خـ | خ | /x/ | خَاء [xaːʔ] khāʾ | ḵ, kh |
ـد | د | /d/ | دَال [daːl] dāl | d | ||
ـذ | ذ | /ð/ | ذَال [ðaːl] dhāl | ḏ / dh | ||
ـر | ر | /r/ | رَاء [raːʔ] rāʾ | r | ||
ـز | ز | /z/ | زَاي [zaːj] zāy | z | ||
ـس | ـسـ | سـ | س | /s/ | سِين [siːn] sīn | s |
ـش | ـشـ | شـ | ش | /ʃ/ | شِين [ʃiːn] shīn | š, sh |
ـص | ـصـ | صـ | ص | /sˤ/ | صَاد [sˤaːd] ṣād | ṣ |
ـض | ـضـ | ضـ | ض | /dˤ/ | ضَاد [dˤaːd] ḍād | ḍ |
ـط | ـطـ | طـ | ط | /tˤ/ | طَاء [tˤaːʔ] ṭāʾ | ṭ |
ـظ | ـظـ | ظـ | ظ | /ðˤ/ | ظَاء [ðˤaːʔ] ẓāʾ | ẓ |
ـع | ـعـ | عـ | ع | /ʕ/ | عَيْن [ʕajn] ʕain | ʻ / ʕ |
ـغ | ـغـ | غـ | غ | /ɣ/ | غَيْن [ɣajn] ghayn | ḡ / gh |
ـف | ـفـ | فـ | ف | /f/ | فَاء [faːʔ] fāʾ | f |
ـق | ـقـ | قـ | ق | /q/ | قَاف [qaːf] qāf | q |
ـك | ـكـ | كـ | ك | /k/ | كَاف [kaːf] kāf | k |
ـل | ـلـ | لـ | ل | /l/ | لاَم [laːm] lām | l |
ـم | ـمـ | مـ | م | /m/ | مِيم [miːm] mīm | m |
ـن | ـنـ | نـ | ن | /n/ | نُون [nuːn] nūn | n |
ـه | ـهـ | هـ | ه | /h/ | هَاء [haːʔ] hāʾ | h |
ـو | و | /w/, /uː/ | وَاو [waːw] wāw | w, ū | ||
ـي | ـيـ | يـ | ي | /j/, /iː/ | يَاء [jaːʔ] yāʾ | y, ī |
All letters except for ʾalif (ا), wāw (و) and yāʾ (ي) denote a single consonant sound. The letters wāw (و) and yāʾ (ي) can denote either the consonants /w/ and /j/ or the long vowels /uː/ and /iː/, respectively. The letter ʾalif (ا) does not have its own pronunciation. Instead, it can represent a few different sounds depending on how its used.
Note: Spelling Variants of ʾalif (ا)
The letter ʾalif (ا) has two additional spelling variants:
- Dagger ʾAlif (الألف الخنجرية, al-ʾAlif al-Khanjariyyah) is a short diacritic (ـٰ) used to indicate that the consonant on which it is placed is followed by the long vowel /aː/. This is used only in a few words such as الله (Allāh) in the Quran. Outside of the Quran, it is rarely used.
- ʾAlif Maqṣūrah (الأَلِف المَقْصُورَة) only occurs at a word-final position and looks like a dotless yāʾ (ى). It is pronounced as long vowel /aː/ and is used only in specific situations.
Note: تَاء مَرْبُوطَة (tā' marbūṭa)
There is a special variant of the letter tā’ known as تَاء مَرْبُوطَة (tā’ marbūṭa). It can only occur at the end of words. It is written as ﺔ when linked to the previous letter and as ة when unlinked. The pronunciation of tā’ marbūṭa varies:
- In full form, it is pronounced as /t/ (e.g. سَيَّارَةٌ, sayyāratun).
- In pause form, it is pronounced as /h/ (e.g. سَيَّارَةْ sayyārah). However, if the word is the first word of an إضافة, then it is again pronounced as /t/.
Short vowels are almost never written in practice. Speakers have to rely on contextual cues to determine which vowels to pronounce. However, there is a system of diacritics, known as vocalization, which is used to indicate short vowels in specific contexts.
Vocalization
A system of diacritics known as vocalization is used to show the pronunciation of short vowels:
Diacritic | IPA | Name |
---|---|---|
ـَ | /a/ | فَتْحَة fatḥah |
ـِ | /i/ | كَسْرَة kasrah |
ـُ | /u/ | ضَمَّة ḍammah |
ــْـ | /ø/ | سُكُون sukūn |
When such a diacritic is placed on any letter except ʾalif (ا), wāw (و) and yāʾ (ي), it indicates that the consonant should be pronounced followed by the respective short vowel. When placed on wāw (و) or yāʾ (ي), the diacritic indicates /w/ or /j/ + the short vowel, respectively.
In vocalized texts, long vowels are indicated by the following combinations of ʾalif (ا), wāw (و) or yāʾ (ي) (without diacritics) and the respective diacritic placed on the letter preceding ʾalif (ا), wāw (و) or yāʾ (ي):
- The long vowel /a:/ is indicated by ا and a fat·ḥah on the letter before ا;
- The long vowel /u:/ is indicated by و and a ḍammah on the consonant before و.
- The long vowel /i:/ is indicated by ي and a kasrah on the consonant before ي;
The sukūn is a special diacritic which indicates that there is no vowel after the letter.
Note: Use of Vocalization
Vocalization is almost never used in practice. It is only used in the Quran and when teaching the language. Rarely, it might also be used to resolve ambiguity.
Tanwīn
Three additional diacritics known as تَنوِين (tanwīn) may occur on word-final letters. They indicate that the letter should be pronounced followed by a particular short vowel + /n/. The use of tanwīn is governed by the grammatical rules of nunation.
Diacritic | IPA | Name |
---|---|---|
ـً | /an/ | فَتْحَتَيْنِ fatḥahtayn |
ـٍ | /in/ | كَسْرَتَيْنِ kasratayn |
ـٌ | /un/ | ضَمَّتَيْنِ ḍammatayn |
Hamza
The symbol ء is known as هَمْزة (hamza) and is used to indicate a glottal stop /ʔ/. It can appear either on its own like a separate letter or as a diacritic on a “seat”: one of ʾalif (ا), wāw (و) and yāʾ (ي). When used as seats for hamza, the letters ʾalif (ا), wāw (و) and yāʾ (ي) themselves do not denote any sound. Furthermore, yāʾ (ي) loses its two dots when acting as a seat for hamza.
When hamza is the first consonant in a word, it is always written either above or below an ʾalif seat:
- It is written above ʾalif (أ) when the following vowel is /a/ (أَ) or /u/ (أُ) - these are pronounced /ʔa/ and /ʔu/, respectively.
- It is written below ʾalif (إ) when the following vowel is /i/ (إِ) - this is pronounced /ʔi/.
It is also common to omit a word-initial hamza symbol and only write the ʾalif seat.
Note: Weak Hamza (هَمْزَة الوَصْل, hamzat al-waṣl)
In certain situations, the glottal stop /ʔ/ at a word-initial position is pronounced only if the word is at the beginning of a sentence and it is otherwise dropped completely. Such a glottal stop /ʔ/ is known as a weak hamza (هَمْزَة الوَصْل, hamzat al-waṣl) and is indicated by an ʾalif seat without a hamza symbol. In vocalized texts, it is indicated by ٱ (an ʾalif with a waṣlah sign).
A medial hamza is written on the line as an unlinked letter when:
- denoting /ʔa(ː)/ preceded by an ʾalif denoting the long vowel /aː/ - قِرَاءَة (qirā’a).
- denoting /ʔa(ː)/ or /ʔu(ː)/ preceded by a wāw denoting the long vowel /uː/ or by a wāw without a subsequent vowel - مُرُوءَة (murūʾa), ضَوْءُهُ (ḍawʾuhu).
In all other situations, a medial hamza is written on one of the three seats. The seat is determined by the vowels surrounding the hamza with precedence according to the order /i(ː)/ > /u(ː)/ > /a(ː)/.
- If there is an /i(ː)/ either preceding or following /ʔ/, then hamza takes yāʾ as a seat - سُئِلَ (suʾila), بِئْر (biʾr).
- If there is no /i(ː)/ surrounding the /ʔ/ but there is /u(ː)/ either before or after it, then hamza takes wāw as a seat - سُؤَال (su’āl), رُؤُوس (ruʾūs).
- If there is neither /i(ː)/ nor /u(ː)/ surrounding the /ʔ/, but there is /a(ː)/ either before or after it, then hamza takes ʾalif as a seat - سَأَلَ (saʾala), رَأْس (raʾs).
A final hamza is written on the line as an unlinked letter when preceded by any long vowel or any consonant - عِبْء (ʿibʾ), سَمَاء (samāʾ), هُدُوء (hudūʾ), شَيْء (shayʾ). It is written on a seat when preceded by a short vowel. The vowel determines which seat to take:
- /a/ → ʾalif, such as in قَرَأَ (qaraʾa);
- /u/ → wāw, such as in تَكَافُؤ (takāfuʾ) ;
- /i/ → yāʾ, such as in شَاطِئ (shāṭiʾ).
Note: ألِف مدَّة (ʾalif madda)
A special symbol (آ) known as ألِف مدَّة (ʾalif madda) is used to denote the combination /ʔaː/ of a glottal stop /ʔ/ followed by the long vowel /aː/.
Gemination
Gemination or doubling of a consonant is indicated not by writing the letter for it twice but rather by placing a special diacritic ــّـ called شَدَّة (shaddah) on top of it.
Diacritic | Name (Arabic Script) | Name (Romanized) | Meaning |
---|---|---|---|
ــّـ | شَدَّة | shaddah | Doubling of the consonant. |
A shaddah may be combined with another diacritic for indicating a short vowel.
Ligatures
Ligatures are quire common in written Arabic. Although there are many optional ligatures, the ligature between ل and ا is compulsory.
Final | Medial | Initial | Isolated | Letters | IPA |
---|---|---|---|---|---|
ﻼ | ﻼ | ﻻ | ﻻ | ل + ا | /la:/ |
Phonology
Consonants
Modern Standard Arabic has 28 consonant phonemes.
Labial | Dental | Denti-Alveolar | Post-Alveolar / Palatal | Velar | Uvular | Pharyngeal | Glottal | |||
---|---|---|---|---|---|---|---|---|---|---|
Plain | Emphatic | |||||||||
Nasal | m م | n ن | ||||||||
Plosive | Voiceless | t ت | tˤ ط | k ك | q ق | ʔ ء | ||||
Voiced | b ب | d د | dˤ ض | d͡ʒ ج | ||||||
Fricative | Voiceless | f ف | θ ث | s س | sˤ ص | ʃ ش | x خ | ħ ح | h هـ | |
Voiced | ð ذ | z ز | ðˤ ظ | ɣ غ | ʕ ع | |||||
Trill | r ر | |||||||||
Approximant | l ل | j ي | w و |
The consonants /sˤ/, /dˤ/, /tˤ/, /ðˤ/ are known as emphatic consonants.
Vowels
Arabic has only six vowel phonemes and they manifest as three pairs of long and short sounds.
Front | Central | Back | |
---|---|---|---|
Close | /i(ː)/ | /u(ː)/ | |
Mid | |||
Open | /a(ː)/ |
The phoneme /a/ is realized as [æ] when preceded by most consonants:
- the labial consonants /m/, /b/ and /f/;
- the non-emphatic coronal consonants /θ/, /ð/, /n/, /t/, /d/, /s/, /z/, /l/, /ʃ/ and /d͡ʒ/ (except /r/)
- the glottal consonants /h/ and /ʔ/
- /j/, /k/ and /w/;
The phoneme /a/ is realized as [ɑ] when preceded by most consonants the emphatic consonants /sˤ/, /dˤ/, /tˤ/, /ðˤ/.
Note: Full Form and Pause Form
There are two styles of pronouncing final short vowels of words in Modern Standard Arabic:
- When a word is pronounced in full form (وَصْل, waṣl), its final short vowel (if any) is always pronounced (e.g. كَتَبَ).
- When a word is pronounced in pause form (وَقْف, waqf), its final short vowel (if any) is not pronounced (e.g. كَتَبْ).
Using full form for every word is rarely done. It is usually restricted to Quranic recitation, poetry, highly formal speeches and is also done when teaching the language. In formal contexts, pause form is used for the final word of each sentence and when there is a natural break for breathing. All other words are in full form. In informal contexts, pause form is used for pretty much all words.
Pause form also affects the pronunciation of tā’ marbūṭa.
Syllabification
Syllable structure in Modern Standard Arabic is fairly restricted because syllables are limited to one of five forms:
- CV (light)
- CVV (heavy)
- CVC (heavy)
- CVVC (super-heavy)
- CVCC (super-heavy)
Syllables can never begin with a vowel. Even though it might sometimes seems like a syllable starts with a vowel, such syllables actually start with a glottal stop.
Accent
Modern Standard Arabic has a stress accent system, similar to the one in English, where an accented (or stressed) syllable is pronounced slightly longer, more clearly and with a bit more volume. For the most part, stress is predictable and follows certain rules, although there are some variations between dialects. These rules are given below in order of decreasing precedence:
- If the final syllable is CVVC or CVCC, then the stress falls on it (e.g. كِتَاب ki-TĀB, مُسْتَقِلّ mus-ta-QILL).
- If the second-to-last syllable is CVV or CVC, then the stress falls on it (e.g. )
- If the second-to-last syllable is CV, then the stress falls on the third-to-last syllable (e.g. كَتَبَ KA-ta-ba).