Skip to content

Orthography#

The [[Writing Systems|writing system]] used by the [[index|Arabic]] language is an [[Writing Systems#Abjads|abjad]] consisting of 28 letters. It is a cursive script written from right to left and top to bottom. All letters have between two and four forms depending on their position in a word so as to be able to connect with the letters surrounding them:
- The initial form is used at the start of a word and connects with the letter after it.
- The medial form is used in the middle of a word and connects with the letters before and after it.
- The final form is used at the end of a word and connects with the letter preceding it.

The letters ʾalif (ا), dāl (د), dhāl (ذ), rāʾ (ر), zayn (ز), and wāw (و) are known as non-connecting letters. They only have two forms and do not connect to a subsequent letter in the word. Any letter that follows them must begin a new connection, taking its initial form (or isolated form if it is the last letter of the word).

The Arabic Script
Final Medial Initial Isolated IPA Name Romanization
ـا ا أَلِف [ʔalif] ʾalif ʾ / ʔ, ā
ـب ـبـ بـ ب /b/ بَاء [baːʔ] bāʾ b
ـت ـتـ تـ ت /t/ تَاء [taːʔ] tāʾ t
ـث ـثـ ثـ ث /θ/ ثَاء [θaːʔ] thāʾ ṯ, th
ـج ـجـ جـ ج /d͡ʒ/ جِيم [d͡ʒiːm] jīm j
ـح ـحـ حـ ح /ħ/ حَاء [ħaːʔ] ḥāʾ
ـخ ـخـ خـ خ /x/ خَاء [xaːʔ] khāʾ ḵ, kh
ـد د /d/ دَال [daːl] dāl d
ـذ ذ /ð/ ذَال [ðaːl] dhāl ḏ / dh
ـر ر /r/ رَاء [raːʔ] rāʾ r
ـز ز /z/ زَاي [zaːj] zāy z
ـس ـسـ سـ س /s/ سِين [siːn] sīn s
ـش ـشـ شـ ش /ʃ/ شِين [ʃiːn] shīn š, sh
ـص ـصـ صـ ص /sˤ/ صَاد [sˤaːd] ṣād
ـض ـضـ ضـ ض /dˤ/ ضَاد [dˤaːd] ḍād
ـط ـطـ طـ ط /tˤ/ طَاء [tˤaːʔ] ṭāʾ
ـظ ـظـ ظـ ظ /ðˤ/ ظَاء [ðˤaːʔ] ẓāʾ
ـع ـعـ عـ ع /ʕ/ عَيْن [ʕajn] ʕain ʻ / ʕ
ـغ ـغـ غـ غ /ɣ/ غَيْن [ɣajn] ghayn ḡ / gh
ـف ـفـ فـ ف /f/ فَاء [faːʔ] fāʾ f
ـق ـقـ قـ ق /q/ قَاف [qaːf] qāf q
ـك ـكـ كـ ك /k/ كَاف [kaːf] kāf k
ـل ـلـ لـ ل /l/ لاَم [laːm] lām l
ـم ـمـ مـ م /m/ مِيم [miːm] mīm m
ـن ـنـ نـ ن /n/ نُون [nuːn] nūn n
ـه ـهـ هـ ه /h/ هَاء [haːʔ] hāʾ h
ـو و /w/, /uː/ وَاو [waːw] wāw w, ū
ـي ـيـ يـ ي /j/, /iː/ يَاء [jaːʔ] yāʾ y, ī

All letters except for ʾalif (ا), wāw (و) and yāʾ (ي) denote a single [[Orthography and Phonology#Consonants|consonant]] sound. The letters wāw (و) and yāʾ (ي) can denote either the [[Orthography and Phonology#Consonants|consonants]] /w/ and /j/ or the long [[Orthography and Phonology#Vowels|vowels]] /uː/ and /iː/, respectively. The letter ʾalif (ا) does not have its own pronunciation. Instead, it can represent a few different sounds depending on how its used.

Note: Spelling Variants of ʾalif (ا)

The letter ʾalif (ا) has two additional spelling variants:

  • Dagger ʾAlif (الألف الخنجرية, al-ʾAlif al-Khanjariyyah) is a short diacritic (ـٰ) used to indicate that the [[Orthography and Phonology#Consonants|consonant]] on which it is placed is followed by the [[Orthography and Phonology#Vowels|long vowel]] /aː/. This is used only in a few words such as الله (Allāh) in the Quran. Outside of the Quran, it is rarely used.
  • ʾAlif Maqṣūrah (الأَلِف المَقْصُورَة) only occurs at a word-final position and looks like a dotless yāʾ (ى). It is pronounced as [[Orthography and Phonology#Vowels|long vowel]] /aː/ and is used only in specific situations.
Note: تَاء مَرْبُوطَة (tā' marbūṭa)

There is a special variant of the letter tā' known as تَاء مَرْبُوطَة (tā' marbūṭa). It can only occur at the end of words. It is written as ﺔ when linked to the previous letter and as ة when unlinked. The pronunciation of tā' marbūṭa varies:

  • In [[Orthography and Phonology#Full Form and Pause Form|full form]], it is pronounced as /t/ (e.g. سَيَّارَةٌ, sayyāratun).
  • In [[Orthography and Phonology#Full Form and Pause Form|pause form]], it is pronounced as /h/ (e.g. سَيَّارَةْ sayyārah). However, if the word is the first word of an [[TODO|إضافة]], then it is again pronounced as /t/.

Short [[Orthography and Phonology#Vowels|vowels]] are almost never written in practice. Speakers have to rely on contextual cues to determine which vowels to pronounce. However, there is a system of diacritics, known as [[Orthography and Phonology#Vocalization|vocalization]], which is used to indicate short [[Orthography and Phonology#Vowels|vowels]] in specific contexts.

Vocalization#

A system of diacritics known as vocalization is used to show the pronunciation of [[Orthography and Phonology#Vowels|short vowels]]:

Diacritic IPA Name
ـَ /a/ فَتْحَة fatḥah
ـِ /i/ كَسْرَة kasrah
ـُ /u/ ضَمَّة ḍammah
ــْـ /ø/ سُكُون sukūn

When such a diacritic is placed on any letter except ʾalif (ا), wāw (و) and yāʾ (ي), it indicates that the [[Orthography and Phonology#Consonants|consonant]] should be pronounced followed by the respective [[Orthography and Phonology#Vowels|short vowel]]. When placed on wāw (و) or yāʾ (ي), the diacritic indicates /w/ or /j/ + the [[Orthography and Phonology#Vowels|short vowel]], respectively.

In vocalized texts, [[Orthography and Phonology#Vowels|long vowels]] are indicated by the following combinations of ʾalif (ا), wāw (و) or yāʾ (ي) (without diacritics) and the respective diacritic placed on the letter preceding ʾalif (ا), wāw (و) or yāʾ (ي):
- The [[Orthography and Phonology#Vowels|long vowel]] /a:/ is indicated by ا and a fat·ḥah on the letter before ا;
- The [[Orthography and Phonology#Vowels|long vowel]] /u:/ is indicated by و and a ḍammah on the consonant before و.
- The [[Orthography and Phonology#Vowels|long vowel]] /i:/ is indicated by ي and a kasrah on the consonant before ي;

The sukūn is a special diacritic which indicates that there is no [[Orthography and Phonology#Vowels|vowel]] after the letter.

Note: Use of Vocalization

Vocalization is almost never used in practice. It is only used in the Quran and when teaching the language. Rarely, it might also be used to resolve ambiguity.

Tanwīn#

Three additional diacritics known as تَنوِين (tanwīn) may occur on word-final letters. They indicate that the letter should be pronounced followed by a particular [[Orthography and Phonology#Vowels|short vowel]] + /n/. The use of tanwīn is governed by the grammatical rules of [[TODO|nunation]].

Diacritic IPA Name
ـً /an/ فَتْحَتَيْنِ fatḥahtayn
ـٍ /in/ كَسْرَتَيْنِ kasratayn
ـٌ /un/ ضَمَّتَيْنِ ḍammatayn

Hamza#

The symbol ء is known as هَمْزة (hamza) and is used to indicate a [[TODO|glottal stop]] /ʔ/. It can appear either on its own like a separate letter or as a diacritic on a "seat": one of ʾalif (ا), wāw (و) and yāʾ (ي). When used as seats for hamza, the letters ʾalif (ا), wāw (و) and yāʾ (ي) themselves do not denote any sound. Furthermore, yāʾ (ي) loses its two dots when acting as a seat for hamza.

When hamza is the first [[Orthography and Phonology#Consonants|consonant]] in a word, it is always written either above or below an ʾalif seat:
- It is written above ʾalif (أ) when the following vowel is /a/ (أَ) or /u/ (أُ) - these are pronounced /ʔa/ and /ʔu/, respectively.
- It is written below ʾalif (إ) when the following vowel is /i/ (إِ) - this is pronounced /ʔi/.

It is also common to omit a word-initial hamza symbol and only write the ʾalif seat.

Note: Weak Hamza (هَمْزَة الوَصْل, hamzat al-waṣl)

In certain situations, the [[TODO|glottal stop]] /ʔ/ at a word-initial position is pronounced only if the word is at the beginning of a sentence and it is otherwise dropped completely. Such a [[TODO|glottal stop]] /ʔ/ is known as a weak hamza (هَمْزَة الوَصْل, hamzat al-waṣl) and is indicated by an ʾalif seat without a hamza symbol. In [[Orthography and Phonology#Vocalization|vocalized]] texts, it is indicated by ٱ (an ʾalif with a waṣlah sign).

A medial hamza is written on the line as an unlinked letter when:
- denoting /ʔa(ː)/ preceded by an ʾalif denoting the [[Orthography and Phonology#Vowels|long vowel]] /aː/ - قِرَاءَة (qirā'a).
- denoting /ʔa(ː)/ or /ʔu(ː)/ preceded by a wāw denoting the [[Orthography and Phonology#Vowels|long vowel]] /uː/ or by a wāw without a subsequent [[Orthography and Phonology#Vowels|vowel]] - مُرُوءَة (murūʾa), ضَوْءُهُ (ḍawʾuhu).

In all other situations, a medial hamza is written on one of the three seats. The seat is determined by the [[Orthography and Phonology#Vowels|vowels]] surrounding the hamza with precedence according to the order /i(ː)/ > /u(ː)/ > /a(ː)/.
- If there is an /i(ː)/ either preceding or following /ʔ/, then hamza takes yāʾ as a seat - سُئِلَ (suʾila), بِئْر (biʾr).
- If there is no /i(ː)/ surrounding the /ʔ/ but there is /u(ː)/ either before or after it, then hamza takes wāw as a seat - سُؤَال (su'āl), رُؤُوس (ruʾūs).
- If there is neither /i(ː)/ nor /u(ː)/ surrounding the /ʔ/, but there is /a(ː)/ either before or after it, then hamza takes ʾalif as a seat - سَأَلَ (saʾala), رَأْس (raʾs).

A final hamza is written on the line as an unlinked letter when preceded by any [[Orthography and Phonology#Vowels|long vowel]] or any [[Orthography and Phonology#Consonants|consonant]] - عِبْء (ʿibʾ), سَمَاء (samāʾ), هُدُوء (hudūʾ), شَيْء (shayʾ). It is written on a seat when preceded by a short [[Orthography and Phonology#Vowels|vowel]]. The vowel determines which seat to take:
- /a/ -> ʾalif, such as in قَرَأَ (qaraʾa);
- /u/ -> wāw, such as in تَكَافُؤ (takāfuʾ) ;
- /i/ -> yāʾ, such as in شَاطِئ (shāṭiʾ).

Note: ألِف مدَّة (ʾalif madda)

A special symbol (آ) known as ألِف مدَّة (ʾalif madda) is used to denote the combination /ʔaː/ of a [[TODO|glottal stop]] /ʔ/ followed by the [[Orthography and Phonology#Vowels|long vowel]] /aː/.

Gemination#

Gemination or doubling of a consonant is indicated not by writing the letter for it twice but rather by placing a special diacritic ــّـ called شَدَّة (shaddah) on top of it.

Diacritic Name (Arabic Script) Name (Romanized) Meaning
ــّـ شَدَّة shaddah Doubling of the consonant.

A shaddah may be combined with another diacritic for indicating a short vowel.

Ligatures#

Ligatures are quire common in written Arabic. Although there are many optional ligatures, the ligature between ل and ا is compulsory.

Final Medial Initial Isolated Letters IPA
ﻻ‎ ﻻ‎ ل + ا /la:/

Phonology#

Consonants#

[[./index|Modern Standard Arabic]] has 28 [[TODO|consonant]] [[TODO|phonemes]].

Consonant Phonemes of Modern Standard Arabic
Labial Dental Denti-Alveolar Post-Alveolar / Palatal Velar Uvular Pharyngeal Glottal
Plain Emphatic
Nasal m م n ن
Plosive Voiceless t ت tˤ ط k ك q ق ʔ ء
Voiced b ب d د dˤ ض d͡ʒ ج
Fricative Voiceless f ف θ ث s س sˤ ص ʃ ش x خ ħ ح h هـ
Voiced ð ذ z ز ðˤ ظ ɣ غ ʕ ع
Trill r ر
Approximant l ل j ي w و

The consonants /sˤ/, /dˤ/, /tˤ/, /ðˤ/ are known as emphatic consonants.

Vowels#

[[./index|Arabic]] has only six [[TODO|vowel]] [[TODO|phonemes]] and they manifest as three pairs of long and short sounds.

Arabic Vowels
Front Central Back
Close /i(ː)/ /u(ː)/
Mid
Open /a(ː)/

The [[TODO|phoneme]] /a/ is realized as [æ] when preceded by most [[Orthography and Phonology#Consonants|consonants]]:
- the [[TODO|labial]] [[TODO|consonants]] /m/, /b/ and /f/;
- the non-[[Orthography and Phonology#Consonants|emphatic]] [[TODO|coronal consonants]] /θ/, /ð/, /n/, /t/, /d/, /s/, /z/, /l/, /ʃ/ and /d͡ʒ/ (except /r/)
- the [[TODO|glottal consonants]] /h/ and /ʔ/
- /j/, /k/ and /w/;

The [[TODO|phoneme]] /a/ is realized as [ɑ] when preceded by most [[Orthography and Phonology#Consonants|consonants]] the [[Orthography and Phonology#Consonants|emphatic consonants]] /sˤ/, /dˤ/, /tˤ/, /ðˤ/.

Note: Full Form and Pause Form

There are two styles of pronouncing final [[Orthography and Phonology#Vowels|short vowels]] of words in [[./index|Modern Standard Arabic]]:

  • When a word is pronounced in full form (وَصْل, waṣl), its final [[Orthography and Phonology#Vowels|short vowel]] (if any) is always pronounced (e.g. كَتَبَ).
  • When a word is pronounced in pause form (وَقْف, waqf), its final [[Orthography and Phonology#Vowels|short vowel]] (if any) is not pronounced (e.g. كَتَبْ).

Using full form for every word is rarely done. It is usually restricted to Quranic recitation, poetry, highly formal speeches and is also done when teaching the language. In formal contexts, pause form is used for the final word of each sentence and when there is a natural break for breathing. All other words are in full form. In informal contexts, pause form is used for pretty much all words.

Pause form also affects the pronunciation of [[Orthography and Phonology#Orthography|tā' marbūṭa]].

Syllabification#

[[TODO|Syllable]] structure in [[./index|Modern Standard Arabic]] is fairly restricted because syllables are limited to one of five forms:
- CV (light)
- CVV (heavy)
- CVC (heavy)
- CVVC (super-heavy)
- CVCC (super-heavy)

[[TODO|Syllables]] can never begin with a [[Orthography and Phonology#Vowels|vowel]]. Even though it might sometimes seems like a [[TODO|syllable]] starts with a [[Orthography and Phonology#Vowels|vowel]], such [[TODO|syllables]] actually start with a [[TODO|glottal stop]].

Accent#

[[./index|Modern Standard Arabic]] has a [[TODO|stress accent]] system, similar to the one in English, where an accented (or stressed) syllable is pronounced slightly longer, more clearly and with a bit more volume. For the most part, stress is predictable and follows certain rules, although there are some variations between dialects. These rules are given below in order of decreasing precedence:
- If the final [[Orthography and Phonology#Syllabification|syllable]] is CVVC or CVCC, then the stress falls on it (e.g. كِتَاب ki-TĀB, مُسْتَقِلّ mus-ta-QILL).
- If the second-to-last [[Orthography and Phonology#Syllabification|syllable]] is CVV or CVC, then the stress falls on it (e.g. )
- If the second-to-last [[Orthography and Phonology#Syllabification|syllable]] is CV, then the stress falls on the third-to-last syllable (e.g. كَتَبَ KA-ta-ba).

Sources#

  1. Arabic Alphabet - Wikipedia
  2. Hamza - Wikipedia
  3. A Reference Grammar of Modern Standard Arabic
  4. Standard Arabic Phonology