You may find this article helpful for understanding the physiology of language sound production in English. A sound production involves the coordination of various speech organs, such as the lungs, diaphragm, larynx, vocal cords, tongue, lips, and teeth, to produce speech sounds. Air is expelled from the lungs (the air pressure increases to above the pressure of the atmosphere, causing air to be forced out of the lungs) and vibrates the vocal folds (cords) in the larynx, creating sound(s). The shape and movement of the articulators (tongue, lips, alveolar ridge, palate, velum, vocal folds etc.) modify vibrations to produce "speech sounds". The process of speech sound production involves a complex interaction between the nervous and muscular systems and we must be aware of what is going on—It helps to identify the root cause of speech and language difficulties.

Articulator pathologies can result in speech sound disorders and impact an individual's ability to communicate effectively. Knowing the specific articulators involved and the nature of the problem (e.g. muscle weakness, structural abnormality) can guide the development of targeted, effective therapy goals and interventions. For example, if the problem is related to weakness in the muscles used for speech, the therapist may focus on strengthening exercises for those muscles. By understanding the pathology of articulators, the therapist can provide more targeted, effective treatment and improve the individual's overall communication abilities.

Here are some articulators involved in speech sound production by shaping and modifying the basic sound produced by the vocal folds:

Lungs: Provide the air pressure required to produce speech sounds;
Larynx: Contains the vocal cords, which vibrate to produce voice;
Pharynx: A resonating chamber that modifies the sounds produced by the vocal cords;
Mandible (Lower jaw): Used to shape the oral cavity for speech and it is very important in shaping the vowels sounds;
Lip(s) movement: Used to produce bilabial sounds such as "b", "p", "m", and "w" and some labiodental sounds produced by bringing the lower lip to the upper front teeth as in "f" and "v";
Teeth: Used to produce sounds such as "th" in "think";
Alveolar ridge: Used in producing sounds such as "t," "d," "s," and "z";
Tongue movements (tip, blade, front, back, and root): Used to produce sounds such as "t", "d", "n", and "l";
Velum (Soft Palate): Used to produce nasal sounds such as "m", "n", and "ng";
Hard palate: Place to produce sounds such as "k," "g," and "ng";
Nasal & Oral cavities: Provides a resonating chamber for speech sounds;
Hyoid bone: Plays a crucial role not only in speech production by serving as a support for the tongue and larynx, but also in swallowing;
etc.

These articulators work together to produce the wide range of speech sounds used in human language.

Phonetic animation

B – [b] – Voiced bilabial plosive
C – [k] – Voiceless velar plosive
CH – [tʃ] – Voiceless post-alveolar affricate
D – [d] – Voiced alveolar plosives
DZ – [dʒ] – Voiced post-alveolar affricate
F – [f] – Voiceless labio-dental fricative
G – [g] – Voiced velar plosive
H – [h] – Voiceless glottal fricative
L – [l] – Voiced alveolar lateral liquid
M – [m] – Voiced bilabial nasal
N – [n] – Voiced alveolar nasals
NG – [ŋ] – Voiced velar nasal
P – [p] – Voiceless bilabial plosive
R – [r] – Trilled alveolar r
R – [r] – Voiced alveolar retroflex liquid prevocalic
R – [r] – Voiced palatal tip-down “bunched-r”
S – [s] – Voiceless alveolar fricative
SH – [ʃ] – Voiceless post-alveolar fricative
T – [t] – Voiceless alveolar plosives
Th – [ð] – Voiced dental fricative
Th – [θ] – Voiceless dental fricative
V – [v] – Voiced labio-dentals fricative
W – [w] – Voiced labio-velar approximant
Z – [z] – Voiced alveolar fricative
ZH – [ʒ] – Voiced post-alveolar fricative

*******

P-T-C-Voiceless-Stops-target & B-D-G-Voiced-Stops-target (Diadochokinetic (DDK))

**According to the International Phonetic Alphabet (IPA), there are around 24 consonants in English, including: /p/, /b/, /t/, /d/, /k/, /g/, /f/, /v/, /θ/, /ð/, /s/, /z/, /ʃ/, /ʒ/, /h/, /tʃ/, /dʒ/, /m/, /n/, /ŋ/, /l/, /r/, /w/, and /j/. Some other sources may also include glottal consonant /?/. The number of consonants in English can vary slightly depending on the accent or dialect, as well as the definition of what constitutes a distinct consonant sound. The English language has many speech sounds and they can be classified in various ways. One common way is to categorize them based on their place of articulation, manner of articulation, and voicing. Here is a table that classifies speech sounds in English based only on the manner of articulation:

Diadochokinetic (DDK) tasks - p-t-k/b-d-g taks involve repeating the syllables "p-t-k" & "b-d-g" rapidly and smoothly. These tasks are used by speech-language pathologists to assess speech motor control and coordination, and are typically performed in a clinical setting. They can be used to diagnose speech disorders, such as apraxia of speech or stuttering, and to monitor the progress of speech therapy.

Note that some sounds may be classified into multiple categories, depending on the classification system being used. This table is a general guide and there may be variations in speech sounds among different individuals and regions. However, it provides a basic framework for understanding and categorizing speech sounds based on their manner of articulation.

*** Nasals, specifically [m], [n], and [ŋ], are produced by stopping the airflow in the oral cavity and then redirecting it through the nasal cavity, making them nasal stops or plosives. Some S-LPs may classify nasals as separate from plosives because they involve a different mechanism for stopping the airflow in the vocal tract and redirecting it through the nasal cavity, as opposed to simply blocking it as in plosives. However, from a linguistic perspective, nasals can be classified as plosives because they involve a complete stoppage of the airflow, even if it is followed by a redirection through the nasal cavity. In my opinion, this viewpoint align with my linguistic perspective, but I am not dismissive of others' viewpoints.

Plosives (Stops)

B – [b] – Voiced bilabial plosive

The /b/ sound is a voiced bilabial plosive, which means that it is a consonant sound that is produced by using both lips to stop the airflow from the lungs, and then releasing the airflow suddenly to produce a burst of sound.

To produce the /b/ sound in English:

The lips are brought together and the airflow from the lungs is stopped by the lips;
The vocal cords start to vibrate as the lips are held tightly together;
The lips are suddenly released, allowing a burst of air to be released through the vocal cords and out of the mouth, producing the /b/ sound.

[b] at the beginning of a word is usually more fully voiced than [b] in the final or middle positions, as the preceding sound is usually a silent pause, allowing for the full vibration of the vocal cords.

In Romanian, the /b/ sound is also a voiced bilabial plosive, and it is produced in a similar way to English. However, there may be some subtle differences in the way the sound is pronounced, due to differences in accent and regional pronunciation variations. For example, the /b/ sound in Romanian may be pronounced with the lips slightly more rounded than in English, and the sound may be held for a slightly longer duration. However, these differences are generally quite small, and the /b/ sound in Romanian is quite similar to the /b/ sound in English.

[b] & [p] are consonant cognates. Two sounds are considered cognates when they are two variations of the same basic sound that differ only in the presence or absence of a feature (voice in our case)

P – [p] – Voiceless bilabial plosive

The voiceless bilabial plosive [p] is produced by:

Bringing the two lips together to create a complete closure in the oral cavity, then
Suddenly releasing the air pressure built up behind the closure ... to create a burst of air which produces the sound [p].

In general, [p] at the beginning of a word is usually more aspirated, with a burst of air being released before the vowel sound, than [p] in the final or middle positions. This is due to the fact that at the beginning of a word, the preceding sound is usually a silent pause, allowing for the build-up of air pressure.

* The difference in features between [p] and its voiced counterpart [b] lies in the vibration of the vocal cords. [p] is voiceless, meaning that the vocal cords do not vibrate during its production, whereas [b] is voiced, meaning that the vocal cords do vibrate. This results in a significant difference in sound quality between the two sounds, with [b] having a warmer and more resonant sound than [p]. Additionally, [p] is generally produced with more airflow and higher air pressure compared to [b].

** The "attack" (the beginning of the sound and how it is produced) of the voiced bilabial stop [b] in English words may differ from that in Spanish or Romanian. In English, the [b] sound is typically produced with a smooth and quick transition from silence to the full vibration of the vocal cords, with little to no aspiration. This results in a relatively quiet attack, as the vibration of the vocal cords takes over quickly. In Spanish and Romanian, the [b] sound is often produced with a more gradual transition from silence to the full vibration of the vocal cords, with a more pronounced aspiration at the beginning of the sound. This results in a more audible attack, with a more noticeable burst of air before the full vibration of the vocal cords takes over.

The [b] and [p] sounds are sometimes substituted by children during their language development. Here are some common substitutions for [b] and [p]:

- - B for P (e.g., "bup" for "pup")
  - D for P (e.g., "dup" for "pup")
  - T for P (e.g., "tup" for "pup")
  - P for B (e.g., "pup" for "bup")

These are normal developmental processes and typically resolve as children's speech and language abilities mature.

[b] & [p] are consonant cognates. Two sounds are considered cognates when they are two variations of the same basic sound that differ only in the presence or absence of a feature (voice in our case)

Here is a list of all English consonant cognates:

[p] (voiceless bilabial plosive) & [b] (voiced bilabial plosive)
[t] (voiceless alveolar plosive) & [d] (voiced alveolar plosive)
[k] (voiceless velar plosive) & [g] (voiced velar plosive)
[f] (voiceless labiodental fricative) & [v] (voiced labiodental fricative)
[s] (voiceless alveolar fricative) & [z] (voiced alveolar fricative)
[θ] (voiceless dental fricative) & [ð] (voiced dental fricative)
[ʃ] (voiceless postalveolar fricative) & [ʒ] (voiced postalveolar fricative)
[tʃ] (voiceless palato-alveolar affricate) & [dʒ] (voiced palato-alveolar affricate)

D – [d] – Voiced alveolar plosive

The voiced alveolar plosive [d] is produced by the following steps:
1. The air pressure builds up behind the closure of the tongue and the alveolar ridge;
2. The vocal cords vibrate, creating the source of sound;
3. The closure is released, causing a burst of air that creates a plosive sound;
4. The vibration of the vocal cords continues during the release of the closure, making the sound [d] voiced.
The tongue is typically positioned close to the alveolar ridge for the production of [d].

* In Spanish and Romanian, the production of [d] may be different due to differences in the phonology and phonetics of these languages. For example, the Spanish [d] is often realized as an alveolar plosive, whereas the Romanian [d] may have a retroflex or alveolar/postalveolar realization.
- [d] & [t] are consonant cognates. Two sounds are considered cognates when they are two variations of the same basic sound that differ only in the presence or absence of a feature (voice in our case)

T – [t] – Voiceless alveolar plosive

The voiced alveolar plosive [t] is produced by the following steps:

The air pressure builds up behind the closure of the tongue and the alveolar ridge;
There is not vibration of the vocal folds (voiceless);
The closure is released, causing a burst of air that creates a plosive sound.

In English, [t] is typically produced as a voiceless alveolar plosive, with the tongue positioned close to the alveolar ridge and the air released in a burst through the closure. In my language (Romanian) and Spanish, [t] may be produced as a dental plosive or an alveolar plosive, with the tongue positioned closer to the upper teeth or the alveolar ridge, respectively. The exact realization of [t] in Romanian & Spanish can also be influenced by the position of the sound within a word, regional dialects, and individual differences.

*** I am not a native speaker of English, but from my analysis, the sound [t] is often pronounced with a following glottal fricative [h] in some contexts. This is known as "aspirated [t]" or "[tʰ]." It was a while until I was able to create a [tʰ] at the beginning of some words in English. So, I can say now that the [t] sound is pronounced with a strong burst of air, followed immediately by a glottal fricative [h]. This [h] sound is produced by a narrowing of the airway in the region of the vocal cords, which creates friction and a characteristic hissing sound. I just wanted to mention this for your accent training if you are a non-native speaker like me. English is my third language and I started from scratch with no prior knowledge.

Paying attention to the pronunciation of aspirated [t] is important if you want to sound like a native speaker of English because this sound is an important part of the English language. The distinction between aspirated and unaspirated [t] can affect the meaning of words, and a speaker's use of these sounds can impact their overall intelligibility and comprehensibility to native English speakers.

Just also want to mention that I have traveled to more than half of the states in the US and I was amazed seeing so many different varieties of English.

The [d] and [t] sounds are sometimes substituted by children during their language development. Here are some common substitutions for [d] and [t]:

- - T for D (e.g. "tog" for "dog")
  - K for T (e.g. "kog" for "dog")
  - D for T (e.g. "dog" for "tog")
  - G for T (e.g. "gog" for "dog")

These are normal developmental processes and typically resolve as children's speech and language abilities mature.

[t] & [d] are consonant cognates. Two sounds are considered cognates when they are two variations of the same basic sound that differ only in the presence or absence of a feature (voice in our case)

G – [g] – Voiced velar plosive

The steps in producing the English [g] sound are:

Keep the mouth slightly open, with the lips relaxed;
Raise the back of the tongue (towards the velum (the soft palate)) and create a complete closure blocking the airway;
Build up air pressure (behind the closure) and rapidly release the air by suddenly dropping the tongue to normal position. As a result, the built-up air pressure is released and [g] sound is produced;
Make the vocal folds vibrate (creating the voiced aspect of the [g] sound).

[g] & [k] are consonant cognates. Two sounds are considered cognates when they are two variations of the same basic sound that differ only in the presence or absence of a feature (voice in our case)

C – [k] – Voiceless velar plosive

The steps in producing the English [k] sound are:

Keep the mouth slightly open, with the lips relaxed and raise the back of the tongue to makes contact with the soft palate (velum) to create a complete closure blocking the airway.
Remember - the vocal cords do not vibrate (resulting in a voiceless sound).
Build up air pressure (behind the closure) and rapidly release the air by suddenly dropping the tongue to normal position. As a result, the built-up air pressure is released and [k] sound is produced.

The [k] and [g] sounds are sometimes substituted by children during their language development. Here are some common substitutions for [k] and [g]:

- - G for K (e.g. "gog" for "dog")
  - T for K (e.g. "tog" for "dog")
  - D for G (e.g. "dog" for "gog")
  - D for K (e.g. "dog" for "kog")
  - K for G (e.g. "kog" for "dog")
  - J for G (e.g. "jog" for "dog")
  - Y for G (e.g. "yog" for "dog")

These are normal developmental processes and typically resolve as children's speech and language abilities mature.

[k] & [g] are consonant cognates. Two sounds are considered cognates when they are two variations of the same basic sound that differ only in the presence or absence of a feature (voice in our case)

M – [m] – Voiced bilabial nasal

The sound [m] in English is produced by the following steps:

Close the lips to create a complete obstruction in the vocal tract***;
Open the nasal passage by lowering the soft palate (velum) allowing the air to escape through the nose;
Make the vocal folds vibrate;
Maintain the lip closure throughout the duration of the sound and control the airflow to create the desired volume and pitch of the sound;
Release of the closure.

*** The /m/ sound is considered a stop sound. When producing it, the lips are brought together to create a complete closure in the vocal tract, stopping the airflow there and redirecting it through the nasal cavity by lowering the velum. This creates a build-up of air pressure behind the closure (lips) and the release of the air through nose. The release of the closure (lips) then results in a burst of air, which creates the characteristic sound of a stop consonant.

N – [n] – Voiced alveolar nasal

The sound [n] in English is produced by the following steps:

Place the tip of the tongue on the alveolar ridge (the bumpy ridge just behind the upper front teeth);
Keep the mouth slightly open (or even close) while making this sound (the tongue is pressed against the alveolar ridge to create a partial closure, and the air is then allowed to escape through the nose).
Vibrate your vocal folds to produce sound.
Allow air to escape through your nose while maintaining the partial closure at the alveolar ridge with the tip of your tongue.
Release of the closure.

NG – [ŋ] – Voiced velar nasal

The sound [ŋ] in English is produced by the following steps:

Place the back of your tongue against the velum (the soft palate at the back of the mouth)
Keep your mouth open.
Vibrate your vocal cords to produce sound.
Allow air to escape through your nose while maintaining the partial closure at the velum with the back of your tongue.

- - Common substitutions for the [m] sound include [b], [w], [n], and [v].
  - Common substitutions for the [n] sound include [m], [d], [t], and [l].
  - Common substitutions for the [ŋ] sound include [k], [g], [m], and [n].

These are normal developmental processes and typically resolve as children's speech and language abilities mature.

The [ŋ] sound is relatively uncommon in English, at the beginning of words, but we may find it in the middle or final position (as part of the suffix "-ing.")

- - being [biːɪŋ], seeing [siːɪŋ], doing [duːɪŋ], going [ɡoʊɪŋ], swimming [swɪmɪŋ], helping [hɛlpɪŋ], missing [mɪsɪŋ], ruling [ruːlɪŋ], thinking [θɪŋkɪŋ];
  - bring [brɪŋ], long [lɔŋ], strong [strɔŋ], wrong [rɔŋ], among [əˈmʌŋ], tongue [tʌŋ], song [sɔŋ], young [jʌŋ], ring [rɪŋ], cling [klɪŋ], bring [brɪŋ], hang [hæŋ],
  - bank [bæŋk], sink [sɪŋk], shrink [ʃrɪŋk], think [θɪŋk], stink [stɪŋk], drink [drɪŋk], link [lɪŋk], pink [pɪŋk],
  - brink [brɪŋk], chunk [tʃʌŋk], punk [pʌŋk], monk [mʌŋk], hunk [hʌŋk], spunk [spʌŋk]

Don't forget! Nasal sounds, specifically [m], [n], and [ŋ], are produced by stopping the airflow in the oral cavity, which creates a complete closure. This closure and the subsequent release of the pressure is what makes nasals a type of stop sound, sometimes referred to as a nasal plosive.

Fricatives

V – [v] – Voiced labio-dentals fricative

The steps in producing the English [v] sound are:

Bring the lower lip towards the upper front teeth (contact between the upper central incisors and the lower lip);
Exhale air through the narrow opening (the exhalation will cause turbulence in the airflow);
Activate the vocal cords (vibrating of the folds are producing the voiced sound).

By narrowing the space between the upper teeth and the lower lip, the air pass with turbulence creating the characteristic "fricative" sound (the air is forced through a narrow opening).

[v] & [f] are consonant cognates. Two sounds are considered cognates when they are two variations of the same basic sound that differ only in the presence or absence of a feature (voice in our case)

F – [f] – Voiceless labio-dental fricative

The steps in producing the English [f] sound are:

Bring the lower lip towards the upper front teeth (contact between the upper central incisors and the lower lip);
Exhale air through the narrow opening (the exhalation will cause turbulence in the airflow);

For the [f] sound, the vocal cords are not vibrating, producing the voiceless sound.

By narrowing the space between the upper teeth and the lower lip, the air pass with turbulence creating the characteristic "fricative" sound (the air is forced through a narrow opening).

- - Common substitutions for the [v] sound include [b], [w], and [f].
  - Common substitutions for the [f] sound include [v], [p], and [th].

These are normal developmental processes and typically resolve as children's speech and language abilities mature.

[f] & [v] are consonant cognates. Two sounds are considered cognates when they are two variations of the same basic sound that differ only in the presence or absence of a feature (voice in our case)

Z – [z] – Voiced alveolar fricative

Here are the steps to produce the voiced alveolar fricative [z] sound in English:

Relax the mouth and jaw;
Place the tip of the tongue just behind your upper front teeth (alveolar ridge);

- The tip of the tongue and the blade form a narrow channel in the mouth and the sides of the tongue touch the molars when producing fricative sounds (see the pictures for [z] sound). This narrow space (a little furrow) creates friction as the air passes through, resulting the fricative sound. You can make the sound with the tip of the tongue up or down but the blade is up and it create that narrow space where air is force to pass.

Start a steady flow of air from your lungs and let it to escape through the constricted space between your tongue and the alveolar ridge;
Vibrate your vocal cords to produce a voiced sound.
Keep the constricted space while maintaining the steady airflow and vocal folds vibration.

The position of the tip of the tongue during the production of the "z" sound can vary, and it can be either raised or lowered. The sides of the tongue seal the back cavity of the mouth, allowing the narrow channel formed by the blade of the tongue and the roof of the mouth to direct the airflow and create turbulence, producing the fricative sound.

[z] & [s] are consonant cognates. Two sounds are considered cognates when they are two variations of the same basic sound that differ only in the presence or absence of a feature (voice in our case)

S – [s] – Voiceless alveolar fricative

Here are the steps to produce the voiced alveolar fricative [s] sound in English:

Relax the mouth and jaw;
Place the tip of the tongue just behind your upper front teeth (alveolar ridge);

- The tip of the tongue and the blade form a narrow channel in the mouth and the sides of the tongue touch the molars when producing fricative sounds (see the pictures for [s] sound). This narrow space (a little furrow) creates friction as the air passes through, resulting the fricative sound. You can make the sound with the tip of the tongue up or down but the blade is up and it create that narrow space where air is force to pass.

Start a steady flow of air from your lungs and let it to escape through the constricted space between your tongue and the alveolar ridge;
Don't vibrate your vocal folds (it is a voiceless sound).
Keep the constricted space while maintaining the steady airflow without vibrating the vocal folds.

The position of the tip of the tongue during the production of the "s" sound can vary, and it can be either raised or lowered. The sides of the tongue seal the back cavity of the mouth, allowing the narrow channel formed by the blade of the tongue and the roof of the mouth to direct the airflow and create turbulence, producing the fricative sound.

- - Common substitutions for the [s] sound include [sh], [z], [ch], and [j].
  - Common substitutions for the [z] sound include [s], [d], and [zh].

These are normal developmental processes and typically resolve as children's speech and language abilities mature.

[s] & [z] are consonant cognates. Two sounds are considered cognates when they are two variations of the same basic sound that differ only in the presence or absence of a feature (voice in our case)

ZH – [ʒ] – Voiced post-alveolar fricative

The steps to produce the [ʒ] sound (voiced post-alveolar fricative) in English, are as follows:

The blade of the tongue is raised and positioned behind the lower front teeth;

- The blade of the tongue is positioned further back in the mouth for the [ʒ] sound compared to the [s] sound. For the [ʒ] sound, the blade of the tongue is positioned behind the lower front teeth, with the tip of the tongue is slightly curled back.

The blade of the tongue and the roof of the mouth form a narrow channel, while the sides of the tongue help to seal the back cavity of the mouth;
The vocal cords vibrate, producing a voiced sound;
The air is forced through the narrow channel, creating turbulence and producing the [ʒ] sound;
The lips are rounded.

[ʒ] & [ʃ] are consonant cognates. Two sounds are considered cognates when they are two variations of the same basic sound that differ only in the presence or absence of a feature (voice in our case)

SH – [ʃ] – Voiceless post-alveolar fricative

The steps to produce the [ʃ] sound (voiced post-alveolar fricative) in English, are as follows:

The blade of the tongue is raised and positioned behind the lower front teeth;

- The blade of the tongue is positioned further back in the mouth for the [ʃ] sound compared to the [s] sound. For the [ʃ] sound, the blade of the tongue is positioned behind the lower front teeth, with the tip of the tongue is slightly curled back.

The blade of the tongue and the roof of the mouth form a narrow channel, while the sides of the tongue help to seal the back cavity of the mouth.
The vocal cords do not vibrate, producing a voiceless sound.
The air is forced through the narrow channel, creating turbulence and producing the [ʃ] sound.
The lips are rounded.

- - Common substitutions for the [ʃ] sound include [s], [tʃ], [f], and [h].
  - Common substitutions for the [ʒ] sound include [dʒ], [z], [j], and [zh].

These are normal developmental processes and typically resolve as children's speech and language abilities mature.

[ʃ] & [ʒ] are consonant cognates. Two sounds are considered cognates when they are two variations of the same basic sound that differ only in the presence or absence of a feature (voice in our case)

Th – [ð] – Voiced dental fricative

The steps to produce the [ð] sound, which is a voiced dental fricative in English, are as follows:

The blade of the tongue is raised and positioned between the upper and lower front teeth.
The blade of the tongue and the upper front teeth form a narrow channel, while the sides of the tongue help to seal the back cavity of the mouth.
The vocal cords vibrate, producing a voiced sound.
The air is forced through the narrow channel, creating turbulence and producing the [ð] sound.
The lips are relaxed and slightly apart.

The [ð] English sound is sometimes substituted by children during their language development. Here are some common substitutions for [ð]:

- - D for [ð] (e.g. "dese" for "these")
  - Z for [ð] (e.g. "zeese" for "these")
  - S for [ð] (e.g. "seese" for "these")
  - F for [ð] (e.g. "fese" for "these")

These are normal developmental processes and typically resolve as children's speech and language abilities mature.

[ð] & [θ] are consonant cognates. Two sounds are considered cognates when they are two variations of the same basic sound that differ only in the presence or absence of a feature (voice in our case)

Th – [θ] – Voiceless dental fricative

The steps to produce the [θ] sound, which is a voiceless dental fricative in English, are as follows:

The blade of the tongue is raised and positioned between the upper and lower front teeth.
The blade of the tongue and the upper front teeth form a narrow channel, while the sides of the tongue help to seal the back cavity of the mouth.
The vocal cords don't vibrate, producing a voiceless sound.
The air is forced through the narrow channel, creating turbulence and producing the [θ] sound.
The lips are relaxed and slightly apart.

The [θ] English sound is sometimes substituted by children during their language development.

Common substitutions for the [θ] sound include [s], [t], [f], [d], and [th].

These are normal developmental processes and typically resolve as children's speech and language abilities mature.

[θ] & [ð] are consonant cognates. Two sounds are considered cognates when they are two variations of the same basic sound that differ only in the presence or absence of a feature (voice in our case)

H – [h] – Voiceless glottal fricative

The voiceless glottal fricative sound represented by the symbol [h] in English is produced by narrowing the space between the vocal cords and releasing a burst of air. Here are the steps to make the [h] sound:

Relax the jaw and open the mouth slightly, with the lips relaxed;
Take a deep breath in and let the air build up in your lungs;
Narrow the space between the vocal cords*;
Release a burst of air while keeping the glottal constriction of the vocal cords;
Maintain the glottal constriction and continue to release the air for as long as you need to produce the sound.

*The space between the vocal cords is narrowed by bringing the cords closer together. This can be achieved through the following steps:

- - Take a deep breath and let the air build up in the lungs
  - Relax the entire body (especially the throat & neck muscles)
  - Contract the muscles in the larynx (voice box) to bring the vocal cords closer together by doing glottal stops or murmurs, which involve making a short, abrupt closure of the vocal cords, followed by a release of air.
  - Gradually increase the duration and intensity of the closure until you can produce the voiceless glottal fricative sound [h].

Affricates

CH – [tʃ] – Voiceless post-alveolar affricate

An affricate speech sound is produced by a combination of a stop (a complete closure of the vocal tract) followed by a fricative (a narrowing of the vocal tract that creates a turbulence of air with unique characteristic of hissing/buzzing ). Although affricates are produced by the combination of a stop and a fricative, they are unique sounds different from the individual stop and fricative sounds that make them up. The voiceless post-alveolar affricate [tʃ] is a common example of an affricate, as it is produced by the combined sounds of the voiceless alveolar stop [t] and the voiceless postalveolar fricative [ʃ].

The English voiceless post-alveolar affricate [tʃ] is made by narrowing the space between the alveolar ridge (just behind the upper front teeth) and the hard palate (roof of the mouth) so that the air must pass through a narrow channel, producing turbulence.

The steps to produce the [tʃ] sound are as follows:

Start with the tongue in the position for the voiceless alveolar stop [t], with the tip of the tongue touching the alveolar ridge and the sides of the tongue touching the molars.
Move the tip of the tongue slightly forward and down while still blocking the airflow, creating the narrow channel between the alveolar ridge and the hard palate.
Quickly release the airflow, creating turbulence which produces the distinct sound of the affricate [tʃ]. The precise shape and size of the vocal tract, as well as the precise manner in which the airflow is released, determines the specific quality of the affricate [tʃ] sound.
Hold the narrowing of the space between the alveolar ridge and the hard palate, allowing the turbulence to continue, producing the [tʃ] sound.

Here are some examples of English words that contain the [tʃ] sound: "church," "teach," "watch," "lunch," etc.

In young children, the [tʃ] sound is often one of the later-developing sounds in English, and they may use different substitutions for it. Common substitutions for the [tʃ] sound include:

[t]: Children may produce a "[t]" sound instead of the expected "[tʃ]" sound in words like "cheese" or "teach".
[s]: Children may produce an "[s]" sound instead of "[tʃ]" in words like "cheese" or "teach".
[k]: Some children may use the "[k]" sound in place of the "[tʃ]" sound. This is a common substitution in words like "check" or "chalk".
Glottal stop [ʔ]: Children may produce a glottal stop in place of the "[tʃ]" sound, especially in words that start with "ch". For example, they may say "uh-uh" instead of "choo-choo".

These substitutions are common and typical in young children's speech development, and they often improve as they grow older and their speech and language skills mature.

[tʃ] & [dʒ] are consonant cognates.Two sounds are considered cognates when they are two variations of the same basic sound that differ only in the presence or absence of a feature (voice in our case)

DZ – [dʒ] – Voiced post-alveolar affricate

The [dʒ] affricate sound in English is produced as a combination of two separate articulations: the plosive [d] sound made at the alveolar ridge, followed by the fricative [ʒ] made at the post-alveolar region. [dʒ] is a voiced affricate, meaning that the vocal cords vibrate during the sound's production.

Here are the steps to produce the [dʒ] sound:

Start with the tongue in a [d] position, with the tip of the tongue touching the alveolar ridge just behind the upper front teeth.
Close the vocal tract, creating a build-up of air pressure and then, quickly release the closure to produce the plosive [d] sound while the center part of the tongue raises and moves forward to the post-alveolar region, narrowing the space between the tongue and the roof of the mouth. This narrowing of the vocal tract creates the turbulence in the airflow, producing the fricative [ʒ] sound.
Don't forget to make the vocal folds vibrate during the [dʒ] sound production
Maintain the position of the tongue while allowing air to flow through the narrow space, producing the voiced post-alveolar affricate [dʒ].

Examples of English words that contain the [dʒ] sound include "jump", "gym", and "just".

The [dʒ] sound is often substituted with other sounds during the development of speech sounds in children. Here are some common substitutions that young children may use in place of the [dʒ] sound:

[j]: Children may produce a "[y]" or "[j]" sound instead of the expected "[dʒ]" sound in words like "jump" or "gym".
[g]: Some children may use the "[g]" sound in place of the "[dʒ]" sound. This is a common substitution in words like "giant".
[d]: Children may produce a "[d]" sound instead of the "[dʒ]" sound, especially in words that start with "[dʒ]". For example, they may say "duh-ump" instead of "jump".
[ʒ]: Some children may use the "[ʒ]" sound in place of the "[dʒ]" sound. This is a common substitution in words like "measure".

Remember! These substitutions are common and typical in young children's speech development, and they often improve as they grow older and their speech and language skills mature.

[tʃ] & [dʒ] are consonant cognates.Two sounds are considered cognates when they are two variations of the same basic sound that differ only in the presence or absence of a feature (voice in our case).

[dʒ] and [tʃ] are both affricates, The difference between the two sounds lies in their manner of articulation and the involvement of the vocal cords. [dʒ] is a voiced affricate, meaning that the vocal cords vibrate during the sound's production, whereas [tʃ] is an unvoiced affricate, meaning that the vocal cords do not vibrate. Additionally, [dʒ] is pronounced with the tongue positioned towards the back of the mouth (post-alveolar), while [tʃ] is pronounced with the tongue positioned further forward (palato-alveolar).

Nasals

M – [m] – Voiced bilabial nasal

The sound [m] in English is produced by the following steps:

Close the lips to create a complete obstruction in the vocal tract***;
Open the nasal passage by lowering the soft palate (velum) allowing the air to escape through the nose;
Make the vocal folds vibrate;
Maintain the lip closure throughout the duration of the sound and control the airflow to create the desired volume and pitch of the sound;
Release of the closure.

N – [n] – Voiced alveolar nasal

The sound [n] in English is produced by the following steps:

Place the tip of the tongue on the alveolar ridge (the bumpy ridge just behind the upper front teeth);
Keep the mouth slightly open (or even close) while making this sound (the tongue is pressed against the alveolar ridge to create a partial closure, and the air is then allowed to escape through the nose).
Vibrate your vocal folds to produce sound.
Allow air to escape through your nose while maintaining the partial closure at the alveolar ridge with the tip of your tongue.
Release of the closure.

NG – [ŋ] – Voiced velar nasal

The sound [ŋ] in English is produced by the following steps:

Place the back of your tongue against the velum (the soft palate at the back of the mouth)
Keep your mouth open.
Vibrate your vocal cords to produce sound.
Allow air to escape through your nose while maintaining the partial closure at the velum with the back of your tongue.

The [ŋ] sound is relatively uncommon in English, at the beginning of words, but we may find it in the middle or final position (as part of the suffix "-ing.")

being [biːɪŋ], seeing [siːɪŋ], doing [duːɪŋ], going [ɡoʊɪŋ], swimming [swɪmɪŋ], helping [hɛlpɪŋ], missing [mɪsɪŋ], ruling [ruːlɪŋ], thinking [θɪŋkɪŋ];
bring [brɪŋ], long [lɔŋ], strong [strɔŋ], wrong [rɔŋ], among [əˈmʌŋ], tongue [tʌŋ], song [sɔŋ], young [jʌŋ], ring [rɪŋ], cling [klɪŋ], bring [brɪŋ], hang [hæŋ],
bank [bæŋk], sink [sɪŋk], shrink [ʃrɪŋk], think [θɪŋk], stink [stɪŋk], drink [drɪŋk], link [lɪŋk], pink [pɪŋk],
brink [brɪŋk], chunk [tʃʌŋk], punk [pʌŋk], monk [mʌŋk], hunk [hʌŋk], spunk [spʌŋk]

Liquids

L – [l] – Voiced alveolar lateral liquid

The [l] – Voiced alveolar lateral liquid in English is produced by narrowing the airflow at the alveolar ridge (the ridge behind the upper teeth) while keeping the sides of the tongue relaxed (see the animation or the flashcard). Here's how to produce the [l] sound:

Keep your lips apart and relaxed.
Place the tip of the tongue behind the upper front teeth (at the alveolar ridge), with the sides of your tongue relaxed and without touching the teeth.
Create a partial closure between the blade of the tongue and the alveolar ridge (the middle of the tongue is contacting the alveolar ridge, but the sides of the tongue are relaxed letting the air flow escaping out laterally through the sides of the tongue). This escape is causing the airflow to be slightly turbulent.
Vibrate your vocal cords to produce the voiced feature of [l] sound.
Maintain the partial closure and airflow constriction as you exhale, allowing the sound to continue.

It's important to note that the key to producing the [l] sound is to keep the sides of the tongue relaxed and not touching the sides of the mouth or the teeth. This allows the air to flow out through the sides of the tongue, producing the distinctive lateral liquid quality of the [l] sound.

The [l] sound is called a liquid because it has a fluid-like quality. Unlike other speech sounds, which are produced by completely or partially blocking the airflow and then releasing it, liquids are produced by constricting the airflow to create turbulence and then allowing it to flow around the sides of the tongue. This creates a continuous, flowing sound, similar to the flow of a liquid. The term "liquid" is also used to describe other speech sounds that are produced in a similar way, such as the [r] sound. These sounds are known for their fluid quality and are commonly referred to as lateral liquids or rhotics.

The [l] sound is a speech sound that can be difficult for some children to produce accurately. Here are some common substitutions for the [l] sound in children:

[w]: Children may produce a "[w]" sound instead of the expected "[l]", especially when speaking quickly (e.g, the word "ball" may be pronounced as "baw").
[r]: Some children may substitute the "[r]" sound for "[l]" (e.g., "ball" pronounced as "bar").
[d]: Another common substitution is "[d]". For example, the word "ball" may be pronounced as "bad".
[ɫ]: Some children may substitute the dark "[ɫ]" sound for "[l]". [ɫ] is a velarized or "dark" version of the alveolar lateral liquid [l]. The difference between the two sounds is that in [ɫ], the back of the tongue is raised higher towards the soft palate (velum) than in [l]. This gives the [ɫ] sound a more pronounced resonance and makes it sound "darker" than the [l] sound..

It's important to keep in mind that these substitutions are common in children and typically resolve as the child's speech development progresses. If a child's speech difficulties persist, it may be beneficial to seek the assistance of a speech-language pathologist.

R* – [ɻ] – Voiced alveolar retroflex liquid prevocalic**

The sound represented by [ɻ] is a voiced alveolar retroflex liquid, made in English as an allophone of /r/. This sound is often described as a type of 'r-sound' and can be found in many English words, such as "red" or "car." The [ɻ] sound, also known as the voiced alveolar retroflex liquid prevocalic, is produced by shaping the tongue into a retroflex position in the mouth and vibrating the vocal cords to produce a voiced sound. Here's how to produce the [ɻ] sound in English:

Raise the tip of your tongue to the alveolar ridge, just behind the upper front teeth.
Curl the tip of the tongue backwards, so that it touches the roof of the mouth.
Vibrate the vocal cords to produce a voiced sound.
Relax your tongue and release the air flow in a continuous, steady manner.

The exact shape and position of the tongue during the production of the [ɻ] sound can vary between native speakers, and it is possible for there to be a slight opening between the tongue and the roof of the mouth rather than a direct touch.

R – [r] – Trilled alveolar r

The [r] trilled alveolar sound, is produced by rapidly vibrating the tip of the tongue against the alveolar ridge, just behind the upper front teeth.

To make this sound in English:

Raise the tip of your tongue to the alveolar ridge.
Rapidly vibrate the tip of your tongue against the alveolar ridge to produce a trilled sound.
Vibrate your vocal cords to produce a voiced sound.
Relax your tongue and release the sound.

The trilled alveolar r is not used in all English dialects. In some dialects, such as British English, the [r] sound is typically pronounced as a vocalic [ɹ], which is produced by relaxing the tongue and allowing air to flow freely through the mouth. This sound is sometimes referred to as a "lazy" or "burred" r.

R – [ʀ] – Voiced palatal tip-down “bunched-r”

The [ʀ] sound is not commonly found in English. It is sometimes referred to as the voiced palatal tip-down “bunched-r” and is a variant of the [r] sound. This sound is produced by placing the tip of the tongue in the palatal region (the hard palate) and vibrating it in a downward motion. It may be used by some English speakers as a variant of the [r] sound, but it is not widely used or recognized in the English language. Instead, the standard [r] sound in English is the trilled alveolar r or the vocalic [ɹ].

The /r/ is considered a liquid sound in phonetics. It is a type of consonant sound characterized by continuous, unobstructed airflow (the r sounds are produced with a slight obstruction at the tip of the tongue and the hard palate.). The classification of the [r] sounds can vary depending on the linguistic framework used.

Glides

W – [w] – Voiced labio-velar (glide) approximant

The back part sides of the tongue, near the velum (soft palate), touche the soft palate during the production of the [w] sound while the middle part of the tongue remains a little bit lower to allow air to flow through to escape (producing the [w] sound). The tongue shape and the position of the velum create that small opening, which creates turbulence in the airflow, causing the vocal cords to vibrate and to produce the sound [w]. The rounded lips also play a role in shaping the airflow and determining the quality of the sound produced.

To produce this sound, follow these steps:

Position the lips in a rounded shape.
Bring the back of the tongue up to touch the soft palate while the middle of the tongue is slightly lowered to allow air to flow.
Vibrate your vocal cords to produce sound while the air flow continues without any interruption.
Keep the mouth relaxed and the jaw slightly open.

We are not making any tight closure when allowing air to flow freely through the narrow opening. This type of speech sound is considered a continuant, as opposed to a stop, in which the articulators make a complete closure, completely blocking the air flow, before releasing it again. Approximants like the [w] sound are characterized by a relatively low level of turbulence in the airflow, which results in a more moderate or weak sound energy compared to the stronger and more turbulent fricatives or stops. The [w] sound is an example of a voiced labial-velar approximant, meaning it is produced with the lips rounded and the back of the tongue touching the velum, creating a narrow opening for the air to flow through.

There are four English approximant sounds:

[w]: Voiced labial-velar approximant, as in the word "wet".
[j]: Voiced palatal approximant, as in the words "yes" or "yellow".
[l]: Voiced alveolar lateral approximant, as in the words "love" and "lull".
[r]: Voiced alveolar flap or trill, as in the words "red" and "right".

Note that the distinction between an "approximant" and a "flap or trill" depends on the manner of articulation, or how the speech sound is produced, rather than the place or type of vibration in the vocal cords. These sounds are similar in that they involve a partial closure or narrowing of the articulators, creating a continuous and relatively low-turbulence airflow.

The term "liquid" is used to describe a subgroup of approximant sounds that have a relatively fast, smooth, and continuous airflow, similar to liquids. In English, the [l] and [r] sounds are typically referred to as the liquid sounds. Like other approximants, liquid sounds involve a partial closure or narrowing of the articulators, creating a narrow opening for the air to flow through, producing a relatively weak and moderate sound energy compared to other speech sounds such as stops and fricatives.

The glides (/j/ and /w/) and the liquids (/r_s/ and /l/) in American English can be grouped together in this larger category called the "approximants". Glides, also are known as semivowels—speech sounds that are produced with a more gradual transition from one vowel to another, serving as a bridge between two vowel sounds. For example, the [w] sound in the word "wide" functions as a glide, connecting the initial [ɛ] sound to the [ɪ] sound. This gradual transition between two vowel sounds is what characterizes the [w] sound as a glide or semivowel.

Approximants

Don't forget that all sounds are influenced by the surrounding sounds, such as vowels, and the rhythm and stress patterns of the words in which it appears (the process is called "co-articulation").

Co-articulation is the phenomenon in which the articulation of one speech sound is influenced by adjacent speech sounds, resulting in a blending or overlap of the movements involved in producing the sounds.

In other words, when producing speech, the muscles used to produce one sound often begin to move before the previous sound is complete, and continue to move after the next sound begins. This overlapping of movements can result in a smoothing of the sounds and a reduction of the distinction between them.

Co-articulation can occur between sounds produced by the same articulator, such as the lips, or by different articulators, such as the tongue and the lips. It is an important aspect of speech production that allows for the rapid, fluent, and continuous production of speech sounds, and contributes to the distinctive rhythm and melody of different languages and dialects.

Here is a short example of co-articulation that I recall reading about a long time ago while I was in linguistics classes at Concordia University:

ANCA pronunciation!

In this case, the back of the tongue is fronted due to co-articulation with the following [n] sound, the front part of the tongue, rather than the back, may touch the hard palate (roof of the mouth). This fronting of the tongue results in a reduction of the closure between the back of the tongue and the velum (soft palate), and a modification of the airflow and acoustics of the preceding speech sound, such as [k]. The fronting of the tongue can be thought of as a preparation for the upcoming [n] sound, which requires a different tongue position in the mouth.

!!!

If you're concerned about your child's speech, it's helpful to have them evaluated by a speech-language pathologist to determine if they have a speech or language difficulty and to develop a plan to improve their speech.

It is my hope that this page information will prove to be beneficial. Thanks for reading it!

Written by Natanael Dobra - Communicative Disorders Assistant (CDA)

You are not authorised to post comments.

Comments will undergo moderation before they get published.

Home

Free Stuff

Disorders

SLP Adults

Phonological Awareness

Forum

Services

Language Switcher

Speech Sounds

Phonetic animation

B – [b] – Voiced bilabial plosive

P – [p] – Voiceless bilabial plosive

D – [d] – Voiced alveolar plosive

T – [t] – Voiceless alveolar plosive

G – [g] – Voiced velar plosive

C – [k] – Voiceless velar plosive

M – [m] – Voiced bilabial nasal

N – [n] – Voiced alveolar nasal

NG – [ŋ] – Voiced velar nasal

Fricatives

V – [v] – Voiced labio-dentals fricative

F – [f] – Voiceless labio-dental fricative

Z – [z] – Voiced alveolar fricative

S – [s] – Voiceless alveolar fricative

ZH – [ʒ] – Voiced post-alveolar fricative

SH – [ʃ] – Voiceless post-alveolar fricative

Th – [ð] – Voiced dental fricative

Th – [θ] – Voiceless dental fricative

H – [h] – Voiceless glottal fricative

Affricates

CH – [tʃ] – Voiceless post-alveolar affricate

DZ – [dʒ] – Voiced post-alveolar affricate

Nasals

M – [m] – Voiced bilabial nasal

N – [n] – Voiced alveolar nasal

NG – [ŋ] – Voiced velar nasal

L – [l] – Voiced alveolar lateral liquid

R*** – [ɻ] – Voiced alveolar retroflex liquid prevocalic

R – [r] – Trilled alveolar r

R – [ʀ] – Voiced palatal tip-down “bunched-r”

W – [w] – Voiced labio-velar (glide) approximant

You are not authorised to post comments.

Our Latest News

Popular Articles

Get Involved!

R* – [ɻ] – Voiced alveolar retroflex liquid prevocalic**