Formation of speech sounds

There are two main types of speech sound: vowels and consonants. These two types of speech sound are produced by the different ways in which air is broken up as it passes through the mouth. When a vowel is formed, air is passed continuously over the tongue without any break. The position in which the tongue is held determines which vowel is produced, and the lips help modify the sound. The O sound is produced when the tongue is pulled back in the mouth with the lips pursed. E is the opposite, occurring when the tongue is pushed forward in the mouth and the lips are pulled wide. Consonants are produced by breaking up the sound into bursts. Sounds such as the short b in bush and p in push are produced by stopping the flow of air for an instant and then releasing it.

The long m and n sounds differ from the others. The air escapes through the nose while they are being produced which is the reason why these consonants are difficult to produce when the nose is blocked. In all other sounds the soft palate moves up and blocks air flow through the nose. When consonant and vowel sounds are joined in particular patterns, intelligible speech is produced.

Human speech organs can produce an enormous variation in sounds. In every language only a small fraction of the possible sounds are used. That is why it is sometimes so difficult to pronounce foreign languages that include ‘unusual’ sounds.


Singing is speaking at a controlled pitch. Physiologically, what distinguishes singing from speaking is the manner in which the breath is expended to vibrate the vocal cords. Singing requires far more breath than speech; and the louder, higher and longer you sing, the more breath is needed. A further distinction is the control required in singing of the movement and reflexes of the larynx. There is not much movement of the larynx within a singer’s normal range, which in most people is about an octave and a third. The ability to sing above or below that range indicates a degree of technical accomplishment on the singer’s part, who has gained control over the ability to tighten or loosen the vocal cords in order to obtain the correct frequency. As singers are constantly tightening and vibrating their vocal cords they sometimes bruise the edges. These bruises are called singers nodules. Several languages, including Chinese and a number of African tongues, rely heavily on the melodic inflection of speech to communicate meaning. This requires the appropriate selection of various rising, sustained and falling sounds to express the full meaning of a word. Chinese words are monosyllabic and many have multiple meanings. If these are spoken without vocal inflection, such as when whispering, intelligibility is reduced by at least one-third.

Control of speech

In order to speak three things must happen in the brain. The brain has to decide what to say, organize it into words and grammar, and then pass on the message to the speech organs. The brain areas involved in controlling speech are normally located in the left hemisphere. This is true of most right-handed individuals because their left hemisphere is almost always dominant. About 60 per cent of left-handed people, in whom the right hemisphere would be expected to dominate, have their speech centres located in the left hemisphere. The main region for controlling speech in the brain is called Broca’s area. This holds the information concerned with the patterns of sound of a word. It receives its messages from an area situated behind it called Wernicke’s area. Wernicke’s area deals with understanding the meaning of a word.

When the message from the section of the brain that has decided what to say reaches Broca’s area it is organized into correct form and grammar. The appropriate instructions are then relayed to the motor area of the brain, which activates the organs of speech.

Speech defects

As speech is organized in such a complicated manner, it can be impaired in many different ways. Damage to the Broca’s area will result in a lost ability to form words. Someone with damage to Wernincke’s area loses the ability to understand the speech of both himself and others. Both disturbances of speech are called aphasia. Other diseases affecting the nerves, muscles or brain structures may lead to a weak voice or to an inability to articulate words. Hoarseness is most often the result of a local affliction of the vocal cords. This may be caused by an infection (laryngitis) or a (benign or malignant) tumour. It also can be the result of overstraining the cords, for example, yelling at a football match, singing very high notes or having to talk at the top of ones voice at a party. In the latter case alcohol consumption will contribute to this impairment, because it makes it more difficult to control the muscles of the vocal cords.

Artificial speech

Scientists have been attempting to re-create speech artificially for centuries. Dramatic developments in modern electronics have enabled the construction of sophisticated speech synthesizers in various laboratories throughout the world. In fact, computer enthusiasts report that it is now possible to purchase a ‘speech synthesis’ chip for the home computer, which is capable of producing a range of basic speech sounds. A speech synthesizer is basically an electronic representation of the human vocal tract, which can be made to produce speech-like sounds that are amazingly natural.

The speech synthesizer has a range of potential applications stretching beyond that of the robot, into the fields of psychology and medicine. One major contribution so far has been to the study of the various physical characteristics that are involved in the generation, perception and recognition of speech sounds. For example, by making slight adjustments to the electronic ‘model’ of the vocal tract, it is possible to discover the subtle physical anomalies in the real vocal tract of someone who stutters.