Articulator - an overview | ScienceDirect Topics (2024)

An articulator is defined as a “mechanical instrument that represents the temporomandibular joints (TMJs) and jaws, to which maxillary and mandibular casts may be attached to simulate some or all mandibular movements.”

From: Misch's Avoiding Complications in Oral Implantology, 2018

Add to Mendeley

Chapters and Articles

Occlusion

David Ricketts, in Advanced Operative Dentistry, 2011

Fully adjustable articulators

Fully adjustable articulators are more complex devices allowing the clinical scenario to be most closely reproduced. Instead of flat tracks and planes that reproduce the condylar movements on semi-adjustable articulators, fully adjustable articulators have further components that can be adjusted and use curved condylar inserts that can more accurately reproduce the three-dimensional nature of the glenoid fossa anatomy. These articulators require more complex information and time to program them, and pantographic (see Cadiax system) and stereographic recordings are needed. As these articulators are only as accurate as the recordings used to program them and are usually reserved for the most complex of restorative procedures, the semi-adjustable articulator is the articulator of choice for the vast majority of clinical situations.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780702031267000065

Diagnostic Casts, Surgical Templates, and Provisionalization

Randolph R. Resnik, Carl E. Misch, in Dental Implant Prosthetics (Second Edition), 2015

Class 2: Arbitrary Plane Line (Average Value)

This type of articulator has evolved from the simple hinge that does allow restricted lateral movement. Arbitrary plane line articulators have fixed arbitrary condylar inclinations, vertical axes of rotation settings, and Bennett angles. The main disadvantage of nonadjustable articulators is the significant differences between the hinge closure of the articulator compared with the patient's anatomy. A closed-mouth MIP recording of the patient is made because an open-bite registration in centric relation does not correspond to the arc of mandibular closure with a nonadjustable articulator. The distance between the hinge and teeth is less on a nonadjustable articulator; therefore, a steeper curve exists upon closing, which results in premature contacts and incorrect ridge and groove direction in the final prosthesis (Figure 18-2, B).

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B978032307845000018X

Articulators, Transfer Records, and Study Casts

Rob Jagger, Iven Klineberg, in Functional Occlusion in Restorative Dentistry and Prosthodontics, 2016

Fully Adjustable Articulators

Fully adjustable articulators have the potential to provide the most accurate replication of jaw relationships and lateral and protrusive jaw mandibular movement. However, they are complex instruments and are technique-sensitive, and the complex condylar adjustments for individual patients require pantographic tracings. The reproducibility of the transfer of this condylar information to the articulator is an expectation. Their indication has been principally for extensive fixed prosthodontics and restorative dentistry with the assumption that all aspects of fixed prosthodontic laboratory work can be completed on such articulators and their individualized condylar settings (derived from pantographic tracings), with minimal clinical verification. They are seldom used in clinical practice because semiadjustable articulators with clinical verification and adjustment where indicated has become the acceptable approach.

Neural Models of Motor Speech Control

Frank H. Guenther, Gregory Hickok, in Neurobiology of Language, 2016

58.7 The HSFC Model

DIVA is by far the most detailed and explicit model of speech motor control as well as the most influential to date. As noted, DIVA uses feedback control architecture to detect and correct overtly produced errors. However, there is evidence in the motor control literature generally (Shadmehr & Mussa-Ivaldi, 1994; Tian & Poeppel, 2010; Wolpert, Ghahramani, & Jordan, 1995) and in the speech production literature more specifically for internal feedback control, that is, detecting and correcting internal coding errors prior to overt speech production. We describe one such model here, the HSFC model illustrated in Figure 58.6.

Articulator - an overview | ScienceDirect Topics (1)

Figure 58.6. The HSFC model. The HSFC model includes two hierarchical levels of feedback control, each with its own internal and external sensory feedback loops. As in psycholinguistic models, the input to the HSFC model starts with the activation of a conceptual representation that, in turn, excites a corresponding word (lemma) representation. The word level projects in parallel to sensory and motor sides of the highest, fully cortical level of feedback control, the auditory–Spt–BA44 loop. This higher-level loop, in turn, projects, also in parallel, to the lower-level somatosensory–cerebellum–motor cortex loop. Direct connections between the word level and the lower-level circuit may also exist, although they are not depicted here. The HSFC model differs from the state feedback control (SFC) model in two main respects. First, “phonological” processing is distributed over two hierarchically organized levels, implicating a higher-level cortical auditory-motor circuit and a lower-level somatosensory-motor circuit, which approximately map onto syllabic and phonemic levels of analysis, respectively. Second, a true efference copy signal is not a component of the model. Instead, the function served by an efference copy is integrated into the motor planning process. BA, Brodmann area; M1, primary motor cortex; S1, primary somatosensory area; aSMG, anterior supramarginal gyrus; STG, superior temporal gyrus; STS, superior temporal sulcus; vBA6, ventral BA6. The HSFC model is squarely within the tradition of the DIVA model in that it assumes that the targets of speech gestures are coded in auditory space and that feedback control is a key computational operation of the network. HSFC differs from DIVA in three respects: (i) it assumes an internal as well as an external feedback detection/correction mechanism; (ii) it situates auditory and somatosensory feedback loops in a hierarchical arrangement (auditory loop being higher-level and somatosensory loop being lower-level); and (iii) it assumes a modified computational architecture for the feedback loops.

The empirical motivation for an internal feedback loop in the speech domain comes from three sources. One is simply that we can imagine speaking and hear ourselves in our “mind’s ear.” Experimental research on such inner speech has shown that imagined speech can contain inadvertent errors that are internally detected. Further, the types and distribution of such errors are similar to what is observed in overt speech (e.g., phonemic errors show a “lexical bias”0 (Nozari, Dell, & Schwartz, 2011; Oppenheim & Dell, 2008). A second source is that talkers correct partially articulated speech errors faster than they should be able to if they were relying on overt auditory feedback alone (Nozari et al., 2011); it is an open question whether somatosensory feedback may explain this phenomenon. The third source is conduction aphasia, a syndrome characterized by fluent speech output and intact speech perception abilities but with a higher than normal rate of phonemic errors in production. Crucially, affected individuals can readily detect their own errors once spoken but have trouble correcting them, even after they have been overtly detected (Goodglass, 1992). This pattern of speech behavior can be explained by a damaged internal error detection and correction loop (leading to the higher error rate) with an intact external feedback loop (allowing for detection of overtly produced errors).

The HSFC model, like DIVA, assumes that a basic planning unit in auditory space is approximately at the syllable level. The somatosensory circuit, however, is hypothesized to code sensory targets at a lower level. The basic idea is that speech production involves a cyclic opening and closing of the vocal tract (approximately corresponding to vowels and consonants respectively) and that the somatosensory system defines the targets of these opening or closing gestures (Gracco, 1994; Gracco & Lofqvist, 1994) similar to the somatosensory target map in the DIVA model but involving targets for individual phonemes rather than a single target for a whole syllable. The HSFC model holds that the internal auditory loop comprises a fully cortical network including auditory regions in the STG, motor regions in the IFG, and an auditory-motor interface network, Spt, in the posterior planum temporale region (of course, the external feedback loop involves noncortical structures). The somatosensory loop comprises somatosensory regions in the inferior parietal lobe, lower-level motor regions in primary motor cortex and/or Brodmann area 6, and a somatosensory-motor interface in the cerebellum. The hypothesis that the cerebellum is part of the lower-level sensory-motor circuit is motivated by the nature of speech deficits after cerebellar damage, which tend to be fairly low-level dysarthrias compared with the higher-level phonological deficits found in conduction aphasics with cortical damage (Ackermann, Mathiak, & Riecker, 2007; Baldo, Klostermann, & Dronkers, 2008; Buchsbaum et al., 2011; Kohn, 1984).

Architecturally and computationally, the HSFC (Figure 58.6) differs somewhat from DIVA (Figure 58.3). In the HSFC there are two sensory-motor feedback loops, both of which involve three components, sensory target representations, motor codes for “hitting” those targets (learned via external feedback), and a sensory-motor coordinate transform network. The latter is assumed to compute the relation between sensory and motor representations of speech units. In DIVA, speech production begins with the activation of a speech sound map in the frontal lobe; in HSFC, production begins with the activation of both auditory and motor units corresponding to the intended word. The auditory units comprise the target for motor unit selection in the same sense that a visually perceived object might comprise the target for activating motor units to execute a reaching action. The difference with speech is that the sensory (auditory) target is not physically present in the environment but is instead a (re-)activated mental representation of the sensory target (i.e., a sound pattern). During motor activation, the accuracy of motor unit selection is checked, internally, prior to speech articulation. If motor and sensory units match, then articulation proceeds. If there is a mismatch, then a correction signal can be generated prior to articulation. Computationally, this internal “checking” mechanism is instantiated via excitatory connections from auditory target units to their previously learned corresponding motor units (via the interface network) and via inhibitory feedback connections from motor units to their corresponding auditory units. When the motor and auditory units match, the motor units will inhibit the auditory target units and carry on with their activation pattern. When there is a mismatch, motor units will inhibit nontarget units in the auditory network, allowing the target auditory units to persist in exciting the correct motor units; this is the “error signal.” Although a full-scale implementation of this architecture has not yet been demonstrated, a small-scale computational simulation has shown the feasibility of this architecture for internal error detection and correction (Hickok, 2012). A similar mechanism is assumed to hold at the various levels of the sensory-motor control hierarchy.

This architecture explains conduction aphasia as damage to the cortical interface network. Production is fluent because motor units are fully intact and can be activated directly from higher-level word representations. However, because of damage to the auditory-motor interface, motor unit activation cannot be checked against their auditory targets and an increase in error rates is observed. Once the conduction aphasic overtly produces an error, it is immediately detected via external feedback because the auditory target network is intact and the appropriate units are activated. However, subsequent attempts to correct such errors often fail, again because of the damage to the auditory-motor interface. Analysis of the relation between the lesions that typically cause conduction aphasia and the anatomical location of Spt, the auditory-motor interface, has shown good correspondence, lending further support for the proposed model (Buchsbaum et al., 2011).

The advantage of the HSFC model is that it incorporates an internal feedback loop that has some explanatory power regarding a speech disorder that has proven difficult to explain. It also includes a computational architecture that integrates auditory target activation, error detection, and error correction into a single process, which has some appeal from a parsimony standpoint. However, the model is far less developed than DIVA.

Although the models described herein can account for a wide range of experimental phenomena, it is important to note that these models are incomplete in their characterization of the vastly complex neural processes involved in speech production. An iterative process of generating testable predictions from a neurocomputational model, experimentally testing that prediction, and modifying the model as necessary to account for the experimental findings will lead to increasingly accurate accounts of the neural computations underlying speech.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780124077942000584

The Human Auditory System

Frank H. Guenther, Gregory Hickok, in Handbook of Clinical Neurology, 2015

The Hierarchical State Feedback Control model

DIVA is by far the most detailed and explicit model of speech motor control as well as the most influential to date. As noted, DIVA uses a feedback control architecture to detect and correct overtly produced errors. However, there is evidence in the motor control literature generally (Shadmehr and Mussa-Ivaldi, 1994; Wolpert et al., 1995; Tian and Poeppel, 2010) and in the speech production literature more specifically for internal feedback control, that is, detecting and correcting internal coding errors prior to overt speech production (see below). We describe one such model here, the HSFC model, illustrated in Figure 9.6.

Articulator - an overview | ScienceDirect Topics (2)

Fig. 9.6. The hierarchical state feedback control (HSFC) model. The HSFC model includes two hierarchical levels of feedback control, each with its own internal and external sensory feedback loops. As in psycholinguistic models, the input to the HSFC model starts with the activation of a conceptual representation that in turn excites a corresponding word (lemma) representation. The word level projects in parallel to sensory and motor sides of the highest, fully cortical level of feedback control, the auditory–Spt–BA44 loop. This higher-level loop in turn projects, also in parallel, to the lower-level somatosensory–cerebellum–motor cortex loop. Direct connections between the word level and the lower-level circuit may also exist, although they are not depicted here. The HSFC model differs from the state feedback control (SFC) model in two main respects. First, “phonologic” processing is distributed over two hierarchically organized levels, implicating a higher-level cortical auditory–motor circuit and a lower-level somatosensory–motor circuit, which roughly map on to syllabic and phonemic levels of analysis, respectively. Second, a true efference copy signal is not a component of the model. Instead, the function served by an efference copy is integrated into the motor planning process. BA, Brodmann area; M1, primary motor cortex; S1, primary somatosensory area; aSMG, anterior supramarginal gyrus; STG, superior temporal gyrus; STS, superior temporal sulcus; vBA6, ventral BA6. The HSFC model is squarely within the tradition of the Directions Into Velocities of Articulators (DIVA) model in that it assumes that the targets of speech gestures are coded in auditory space and that feedback control is a key computational operation of the network. HSFC differs from DIVA in three respects: (1) it assumes an internal as well as an external feedback detection/correction mechanism; (2) it situates auditory and somatosensory feedback loops in a hierarchical arrangement (auditory loop being higher-level and somatosensory loop being lower-level); and (3) it assumes a modified computational architecture for the feedback loops.

The empiric motivation for an internal feedback loop in the speech domain comes from three sources. One is simply that we can imagine speaking and hear ourselves in our “mind's ear.” Experimental research on such inner speech has shown that imagined speech can contain inadvertent errors that are internally detected. Further, the types and distribution of such errors are similar to what is observed in overt speech; e.g., phonemic errors show a “lexical bias” (Oppenheim and Dell, 2008; Nozari et al., 2011). A second source is that talkers correct partially articulated speech errors faster than they should be able to if they were relying on overt auditory feedback alone (Nozari et al., 2011); it is an open question whether somatosensory feedback may explain this phenomenon. The third source is conduction aphasia, a syndrome characterized by fluent speech output and intact speech perception abilities but with a higher than normal rate of phonemic errors in production. Crucially, affected individuals can readily detect their own errors once spoken, but have trouble correcting them, even after they have been overtly detected (Goodglass, 1992). This pattern of speech behavior can be explained by a damaged internal error detection and correction loop (leading to the higher error rate) with an intact external feedback loop (allowing for detection of overtly produced errors). More details on conduction aphasia are provided below.

The HSFC model, like DIVA, assumes that a basic planning unit in auditory space is roughly at the syllable level. The somatosensory circuit, on the other hand, is hypothesized to code sensory targets at a lower level. The basic idea is that speech production involves a cyclic opening and closing of the vocal tract (roughly corresponding to vowels and consonants respectively) and that the somatosensory system defines the targets of these opening or closing gestures (Gracco, 1994; Gracco and Lofqvist, 1994) similar to the somatosensory target map in the DIVA model but involving targets for individual phonemes rather than a single target for a whole syllable. The HSFC model holds that the internal auditory loop comprises a fully cortical network including auditory regions in the superior temporal gyrus, motor regions in the inferior frontal gyrus, and an auditory–motor interface network, Spt, in the posterior planum temporale region (of course, the external feedback loop involves non-cortical structures). The somatosensory loop comprises somatosensory regions in the inferior parietal lobe, lower-level motor regions in primary motor cortex and/or Brodmann area 6, and a somatosensory–motor interface in the cerebellum. The hypothesis that the cerebellum is part of the lower-level sensorimotor circuit is motivated by the nature of speech deficits following cerebellar damage, which tend to be fairly low-level dysarthrias compared to the higher-level phonologic deficits found in conduction aphasics with cortical damage (Kohn, 1984; Ackermann et al., 2007; Baldo et al., 2008; Buchsbaum et al., 2011).

Architecturally and computationally the HSFC (Fig. 9.6) differs somewhat from DIVA (Fig. 9.3). In the HSFC there are two sensorimotor feedback loops, both of which involve three components: sensory target representations, motor codes for “hitting” those targets (learned via external feedback), and a sensorimotor coordinate transform network. The latter is assumed to compute the relation between sensory and motor representations of speech units. While in DIVA speech production begins with the activation of a speech sound map in the frontal lobe, in HSFC production begins with the activation of both auditory and motor units corresponding to the intended word. The auditory units comprise the target for motor unit selection in the same sense that a visually perceived object might comprise the target for activating motor units to execute a reaching action. The difference with speech is that the sensory (auditory) target is not physically present in the environment but is instead a (re-)activated mental representation of the sensory target, i.e., a sound pattern. During motor activation, the accuracy of motor unit selection is checked, internally, prior to speech articulation. If motor and sensory units match, then articulation proceeds. If there is a mismatch, a correction signal can be generated prior to articulation. Computationally, this internal “checking” mechanism is instantiated via excitatory connections from auditory target units to their previously learned corresponding motor units (via the interface network) and via inhibitory feedback connections from motor units to their corresponding auditory units. When the motor and auditory units match, the motor units will inhibit the auditory target units and carry on with their activation pattern. When there is a mismatch, motor units will inhibit non-target units in the auditory network, allowing the target auditory units to persist in exciting the correct motor units; this is the “error signal.” Although a full-scale implementation of this architecture has not yet been demonstrated, a small-scale computational simulation has shown the feasibility of this architecture for internal error detection and correction (Hickok, 2012). A similar mechanism is assumed to hold at the various levels of the sensorimotor control hierarchy.

This architecture explains conduction aphasia as damage to the cortical interface network. Production is fluent because motor units are fully intact and can be activated directly from higher-level word representations. But because of damage to the auditory–motor interface, motor unit activation cannot be checked against their auditory targets and an increase in error rates is observed. Once the conduction aphasic overtly produces an error, it is immediately detected via external feedback because the auditory target network is intact and the appropriate units are activated. Subsequent attempts to correct such errors often fail, however, again because of the damage to the auditory–motor interface. Analysis of the relation between the lesions that typically cause conduction aphasia and the anatomic location of Spt, the auditory–motor interface, has shown good correspondence, lending further support for the proposed model (Buchsbaum et al., 2011).

The advantage of the HSFC model is that it incorporates an internal feedback loop that has some explanatory power regarding a speech disorder that has proven difficult to explain. It also includes a computational architecture that integrates auditory target activation, error detection, and error correction into a single process, which has some appeal from a parsimony standpoint. However, the model is far less developed than DIVA.

Although the models described herein can account for a wide range of experimental phenomena, it is important to note that these models are incomplete in their characterization of the vastly complex neural processes involved in speech production. An iterative process of: (1) generating testable predictions from a neurocomputational model; (2) experimentally testing that prediction; and (3) modifying the model as necessary to account for the experimental findings will lead to increasingly accurate accounts of the neural computations underlying speech.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780444626301000093

Phonetics: Articulatory

P.A. Keating, in International Encyclopedia of the Social & Behavioral Sciences, 2001

1.1 International Phonetic Alphabet

The articulators specified in the IPA system are the lungs, the larynx, the two lips, the upper surface of the oral cavity from the teeth back to the uvula (divided into alveolar ridge, hard palate, and soft palate), the uvula, the pharynx and epiglottis, the tongue (divided into the tip, blade, front, back, root, and sides), and the vocal folds (or cords). These are divided conveniently according to their functions in speech as follows.

Respiratory (or initiatory): The respiratory system provides an outward pulmonic airstream with a roughly constant pressure; however, since other articulators, such as the larynx, can also be used to initiate airflow, this is sometimes also called, more generally, the airstream process. Sounds are described in terms of the source and direction of movement of their airstream.

Phonatory: Vibration of the vocal folds (which, with their control structures, lie inside the larynx) modulates the pulmonic airstream to produce a sound source called voicing; the rate of this vibration is perceived as the pitch of the voice, and the mode of vibration is perceived as voice quality. Sounds are described in terms of the presence/absence, and if present, the quality, of voicing; larger spans of speech are described in terms of the pitch of voicing (e.g., tones, intonation—see Suprasegmentals).

Articulatory (in a narrower sense): Structures above the larynx, including the pharynx walls, the tongue (and its subparts), the jaw, the velum (or soft palate—sometimes classified separately as the oro-nasal process), the uvula, and the lips, which move within the vocal tract to modify the shape, and therefore the resonances, of the airway. In addition, stationary articulators—upper teeth and hard palate (and its subparts)—provide contact sites for these movable articulators. Within a given sound, the articulators may be classified as active vs. passive, with the active articulators moving toward the passive and often forming a constriction or narrowing along the vocal tract. (While stationary articulators are necessarily passive, movable articulators may be active or passive.) For most sounds, the extreme or target positions of the active articulators relative to the passive are taken to suffice to characterize the sounds. For example, labiodental means active lower lip to passive upper teeth; velar means active tongue body to (presumably) passive soft palate (the velum); a front vowel has the tongue as a whole to the front, relative to the surface of the palate. These locations must be further qualified according to how close the articulators come together. For example, in a stop consonant, they touch, whereas in a vowel they are still relatively far apart. These two dimensions of the location of the active articulator are sometimes called the location or place, vs. the degree, stricture, or manner, of the constriction.

The IPA uses these articulatory dimensions (and others not presented here) both to describe speech sounds, and as a basis for defining phonetic symbols to represent those sounds. These dimensions serve as a row and column labels on the well-known IPA chart, the most recent (1996) version of which is reproduced here as Fig. 1. This chart comprises two consonant charts, a vowel chart, and lists of other symbols. Each symbol represents the combination of articulatory properties expressed by the row and column labels of its chart. Square brackets are usually used to show that a symbol is part of a phonetic alphabet. Thus the first symbol on the IPA chart, [p], represents a bilabial (column label—made with the two lips approaching each other) plosive (row label—an oral stop; i.e., made with complete closure between the active and passive articulators, and the soft palate raised to prevent nasal airflow) which is voiceless (leftmost symbol in its cell—made without vocal fold vibration).

Articulator - an overview | ScienceDirect Topics (3)

Figure 1. The 1996 IPA Chart (Source: International Phonetic Association)

The consonant chart can be seen as a grid representing the articulatory space, with sounds made in the front of the mouth located toward the left of the chart, and sounds made in the throat (pharynx) located toward the right of the chart, and the degree of openness of the vocal tract given from top to bottom of the chart. The vowel chart's layout also gives a similar articulatory grid. However, rather different articulatory descriptions are given for consonants vs. vowels, even when the same articulators are involved, as can be seen from the different row and column labels. The basic descriptive dimensions for vowels are the overall vertical and horizontal position of the active articulator (the tongue), and the position of the lips. While the position of the tongue is judged implicitly relative to passive structures, these are not referred to overtly; only the active articulator, the tongue, is discussed. Thus what might be called a ‘close palatal vowel with bilabial approximation’ is instead nowadays generally called a ‘high (close) front rounded vowel.’ Not all phonetic systems make this kind of functional differentiation of vowel vs. consonant articulations, and the issue remains controversial.

The IPA system aims to provide a symbol for every phoneme of every language. However, it is a crucial property of this system that the descriptive framework is independent of the particular symbols chosen to fill the cells; other symbols could be substituted for these as long as they were defined appropriately, and the charts provide places for sounds that, while possible, have not been assigned symbols.

For further explication of this chart and the IPA system, including its resources for description of suprasegmental phonetics (see Suprasegmentals) and of the range of speech sounds found in the world's languages, see the Handbook of the International Phonetic Association (IPA 1999), a phonetic textbook such as Ladefoged (2000), or a phonetics chapter such as that in Fromkin (2000). For more on how these and other articulatory dimensions can be used to classify speech sounds, see Phonology.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B0080430767029776

Speech Recognition and Production by Machines

W. Ward, in International Encyclopedia of the Social & Behavioral Sciences, 2001

2 Speech Production

Currently, the various approaches to speech recognition can be grouped into three basic categories: articulator models, source-filter models, and concatenative synthesis. An excellent brief discussion of the details of these approaches can be found in Pellom (1998). This section will present only a short overview.

2.1 Articulator Model Synthesis

Articulator model-based systems attempt to explicitly model the physical aspects of the human speech production system. They produce detailed models of the articulators and the airflow in the vocal tract. These models are generally based on detailed measurements of human speech production. Models of this type are not widely used because it is difficult to get the sensor data and because the models are computationally intensive.

2.2 Source-filter Based Synthesis

Source-filter-based systems use an abstract model of the speech production system (Fant 1960). Speech production is modeled as an excitation source that is passed through a linear digital filter. The excitation source represents either voiced or unvoiced speech, and the filter models the effect produced by the vocal tract on the signal. Within the general class of source-filter models, there are two basic synthesis techniques: formant synthesis and linear predictive synthesis.

Formant synthesis models the vocal tract as a digital filter with resonators and antiresonators (Klatt 1980). These systems use a low-pass filtered periodic pulse train as a source for voiced signals and a random noise generator as an unvoiced source. A mixture of the two sources can also be used for speech units that have both voiced and unvoiced properties. Rules are created to specify the time varying values for the control parameters for the filter and excitation. This is the technique used by DECTalk (Bruckert et al. 1983), probably the most used commercial synthesis system. Linear predictive synthesis uses linear predictive coding to represent the output signal. Each speech sample is represented as a linear combination of the N previous samples plus an additive excitation term. As in formant synthesis, the excitation term uses a pulse train for a voiced signal and noise for un-voiced.

While the source-filter-based systems are capable of producing quite intelligible speech, they have a distinct mechanical sound, and would not be mistaken for a human. This quality arises from the simplifying assumptions made by the model and therefore is not easily remedied.

2.3 Concatenative Synthesis

In recent years, much research has been focused on techniques for concatenative synthesis (Taylor et al. 1998). This technique uses an inventory of prestored units that are derived from real recorded speech. A database of speech is recorded from a single speaker and is segmented to create an inventory of speech units. These units are concatenated together to create the synthesized output. The speech units differ between systems. One popular unit is diphones, which consist of parts of two phones, including the transitions between them (Peterson et al. 1958). The beginning and end of the diphone are positioned to be in stationary portions of the two phones. This reduces the problem of smoothing the junctions of concatenated units. There are also a manageable number of units, at most 1600 (40×40). The actual number is somewhat smaller, since not all possible pairs of phones occur in English. Context dependent phones, clustered by acoustic similarity, are also used. A search algorithm is used to find the optimal sequence of units (Hunt and Black 1996).

An extension of the small unit concatenative synthesis technique is to use variable sized units for concatenation (Yi and Glass 1998). In this method, task-specific sentences, phrases, and words are recorded from a single speaker. A search process is used to concatenate the largest units available in generating the output. In some cases, an entire prerecorded sentence or prompt can be used. Much of the time phrase level and word level units will be used. If a novel word is encountered, it is synthesized by concatenating phone sized units. This technique sounds very natural if word sized or larger units are being concatenated. The problem with this approach is that good coverage with large units is only possible in limited domains, and a new database must be recorded for each new domain. Work is in progress to unify the small and variable unit approaches to give a more general system.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B0080430767016235

Social Cognitive Neuroscience, Cognitive Neuroscience, Clinical Brain Mapping

F.H. Guenther, ... J.W. Bohland, in Brain Mapping, 2015

The GODIVA Model of Speech Sound Sequencing

The DIVA model accounts for the production of individual speech motor programs, each corresponding to a different speech sound. Additional brain areas, particularly in the left prefrontal cortex, are involved in the production of longer speech sequences (i.e., multisyllabic pseudowords). The GODIVA model (Bohland et al., 2010) provides a natural extension to DIVA, to account for multisyllabic planning, timing, and coordination. According to the GODIVA model, an upcoming multisyllabic utterance is represented simultaneously in two complementary modules within the prefrontal cortex. The syllabic frame structure (i.e., an abstract metrical structure) is hypothesized to be represented in cells in the preSMA, without regard for the phonemes involved, whereas the phonemic content of the utterance is represented in the left IFS, organized by the location of the phonemes within the syllable frame. Each of these representations is modeled as a working memory, which can contain multiple coactive items (i.e., the model can represent multiple forthcoming syllables), while representing the order of those items using an activity gradient (e.g., Bullock & Rhodes, 2003; Grossberg, 1978a,1978b; Houghton & Hartley, 1996). This gradient representation of serial order gives rise to the model's name (gradient order DIVA). The coordination of these two representations involves a basal ganglia ‘planning loop,’ hypothesized to involve the caudate nucleus.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123970251002657

Phonetics, Articulatory

J.C. Catford, J.H. Esling, in Encyclopedia of Language & Linguistics (Second Edition), 2006

Lower Articulators

The lower articulators are normally named by prefixes (labio-, dorso-, etc.) attached to the names of the upper zones against which they articulate. The lower articulators, then, are the lower lip (‘labio-,’ subdivided if necessary into ‘exolabio-’ and ‘endolabio-’) and the lower teeth (‘denti-’) and the tongue. The tongue has no clear-cut natural divisions, but for phonetic purposes it is divided into the tip or apex (‘apico-’) and behind that the blade (‘lamino-’), which is usually taken to consist of that part of the upper surface of the tongue that lies immediately below the alveolar ridge, extending back about 1 to 1.5 cm from the tip. This definition of ‘blade,’ which goes back at least to Sweet (1877), is traditionally used in articulatory phonetics, though some writers have treated what is normally called the blade as part of the apex, using the term ‘blade’ to refer to the entire front half of the dorsal surface of the tongue behind the apex (e.g., Peterson and Shoup, 1966). The traditional usage of the term ‘blade,’ however, is much more convenient for phonetic taxonomy. The underside of the blade (‘sublamino-’) can be used in the articulation of ‘retroflex’ sounds. For these sounds, the apex of the tongue is raised and somewhat turned back, so that, in the extreme case, the underblade articulates against the ‘prepalatal’ arch, behind the alveolar ridge, giving a ‘sublaminoprepalatal’ articulation.

The remaining dorsal surface of the tongue (‘dorso-’) can be subdivided into front (‘anterodorso-’) and rear (‘posterodorso-’) halves. However, since it is normal for ‘dorsodomal’ articulations to be made by the juxtaposition of the appropriate part of the tongue with the upper articulatory zone that lies opposite or nearest to it, these terms are not often used. Finally, the cover term ‘linguo-’ can be used, when required, to refer in the most general way to articulation by the tongue.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B008044854200002X

Advances in Child Development and Behavior

Julia A. Venditti, ... Michael H. Goldstein, in Advances in Child Development and Behavior, 2023

6 Vocal learning in human infants

Maturation of articulators and the body constrain the types of vocalizations that infants can make. In infants’ first two months of life, their sounds are typically limited to vegetative and distress vocalizations, which are produced involuntarily. They do, however, produce some vowel-like sounds, known as quasivowels (Oller, 2000). By approximately 4 months, infants begin to make more mature vowel-like sounds that are fully resonant and begin to integrate consonants into their repertoires, typically resulting in slow consonant-vowel alternations called marginal syllables (Oller, 2000). By 6 months of age, infants begin to produce canonical syllables. These syllables are defined by quick alternations between vocal tract closures and openings to produce consonant-vowel pairs with fully-resonant voicing, resembling adult speech sounds (Oller, 2000).

Vocal development depends on development in sensory, motor, social, and language domains (Kent, 2022). The influence of the maturing vocal tract on the sounds that infants are capable of making across the first year can be considered alongside internal and external factors driving vocal behavior. Machine learning models offer unique insight into how intrinsic motivation plays an important role in supporting the feedback loop of vocal development. By mastering immature behaviors first and then going on to attempt and master increasingly mature behaviors, vocal tract models demonstrate that phonetic learning can result from active exploration of acoustic space (Moulin-Frier, Nguyen, & Oudeyer, 2014). By babbling, infants can explore their vocal capacities while incidentally receiving social feedback from responsive caregivers.

Studying how social feedback is integrated into vocal behaviors aids in the understanding of how infants begin to produce sounds relevant to their ambient language environment once they have developed the physical capacity to do so.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/S0065240723000186

Articulator - an overview | ScienceDirect Topics (2024)
Top Articles
Latest Posts
Article information

Author: Virgilio Hermann JD

Last Updated:

Views: 5579

Rating: 4 / 5 (41 voted)

Reviews: 88% of readers found this page helpful

Author information

Name: Virgilio Hermann JD

Birthday: 1997-12-21

Address: 6946 Schoen Cove, Sipesshire, MO 55944

Phone: +3763365785260

Job: Accounting Engineer

Hobby: Web surfing, Rafting, Dowsing, Stand-up comedy, Ghost hunting, Swimming, Amateur radio

Introduction: My name is Virgilio Hermann JD, I am a fine, gifted, beautiful, encouraging, kind, talented, zealous person who loves writing and wants to share my knowledge and understanding with you.