Multilingual speech stress analysis as a fundamental basis of speech databases construction for Slavic languages

УДК 81'-34 В. В. Потапов
д-р филол. наук, главный науч. сотр. НИР каф. прикладной и экспериментальной лингвистики ФГПН МГЛУ, ст. научный сотрудник филологического факультета МГУ- e-mail: rkpotapova@yandex. ru
Целью статьи является описание и определение признаков речевого ударения, рассматриваемого в качестве основы для создания многоязычной речевой базы данных, относящейся к сфере современной корпусной лингвистики. Исследовались такие просодические признаки, как длительность, частота основного тона, уровень интенсивности. Все акустические параметры анализировались применительно к связной речи. Этот подход особенно важен для формирования речевых баз данных, так как предоставляет возможность получить описание типологии речевого ударения в рамках определенных речевых единиц.
Ключевые слова: устная речь- ударение- анализ- база данных- ритм- речевая просодия.
Potapov V. V.
D. Sc., Principal Research Fellow, Department of Applied and Experimental Linguistics, Institute of Applied and Mathematical Linguistics, Faculty of Humanities and Applied Sciences, MSLU- е-mail: rkpotapova@yandex. ru
The aim of this article is to describe and to define the speech stress features as a fundamental basis for databases construction on the domain of modern corpus linguistics. Some prosodic features were investigated (duration, pitch, intensity) especially for Slavic languages. All stress acoustic parameters were analysed on
1 Данное исследование поддержано Министерством образования и науки РФ. Проект № 34. 1254. 2014К (Московский государственный лингвистический университет, научный руководитель проекта — Р. К. Потапова).
2 The survey is being carried out with the support of the Ministry of Education and Science of the Russian Federation in the framework of the project № 34. 1254. 2014К at Moscow State Linguistic University (scientific head of the project — R. K. Potapova).
the basis of connected speech. And this approach to speech bases construction is very important, because it provides the possibility to describe the typology of speech stress units.
Key words: speech- stress- analysis- data base- rhythm- speech prosody.
Modern phonetics is characterized by examination of various aspects of speech and by various trends in linguistic analysis [28]. The speech rhythmic1 organization research is impossible without consideration of the problem of stress2, or accent. Now these two terms (& quot-stress"- and & quot-accent"-) are distinguished by some linguists (e.g., see: [443- 53]). & quot-Specialists in accentuation, phonology and history of phonetics can be opposed to phoneticians who study the modern state of the language and seek to find accurate both variant and invariant correlates of stress, being well aware of the experimental difficulties of such studies& quot- [17, p. 66]. The specificity of examination also presupposes experimental research directed at a contrastive analysis of the acoustic (spectral and prosodic) features of stressed vowels in languages being analysed. In this connection we shall touch on some points of crucial importance that are related one way or another to the problems considered here.
The problem of stress perception has been in the focus of attention for the last several decades (e.g. [8- 33- 38- 39- 40- 41- 46- 48]). The results of various studies are basically the same and show that there is no direct relationship between stress perception judgement and certain physical characteristics of the speech signal. This has been demonstrated experimentally [47], in particular for English [39]. For American English a similar observation is made in [37- 45]. A similar situation was described in Polish by W. Jassem [43] and in French by A. Rigault [52].
P. Janota [42] attempted in his early works to determine the influence of three physical parameters — intensity, fundamental frequency and duration — on stress perception in Czech. The experimental material was represented by two-syllable synthesized stimuli. The judgements of the listeners about such texts are based, as a rule, on complex phenomena that can be identified as a) speech signal acoustic features and b) the context
role. The correlation between the two factors above can vary considerably for natural language utterances.
The processing of the data obtained from 100 listeners showed that slight changes in the fundamental frequency contributed to the perception of the syllable as a stressed one while greater changes resulted in the interpretation of the phrase intonation.
On the basis of the results obtained in the experiment with synthetic stimuli the author concluded that an increase in one of the above mentioned parameters makes the listeners evaluate the presented syllable as a stressed one. Small changes in the fundamental frequency have a greater effect on the judgement on the stress while greater changes influence on the evaluation of the intonation pattern of the whole utterance.
The role of the context is very important for perception: in texts with a constant fundamental frequency on the first syllable in a two-syllable structure (= two-syllable phonetic word), a deviation from this constant level (either a fall or rise) results in the identification of the subsequent syllable as a stressed one- in texts with a varying frequency level on the first syllable only an increase of the fundamental frequency in the subsequent syllable can be considered as a perception correlate of stress.
While discussing the specificity of realization of word-stress in Czech and Russian, it is necessary to take into account two possible approaches to the problem [3]. The first one rooted in the works of the Prague School (the Prague Linguistic Circle), could be called a phonological approach. In this approach the main feature of stress both in Czech and Russian is the expiratory enhancement of the syllable. Stress in Czech is fixed on the first syllable of words and is described as a weak dynamic stress while in Russian it is described as a strong dynamic one. R. Jakobson [36] showed that the weakness of stress in Czech is explained by the language system, namely: Czech with its phonological opposition of short and long vowel phonemes lacks the expiratory enhancement which, as R. Jakobson believes, accompanies the increase in the vowel duration [2- 3]. On the contrary the strength of stress in Russian is a result of a free, functionally unloaded length of vowels.
The second approach, the acoustic one, excludes the dynamic interpretation of stress both in Russian and Czech since the high intensity that is believed to be a correlate of strength is not always characteristic of stressed syllables in the two languages. In the acoustic approach the vowel
duration is considered to be the main feature of stress as a most constant acoustic feature of the stressed syllable [4- 10- 11- 13]. Regarding stress in Czech, the acoustic approach in a number of studies has failed to reveal any regularity in its realization. Moreover, in terms of the traditional parameters related to stress such as intensity, duration and fundamental frequency, a stressed syllable in Czech can be less marked than an unstressed syllable. This explains why stress in Czech is described as paradoxical [18- 24- 49].
Therefore, the above two approaches lead to contradictory results regarding the specificity of stress in Czech and Russian. The phonological approach leads to the identification of this specificity in the degree of the syllable expiratory enhancement: stress is classified as strong in Russian and as weak in Czech. This result can be substantiated typologically, it contributes to the auditory perception but is not supported by acoustic data. The acoustic approach does not show any difference in realization of stress in Czech and Russian syllables. For example, a longer vowel duration in a stressed syllable in Russian compared to Czech does not explain the specificity of stress in Russian as in Czech stress can be realized in a syllable with a phonologically long vowel.
In the experiment on recognition of word stress in Czech and Russian listeners, the first syllable in words was recognized as a stressed one only in 22,5% of all answers. This figure cannot be explained by the presence of phonologically long vowels in these words because for words consisting of syllables of the same phonological duration this figure was even lower -18,5%. Therefore we can speak about different rhythmic predisposition of Czech and Russian people to perception of Czech words, which is explained by the fact that rhythm in Czech is quite different from rhythm in Russian [22- 24−27- 50].
In Bulgarian stressed vowels are not marked by a longer duration so frequently as in Russian due to some environmental restrictions (like the presence of the subsequent sonorant or the absolute word end pronounced with the intonation of a completed statement [10]).
In Russian unstressed sounds tend, to a greater extent than in Bulgarian, to have a shorter duration. Durational differences are also found in the second and third pre-stress syllables as well as post-stress non-terminal syllables. In the Russian literary language the vowels of the second pre-stress syllable are reduced both quantitatively and qualitatively due to a considerably lower intensity compared to the stressed ones. These two types of reduction are expressed quite well [1- 19- 20- 23- 34- 35].
The reduction of vowels, though not so marked, is also observed in Bulgarian. In Bulgarian the second pre-stress syllable can be compared to the reduction in this position of Russian high vowels which become reduced in duration but not in the manner of articulation.
Therefore, the comparison of duration of Bulgarian and Russian vowels shows that longer duration of sounds is characteristic of stress in both Bulgarian and Russian. Unlike Russian, however, Bulgarian is characterized by smaller durational differences pertaining to stress. Besides, duration, as one of the parameters characterizing the nature of Bulgarian stress, is differently correlated to other characteristics of stress [10].
Bulgarian and Russian have many common features as regards the peculiarities of the word structure. Like in Russian, in Bulgarian the initial part of words is more loaded than other parts, which is manifested in longer duration of consonants of the absolute beginning of words.
The examination of the role played by individual physical components in the stress effect makes it possible to conclude that in Russian the most effective (all other conditions being the same) is the syllable duration change (either an increase or decrease) or the physical intensity increase. The increase in the fundamental frequency of the vowel in the syllable as well as pitch and amplitude changes have a smaller effect, which corresponds to the scientifically established fact that stress in Russian is mainly expressed as an increase in intensity and duration. There is a further point to be made here: that temporal contrasts also affect the perception of stress [10].
The effect of word stress in Russian is determined not so much by the correlation of such physical parameters as intensity, fundamental frequency and duration of individual syllables as by the fact that these physical correlations are included into a language meaningful stimulus and are, therefore, analysed by listeners through the system of their experience with temporal relations determined by the language meanings and the language system.
The impact of the language system on perception is manifested in the following:
a) the stress effect depends not only on the physical parameters (intensity, duration and fundamental frequency of individual syllables) but also on the recognition of the word and its lexical meaning by the subject (e.g., see [31]) —
b) word stress perception is influenced by speech automatic reactions that have been developed in the speech experience of the subjects under the influence of the language phonetic system [33].
In the present study problems that are directly related to the acoustic characteristics of stressed vowels in rhythmic structure (phonetic word) /RS/ will be considered only partially since rhythmic units cover the problem of stress. Stress is part of intonation and is one of its components, but its role varies from language to language because there are various types of stress (which is determined by the constituent elements of stress -intensity, duration, fundamental frequency — and various combinations thereof [32]).
Thus as early as in [13] the analysis of connected speech showed the absence of a constant intensity increase of the stressed syllable when there were stressed vowels as the weakest in a word if its emphasizing by means of intensity is not determined by the specific need to identify the word boundary in the speech flow or phrase stresses.
Therefore, comprehensive consideration of the organization of both stressed and unstressed syllables in various phonetic positions, including various phrase positions, makes it possible to explain the phenomenon of the stressed vowel the intensity of which is rather frequently lower than that of an unstressed syllable.
The position of the absolute beginning of the word, strict differentiation of the initial and final position of the phrase, differences between narrow and wide vowels — all these as well as some other factors are considered [13] to be directly related to the vowel duration distribution in word forms.
Stress [14] is to be considered as a means of organizing all syllables of the word and a corresponding organization of stressed and unstressed syllables on the basis of some definite features. Among the acoustic parameters directly characterizing stress, it is traditional to name relationships in terms of intensity, duration, fundamental frequency and spectral structure. The identification of relationships that are relevant for the stressed syllable recognition is impeded by the dependence ofthe above mentioned parameters on phonetic factors as well as on phonological relationships inherent to every individual language system. Generally, every word in Russian can be underlined by stress. In terms of articulation stress in Russian speech is manifested in emphasizing a single syllable in a rhythmic structure (RS) by means of longer duration [7] and higher intensity [34]. Changes
in the fundamental frequency transcend the framework of the word and constitute a major component of the phrase intonation. In this quantitatively and dynamically marked syllable the nucleus is to be found in the vowel around which unstressed components of the syllable are grouped reduced in terms of quantity and dynamics. Therefore, the influence of stress covers the whole RS making it a single entity. It could also be called a centralizing type of stress. It serves as a kind of a marker for the listener and enables them to identify RS in the speech flow.
Some papers [5- 6] supported the data obtained in [11] showing that the relative vowel intensity is more determined by the position of the vowel in the word rather than by its position relative to the stress: the closer the vowel to the beginning of the word, the higher its intensity. A stressed vowel in the final position can be weaker than an unstressed one in the initial position in the word. Research on vowel duration revealed that the quantitative opposition of stressed and unstressed vowels is definitely persistent in all cases. The post-stress vowel at the absolute end where, depending on the intonation and individual manner of pronunciation, an unstressed vowel may be of the same or nearly the same duration as a stressed one.
It was found that differences in duration between the first pre-stress syllable, on the one hand, and the second pre-stress and post-stress syllable on the other, are significant only for the /a/ vowel and, to a less degree, for the /h/ vowel. For the other vowels no duration differences were found in unstressed syllables. Here we have only the significant opposition of stressed vowels to unstressed ones. The third specific characteristic of unstressed vowels is manifested at the syllable level: in unstressed syllables the characteristics of vowels and consonants cannot be differentiated. This is evident in palatalized and labialized syllables. The data obtained convince us that it is these deviations from the & quot-standard"- relationships between vowels and consonants existing in stressed syllables that are the general and important consequence of unstressedness.
On the whole there are several points of view on the physical nature of word stress in Russian (see e.g. [54]). In some works the stressed vowel is defined as the strongest in the word. Other experimental phonetic studies of word stress in Russian showed that stressed vowels have a longer duration than unstressed ones. There are also works describing the acoustic characteristics of stressed and unstressed syllables and showing that stressed vowels have a specific spectral structure.
New works have appeared of late that study the determination of word stress location by man. For example, experiments have been staged on the perception of synthetic words with various parameters of stressed and unstressed elements [5]. There is a theory put forward by N. I. Zhinkin and widely accepted among linguists that postulates that when determining the location of stress in a word, the listener compares parameters of the perceived vowel (first of all, its loudness) to some standard characteristics of this vowel and on the basis of this comparison decides whether the vowel is stressed or not [8- 9]. Nevertheless, this theory is opposed by some physiologists who study human speech [15].
The results obtained in the experiments on the determination of stress in words with intentionally shorter duration of stressed vowels show the following [5]:
1) native Russian speakers can correctly define the location of stress in a word even when the duration of the stressed vowel does not exceed that of unstressed ones-
2) native Russian speakers correctly determine the place of stress in a word irrespective of the tonetic characteristics of stressed and unstressed vowels.
These data show therefore that when determining RS of the word native speakers use not only quantitative (duration, pitch) but also qualitative (spectral) features. In cases when the stress is correctly located with a maximally shortened stressed vowel, subjects recognized syllables as unstressed ones due to their qualitative reduction manifested as well as in weaker syllabic contrasts.
In conclusion we would like to underline the fact that modern typological language studies are characterized by the three main directions of search -for universalities, for type and for individual specificity. We considered RS as a basic universal feature of rhythm [12- 24- 25- 26- 27- 30- 50- 51] serving as a link between languages with free and fixed stress.
The frequency of RSs with respect to their morphological composition is governed by certain laws intrinsic, e.g., to Slavic languages. In this respect it is crucial whether the language is synthetic or analytical because this fact determines the structure of RS and, consequently, the specific rhythm of utterances. Differences in style may also affect the prevailing grammar composition of RS. The latter factor, however, does not exert as considerable influence as the language grammar does.
The results ofthe acoustic analysis have revealed the phonetic specificity of RS and rhythmic pattern structures /RPS/, which is determined by the phonetic nature of stress in RS, realized in the studied languages by different means: definite combinations of prosodic features of vowels at the RS boundaries in Czech speech, a dynamic component in Bulgarian and spectral and temporal component in Russian. All these features are responsible for specific combinations of qualitative and quantitative parameters that shape the rhythm of spoken language. RSs are identified not only on the basis of stress but also on the basis of the acoustic features of boundaries between RSs that are present in the speech flow but frequently impede identification because they are weakly expressed acoustically.
From the point of view of integral perception of rhythm in Czech, it can be represented as a dot-and-dash line where dots correspond to equally stressed vowels (syllables) while dashes correspond to specifically emphasized sounds (syllables) that are a result of phonological length, wide vowels, syllable-forming sonorants and RS final position. The corresponding picture of Bulgarian speech rhythm will be a dotted line where dots represent vowels (syllables) with a minimum quantitative expression. Russian speech which is characterized by a longer duration of vowels (syllables) can be represented as a dash line or a solid wave-like curve.
The assumption of a greater similarity of Bulgarian and Russian as opposed to Czech with respect to a number of parameters has been supported by phonetic experimental data.
Therefore, our study has supported the validity of the hypothesis on the existence of a hierarchy of factors that determine the rhythmic organization of utterances: the underlying factor is found in stress, followed by grammar which affects the speech rhythm indirectly, through various combinations of parts of speech, forming proclitics and enclitics, and finally — by stylistic1 peculiarities though the latter exert weaker influence on rhythm.
E.g. the style of journalistic articles (= political essay style), of scientific articles, etc.
