Completed research project

Title / Titel The relative frequencies of nouns, pronouns, and verbs cross-linguistically
PDF Abstract (PDF, 14 KB)
Summary / Zusammenfassung This project will investigate the relative frequencies of core parts of speech, such as nouns, verbs, and pronouns, in spoken language corpora of seven languages that represent a wide range of areal and typological diversity. We focus on two research questions:

(1) Why do languages vary so drastically in the relative frequencies of noun, pronoun, and verb tokens employed in discourse? Our pilot study for this project suggests that in some languages (such as Chintang) the overall number of nouns and pronouns taken together roughly equals the overall number of verbs, while in others (such as Sri Lanka Malay) this ratio is twice as high, i.e., the overall number of nouns and pronouns taken together is roughly double the overall number of verbs. What typological or other differences between languages can explain these differences in the use of parts of speech? One of the hypotheses we will test is the presence of argument indexing on verbs, which may make the overt realization of arguments as nouns or pronouns unnecessary, and may thus explain the low frequencies of nouns and pronouns in some languages.

(2) Why do the relative frequencies of nouns, pronouns, and verbs vary within texts? Our pilot study has shown that-consistently across languages—at the beginning of narrative texts, nouns are particularly frequently used, reflecting the introduction of new discourse participants, as expected. Furthermore, there are characteristic, sinusoidal alternations in the frequencies of noun use as narrative texts unfold, with regular peaks of heavy noun use roughly every 10-15 clauses. These peaks may reflect universal cognitive constraints on the activation of discourse participants, which necessitate their re-introduction by full lexical nouns after their activation has decayed, ultimately due to constraints of short-term memory.

We will also investigate the influence of further factors on the relative frequencies of nouns, pronouns, and verbs, such as the degree of speakers’ and listeners’ mutual acquaintance (known/familiar vs. unknown) and text genres. In this context we will empirically test the assumed universality of ‘nouniness’ of formal genres.

The newly available data compiled in the DoBeS framework allow us to develop and then appropriately address these new and exciting research questions for the first time, as they require data from diverse languages that are annotated for parts of speech by experts, time-aligned, and described with detailed metadata with respect to speakers’ social status, mutual acquaintance, etc. These data allow us to capture subtle language usage patterns and explore their relation to typological differences between languages, narrative strategies, and other linguistic and non-linguistic factors. This project thus further develops documentary linguistics, connecting it with areas such as corpus linguistics, morphological typology, syntactic theory, discourse studies, and cognitive linguistics. In order to connect our findings with research on well-known languages such as English, we will additionally carry out analyses on published corpora of English.

The methods applied include advanced computational techniques for quantitative analysis of textual data of the type that has been produced by DoBeS projects, with as little additional manual annotation of data as possible. This permits us to analyze the huge amount of data necessary to detect and appropriately describe the subtle patterns under investigation. It will involve developing solutions for a number of technological and computational issues for cross-corpora studies, as additional valuable outcomes of this project.
Publications / Publikationen Stoll, Sabine; Bickel, Balthasar (2009). How deep are differences in referential density? In: Guo, Jiansheng; Lieven, Elena; Budwig, Nancy; Ervin-Tripp, Susan; Nakamura, Keiko; Özçalişkan, Şeyda. Crosslinguistic approaches to the psychology of language: research in the traditions of Dan Slobin. London: Psychology Press, 543-555.

Bickel, Balthasar (2003). Referential density in discourse and syntactic typology. Language, 79(4):708 - 736.

Project leadership and contacts /
Projektleitung und Kontakte
Prof. Dr. Balthasar Bickel (Project Leader)
Dr. Alena Witzlack-Makarevich
Taras Zakharko, MA
Funding source(s) /
Unterstützt durch
In collaboration with /
In Zusammenarbeit mit
Dr. Frank Seifart
Faculteit der Geesteswetenschappen
Capaciteitsgroep Taalwetenschap
University of Amsterdam
Dr. Brigitte Pakendorf
Laboratoir Dynamique du Langage
Université Lumière Lyon 2
Duration of Project / Projektdauer Jun 2012 to Jun 2015