Fluctuations in Word Use from Word Birth to Word Death
A team of mathematicians and phycisists, Alexander M. Petersen, Joel Tenenbaum, Shlomo Havlin, and H. Eugene Stanley, studied the "Statistical Laws Governing Fluctuations in Word Use from Word Birth to Word Death" by analysing the dynamic properties of 107 words recorded in English, Spanish and Hebrew over the period 1800–2008. (The paper is published by Arxiv.org.)
We quote at length from the concluding Discussion:
"… words are competing actors in a system of ﬁnite resources. Just as business ﬁrms compete for market share, words demonstrate the same growth statistics because they are competing for the use of the writer/speaker and for the attention of the corresponding reader/listener . A prime example of ﬁtness mediated evolutionary competition is the case of irregular and regular verb use in English. By analyzing the regularization rate of irregular verbs through the history of the English language, Lieberman et al. show that the irregular verbs that are used more frequently are less likely to be overcome by their regular verb counterparts. Speciﬁcally, they ﬁnd that the irregular verb death rate scales as the inverse square root of the word’s relative use. A study of word diffusion across IndoEuropean languages shows similar frequency-dependence of word replacement rates."
"We document the case example of 'X-ray', which shows how categorically related words can compete in a zero-sum game. Moreover, this competition does not occur in a vacuum. Instead, the dynamics are signiﬁcantly related to diffusion and technology. Lexical diffusion occurs at many scales, both within relatively small groups and across nations. The technological forces underlying word selection have changed signiﬁcantly over the last 20 years. With the advent of automatic spell-checkers in the digital era, words recognized by spell-checkers receive a signiﬁcant boost in their “reproductive ﬁtness” at the expense of their “misspelled” or unstandardized counterparts.
We ﬁnd that the dynamics are inﬂuenced by historical context, trends in global communication, and the means for standardizing that communication. Analogous to recessions and booms in a global economy, the marketplace for words waxes and wanes with a global pulse as historical events unfold. And in analogy to ﬁnancial regulations meant to limit risk and market domination, standardization technologies such as the dictionary and spell checkers serve as powerful arbiters in determining the characteristic properties of word evolution. Context matters, and so we anticipate that niches in various language ecosystems (ranging from spoken word to professionally published documents to various online forms such as chats, tweets and blogs) have heterogenous selection laws that may favor a given word in one arena but not another. Moreover, the birth and death rate of words and their close associates (misspellings, synonyms, abbreviations) depend on factors endogenous to the language domain such as correlations in word use to other partner words and polysemous contexts as well as exogenous socio-technological factors and demographic aspects of the writers, such as age and social niche.
We ﬁnd a pronounced peak in the ﬂuctuations of word growth rates when a word has reached approximately 30-50 years of age. … Another important timescale in evolutionary systems is the reproduction age of the interacting gene or meme host. Interestingly, a 30-50 year timescale is roughly equal to the characteristic human generational time scale.
The impact of historical context on language dynamics is not limited to emerging languages, but extends to languages that have been active and evolving continuously for a thousand years. We ﬁnd that historical episodes can drastically perturb the properties of existing languages over large time scales. Moreover, recent studies show evidence for short-timescale cascading behavior in blog trends, analogous to the aftershocks following earthquakes and the cascades of market volatility following ﬁnancial news announcements. The nontrivial autocorrelations and the leptokurtic growth distributions demonstrate the signiﬁcance of exogenous shocks which can result in growth rates that signiﬁcantly exceeding the frequencies that one would expect from non-interacting proportional growth models.
A large number of the world’s ethnic groups are separated along linguistic lines. A language barrier can isolate its speakers by serving as a screen to external events, which may further slow the rate of language evolution by stalling endogenous change. Nevertheless, we ﬁnd that the distribution of word growth rates signiﬁcantly broadens during times of large scale conﬂict…. This can be understood as manifesting from the uniﬁcation of public consciousness that creates fertile breeding ground for new topics and ideas. During war, people may be more likely to have their attention drawn to global issues. Remarkably, the pronounced change during WWII was not observed for the Spanish corpus, documenting the relatively small roles that Spain and Latin American countries played in the war."