Fluctuations in Word Use from Word Birth to Word Death

A team of mathematicians and phycisists, Alexander M. Petersen, Joel Tenenbaum, Shlomo Havlin, and H. Eugene Stanley, studied the “Statistical Laws Governing Fluctuations in Word Use from Word Birth to Word Death” by analysing the dynamic properties of 107 words recorded in English, Spanish and Hebrew over the period 1800–2008. (The paper is published by Arxiv.org.)

We quote at length from the concluding Discussion:

“… words are competing actors in a system of finite resources. Just as business firms compete for market share, words demonstrate the same growth statistics because they are competing for the use of the writer/speaker and for the attention of the corresponding reader/listener . A prime example of fitness mediated evolutionary competition is the case of irregular and regular verb use in English. By analyzing the regularization rate of irregular verbs through the history of the English language, Lieberman et al. show that the irregular verbs that are used more frequently are less likely to be overcome by their regular verb counterparts. Specifically, they find that the irregular verb death rate scales as the inverse square root of the word’s relative use. A study of word diffusion across IndoEuropean languages shows similar frequency-dependence of word replacement rates.”

“We document the case example of ‘X-ray’, which shows how categorically related words can compete in a zero-sum game. Moreover, this competition does not occur in a vacuum. Instead, the dynamics are significantly related to diffusion and technology. Lexical diffusion occurs at many scales, both within relatively small groups and across nations. The technological forces underlying word selection have changed significantly over the last 20 years. With the advent of automatic spell-checkers in the digital era, words recognized by spell-checkers receive a significant boost in their “reproductive fitness” at the expense of their “misspelled” or unstandardized counterparts.

We find that the dynamics are influenced by historical context, trends in global communication, and the means for standardizing that communication. Analogous to recessions and booms in a global economy, the marketplace for words waxes and wanes with a global pulse as historical events unfold. And in analogy to financial regulations meant to limit risk and market domination, standardization technologies such as the dictionary and spell checkers serve as powerful arbiters in determining the characteristic properties of word evolution. Context matters, and so we anticipate that niches in various language ecosystems (ranging from spoken word to professionally published documents to various online forms such as chats, tweets and blogs) have heterogenous selection laws that may favor a given word in one arena but not another. Moreover, the birth and death rate of words and their close associates (misspellings, synonyms, abbreviations) depend on factors endogenous to the language domain such as correlations in word use to other partner words and polysemous contexts as well as exogenous socio-technological factors and demographic aspects of the writers, such as age and social niche.

We find a pronounced peak in the fluctuations of word growth rates when a word has reached approximately 30-50 years of age. … Another important timescale in evolutionary systems is the reproduction age of the interacting gene or meme host. Interestingly, a 30-50 year timescale is roughly equal to the characteristic human generational time scale.

The impact of historical context on language dynamics is not limited to emerging languages, but extends to languages that have been active and evolving continuously for a thousand years. We find that historical episodes can drastically perturb the properties of existing languages over large time scales. Moreover, recent studies show evidence for short-timescale cascading behavior in blog trends, analogous to the aftershocks following earthquakes and the cascades of market volatility following financial news announcements. The nontrivial autocorrelations and the leptokurtic growth distributions demonstrate the significance of exogenous shocks which can result in growth rates that significantly exceeding the frequencies that one would expect from non-interacting proportional growth models.

A large number of the world’s ethnic groups are separated along linguistic lines. A language barrier can isolate its speakers by serving as a screen to external events, which may further slow the rate of language evolution by stalling endogenous change. Nevertheless, we find that the distribution of word growth rates significantly broadens during times of large scale conflict…. This can be understood as manifesting from the unification of public consciousness that creates fertile breeding ground for new topics and ideas. During war, people may be more likely to have their attention drawn to global issues. Remarkably, the pronounced change during WWII was not observed for the Spanish corpus, documenting the relatively small roles that Spain and Latin American countries played in the war.”

Comments Disabled