Into the dynamic of hot topics

"Have you heard about the subprime crisis?" A few days before I was first asked that question I did not even know what a subprime was, but just in a week or so subprimes had become a major topic for news, conversation, rumors and even jokes. 'Subprime crisis' is an example of the successful spread, by word of mouth, media and other sources, of an information throughout a population. Most topics of conversation do not become hot, obviously, even important ones, so what makes the difference between a topic and a hot topic? Between information that spreads only locally and information that becomes known by each and everyone?

Crane and Sornette make a great step toward a better understanding of 'hot topics dynamic' with a simple model.

In sum, they show that the distribution of waiting time, the time span between the cause of the behaviour and the behaviour itself, and the influence that individuals have on each other are major determinants of the propagation dynamic. Crane and Sornette are in fact able to distinguish, among the 5 million Youtube videos they analyse, the most interesting videos and inside these, those whose popularity has grown endogenously, i.e. inside Youtube itself, and those for which interest comes from external sources (such as TV, Newspaper, etc), i.e. exogenously. Even more interestingly, based on the videos dynamic they can predict which endogenous videos are going to be hot in the days to come.

Why is this important?

First, from the Cognition and Culture point of view, it is a quite successful attempt to understand the link between the behaviour of individuals and their consequences at the population scale. Crane and Sornette model show that the connectivity of the network, meaning the mean number of individuals you can influence in a given time, is what makes the difference between a topic and a hot topic. In their model, the distribution of waiting time is a constant parameter. Let me speculate from their results. I think their results strongly suggest that what is really important for an information to have large, population scale, consequences, is the way that information can be spread by individuals to other individuals. If your brother had a baby recently, you're not likely to talk about it beyond your familly and close friends circles and they're not going to tell their friends and their familly, i.e. you cannot influence many people. By contrast, if you see a nice movie you're going to tell your familly, your close friends and anyone else you know. What Crane and Sornette's results suggest is that it is not the fact that you spread a piece of news as fast as you can that makes for the difference between a topic and a hot topic, you may be quite eager to tell about your brother's baby, it is not the fact that you're really convinced this is important either, again, you may be convinced that you're brother's life is news, but it is the number of individuals you can influence that really matter. For you to have an influence on many individuals simply requires that everyone is sharing your interest for the topic you're talking about, from close friend to distant connections.

Second, from a more practical point of view, applications of their model can be diverse and really useful. Think of the way most information is sorted on large websites like Youtube, Amazon and the like. What would you use to find the most interesting piece of information? One way would be to go to the most popular item, the last blockbuster. This would work but you may miss really nice piece of works which may not be so popular and you can also get frustrated at the sight of what is popular. Crane and Sornette model can be used to find the most interesting item inside a huge quantity of similar stuff, not based on how many people have already looked at it but on the effect that such items have on other individuals. Do individuals who have watched that movie convinced others to watch it? if yes it may be great, if no it may not be so interesting. Applications in the field of books, music, movies, blogs and the like are logical extensions of their work.

Last, thinking of the scientific domain, this model could be used to distinguish between publications that are cited a lot because of exogenous factors, such as a journal impact factor or a scientist's reputation, and endogenous factors, because the paper progressively attract the attention of the readers. Among the two categories, Crane and Sornette model can also be used to find out the most important papers in the field of interest and the less important ones, even if they initially benefit from 'advertising'. This model could therefore be used inside a field to find out the papers that have an intrinsic quality (that influence many scientists), independently of their original journal impact factor and their last author's reputation.

See Riley Crane's website for reprint and more explanations.

Riley Crane and Didier Sornette. Robust dynamic classes revealed by measuring the response function of a social system. PNAS 2008 105 : 15649 - 15653 ; published online before print September 29, 2008 , doi:10.1073.

For those interested, here are some details on the model and the results.

Crane and Sornette recorded the number of daily views of 5 million Youtube videos and analysed their dynamic according to a model based on two ingredients:

The distribution of waiting time between the causes of a future action and the action itself. For instance the distribution of the number of hours it usually takes when someone hears about a nice video on Youtube to get to Youtube and watch the video. The spread of information through the influence that people have on each other and through external factors. So you may hear about a nice video when browsing Youtube itself or from other sources, such as TV news.

The model predicts that videos can be classified into four different classes depending on the combination of two parameters (see Figure below).

Endogenous subcritical: the dynamic of daily views of 90 % of videos cannot be distinguished from random processes by the model because they do not result in peak activity, they do not become hot topics at all. Exogenous subcritical: some videos experience peak activity because of external factors (such as advertising for instance) but are not worth the attention. The number of their daily views decreases rapidly. Endogenous critical: some Youtube videos get the attention of Youtube users, their reputation progressively grows and they become a major topic for some time. Exogenous critical: some other videos have a burst of activity linked to some external factor and then spread like epidemics. After an initial burst, their activity decrease slowly.

The figure shows examples of dynamics for the four classes found. Crane and Sornette estimate ? (the decay exponent) on the post peak dynamic only. The peak fraction F is the proportion of views contain in the peak (in black), relative to the total number of views. Reproduced from Crane and Sornette.

Hot topics dynamics differ quite sharply depending on whether the spread of information is linked to endogenous or exogenous causes. Here is an example of an endogeneous event. In the summer 2007 you may have heard about the forthcoming Harry Potter movie Harry Potter and the Order of the Phoenix through some friends, news in the media, or other sources and somehow, because it was expected by many, news and conversations progressively came to turn a lot around Harry Potter. Alternatively, in winter 2005, no one was expecting a Tsunami and suddenly it became the most important topic of conversation. This is an exogenous event. Both events are associated with a sudden burst of activity but their generation involves quite different mechanisms, as shown in the Figure below. Endogenous events are associated with a progressive increase before the peak and a progressive decrease afterward (the graph looks almost symetrical). On the opposite, exogenous events experience a brutal increase of activity and a progressive decrease after the peak. Using Crane and Sornette's model, the upcoming of Harry Potter's movie would have been predictable but not the tsunami obviously.

Number of search on google for "tsunami" and "harry potter" during previous four years.
Example found in Crane and Sornette, figure reproduced from Google Trends.

Crane and Sornette estimate the model's single parameter ? (the decay exponent) with the post peak dynamic and show that their model, based on this single parameter, accurately finds the four classes they predict. Furthermore, they also show that the model predicts the prepeak dynamic for the endogenous critical class only, which is consistent with their general assumptions.

1 Comment

  • guest guest 16 December 2009 (22:57)

    Hey Nicolas, I wrote a small calculator that allows one to calculate how many views a YouTube video will get based on this model. You can find it here: http://www.squidoo.com/youtube-super-star Steven