Short exploration of the rise (and fall) of hype-laden buzzwords announcing the impending domination of the world by sentient computers.
To-Do:
- Explain the irSIR model
- Cite references, i.e.
The approach to prediction is to fit the (normalized) popularity data using the irSIR model. The fit is performed in a Bayesian fashion. Namely, a generative model is specified that completely describes the data generation process:
- temporal evolution of the sate using ODEs, and
- subsequent addition of a Poisson-like noise.
As the absolute scale is not known, the noise is approximated by a normal distribution whose width is proportional to the square root of the normalized popularity score. This approach has been shown to adequately describe the observed variability.
The key principle is to try to include all uncertainties into prediction.
Coupled with weak uninformative priors we obtain the posterior predictive distribution of the normalized popularity. The model can be extrapolated into the future.
The model is implemented in the Stan probabilistic programming language, which uses the advanced NUTS MCMC sampling algorithm.
Of course, it goes without saying that even a Bayesian approach cannot mitigate the consequences of fitting the wrong model ;-)
Fit of Google Trends data for the Facebook and LinkedIn search keywords.
The model is fitted to data up to 2017-05-01. The remaining data will be used for ongoing validation. I am genuinely curious to see how accurate the predictions will turn out to be.
Below is a slightly different model shown fitting to the Cryptocurrency search keyword. The model, which I call the FOMO/FUD model, builds upon the ideas of irSIR.
The differential equations are similar to irSIR, but the SI/N and IR/N terms are replaced with S(I/N)^2 and I(R/N)^2. The square terms approximate the perceived value of belonging to a particular sub-group as modeled by Metcalfe's law.
The fit to the time period from 2017 onwards is shown in the figure below. Two things can be noticed immediately. The minor bubbles of June and September are not very well described, which is to be expected. After all, the model is only able to describe a single bubble. Moreover, at early time periods, as well as after the big peak, the lower limit of the 95% prediction interval goes below zero. This is a consequence the variability is being modeled by a normal distribution with the width proportional to the square root of the mean value.
In order to improve the fit further, the process is repeated by using data from 2017-10-01 onwards (therefore excluding the two pre-peaks), and the variability model is replaced by a log-normal distribution. The fit of this improved model on the reduced data set is shown below.
I personally find it astounding how well does the median prediction line match the 31-day moving average of the Google Trends data.
One way to use the above result is to monitor the Google Trends data for a break outside the 95% prediction interval. Now it the above assumptions hold approximately, one would expect to see either of these two things:
- future trend data within the 95% prediction interval (the decay will continue),
- or an upwards break-out (which would indicate this was a bubble indeed, but overlaid on slowly rising background adoption curve).
Let's see what the future holds :-)