Friday, April 16, 2010

IDEA's Bass Diffusion Model

In the previous post I argued that a Bass Diffusion Model fits the administrative prevalence of autism in California remarkably well, and made specific predictions based on this observation. You can think of a Bass Diffusion Model as a word-of-mouth or an adoption-of-innovation type of model.

For the sake of completeness, I will now present a couple of Bass models I derived for the administrative prevalence of autism at the US level, based on data from the Department of Education, otherwise known as IDEA data. The following is a graph of the 6-17 IDEA prevalence along with Bass model hind-casting and forecasting all the way to 2030.

Model # 2 (the red line) is the one I prefer in this case. (I'll explain why shortly.) It predicts that prevalence will eventually level off at almost 1.1%. This is completely plausible, not only because that's roughly the new consensus prevalence of ASD, but also because Minnesota is already there.

I also find it to be a fascinating prediction of the model. If you recall, a Bass model predicts a maximum prevalence of about 0.65% (at most 0.7%) for children 6 to 9 in California DDS. This absolutely makes sense. California DDS is not like IDEA. DDS does not find every autistic person to be eligible for services, and not all developmentally disabled Californians pursue eligibility with DDS. So, in my view, a Bass model makes predictions that are remarkably consistent with our current reality.

If the models are correct, by 2013 IDEA prevalence should just have surpassed 80 in 10,000. Additionally, a leveling-off trend should not be completely evident yet. It may be slightly noticeable. Meanwhile, in the California report of Q4 2013 (and let's hope they produce data equivalent to that of reports currently available) a leveling-off trend should already be evident in the 6-9 cohort.

Technical Details

For formulas and variable names, see the California post. Parameters of both models are, again, estimated by means of genetic programming. For model # 1 I simply tried to fit the 1993-2007 prevalence series without any modifications. The resulting parameters were:

p = 4.808·10-8
q = 0.22
t0 = 1938.809 (year)
m = 118.32 (per 10,000 population)

Model # 2 is based on the observation that IDEA practically did not have an autism category prior to 1993. However, once the category was introduced, many children would've been put in the category all at once. It's like introducing a product into the market that already has a number of owners. So I performed the calculation by reducing the prevalence in all report years by 3.864, which is the 1993 prevalence. Hence, t0 should be equal to 1993. The parameters actually derived by the code I wrote were:

p = 0.0072
q = 0.222
t0 = 1993.03 (year)
m = 105.992 (per 10,000 population)

Note: In this case, model results must be added to 3.864 to obtain the projected prevalence.

The rationale of the derivation of Model # 2 makes sense to me, and that's why I prefer it. However, there's not a huge difference between the models.

Addendum (4/16/2010)

I forgot to mention that the correlation coefficient R for both models was approximately the same: 0.99993. This is exceedingly good, and better than the fit for CalDDS.


  1. But I could have told you that, this is nothing more than the confirmation of the obvious to the social scientist.

    Who invented the use of statistics to support scientific conclusions, it wasn't Pasteur (with his suppression of equivocal results) was it?

    No it was the sociologists like Comte and Durkheim, so there you have it.

  2. But I could have told you that, this is nothing more than the confirmation of the obvious to the social scientist.

    Not really, Larry. In that case, I could've told me this too. There's a difference between hypothesizing about mechanisms (and we've done that many times before), and making actual projections, or describing mathematically how things probably work.

    And it's true that sociologists would be closest to figuring this sort of thing out (Liu et al. 2010 are clearly on the right track) but they haven't been involved as much, have they?

  3. It was a couple of years ago since I analysed age-distrubtions in Aspie-quiz, along with genealogy data. The long-term trend is clear, and there is a natural amount of Aspie genes in the population that has remained rather constant at least back to the 1800s. I also calculated the final prevalence based on California data, but I can't remember the exact figures.

    There also is a (less clear) trend of a reduction of the natural rate of Aspie traits by 1% per year.

    I'll recollect the data and put it into a blog-entry.