Thursday, November 06, 2008

Is Precipitation Associated with Autism? Now I'm Quite Sure It's Not.

In the last post I attempted to confirm if there was a naive ecological state-level association between precipitation and IDEA autism prevalence. To my surprise, there wasn't, and there was no need to control for urbanicity.

Technically what the result means is that, just considering this one analysis, we can't reject the null hypothesis. Of course, one could argue that state-level data is poor. The confidence interval is too big, and a real effect could easily hide in it. (In part this is what "not being able to prove a negative" means).

So I couldn't leave it at that. I wanted to confirm it in some other way. I remembered I had birth-year caseload data from California DDS dating back to 1920 (contiguous since 1930) that David Kirby had originally requested, and a copy of which I had obtained in order to rebutt one of his posts. This is data from a file called AUT_200703.xls contained in, which may be requested from California DDS. Corresponding precipitation data is not difficult to obtain.

The year range I will use is 1930 to 2000. (I'm left-censoring autism caseload starting at 2000). For precipitation we have to assume some sort of a lag. I will use precipitation at 1 year of age. The autism and precipitation time series follow.

The time series in themselves don't look very promising, do they? But I wanted to apply some math to them in order to confirm if there's at least a trend, even if not a statistically significant one.

Whenever you compare two time series, there's always a possibility that you'll end up with a pirates vs. global-warming type of association. There are different ways to control for this. One that I particularly like is called detrended cross-correlation analysis (Podobnik & Stanley, 2008). Basically, you remove the trends from the series, and then compare them. The reason I like this technique is that it's intuitive, can be illustrated graphically, and is easy for anyone with passing knowledge of Excel formula syntax to reproduce.

Now, one problem is that there isn't something we can call the trend of the time series. There are many different ways to model trends. What we should ideally do is try many different types of trends, e.g. linear, quadratic, and cubic. For simplicity I will skip the linear and quadratic trends (they don't look adequate) and use cubic trend lines, which you can see in the graph above.

The following graph represents the cubic detrending of the original time series.

At this point we can just put the detrended data points in a scatter chart and see if there's an association.

This is the kind of scatter you'd expect to see if you compare two completely independent random variables. That is, you see a random distribution of dots and a linear regression slope that is almost completely horizontal.

Of course, we're still left with the problem of not being able to prove a negative. The slope of the linear regression is 0.11 (0.11 more California autistics for every extra inch of rain in a year) with 95% confidence interval of -3.896 to 4.133.

But I think the scatter graph is compelling. What we see in it is entirely consistent with a complete lack of association between autism and precipitation.


  1. Very cool... and very much what I was thinking the actual data would show.

  2. I suspect correlation between autism and the increase in the use of mobile phones can easily be found; and the use of the internet; And the increase in the number of Tesco superstores in the UK...

  3. @Socrates: Absolutely. But a couple observations about that:

    (1) An association with the internet is quite plausible. I don't think it's a coincidence that autism diagnoses increased significantly in the 1990s, which is when the internet caught on. The internet drives a lot of awareness I'm sure.

    (2) A detrended cross-correlation analysis is one technique that can be used to deal with this type of pirates vs. global warming association. Of course, it can still be confounded by more than just coincidence.

  4. The Internet would be an interesting variable to study. Though I am not quite sure how one would do the study. Though one idea is to track Usenet postings from about 1995 to 2005. Google has a fairly complete archive that covers that time period, and includes some newer groups.

    Going on my own very poor anecdotal experience, I know it would never be complete. When we got our first modem I stuck to the ISP forums (Compuserve*), and then an email listserv pertaining to my son's disability. There was a definite shift about the time of the turn of the century (Year 2000 crisis!), where there was a change in attitude and awareness (I remember it was when I found as a link on Quackwatch when I was looking into some questionable treatments proposed on the listserv).

    In the last five years there has been a shift towards blogs and forums, with some Yahoo and Google groups. So I believe that searching the might be a valid tactic.

    *Note: An observation of awareness, one of the first things I did when I found the Compuserve Disabilities forum was to ask about my son's disability "dyspraxia". I got several replies, but they were all on "dyslexia"! sigh

    (By the way, it turned out the ADD forum on Compuserve had a "Dyspraxia" section... and no, I not know why. But it was where I found out about the Apraxia-Kids listserv.)

  5. Though one idea is to track Usenet postings from about 1995 to 2005.

    I was thinking exactly that. You could count the number of references to the word "California", for example.

    Usenet is not necessarily representative of internet usage, especially in recent times, but that's where detrending would help.

    Birth year data would not be very useful for such an analysis, though. Autistics born in a given year might have been diagnosed much later. I think it would be necessary to calculate prevalence of autistic 3 year olds in a given year.

  6. Another idea is to compare sugar and fat (should distinguish between O3, O6 and transfats to be useful) consumption respectively (the cholesterol scare and fatless foods fad).

    Notice though that the environmental conditions during your wife's mother's pregnancy may affect your childs health. This is because females get the full complement of eggs before birth.

    I am not sure about the male physiology.

    (Rainfall is not an easy variable as indoor living may confound the issue. I assume you are interested in the vitamin D hypothesis as you studied rainfall patterns.