Error in Scientist Mom's Vaccine & Autism Data Analysis

Back in September there was some noise about a post by someone I'll call "Scientist Mom" (apparently she doesn't use a pseudonym at all) titled The Correlation that Does Indicate Causation. I didn't want to even read the post back then because I had a feeling I would become involved in analyzing the data and spend way too much time that I was supposed to spend doing something else. The obvious critique of such an analysis, without knowing much about it, is that it was a pirates vs. global-warming type of correlation. Orac slammed Scientist Mom for it, and rightly so.

In my last post on the (lack of) association between rainfall and autism I had used birth-year data from California. I thought a natural extension of that work was to apply a detrended cross-correlation analysis to the caseload data and Scientist Mom's vaccine data.

Well, to my disappointment, the post doesn't provide any usable vaccine data. It's more of a qualitative analysis, where Scientist Mom just lists vaccines that were recommended during different time periods.

I noticed a significant error in the analysis, however. It has to do with Scientist Mom's key claim:

Most compelling of all, there was no increase in the percentage of autism cases in 2002-2004, when no vaccines were added to the childhood schedule.

I wonder if the error is obvious to some of my readers. If I mention "left censorship" as a hint, do you see the problem now? What if I mention that in my last post I decided to left-censor California birth-year autism caseload such that I only used data up to 2000?

You see, autism prevalence by birth year series always have a hook shape on the right hand side of the graph. It doesn't matter if I survey the prevalence in 2004 or 1994. They always do. The following is an IDEA graph representing prevalence by birth year, as reported in 2001, 2002 and 2003.

Not only is there a natural decline in prevalence by birth year because some autistics are diagnosed late; it's also the case that prevalence by birth year data is not fixed in time. If we request new data from Califonia DDS next year, the data potentially changes in all birth years, and it likely changes considerably in recent birth years.

This is a common mistake. Mark Blaxill has fallen for it. The Geiers have as well, assuming they didn't know what they were doing.

One way to solve the issue is to left-censor the data. Basically, you only consider the birth year data that is more likely to remain stable in the future.

I believe a much better way to solve the issue (although this is not always feasible) is to use data on prevalence by a given age in a given year, e.g. prevalence of 3 year olds in the system in a given year. This data shouldn't change with time. This is the type of data that was used when there was a debate over the expected decline in the California DDS 3-5 caseload. And as you may recall, the 3-5 prevalence continued to increase.

Since California DDS provides birth year data as reported in different years, we can estimate the caseload of autistic 3 year olds from, say, 2000 to 2007. You basically look at each of the 32 files (8 years times 4 quarters) for the years we're interested in, and get the birth-year caseload of the report year minus 3. The resulting graph follows.

This is an approximation, of course. Consider that on 03/2002, the number of children born in 1999 will not be as many as you'll have in 12/2002. Hence the seesaw pattern.

The point is that Scientist Mom is mistaken in her finding that the prevalence of autism dropped or was stable after 2002. This completely undermines her analysis, since that was her key claim.

Is Precipitation Associated with Autism? Now I'm Quite Sure It's Not.

In the last post I attempted to confirm if there was a naive ecological state-level association between precipitation and IDEA autism prevalence. To my surprise, there wasn't, and there was no need to control for urbanicity.

Technically what the result means is that, just considering this one analysis, we can't reject the null hypothesis. Of course, one could argue that state-level data is poor. The confidence interval is too big, and a real effect could easily hide in it. (In part this is what "not being able to prove a negative" means).

So I couldn't leave it at that. I wanted to confirm it in some other way. I remembered I had birth-year caseload data from California DDS dating back to 1920 (contiguous since 1930) that David Kirby had originally requested, and a copy of which I had obtained in order to rebutt one of his posts. This is data from a file called AUT_200703.xls contained in Job5028.zip, which may be requested from California DDS. Corresponding precipitation data is not difficult to obtain.

The year range I will use is 1930 to 2000. (I'm left-censoring autism caseload starting at 2000). For precipitation we have to assume some sort of a lag. I will use precipitation at 1 year of age. The autism and precipitation time series follow.

autism and precipitation california time series

The time series in themselves don't look very promising, do they? But I wanted to apply some math to them in order to confirm if there's at least a trend, even if not a statistically significant one.

Whenever you compare two time series, there's always a possibility that you'll end up with a pirates vs. global-warming type of association. There are different ways to control for this. One that I particularly like is called detrended cross-correlation analysis (Podobnik & Stanley, 2008). Basically, you remove the trends from the series, and then compare them. The reason I like this technique is that it's intuitive, can be illustrated graphically, and is easy for anyone with passing knowledge of Excel formula syntax to reproduce.

Now, one problem is that there isn't something we can call the trend of the time series. There are many different ways to model trends. What we should ideally do is try many different types of trends, e.g. linear, quadratic, and cubic. For simplicity I will skip the linear and quadratic trends (they don't look adequate) and use cubic trend lines, which you can see in the graph above.

The following graph represents the cubic detrending of the original time series.

At this point we can just put the detrended data points in a scatter chart and see if there's an association.

This is the kind of scatter you'd expect to see if you compare two completely independent random variables. That is, you see a random distribution of dots and a linear regression slope that is almost completely horizontal.

Of course, we're still left with the problem of not being able to prove a negative. The slope of the linear regression is 0.11 (0.11 more California autistics for every extra inch of rain in a year) with 95% confidence interval of -3.896 to 4.133.

But I think the scatter graph is compelling. What we see in it is entirely consistent with a complete lack of association between autism and precipitation.

Is Precipitation Associated with Autism? Apparently Not.

A while back I wrote a critique of the TV hypothesis by Waldman et al. I noted the likely confound is population density, which should not be considered a "fixed effect" in Waldman's methodology (an interesting statistical methodology that is apparently used in Economics frequently). When we talk about population density as a confound, we're really using it as a proxy of other confounds that are clearly not fixed in time. These more specific confounds could be things like awareness, availability of autism specialists, etc.

In general, studies like Waldman's and Palmer's likely suffer from the fundamentally incorrect assumption that regional differences in the administrative prevalence of autism reflect a real difference in actual prevalence. But I do believe it is possible to use administrative data to draw preliminary conclusions, so long as confounding factors are accounted for.

My intention in writing this post was to walk through an analysis of publicly available data, controlling for population density, to see if the rainfall effect remained. I fully expected there to be a naive ecological association between precipitation and autism. To my surprise, the effect didn't appear to exist in the first place at the US level, and there was no need to control for confounding.

The following is a scatter graph of annual precipitation by state (1971-2000) vs. the 3-5 IDEA prevalence of autism (estimated for 2006).

scatter graph autism precipitation

There's not even a trend in the expected direction. This is quite the head-scratcher, and it left me wondering what was going on. Why is it unexpected? Let's first look at a population density map of the US.

population density united states

It would be reasonable to expect that counties with a higher concentration of people will have higher rates of autism diagnoses, due to increased awareness and a greater availability of autism specialists. Let's now look at a map of precipitation in the US.

united states precipitation

The correlation between precipitation and population density is quite clear, isn't it? Why didn't we see an association trend in the expected direction in the scatter graph then? First, it seems that a few states bring the slope down. These would be states with a low autism prevalence but high precipitation rates, like Louisiana, Alabama and Mississippi.

That's a bit of bad luck for Waldman et al. Additionally, we don't have that many data points. There's unfortunately too much variability in this US-level data, which makes it pretty inadequate. Perhaps using 6-11 IDEA prevalence would be better than 3-5 prevalence. In any case, it's doubtful statistical significance would be achieved, and even if it were, it is doubtful it could withstand controlling for population density.

I think the association needs to be revisited in a different way. But this exercise left me wondering why Waldman et al. decided to only look at counties from certain states, namely, California, Oregon and Washington (with California not showing a clear association).

I'm going to suggest cherry-picking might have occurred when it comes to Oregon and Washington. In order to argue this point, I will simply post population density and precipitation maps of each of these states. You will see that the pattern in these two states is fairly unique. Most people live in the west side of the state, and that's also where it rains.

oregon population density map

oregon precipitation

washington state population density

To summarize: (1) It was not easy to confirm the reported association. (2) Analysis of any such associations should account for population density. (3) Cherry-picking might have occurred in this particular case.

Are the children of first-generation immigrants more likely to be autistic?

I've known about this for about a year, but I never got around to writing about it. I was reminded of it by a recent David Kirby post where he informs us that "an unusually large proportion of Somali-speaking children in Minnesota have autism." I suppose David Kirby is trying to make this about vaccines. The thing is that a link between autism and immigration has been suspected since at least the 1970s, even though it apparently never became an area of research that interested many investigators. Let me just quote from some abstracts, in chronological order.

Harper & Williams (1976): "In a survey on the occurrence of infantile autism in New South Wales it was found that 21-9% of children had at least one foreign-born parent whose native language was not English."

Gillberg et al. (1987): "Urban children with autism more often than age-matched children in the general population had immigrant parents from 'exotic' countries."

Gillberg et al. (1995): "The prevalence for autistic disorder in Göteborg children born to mothers who were born in Uganda was 15% which is almost 200 times higher than in the general population of children."

Gillberg & Gillberg (1996): "Fifteen of these children (27%) were born to parents, at least one of whom had migrated to Sweden."

Bernard-Opitz et al. (2001): "Discussion focuses on possible risk factors and psychosocial adversities for autism such as a high frequency of caregivers who are foreign maids, the use of multiple languages and the high level of punitive educational practices."

Lauritsen et al. (2005): "An increased relative risk of 1.4 was found if the mother was born outside Europe, and in children of parents who were born in different countries."

Maimburg and Vaeth (2006): "The risk of infantile autism was increased for mothers aged >35 years, with foreign citizenship, and mothers who used medicine during pregnancy."

Kolevzon et al. (2007): "The parental characteristics associated with an increased risk of autism and autism spectrum disorders included advanced maternal age, advanced paternal age, and maternal place of birth outside Europe or North America."

There are reports along these lines from Canada as well. In fact, this was discussed in Interverbal's discussion of critiques of Fombonne et al. (2006).

Now, the first thing we need to ask ourselves about these findings is whether they document an actual phenomenon or an artifact. Is there a confound that explains the apparent association?

But if in fact there's an association, what explains it? We don't really know. You will find some unsubstantiated speculation based on old ideas in the cited abstracts, some speculation that I'm sure many readers will find objectionable. It's not surprising to find these types of explanations in old papers, though. If I may engage in some speculation of my own, based on newer ideas, I would say that maternal stress during gestation – see Kinney et al. (2008) – cannot be discounted.

Finally, I'd like to bring attention to Roberts et al. (2007), a study claiming to associate autism with proximity to agricultural pesticide applications in the California Central Valley. The authors stated that they could not dismiss the possibility that the women studied may be disproportionately employed in agriculture. It just so happens that immigrant women also tend to be disproportionately employed in agriculture.

Anthropogenic Global Warming is Absolutely Occurring

I need to ask for the reader's indulgence, as this post is not about autism, except insofar as determining the merit of correlations has become a perseveration of mine. You see, it is trivial to come up with naive correlations of autism trends vs. practically anything about the modern world. The administrative prevalence of autism has been increasing almost always since records have been kept. Concurrent upward trends of nearly anything, from vaccines to environmental pollution, from trans fats to electromagnetic radiation, and so on, are easy to come by.

In my latest post at LB/RB I suggested that instead of correlating trends in a naive manner, we could attempt to correlate the residuals of time regression models of each trend. A residual is a delta or difference between an observed value and a modeled value. (Here's a concise explanation).

When modeling real world phenomena, regression models will never (or almost never) be perfect fits. For all sorts of reasons, even if simply random fluctuation, there will be deviations from a modeled trend. If there's a causative relationship between two trends, the residuals of (or deviations from) corresponding close-fitting regression models should correlate with one another as well. By this I don't mean that the residuals should always be in the same direction; but they should be in the same direction more often than not, in average.

The nice thing about this technique is that it is completely accessible to anyone with Excel installed. It can also be illustrated graphically, as the reader will see.

So it occurred to me to test this idea in a different field of science where there's controversy over correlation vs. causation. I thought global warming would be a great candidate. After all, the spoof about a decrease in the number of pirates correlating with many other arbitrary trends appears to originate in the global warming debate (see this).

To summarize what I found, there is a strong and statistically significant correlation between cumulative human CO2 emissions and northern hemisphere temperature anomalies. Because of the methodology used, I'm quite confident this cannot be explained by coincidence, data collection errors, solar output as a confound, or causation in the opposite direction.

Now, I fully recognize that I'm only superficially familiar with the debate over anthropogenic global warming. I am also not versed in climatology. Therefore, I cannot be entirely sure that this type of analysis hasn't been done before. Google and Google Scholar searches didn't seem to turn up anything, and given the importance of the topic, I thought it was not only prudent but necessary to put this evidence out there. As always, scrutiny and discussion are welcome.

Northern hemisphere temperature data from 1850 to 2004 was obtained from the Climatic Research Unit of the University of East Anglia, UK.

Global CO2 emission data was obtained from CDIAC. I did not use CO2 atmospheric concentration data because temperature increases can theoretically cause this concentration to increase. Human emissions are what we're interested in. More specifically, I calculated cumulative CO2 emissions for every year since 1850. Greenhouse temperature anomalies are presumably caused by the total amount of CO2 in the atmosphere, not by the emissions in any given year. Since CO2 stays in the atmosphere for 50 to 200 years (source) modeling the cumulative human contribution of CO2 should be adequate enough.

Figure 1 (click to enlarge) is a graph of the general time trends of these two sets of data. It also shows the modeled trend lines we will use to calculate residuals. In this analysis we're using third-order polynomial models. They seem to give a considerably closer fit than second-order polynomial models.

I calculated the residuals and built a scatter graph matching cumulative CO2 (X axis) and temperature (Y axis) residuals for each year from 1850 to 2004. As expected, the slope of a linear regression of the scatter was positive (1.9x10-5) and statistically significant (95% confidence interval 1.13x10-5 to 2.66x10-5).

[Note: Instructions on how to calculate the slope confidence interval of a linear regression with Excel can be found here.]

I suspected, however, that there should be lag between cumulative CO2 fluctuations and temperature fluctuations. It presumably takes some time for heat to be trapped. I proceeded to create a moving average trend line of the temperature residuals. It did in fact have a similar shape to the cumulative CO2 residuals graph, but it appeared to lag it by about 10 years. The reader should be able to roughly see this lag in Figure 1.

So I re-ran the whole analysis by only considering the years 1850 to 1997 and correlating CO2 residuals with residuals of temperature 10 years later. The correlation between these two sets of data is remarkable. Let's start with a bar graph of both sets of residuals, Figure 2.

Figure 2 is a good graph to get a subjective sense of the correlation. Let's see if the math confirms this. Figure 3 is the scatter graph of the residuals.

The slope of a linear regression of the scatter is 2.6x10-5, and it is statistically significant (95% confidence interval 1.88x10-5 to 3.33x10-5). Even the 99.99999999% confidence interval is entirely positive. Unless anthropogenic global warming is a reality, there is no apparent reason why the residuals of cumulative human CO2 emissions should correlate so well with the residuals of temperature 10 years later throughout the last 150 years.

The slope of the scatter is actually more steep than expected, if you consider the naive correlation between cumulative CO2 emissions and temperature. There are probably several reasons for this. The one I believe to be the most likely is that over time CO2 does get removed from the atmosphere. Adding this consideration to the analysis should produce a more accurate slope. The other potential reasons don't bode so well for our species.

Critique of Palmer et al. (2008)

I have posted a critique of Palmer et al. (2008) over at LB/RB. The paper claims to associate autism with coal-fired power-plant emissions, particularly mercury. I argue that the control for urbanicity in the paper is limited. Then I illustrate how it is that population density mediates the correlation in California.

Change in Comment Policy

This doesn't mean that this blog is necessarily becoming active again, but I've decided I need to change my comment policy, as you can see in the blog's description above. The new policy is basically the same lenient policy I've had before specifically designed to encourage critical comments and rebuttals. In general, comments are not deleted unless they clearly violate Blogger's content policy. Messages that violate the policy include things like spam, threats of violence or death, pornography and so forth. In addition to this, from now on I will delete any comments from the following persons:
  • John Best Jr. (AKA Fore Sam)

As everyone knows, John does not contribute to any discussion in any productive way, and simply disrupts comment threads. Furthermore, recently he has made unacceptable personal accusations which he cannot support. For this reason, he is the first commenter to be permanently banned from this blog.

Pleasantly Surprised by David Kirby and Dan Olmsted

Orac over at Respectful Insolence has received a response to his open letter to David Kirby and Dan Olmsted regarding the Seidel Subpoena. Their response reads as follows.
We both take this matter very seriously, and strongly oppose any effort to subpoena the records of Ms. Kathleen Seidel. We have also clearly expressed our feelings to Mr. Shoemaker. While we may not agree with her opinions, we consider Ms. Seidel to be a colleague. Rights to privacy, and to free speech as guaranteed by the First Amendment, must be upheld for all. We urge Mr. Shoemaker to reconsider, and drop this action against Ms. Seidel.

David Kirby
Dan Olmsted

I have to admit that was very big of both of them. They did not try to be apologetic. No "yes - but" or anything like that. For this I applaud them.

I had previously said the letter would be a test of the ethical standards of both journalists. It was. And they passed. I had my doubts that they had it in them. I apologize for any suggestion on my part to that effect.

Conficts of Interest Disclosure

As everyone knows, a couple of worthless pieces of shit from the mercury militia have gone after Kathleen and another blogger by means of legal bullying, with the presumed intention to not only silence two important voices of dissent but also to chill much of the neurodiversity and skeptical blogsphere.

After Kathleen was subpoenaed, she filed a top-notch motion to quash, where she happened to disclose the sort of revenue that is generated by running the site neurodiversity.com (incidentally, apparently less than its maintenance costs). I see that at least D'oC, Kassiane and Liz Ditz have also disclosed their conflicts of interest.

Earlier today Orac over at Respectful Insolence posted An open letter to David Kirby and Dan Olmsted about the Kathleen Seidel subpoena essentially asking Kirby and Olmsted to issue a position statement on what is now being referred to as the "Seidel Subpoena." Evidently, the letter will end up testing the ethical standards of these two autism journalists – as they are often thought of.

In the comments section of Orac's post, the question came up as to David Kirby's and Dan Olmsted's source of income. I think this is a legitimate question. After all, they do not appear to be employed by any major media. They are not parents of autistic children. They are not autistic. So are we to assume they write about autism because of their love of journalism? It doesn't seem plausible to me, and unlike Shoemaker's apparent suspicions regarding Kathleen, it's not necessary to invoke major conspiracy theories to conclude that Kirby and Olmsted must be payed by an organization with significant funds. I believe it would be of interest to find out whether, as some have suggested, David Kirby and/or Dan Olmsted are receiving money from parties with a stake in vaccine litigation.

Of course, neither Kirby nor Olmsted have an obligation to disclose this. No one is going to drag them to court to force them to disclose how they make a living; at least I sincerely hope not. I have decided to disclose my conflicts of interest, nevertheless. Of course, this is all done under the honor system, and that's fine. I'm willing to take Kirby's and Olmsted's word about their disclosures, if any. Other bloggers who write about vaccines and autism (from any side of the debate) should feel free to use my template below.

How much money do I make blogging?

I make exactly $0 (Zero US dollars) blogging about autism. As you can see, I don't have any ads in this blog, nor do I think I would make much money to speak of at my current traffic. I do not get payed to blog by anyone, be it pharmaceutical companies, the government, or the Illuminati. I currently do not take donations.

Why do I blog?

I blog about autism because I am the parent of an autistic child and because I am autistic myself. I also blog about autism because of my interest in science and scientific skepticism.

What is my occupation?

My formal employment is in an area of IT that is not related in any way to autism, disability, health care, vaccines or pharmaceuticals.

What about other income?

Believe it or not, I do not own any stock in any company of any kind, not even through mutual funds or 401Ks or anything of the sort. (I don't own a single credit card either). What can I tell you, I'm a strange guy.

The vast majority of my income comes from my formal employment. I do make perhaps and additional $15-$50 a month from hobbies that have nothing to do with anything I write in this blog. That is the extent of my income.

Clifford Shoemaker, What a Dick, Plus Other Views

I've decided to interrupt this blog's hiatus to bring you an important message; one that is actually uncharacteristic for this blog. And that is this: Clifford Shoemaker is a dick-face. In my opinion, that is.

Now let's go over select opinions on the matter around the blogsphere.

