
They simulate count data from negative binomial distributions and look at the results from generalized linear models with negative binomial or quasi-poisson error terms (see here for the difference) versus a slew of transformations.

O’Hara and Kotze’s paper takes this question and runs with it. Better to use a poisson or a negative binomial.īut, “Sheesh!”, one might say, “Come on! How different can these models be? I mean, I’m going to get roughly the same answer, right?” For example, count data is discrete, and hence, a normal distribution will never be quite right. More importantly, the error distributions from generalized linear models may often be far far faaar more appropriate to the data you have at hand. “But, hey!” you might say, “Glms and transformed count data should produce the same results, no?”įrom first principles, Jensen’s inequality says no – consider the consequences for error of the transformation approach of log(y) = ax+b+error versus the glm approach y=e^(ax+b)+error. Sure, one has to think more about the particular model and error distribution they specify, but, if you’re not thinking about these things in the first place, why are you doing science? The canonical book on this was first published ’round 1983. What? I’m biased!) whereby one specifies a nonlinear function with a corresponding non-normal error distribution. This has led to decades of thoughtless transformation of count data without any real thought as to the consequences by in-the-field ecologists.īut statistics has had a better answer for decades – generalized linear models ( glm for R nerds, gzlm for SAS goombas who use proc genmod. Or SOMETHING to linearize it before fitting a line and ensure the sacrament of normality is preserved. Always check your data and make sure it is normally distributed! Or, make sure that whatever lines you fit to it have normally distributed error around them! Normal! Normal normal normal!Īnd if you violate normality – say, you have count data with no negative values, and a normal linear regression would create situations where negative values are possible (e.g., what does it mean if you predict negative kelp! ah, the old dreaded nega-kelp), then no worries. If you’re like me, when you learned experimental stats, you were taught to worship at the throne of the Normal Distribution.

A majority of data sets showed differences between transformed and nontransformed data in mean separations determined using LSD (0.05), although most of these differences were minor and had little effect on interpretation of results.OK, so, the title of this article is actually Do not log-transform count data, but, as mentioned, you just can’t resist adding the “bitches” to the end. The arcsine transformation, not generally recommended for data sets having values from 0 to 20% or 80 to 100%, was as effective in correcting non-normality, heterogeneity of variance, and nonadditivity in these data sets as was the recommended square root transformation.

Performing the recommended transformation in conjunction with omitting treatments having identical replicate observations provided a high percentage of correction of non-normality, heterogeneity of variance, and nonadditivity. Transformations appeared to correct deficiencies in these three parameters in the majority of data sets, but had adverse effects in certain other data sets. The arcsine and square root transformations were tested on 82 weed control data sets and 62 winter wheat winter survival data sets to determine effects on normality of the error terms, homogeneity of variance, and additivity of the model.
