Zero over non zero limits

#ZERO OVER NON ZERO LIMITS HOW TO#

However, one or the other models may be more appropriate depending on the nature of the experimental design and the outcome data being observed ( 2). The distinction between structural and sampling zeros, and hence between zero-inflated and hurdle models, may seem subtle. If a subject is considered a smoker, they do not have the ‘ability’ to score zero cigarettes smoked during the last month and will always score a positive number of cigarettes in a hurdle model with either truncated Poisson or truncated negative binomial distributions. Hence the zero observations can come from only one “structural” source, the non-smokers. In this case, it is safe to assume that only non-smokers will smoke zero cigarettes during the last month and smokers will score some positive (non-zero) number of cigarettes during last month. For example, consider a study of cocaine users in which a secondary outcome is a number of tobacco cigarettes smoked during last month.

The positive (i.e., non-zero) data have “sampling” origin, following either truncated Poisson ( Figure 1c) or truncated negative-binomial distribution ( 7). In contrast, a hurdle model (see Figure 1c for illustration of a Poisson hurdle) assumes that all zero data are from one “structural” source. That is, their risk behavior is assumed to be on a Poisson or negative binomial distribution that includes both zero (the “sampling zeros”) and non-zero counts. Others participants have sexual partners but score zero because they have eliminated their high-risk behavior. For example, if a count of high-risk sexual behaviors is the outcome, some participants may score zero because they do not have a sexual partner these are the structural zeros since they cannot exhibit unprotected sexual behavior. Zero-inflated models assume that some zeros are observed due to some specific structure in the data. The sampling zeros are due to the usual Poisson (or negative binomial) distribution, which assumes that those zero observations happened by chance. Figure 1b shows a zero-inflated Poisson model with the zero observations split due to their structural (dark grey portion of the zero bar let's call them “structural zeros”) or sampling origin (light grey portion of the zero bar let's call them “sampling zeros”). Both (zero-inflated and hurdle) models deal with the high occurrence of zeros in the observed data but have one important distinction in how they interpret and analyze zero counts.Ī zero-inflated model assumes that the zero observations have two different origins: “structural” and “sampling”. Zero-inflated ( 8) and “hurdle” ( 7) models (each assuming either the Poisson or negative binomial distribution of the outcome) have been developed to cope with zero-inflated outcome data with over-dispersion (negative binomial) or without (Poisson distribution) (see Figures 1b and 1c).

#ZERO OVER NON ZERO LIMITS HOW TO#

The purpose of this paper is to illustrate the differences between these distributions and models and to explore how to compare different models using data from a multi-site clinical trial of behavioral interventions to reduce episodes of HIV risk behavior (CTN-0019) conducted through the National Institute on Drug Abuse Clinical Trials Network ( 1). Previous reports have compared Poisson, negative binomial, zero-inflated and hurdle models applied to various outcomes, including counts of adverse events related to a vaccine ( 2), hospital stays ( 3) ( 4), and traffic accidents ( 5). These models have all the flexibility and power of parametric models, handling repeated measures, multiple covariates, and various configurations of fixed and random effects, while assuming that the outcome has different than normal distribution (Poisson, negative binomial, etc). The last several decades have therefore seen the growing availability in standard statistical packages of parametric models (i.e., Mplus, R, SAS, Splus, Stata) for non-normally distributed data, including Poisson, negative binomial, zero-inflated, and hurdle models.

What this means, in practical terms, is that the size of the effect of treatment and its statistical significance are either over-estimated or underestimated, neither of which is good. Ordinary least squares models, of which t-tests, ANOVA and ANCOVA are special cases, assumes that the outcome is normally distributed and may yield a biased estimate of the effect of a treatment (and of other factors) if that assumption is violated. These generally are not normally distributed. episodes of drug use, episodes of risky sex per month).

In controlled clinical trials, outcome variables often take the form of integers or counts, such as number of symptoms or number of risk behaviors during some defined time period (e.g.