The Poisson Deviance for Regression
You’ve probably heard of the Poisson distribution, a probability distribution often used for modeling counts, that is, positive integer values. Imagine you’re modeling “events”, like the number of customers that walk into a store, or birds that land in a tree in a given hour. That’s what the Poisson is often used for. From the perspective of regression, we have this thing called Generalized Poisson Models (GPMs), where the response variable you are modeling is some kind of Poisson-ish type distribution. In a Poisson distribution, the mean is equal to the variance, but in real life, the variance is often greater (over-dispersed) or less than the mean (under-dispersed).
One of the most intuitive and easy to understand explanations of GPMs is from Sachin Date, who, if you don’t know, explains many difficult topics well. What I want to focus on here is something kind of related — the Poisson deviance, or the Poisson loss function.
Most of the time, your models, whether they are linear models, neural networks, or some tree-based method (XGBoost, Random Forest, etc.) are going to use mean squared error (MSE) or root mean squared error (RMSE) as your objective function. As you might know, MSE tends to favor the median, and RMSE and tends to favor the mean of a conditional distribution — this is why people worry about how the loss function works with the outliers in their data sets. However, most of the time, with some normally distributed response, you expect the values of the response (if z-score normalized, especially), to be…