Monday, March 11, 2013

The Gini coefficient of log normal wealth

I recently learned how to use $\LaTeX$ and I found out how to display it in Blogger.  I use MathJax.  Let me know if it doesn't display properly.  I use a script-blocker, and I had to allow MathJax in order for it to work. (ETA: Also, it apparently doesn't work in my rss feed.)

I might as well take the excuse to write a post with lots of math in it.  I'm going to talk about wealth distribution, because I saw a video about it recently.



(Via Pharyngula) This is not new information to me, because I remember when the study by Norton and Ariely made news in 2011.  But very nice presentation!

The goal of the calculation

Back in 2011, I observed that the wealth distribution (desired, imagined, and actual) appeared to follow a log normal distribution, which is about the simplest distribution of wealth I can think of.  The log normal distribution is $$D(y) = \frac{1}{\sigma \sqrt{2\pi}}e^{-\frac{y^2}{2\sigma^2}}$$ where y is the log of the wealth, D(y) is the density of people with respect to y, and $\sigma$ is the standard deviation.*  Of course, the "density with respect to the log of wealth" is intuitively meaningless, so I'll convert to the density with respect to wealth.  $\sigma$ is also intuitively meaningless, so I'll let $\sigma = \log{N}$.  Here, N is the ratio of wealth between two people who are a standard deviation apart.  For example, if I'm at the mean, and you're one standard deviation above the mean, then you own N times as much wealth.

*Note that I'm setting the mean of y to zero, which is the same as setting the median wealth to 1.  This can always be done by appropriate choice of units.

With these changes, we can rewrite the log normal distribution as $$D(x) = \frac{1}{x\log{N} \sqrt{2\pi}}e^{-\frac{(\log{x})^2}{2(\log{N})^2}}$$ where x is the wealth, and D(x) is the density of people who have wealth x.

This model has a single parameter, N.  However, this is not the standard way of measuring wealth inequality.  The standard way is using the Gini coefficient.  The Gini coefficient is a number between 0 and 1.  0 represents a situation where everyone has exactly equal amount of wealth.  1 represents a situation where one person has all the wealth.  The Gini coefficient is represented graphically here:


The Lorenz curve is the plot of cumulative wealth vs cumulative population when the population is arranged from poorest to wealthiest.  To illustrate, say we're given a percentage, 40%.  So we look at the poorest 40% of the population, and determine what fraction of the total wealth they own.  Say that they own 1%.  The Lorenz curve will contain the point (0.4,0.01).

The Gini coefficient ("G") is defined as twice the area of A.

 So the question I'm going to answer is, how does G relate to N in a log normal wealth distribution?

The calculation
$\DeclareMathOperator{\erf}{erf}$
Starting with a log normal distribution, I'm going to calculate the Gini coefficient.  First thing we need to do is calculate the number of people who own wealth x or less.  Let's call this function P(x) (P stands for population).  We can calculate it from $$P(x) = \int_0^x D(x') \mathrm{d}x'$$ It's simpler to evaluate this integral if we integrate with respect to $y = \log{x}$ rather than x.  So we get $$P(x) = \int_{-\infty}^{\log{x}} D(y) \mathrm{d}y$$ Substituting in D(y), $$P(x) = \int_{-\infty}^{\log{x}} \frac{1}{\log{N} \sqrt{2\pi}}e^{-\frac{y^2}{2(\log{N})^2}} \mathrm{d}y$$ $$P(x) = \frac{1}{2}(\erf{(\frac{\log{x}}{\log{N}\sqrt{2}})} + 1)$$ erf is the error function, which is basically defined as the integral of a normal distribution.

The next thing we need to do is calculate W(x), which is the total amount of wealth owned by people who own wealth x or less.  It can be calculated similarly to P(x). $$W(x) = \int_0^x x' D(x') \mathrm{d}x'$$ $$W(x) = \int_{-\infty}^{\log{x}} e^y D(y) \mathrm{d}y$$ $$W(x) = \int_{-\infty}^{\log{x}} \frac{1}{\log{N} \sqrt{2\pi}}e^{-\frac{y^2}{2(\log{N})^2} + y} \mathrm{d}y$$ $$W(x) = \frac{1}{\log{N} \sqrt{2\pi}} e^{(log{N})^2/2} \int_{-\infty}^{\log{x}} e^{-\frac{(y-(\log{N})^2)^2}{2(\log{N})^2}} \mathrm{d}y$$ $$W(x) = \frac{1}{2} e^{(log{N})^2/2} (\erf{( \frac{\log{x}}{\log{N}\sqrt{2}} - \frac{\log{N}}{\sqrt{2}})} + 1 )$$ For this to really be meaningful, instead of the total wealth, I want to talk about the fraction of the total wealth owned by people who own x or less.  Let's call this fraction F(x).  $$F(x) = \frac{W(x)}{W(\infty)} = \frac{1}{2} (\erf{( \frac{\log{x}}{\log{N}\sqrt{2}} - \frac{\log{N}}{\sqrt{2}})} + 1 )$$ I'm a very visual person, so I'm going to show plots of F(x) and P(x).

In this image, x=1 is the median wealth, and N=2.

Now, what we really want is the Lorenz curve.  As I explained earlier, the Lorenz curve is the cumulative wealth vs the cumulative population when people are sorted from poorest to wealthiest.  By definition, every point (P(x),F(x)) is on the Lorenz curve.  But I'd like an explicit formula, which I'll call L(p).  $$L(p) = F( P^{-1}(p) )$$ At this point, it's just elementary plugging in and simplification.  Skipping to the result, $$L(p) = \frac{1}{2} ( \erf{(\erf{^{-1}(2p-1)} - \frac{\log{N}}{\sqrt{2}} )} + 1)$$ Here is a plot of L(p) for N=2:

G is defined as twice the area of A as shown above.  In mathematical terms, $$G = 1 - 2 \int_0^1 L(p) \mathrm{d}p$$ This simplifies to $$G = -\int_0^1 \erf{(\erf{^{-1}(2p-1)} - \frac{\log{N}}{\sqrt{2}})} \mathrm{d}p$$ And that's where we stop, because this function is not integrable.  Instead I'll use Mathematica to numerically evaluate and plot G as a function of N.

Note that the plot only shows $N \geq 1$ because values of N below 1 are meaningless.

Concluding remarks

Isn't $\LaTeX$ great?  Now I can scare off my readers with math equations that are better formatted than ever!

It's somewhat difficult to find Gini coefficients for the US.  As far as income goes, it's somewhere between .378 and .486 depending on the study.  But the above youtube video is about wealth inequality, which is much greater.  It appears that in 1984, the Gini coefficient was 0.84 in 1989, and 0.801 in 2000 (it's unclear whether this is a change over time, or if it's just from differences between studies).  In any case, it's pretty high.

Previously, I determined that N is about 6.5 in the US, because this led to a distribution that looked rather like the one reported by Norton and Ariely.  The corresponding Gini coefficient is 0.814.  That's quite close!  When people tried guessing the amount of inequality, they came up with a distribution with N about 2.7, which corresponds to G = 0.518.  When people were asked about the ideal amount of inequality, they gave a distribution with N about 1.5, which corresponds to G = 0.226.

I don't really understand the economic significance of wealth inequality (or income inequality for that matter).  But the high degree of inequality in the US is clearly an unhappy situation.

6 comments:

Jeffrey Ellis said...

The high degree of inequality isn't necessarily as bad as you think. The video totally ignores an important point: income mobility. Most people who are poor in a given year are in higher income groups 3, 5, 7, etc. years later.

miller said...

The video is not about income inequality, it's about wealth inequality (which is much larger than income inequality). I'm not sure how much income mobility mitigates income inequality, and I'm not sure that it mitigates wealth inequality at all. (Also, a citation is needed.)

Isaac said...

It's true, in the RSS feeds one sees the $\rm\LaTeX$ source code. It's like when one inserts $\rm\LaTeX$ code in a plain-text email: you may read the source code and visualize the formula in your mind, and you can compile it.

Anonymous said...

I think it is interesting why wealth or income distribution is log-normal. A normal distribution comes from the SUM of many random variables, whatever the individual
distributions of those random variables (within reason). But a log-normal distribution comes from the PRODUCT of many random variables, whatever the individual distributions of those random variables. So a persons income or net worth tends to increase by a percentage for each of many factors, not by a fixed amount of money. We tend to get a percentage increase or decrease for such things as education level, circumstances of birth, disabilities or discrimination, country that we are born in, etc, and income from investments of course increases by a percentage that looks random depending on the type of investment and luck.

Anonymous said...

It shouldn't surprise people that the bottom 20% has almost no wealth. This doesn't include just those on welfare, but anybody who has a negative net worth due to student debt, credit card debt, or because they have a home that is "underwater" meaning that because of the fall of home prices since 2008, their home is worth less than their mortgage debt.

miller said...

This is a interesting discussion that is evolving. I think the distinction between wealth distribution and income distribution is an important one.

To offer a constructive criticism of the model; The wealth distribution would comprise of two set of people; the 'workers' who primarily earn a salary and accumulate wealth from (a relatively small) surplus of earnings over spending, and the 'investors' who primarily earn from their existing wealth, and who's income is inherently proportional to their existing wealth.

The investor group would naturally tend to diverge more extremely, but would be limited to a small percentage of the population, (since you can't a whole population of investors, otherwise nothing is produced!). The 'worker' majority would probably appear relatively flat (and I suspect may perhaps look more like the 'ideal' distribution?).
Somewhere there should be a cross-over between the two groups and it would be interesting to see a salary-investment income model that could show where that crossover lay (perhaps around 80-90%, from looking at the graphs?).

One might expect crashes and inflation to affect the investor group proportionately more than the worker group, and generate some 'reversion to the mean', however taxes must surely do most to average out the difference. But ironically, the biggest beneficiary' of the wealth distribution could be a government which taxes the wealthy with higher rates than the poor, so receives a higher percentage of tax take, the more extreme that the imbalance distribution becomes, (but must remain within the limits where they push those wealthy taxpayers abroad, to another country).