Monday, March 11, 2013

The Gini coefficient of log normal wealth

I recently learned how to use LATEX and I found out how to display it in Blogger.  I use MathJax.  Let me know if it doesn't display properly.  I use a script-blocker, and I had to allow MathJax in order for it to work. (ETA: Also, it apparently doesn't work in my rss feed.)

I might as well take the excuse to write a post with lots of math in it.  I'm going to talk about wealth distribution, because I saw a video about it recently.



(Via Pharyngula) This is not new information to me, because I remember when the study by Norton and Ariely made news in 2011.  But very nice presentation!

The goal of the calculation

Back in 2011, I observed that the wealth distribution (desired, imagined, and actual) appeared to follow a log normal distribution, which is about the simplest distribution of wealth I can think of.  The log normal distribution is D(y)=1σ2πey22σ2
where y is the log of the wealth, D(y) is the density of people with respect to y, and σ is the standard deviation.*  Of course, the "density with respect to the log of wealth" is intuitively meaningless, so I'll convert to the density with respect to wealth.  σ is also intuitively meaningless, so I'll let σ=logN.  Here, N is the ratio of wealth between two people who are a standard deviation apart.  For example, if I'm at the mean, and you're one standard deviation above the mean, then you own N times as much wealth.

*Note that I'm setting the mean of y to zero, which is the same as setting the median wealth to 1.  This can always be done by appropriate choice of units.

With these changes, we can rewrite the log normal distribution as D(x)=1xlogN2πe(logx)22(logN)2
where x is the wealth, and D(x) is the density of people who have wealth x.

This model has a single parameter, N.  However, this is not the standard way of measuring wealth inequality.  The standard way is using the Gini coefficient.  The Gini coefficient is a number between 0 and 1.  0 represents a situation where everyone has exactly equal amount of wealth.  1 represents a situation where one person has all the wealth.  The Gini coefficient is represented graphically here:


The Lorenz curve is the plot of cumulative wealth vs cumulative population when the population is arranged from poorest to wealthiest.  To illustrate, say we're given a percentage, 40%.  So we look at the poorest 40% of the population, and determine what fraction of the total wealth they own.  Say that they own 1%.  The Lorenz curve will contain the point (0.4,0.01).

The Gini coefficient ("G") is defined as twice the area of A.

 So the question I'm going to answer is, how does G relate to N in a log normal wealth distribution?

The calculation

Starting with a log normal distribution, I'm going to calculate the Gini coefficient.  First thing we need to do is calculate the number of people who own wealth x or less.  Let's call this function P(x) (P stands for population).  We can calculate it from P(x)=x0D(x)dx
It's simpler to evaluate this integral if we integrate with respect to y=logx rather than x.  So we get P(x)=logxD(y)dy
Substituting in D(y), P(x)=logx1logN2πey22(logN)2dy
P(x)=12(erf(logxlogN2)+1)
erf is the error function, which is basically defined as the integral of a normal distribution.

The next thing we need to do is calculate W(x), which is the total amount of wealth owned by people who own wealth x or less.  It can be calculated similarly to P(x). W(x)=x0xD(x)dx
W(x)=logxeyD(y)dy
W(x)=logx1logN2πey22(logN)2+ydy
W(x)=1logN2πe(logN)2/2logxe(y(logN)2)22(logN)2dy
W(x)=12e(logN)2/2(erf(logxlogN2logN2)+1)
For this to really be meaningful, instead of the total wealth, I want to talk about the fraction of the total wealth owned by people who own x or less.  Let's call this fraction F(x).  F(x)=W(x)W()=12(erf(logxlogN2logN2)+1)
I'm a very visual person, so I'm going to show plots of F(x) and P(x).

In this image, x=1 is the median wealth, and N=2.

Now, what we really want is the Lorenz curve.  As I explained earlier, the Lorenz curve is the cumulative wealth vs the cumulative population when people are sorted from poorest to wealthiest.  By definition, every point (P(x),F(x)) is on the Lorenz curve.  But I'd like an explicit formula, which I'll call L(p).  L(p)=F(P1(p))
At this point, it's just elementary plugging in and simplification.  Skipping to the result, L(p)=12(erf(erf1(2p1)logN2)+1)
Here is a plot of L(p) for N=2:

G is defined as twice the area of A as shown above.  In mathematical terms, G=1210L(p)dp
This simplifies to G=10erf(erf1(2p1)logN2)dp
And that's where we stop, because this function is not integrable.  Instead I'll use Mathematica to numerically evaluate and plot G as a function of N.

Note that the plot only shows N1 because values of N below 1 are meaningless.

Concluding remarks

Isn't LATEX great?  Now I can scare off my readers with math equations that are better formatted than ever!

It's somewhat difficult to find Gini coefficients for the US.  As far as income goes, it's somewhere between .378 and .486 depending on the study.  But the above youtube video is about wealth inequality, which is much greater.  It appears that in 1984, the Gini coefficient was 0.84 in 1989, and 0.801 in 2000 (it's unclear whether this is a change over time, or if it's just from differences between studies).  In any case, it's pretty high.

Previously, I determined that N is about 6.5 in the US, because this led to a distribution that looked rather like the one reported by Norton and Ariely.  The corresponding Gini coefficient is 0.814.  That's quite close!  When people tried guessing the amount of inequality, they came up with a distribution with N about 2.7, which corresponds to G = 0.518.  When people were asked about the ideal amount of inequality, they gave a distribution with N about 1.5, which corresponds to G = 0.226.

I don't really understand the economic significance of wealth inequality (or income inequality for that matter).  But the high degree of inequality in the US is clearly an unhappy situation.