The Gaussian Integral and the Gaussian Probability Density Function

Preface

When you study physics, it is common—or at least it was when I was a student—for textbooks to present mathematical identities without explanation. You become accustomed to using tables of integrals and other precalculated artifacts as a means to solve problems without necessarily understanding why a particular identity holds true. Some form of the Gaussian function appears as a probability density function in different corners of physics, usually with little explanation. Integrating it is a necessary part of finding an expected value, but the process is rarely explained.

The Gaussian function has no elementary indefinite integral. To integrate it over a range generally requires numerical integration techniques for finding an approximate solution. It is, however, possible to integrate it from negative infinity to positive infinity. This improper integral is worth understanding because it yields an identity that recurs in multiple contexts. It is also worth understanding how to write a computer program to perform the numerical integration required for other ranges, but that is a topic for another time.

Prerequisites. A knowledge of integral and differential calculus, the exponential function, and basic probability and statistics is required to understand the material.

Revision History
Revision 1.1.12022-02-21 

Changed definition of central moments from singular to plural for consistency with previous wording.

Revision 1.1.02022-02-13 

Added an addendum with the general form of the Gaussian integral and a summary of integrals evaluated in the paper.

Revision 1.0.22022-02-09 

Corrected typo in series expansion of etx. The t! denominator should have been n!.

Revision 1.0.12022-02-02 

Reformatted theorem title.

Revision 1.0.02022-01-30 

First draft.

PDF version of article

The Gaussian Integral and the Gaussian Probability Density Function

Daniel F. Savarese
(Version: 1.1.1, 2022-02-21)
Abstract.

We review moment-generating functions to provide a context for understanding the utility of Gaussian functions and the Gaussian integral. We then explain Gaussian functions as a class of exponential function and demonstrate a common technique for integrating a Gaussian function over (-,) by converting to polar coordinates, Finally, we use a moment-generating function and the Gaussian integral to construct the parameterized form of the Gaussian or normal probability density function.

copyright: ©2022 Daniel F. Savarese

1. Characterizing Probability Distributions

Although we could start by presenting a Gaussian function and proceed by evaluating its integral over the real numbers, that would not provide a context for the exercise. The relevance of the procedure would be lost and the reader would be left with an isolated mathematical process devoid of applicability. Given that the root of Gaussian functions lies in probability theory, where a specific instance defines the so-called normal distribution, we will review the necessary statistical principles to understand the utility of the Gaussian integral.

1.1. Moments

In probability theory, a parameter associated with a probability distribution, such as its mean or variance, is said to be a characteristic of the distribution. A probability distribution is a function that models a so-called population of data points. A population is the hypothetical collection of all possible measurements made under a given set of conditions. It is a theoretical set of data and not the actual set of measurements. To distinguish the characteristics of a sample from the parameters of a distribution, we use the term statistic. Therefore, the population mean is a parameter of the probability distribution and the sample mean is a statistic of the sample set.

To illustrate the difference between a parameter and a statistic, let’s compare the formula for the expected value of a discrete random variable X with probability mass function P(x),

E(X)=μX=n=1NxnP(xn), (1)

to the formula for the expected value of a continuous random variable X with probability density function P(x),

E(X)=μX=-xP(x)dx. (2)

Equation 1 represents a statistic and may have different values for different sample sets. Equation 2 represents a parameter and has a fixed value. Each is a characteristic of its respective discrete and continuous distribution.

A moment is an element of a set of characteristic values that collectively describe the distribution of a random variable, albeit not in a unique manner. Under the right conditions, two random variables with identical moments will have identical probability distributions. That makes it possible to approximate the probability distribution of a random variable using moments.

In the study of statistics, moments fall into two classes: raw and central. A raw moment is computed about the origin and a central moment is computed about the mean. Without proof, we state that if X is a discrete random variable with probability mass function P(x) and g(X) is a function of X, then

E(g(X))=μg(X)=n=1Ng(xn)P(xn); (3)

and if X is a continuous random variable with probability density function P(x) and g(X) is a function of X, then

E(g(X))=μg(X)=-g(x)P(x)dx. (4)

We set g(X)=Xk to find the raw moments of a probability distribution.

The raw moments of a discrete probability distribution are defined as

E(Xk)=μk=n=1NxnkP(xn), (5)

where E(Xk) is the kth moment of discrete random variable X and written as μk. Similarly, the raw moments of a continuous probability distribution for a continuous random variable X are defined as

E(Xk)=μk=-xkP(x)dx. (6)

The central moments are defined analogously, setting g(X)=(X-μ)k to center the moments about the mean μ. The central moments of a discrete random variable are

E([X-μ]k)=μk =n=1N(xn-μ)kP(xn) (7)
and for a continuous random variable they are
E([X-μ]k)=μk =-(x-μ)kP(x)dx. (8)

Note that the first raw moment, μ1, is equal to the mean, μ. The first central moment, μ1, will always equal zero because Equation 8 reduces to μ-μ=0 for k=1. The area under the curve to the left of the mean will always be equal to that to the right for a proper probability density function. The second central moment, μ2, is equal to the population variance. Knowing that the population variance, σ2, of random variable X is E(X2)-E2(X), the variance can be expressed in terms of raw moments as

σ2=μ2-μ12. (9)

In addition, it is possible to define a set of moments about an arbitrary point, a, in which case you would replace μ in Equations 7 and 8 with the point a.

1.2. Moment-Generating Functions

Instead of calculating the moments of a probability distribution on an ad hoc basis, we can identify a function from which the moments can be immediately derived. This so-called moment-generating function exploits the series expansion of etx to create a function that, when successively differentiated at zero, produces the raw moment of a probability distribution corresponding to the level of differentiation. For example, the first derivative at zero produces the first raw moment, the second derivative the second raw moment, and so on. This works by applying Equation 4 to find the expected value of etX, using g(X)=etX and integrating to yield a moment-generating function,

M(t)=E(etX)=-etxP(x)dx. (10)

Expanding etx to its infinite series gives us

M(t) =-(n=0tnxnn!)P(x)dx;
=-(1+tx+t2x22!+t3x33!+)P(x)dx;
=-P(x)dx+t-xP(x)dx+t22!-x2P(x)dx+;
=1+tμ1+t22!μ2+t33!μ3+. (11)

We arrive at Equation 11 by substituting terms using Equation 6.

If we differentiate M(t) with respect to t we get

M(t)=dMdt=μ1+2t2!μ2+3t23!μ3+.

Evaluating M(t) at t=0 eliminates all but the first term, yielding

M(0)=dMdt|t=0=μ1.

Taking the second derivative produces

M′′(t)=d2Mdt2=22!μ2+6t3!μ3+12t24!+;

and evaluating it at t=0 results in

M′′(0)=d2Mdt2|t=0=μ2.

It should be apparent that the kth derivative evaluated at t=0 produces the kth raw moment according to

dkdtkM(t)|t=0=μkwherek>0. (12)

As with Equation 4, if X is a continuous random variable with probability density function P(x) and g(X) is a function of X, then

M(t)=E(etg(X))=-etg(x)P(x)dx. (13)

We can use this formulation to find the moment-generating function of X-μ by setting g(X)=X-μ. This can be useful to derive the variance directly from the second moment instead of using Equation 9. However, we will only be using raw moments.

1.3. Uniform Distribution Moments

So that we can understand how to derive the moment-generating function of the Gaussian distribution in Section 4, we will now work through the example of deriving the moment-generating function of the uniform probability distribution,

P(x)={1b-aforaxb,0forx<aorx>b. (14)

Applying Equation 10 to the unifrom probability density function from Equation 14, we have

M(t)=-etx1b-adx=1b-aabetxdx.

Integrating gives us

M(t)=1b-aetxt|ab=1b-a(ebt-eatt),

which produces the indeterminate form 00 at t=0, requiring us to apply l’Hôpital’s rule as follows:

limt01b-a(ebt-eatt)=limt0bebt-aeatb-a=b-ab-a=1.

This gives us the complete moment-generating function for the uniform probability distribution,

M(t)={ebt-eat(b-a)tfort0,1fort=0. (15)

We can now find the first raw moment of the uniform probability distribution by finding the first derivative with respect to t of its moment-generating function. Using the sum and product rules, we get

M(t)=ddtM(t) =1b-a(ddtebtt-ddteatt);
=1b-a(bebtt-ebtt2-aeatt+eatt2);
=1b-a(bebt-aeatt+eat-ebtt2).

The resulting function is indeterminate at t=0, requiring us to rewrite it to produce a result of the form 00, allowing us to apply l’Hôpital’s rule as follows:

ddtM(t)|t=0 =limt01b-a(btebt-ateat+eat-ebtt2);
=1b-alimt0b2tebt+bebt-a2teat-aeat+aeat-bebt2t;
=1b-alimt0b2tebt-a2teat2t;
=1b-alimt0b2ebt-a2eat2;
=1b-ab2-a22=b+a2.

Swapping the terms from the last result, the first raw moment of the uniform probability distribution is

M(0)=μ1=a+b2. (16)

We can verify the first raw moment is correct by using Equation 2 to find the mean,

μ =-x1b-adx;
=1b-aabxdx;
=1b-ax22|ab;
=1b-ab2-a22=a+b2.

As a final exercise, let’s derive the second raw moment of the uniform probability distribution. Differentiating the first derivative from before, we find the second derivative is

M′′(t)=1b-a(b2ebt-a2eatt-bebt-aeatt2+aeat-bebtt2-2(eat-ebtt3)).

Again, we have a function that cannot be readily evaluated at t=0. As before, we restructure it to allow the application of l’Hôpital’s rule, giving us

d2dt2M(t)|t=0 =1b-alimt0(t2(b2ebt-a2eat)t3-t(bebt-aeat)t3
+t(aeat-bebt)t3-2(eat-ebt)t3);
=1b-alimt0t2(b2ebt-a2eat)+2t(aeat-bebt)+2(ebt-eat)t3;
=1b-alimt0(t2(b3ebt-a3eat)3t2
+2t(b2ebt-a2eat)+2t(a2eat-b2ebt)3t2
+2(aeat-bebt)+2(bebt-aeat)3t2);
=1b-alimt0(b3ebt-a3eat)3;
=b3-a33(b-a)=a2+ab+b23.

Using Equation 9, we can formulate the variance of the uniform probability distribution as

σ2 =a2+ab+b23-(a+b2)2;
=4a2+4ab+4b2-3a2-6ab-3b212;
=a2-2ab+b212=b2-2ab+a212;
=(b-a)212. (17)

2. Gaussian Functions

Mathematics literature uses the term Gaussian function either narrowly—using it exclusively to refer to the Gaussian probability density function—or broadly—using it to refer to a class of exponential functions. We will use the term in its broad sense, starting with the simplest instance, f(x)=e-x2, before examining more complex forms. Unfortunately, the author has failed to ascertain the historical origin of Gaussian functions and how their earliest applications eventually led to the identification of the Gaussian probability density function. Therefore, we will present a pedagogical analysis of how starting from f(x)=e-x2 eventually leads to the Gaussian probability density function.

Generally speaking, a Gaussian function is a function whose natural logarithm is a concave quadratic function. A quadratic function is concave if its second derivative is negative. In essence, it is a downward-growing parabola as in Figure 1.

-1012-101xf(x)=-x2+x+1

Figure 1. A concave quadratic function.

Therefore, we can define a Gaussian function as having the form f(x)=ae-bx2+cx+d, where b is positive in order to ensure the second derivative of the quadratic function is negative. To simplify our discussion, we restrict our scrutiny to quadratic functions of the form -a(x-b)2, giving f(x)=ae-b(x-c)2 as our model Gaussian function. The simplest Gaussian function we can construct sets a=1, b=1, and c=0, leaving us with f(x)=e-x2. We plot this function in Figure 2 and show how it varies by changing the function parameters.

e-x2e-(x-2)2e-x2/4e-x2+12e-x2-4-202400.511.522.5xf(x)

Figure 2. The Gaussian function, f(x)=e-x2, can be stretched, widened or narrowed, and shifted by changing a, b, and c in f(x)=ae-b(x-c)2.

Starting with f(x)=e-x2, we can visually note it is an even function, meaning that f(x)=f(-x). This property means the function is symmetric about the y-axis. More generally, all Gaussian functions are symmetric about their midpoints, with the area under the curve to the left of the midpoint being equal to the area under the curve to the right of the midpoint. We will exploit this property in Section 3 to compute half of the Gaussian integral.

The midpoint of the function can be shifted by changing the value of c, as done in the plot of f(x)=e-(x-2)2 in Figure 2. Notice that the midpoint is shifted by an amount equal to c, shifting in the positive x direction when c is positive and in the negative x direction when c is negative. Changing the value of b expands or contracts the width of the curve. A value greater than one narrows the curve and a value less than one and greater than zero widens the curve, as in the plot of f(x)=e-x2/4. Notice how the base of the curve expands by an amount equal to 1b. The function can be stretched upward or downward by multiplying it by a constant, as in the plot of f(x)=2e-x2 where we see that the maximum of the function changes by a factor of a. The same effect can be achieved by adding a constant k to the quadratic term, as in the plot of f(x)=e-x2+1, where k=1. But that is equivalent to multiplication by ek, which is one reason we chose not to use a general quadratic form for the exponent (another is that it simplifies integration by substitution as we will see in Section 4). It should be clear that ee-x2=e-x2+1 and 2e-x2=eln2e-x2=e-x2+ln2, making an additive term redundant when parameterizing the function and studying the effects of changing parameter values. In Section 4, we will see that parameters a, b, and c posses special meanings with respect to the Gaussian probability distribution.

3. The Gaussian Integral

Recognizing that f(x)=e-x2 ranges in value from 0 to 1, suppose we wanted to use a Gaussian function as a probability density function. Our first step would be to ensure that the integral over (-,) is equal to 1. We would have to integrate the function to determine the area, A, under the curve. If A1, we would normalize the function to 1Af(x) so that its integral over (-,) equaled 1. We will, in fact, do all of this while deriving the Gaussian probability density function. However, embarking on this journey requires evaluating the Gaussian integral, for which we must first take a brief detour.

Solving problems in mathematics often requires what could be called tricks; like when we restructured the derivatives in Section 1.3 so we could apply l’Hôpital’s rule. The Gaussian integral can be solved in various ways, all of which require some trickery. We will be using two tricks, the first of which is based on the seemingly trivial identity x=x2, allowing us to observe that

abf(x)dx =(abf(x)dxabf(x)dx)12 (18)
implies
f(b)-f(a) =(f(b)-f(a))2. (19)

The second trick we will use is to convert from Cartesian to polar coordinates, for which the reader will have to refer to a calculus textbook if the process is unclear.

We wish to evaluate the improper integral

-e-x2dx

to see if the integrand can be adapted for use as a probability density function. We cannot evaluate the integral based on an existing indefinite integral because the integrand has no elementary indefinite integral. To evaluate the improper integral will depend exactly on its improper nature, integrating from negative to positive infinity. First, we apply Equation 18 to rewrite the integral as

-e-x2dx=(-e-x2dx-e-x2dx)12.

Next, we replace one of the x dummy variables with another dummy variable, y, allowing us to rewrite the integral as the following double integral:

(-e-x2dx-e-x2dx)12 =(-e-x2dx-e-y2dy)12;
=(--e-(x2+y2)dxdy)12.

For clarity, we’ll set aside the square root until the end and evaluate the double integral by converting to polar coordinates as follows:

--e-(x2+y2)dxdy =02π0e-r2drdθ;
=02π-12e-r2|0dθ;
=02π12dθ;
=12θ|02π;
=π.

Now we can apply the square root to arrive at our final result,

-e-x2dx=π. (20)

The various transformations we performed were possible because f(x)=e-x2 is a continuous function. As noted in Section 2, f(x)=f(-x) holds true for a Gaussian function. That allows us to infer that

0e-x2dx=π2. (21)

More formally, if f(x) is continuous on the interval (-,) containing point a then

-f(x)dx=-af(x)dx+af(x)dx.

This property of improper integrals allows us to write

-e-x2dx=-0e-x2dx+0e-x2dx.

Observing that f(x)=f(-x) gives us

-0e-x2dx =0e-x2dx;
-e-x2dx =20e-x2dx;
0e-x2dx =12-e-x2dx;
0e-x2dx =π2.

4. The Gaussian Probability Density Function

Having evaluated the most basic Gaussian integral as π, we can start to build a probability density function derived from the simplest Gaussian function, f(x)=e-x2, normalizing it to f(x)=1πe-x2 so that its integral over (-,) evaluates to 1. To investigate whether this is a useful construction, we can derive its moment-generating function and first two raw moments. Using Equation 10, we have

M(t) =1π-etxe-x2dx;
=1π-e-(x2-tx)dx;
=1π-e-(x2-tx)dxe-t2/4e-t2/4;
=et2/4π-e-(x2-tx+t2/4)dx;
=et2/4π-e-(x-t/2)2dx.
Using u=x-t2 and du=1dx, we find
M(t) =et2/4π-e-u2du;
=et2/4ππ;
=et2/4.

Note that we used Equation 20 to evaluate -e-(x-t/2)2dx via substitution.

We now use Equation 12 to find the first two raw moments. The first raw moment is

ddtM(t)|t=0=μ1=limt0ddtet2/4=limt0t2et2/4=0.

A mean of 0 is exactly what we would expect for a probability density function centered about the origin. The second raw moment is

d2dt2M(t)|t=0=μ2=limt0d2dt2et2/4=limt0(12et2/4+t24et2/4)=12,

giving us a variance of

σ2=μ2-μ12=12-0=12

after using Equation 9.

The values of the raw moments we found are constants that don’t really help us toward formulating a general Gaussian probability density function. But they do give us clues about how such a function might be formulated. In Section 2, we saw how the center of a Gaussian function could be shifted by changing the value of the parameter c in f(x)=ae-b(x-c)2. We now have a clue that c is the mean of the distribution because it lies at the midpoint about which the function has symmetric areas. The parameter b may be related in some way to the variance, given how it affects the width of the curve. The parameter a must be a normalizing constant. To test our theory, we can hold these parameters constant, with b being positive, and try to find a moment-generating function parameterized in terms of a, b, and c.

We already know that c does not affect the area under the curve. Therefore we will ignore it and evaluate the integral of f(x)=ae-bx2 over (-,) to find a value of a dependent on b that normalizes the function. That will enable us to find a moment-generating function for the function including c using substitution as we did earlier. Using our earlier work, we can readily evaluate the integral as follows:

-ae-bx2dx =a-e-u2bdu
where u=bx and du=bdx. Substituting Equation 20, we get
ab-e-u2du =abπ=aπb.
Solving for a in
aπb =1
to normalize the function leaves us with
a =bπ. (22)

Now we can use f(x)=bπe-b(x-c)2 as a probability density function and find its moment-generating function. Starting with

M(t) =-etxbπe-b(x-c)2dx;
=bπ-etxe-b(x-c)2dx;
=bπ-e-(bx2-(2bc+t)x+bc2)dx;
=bπ-e-(bx2-(2bc+t)x+bc2)dxe-(ct+t2/4b)e-(ct+t2/4b);
=ect+t2/4bbπ-e-(bx2-(2bc+t)x+bc2+ct+t2/4b)dx;
=ect+t2/4bbπ-e-(bx2-(2bc+t)x+(bc+t/2b)2)dx;
=ect+t2/4bbπ-e-(bx-(bc+t/2b))2dx;
we can substitute u=bx-(bc+t/2b) and du=bdx to get
M(t) =ect+t2/4bbπ-e-u2bdu;
=ect+t2/4bbππb=ect+t2/4b.

The first moment is

ddtM(t)|t=0=μ =limt0ddtect+t2/4b;
=limt0((c+t2b)ect+t2/4b);
=c.

As we suspected, c is equal to the mean, μ. The second moment is

d2dt2M(t)|t=0 =limt0d2dt2ect+t2/4b;
=limt0ddt((c+t2b)ect+t2/4b);
=limt0ddt(cect+t2/4b)+limt0ddt(t2bect+t2/4b);
=limt0(c(c+t2b)ect+t2/4b)+limt0(ect+t2/4b2b+(t2b(c+t2b)ect+t2/4b));
=c2+12b.

We can now find the variance using Equation 9 as

σ2 =μ2-μ2=μ2-c2;
=c2+12b-c2;
=12b.

Solving for b in terms of σ produces

b=12σ2.

Substituting this value into Equation 22 gives us

a=1σ2π.

Replacing a, b, and c in f(x)=ae-b(x-c)2 with their parameterized values produces the probability density function for what is known as the Gaussian or normal distribution,

f(x)=1σ2πe-(x-μ)22σ2. (23)

Notice how Equation 23 is dependent on the mean, μ, and the standard deviation, σ. As mentioned previously, changing μ just moves the center of the distribution left or right. But σ controls both the width and the height of the distribution. When the distribution widens, it gets shorter. When the distribution narrows, it gets taller. The role of σ is to constrain the area under the curve to remain constant independent of the value of σ. Therefore, the Gaussian probability density function represents a family of functions of unit area.

When μ=0 and σ=1, the distribution corresponding to the probability density function is called the standard normal distribution,

f(x)=12πe-x22. (24)

The standard normal distribution’s probability density function is used primarily as a reference function for numerically integrating probabilities that can be scaled to calculate probabilities for other normal distributions based on the value of σ. The probability within n standard deviations of the mean is the same for all normal distributions regardless of the value of σ. Therefore, calculating probabilities for the standard normal distribution allows you to determine them for any normal distribution as long as you express your ranges in relative terms of standard deviations from the mean instead of using absolute numbers.

5. Commentary

The Gaussian probability density function is usually presented as a formula to be used, but not ncessarily understood. Although we attempted to show a step-by-step process from which one can get from f(x)=e-x2 to Equation 23, we did not explain the origin of f(x)=e-x2. Also, we cheated and chose -b(x-c)2 as an exponent instead of -bx2+cx+d with little explanation. At this point, it should be apparent that we needed the exponent to be in the form of a square paralleling -x2 so that we could integrate by substitution.

Ultimately, our goal was to show that the Gaussian probability density function did not sprout out of thin air fully formed. Instead, it evolved from a series of observations about the family of functions derived from f(x)=e-x2. The Gaussian integral formed the core of our exercise because we only needed to evaluate it once and were subsequently able to use it multiple times via substitution, demonstrating its utility.

The Gaussian probability distribution occurs frequently in many contexts as a result of the central limit theorem.

The Central Limit Theorem.

Given a random sample X1,X2,,XN of independent and identically distributed random variables from a distribution with finite mean μ and finite variance σ2, let

DN=n=1NXn-NμσN =Nσ(X¯-μ)𝑤ℎ𝑒𝑟𝑒X¯=1Nn=1NXn.
Then
limNP(DNd) =-d12πe-x22dxfor all d.

In other words, the distribution function DN converges to the standard normal distribution from Equation 24 as N.

For large N—usually greater than 30—one can pretend that X¯ is distributed according to the normal distribution when calculating probabilities such as P(aX¯b). This should not be done recklessly, but does work in many situations. We will not prove the central limit theorem, but familiarity with it goes a long way to understanding why it is so common for textbooks to assume a Gaussian distribution in various contexts.

As a final comment, we explored Gaussian functions and the Gaussian probability distribution function in one dimension. The functions can can be extended to multiple dimensions, where analysis becomes considerably more complicated than what we have explored.

6. Addendum

For completeness, we have decided to demonstrate how to evaluate the integral of f(x)=ae-bx2+cx+d over the real numbers and summarize the integrals we covered in this paper for easy reference.

6.1. General Gaussian Integral

After evaluating the integral of the exponential function of a general concave quadratic function, it should be possible to evaluate the integral of any Gaussian function by simple parameter substitution. As before, the integral will depend on already having arrived at the result of Equation 20 and completing the square to allow integration by substitution as follows:

-ae-bx2+cx+ddx =a-e-(bx2-cx-d)dx;
=a-e-(bx2-cx-d)dxec2/4bec2/4b;
=a-e-(bx2-cx+c2/4b)dxec24b+d;
=aec24b+d-e-(bx-c2b)2dx;
substituting u=bx-c2b and du=bdx to get
=aec24b+d-e-u2bdu;
=aπbec24b+d.

6.2. Summary of Gaussian Integrals

We now summarize the results of evaluating the Gaussian integrals in this paper:

-e-x2dx =π;
0e-x2dx =π2;
-ae-bx2dx =aπb;
-ae-b(x-c)2dx =aπb;
-ae-(x-μ)22σ2dx =aσ2π;
and
-ae-bx2+cx+ddx =aπbec24b+d.

All you really need is the final identity, from which all the others can be derived by substituting the appropriate values for the equation’s parameters. Note, however, that we needed to evaluate the first integral in order to evaluate all of the others.