Class GaussianDistribution
- java.lang.Object
-
- smile.stat.distribution.AbstractDistribution
-
- smile.stat.distribution.GaussianDistribution
-
- All Implemented Interfaces:
java.io.Serializable,Distribution,ExponentialFamily
public class GaussianDistribution extends AbstractDistribution implements ExponentialFamily
The normal distribution or Gaussian distribution is a continuous probability distribution that describes data that clusters around a mean. The graph of the associated probability density function is bell-shaped, with a peak at the mean, and is known as the Gaussian function or bell curve. The normal distribution can be used to describe any variable that tends to cluster around the mean.The family of normal distributions is closed under linear transformations. That is, if X is normally distributed, then a linear transform aX + b (for some real numbers a ≠ 0 and b) is also normally distributed. If X1, X2 are two independent normal random variables, then their linear combination will also be normally distributed. The converse is also true: if X1 and X2 are independent and their sum X1 + X2 is distributed normally, then both X1 and X2 must also be normal, which is known as the Cramer's theorem. Of all probability distributions over the reals with mean μ and variance σ2, the normal distribution N(μ, σ2) is the one with the maximum entropy.
The central limit theorem states that under certain, fairly common conditions, the sum of a large number of random variables will have approximately normal distribution. For example if X1, …, Xn is a sequence of iid random variables, each having mean μ and variance σ2 but otherwise distributions of Xi's can be arbitrary, then the central limit theorem states that
√n (1⁄n Σ Xi - μ) → N(0, σ2).
The theorem will hold even if the summands Xi are not iid, although some constraints on the degree of dependence and the growth rate of moments still have to be imposed.
Therefore, certain other distributions can be approximated by the normal distribution, for example:
- The binomial distribution B(n, p) is approximately normal N(np, np(1-p)) for large n and for p not too close to zero or one.
- The Poisson(λ) distribution is approximately normal N(λ, λ) for large values of λ.
- The chi-squared distribution Χ2(k) is approximately normal N(k, 2k) for large k.
- The Student's t-distribution t(ν) is approximately normal N(0, 1) when ν is large.
- Author:
- Haifeng Li
- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description GaussianDistribution(double[] data)Constructor.GaussianDistribution(double mu, double sigma)Constructor
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description doublecdf(double x)Cumulative distribution function.doubleentropy()Shannon entropy of the distribution.static GaussianDistributiongetInstance()doublelogp(double x)The density at x in log scale, which may prevents the underflow problem.Mixture.ComponentM(double[] x, double[] posteriori)The M step in the EM algorithm, which depends the specific distribution.doublemean()The mean of distribution.intnpara()The number of parameters of the distribution.doublep(double x)The probability density function for continuous distribution or probability mass function for discrete distribution at x.doublequantile(double p)The quantile, the probability to the left of quantile(p) is p.doublerand()Uses the Box-Muller algorithm to transform Random.random()'s into Gaussian deviates.doublerandInverseCDF()Uses Inverse CDF method to generate a Gaussian deviate.doublesd()The standard deviation of distribution.java.lang.StringtoString()doublevar()The variance of distribution.-
Methods inherited from class smile.stat.distribution.AbstractDistribution
inverseTransformSampling, likelihood, logLikelihood, quantile, quantile, rejection
-
-
-
-
Constructor Detail
-
GaussianDistribution
public GaussianDistribution(double mu, double sigma)Constructor- Parameters:
mu- mean.sigma- standard deviation.
-
GaussianDistribution
public GaussianDistribution(double[] data)
Constructor. Mean and standard deviation will be estimated from the data by MLE.
-
-
Method Detail
-
getInstance
public static GaussianDistribution getInstance()
-
npara
public int npara()
Description copied from interface:DistributionThe number of parameters of the distribution.- Specified by:
nparain interfaceDistribution
-
mean
public double mean()
Description copied from interface:DistributionThe mean of distribution.- Specified by:
meanin interfaceDistribution
-
var
public double var()
Description copied from interface:DistributionThe variance of distribution.- Specified by:
varin interfaceDistribution
-
sd
public double sd()
Description copied from interface:DistributionThe standard deviation of distribution.- Specified by:
sdin interfaceDistribution
-
entropy
public double entropy()
Description copied from interface:DistributionShannon entropy of the distribution.- Specified by:
entropyin interfaceDistribution
-
toString
public java.lang.String toString()
- Overrides:
toStringin classjava.lang.Object
-
rand
public double rand()
Uses the Box-Muller algorithm to transform Random.random()'s into Gaussian deviates.- Specified by:
randin interfaceDistribution
-
randInverseCDF
public double randInverseCDF()
Uses Inverse CDF method to generate a Gaussian deviate.
-
p
public double p(double x)
Description copied from interface:DistributionThe probability density function for continuous distribution or probability mass function for discrete distribution at x.- Specified by:
pin interfaceDistribution
-
logp
public double logp(double x)
Description copied from interface:DistributionThe density at x in log scale, which may prevents the underflow problem.- Specified by:
logpin interfaceDistribution
-
cdf
public double cdf(double x)
Description copied from interface:DistributionCumulative distribution function. That is the probability to the left of x.- Specified by:
cdfin interfaceDistribution
-
quantile
public double quantile(double p)
The quantile, the probability to the left of quantile(p) is p. This is actually the inverse of cdf. Original algorythm and Perl implementation can be found at: http://www.math.uio.no/~jacklam/notes/invnorm/index.html- Specified by:
quantilein interfaceDistribution
-
M
public Mixture.Component M(double[] x, double[] posteriori)
Description copied from interface:ExponentialFamilyThe M step in the EM algorithm, which depends the specific distribution.- Specified by:
Min interfaceExponentialFamily- Parameters:
x- the input data for estimationposteriori- the posteriori probability.- Returns:
- the (unnormalized) weight of this distribution in the mixture.
-
-