Class GaussianKernelDistribution

  • All Implemented Interfaces:
    java.io.Serializable, org.apache.commons.math3.analysis.UnivariateFunction, org.apache.commons.math3.distribution.RealDistribution
    Direct Known Subclasses:
    GaussianReflectionKernelDistribution

    public class GaussianKernelDistribution
    extends AnotherAbstractRealDistribution
    Simple gaussian kernel estimator. Adds a gaussian kernel for each data point with specified smoothing parameter #kernelBandwidth

    A precision parameter controls the precision, such that for a precision of 0.1, all sample values between -0.05 and 0.05 will be treated as 0.0.

    The bandwidth of the kernel is adjusted to the data distribution and works best for normally distributed data It used the formula proposed in:

    Scott, D. W. (1992) Multivariate Density Estimation: Theory, Practice, and Visualization. Wiley

    Author:
    Andreas Rogge-Solti
    See Also:
    Serialized Form
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected double h
      smoothing parameter
      protected java.util.Map<java.lang.Long,​java.lang.Double> kernelPointsAndWeights
      This map stores the number of occurrences of values in defined intervals.
      protected org.apache.commons.math3.distribution.NormalDistribution ndist  
      static int NUMBER_OF_BINS
      Grid over the data
      protected double precision
      The precision parameter determines the interval size for kernels for improved efficiency.
      protected java.util.List<java.lang.Double> sampleValues
      All observed values in an array (easier for sampling)
      protected static java.math.MathContext veryPrecise  
      • Fields inherited from class org.apache.commons.math3.distribution.AbstractRealDistribution

        random, randomData, SOLVER_DEFAULT_ABSOLUTE_ACCURACY
    • Constructor Summary

      Constructors 
      Constructor Description
      GaussianKernelDistribution()  
      GaussianKernelDistribution​(double precision)
      Creates a kernel distribution grouping kernels with values falling into the range of the precision parameter into one "bin" with added weight Precision 0.1 for example creates ten bins for one unit, precision 0.5 creates two bins.
    • Field Detail

      • precision

        protected double precision
        The precision parameter determines the interval size for kernels for improved efficiency. We do not store n kernels for n observations, but group kernels falling into a particular interval into one with the weight factor capturing the number of occurrences.

        Change: Make this dynamic depending on the range of values (make it

      • NUMBER_OF_BINS

        public static final int NUMBER_OF_BINS
        Grid over the data
        See Also:
        Constant Field Values
      • kernelPointsAndWeights

        protected java.util.Map<java.lang.Long,​java.lang.Double> kernelPointsAndWeights
        This map stores the number of occurrences of values in defined intervals. The interval size is regulated by the precision argument
      • sampleValues

        protected java.util.List<java.lang.Double> sampleValues
        All observed values in an array (easier for sampling)
      • veryPrecise

        protected static java.math.MathContext veryPrecise
      • h

        protected double h
        smoothing parameter
      • ndist

        protected org.apache.commons.math3.distribution.NormalDistribution ndist
    • Constructor Detail

      • GaussianKernelDistribution

        public GaussianKernelDistribution()
      • GaussianKernelDistribution

        public GaussianKernelDistribution​(double precision)
        Creates a kernel distribution grouping kernels with values falling into the range of the precision parameter into one "bin" with added weight Precision 0.1 for example creates ten bins for one unit, precision 0.5 creates two bins. Values in the interval [0.05,0.05[ fall into bin "0".
        Parameters:
        precision - the interval size to be captured by one bin. Instead of creating n kernels for n values, we reduce the kernel count by grouping similar values and adjusting the weight of the shared kernel.
    • Method Detail

      • addValues

        public void addValues​(double[] values)
      • addValue

        public void addValue​(double val)
      • updateKernels

        protected void updateKernels()
      • updateSmoothingParameter

        public void updateSmoothingParameter()
        Uses the 'rule of thumb' for the kernel bandwith combined with the more robust quantile based approximation.
      • getDoubleArray

        protected double[] getDoubleArray​(java.util.List<java.lang.Double> values)
      • cumulativeProbability

        public double cumulativeProbability​(double arg0,
                                            double arg1)
                                     throws org.apache.commons.math3.exception.NumberIsTooLargeException
        Specified by:
        cumulativeProbability in interface org.apache.commons.math3.distribution.RealDistribution
        Overrides:
        cumulativeProbability in class org.apache.commons.math3.distribution.AbstractRealDistribution
        Throws:
        org.apache.commons.math3.exception.NumberIsTooLargeException
      • density

        public double density​(double x)
      • getSupportLowerBound

        public double getSupportLowerBound()
      • getSupportUpperBound

        public double getSupportUpperBound()
      • isSupportConnected

        public boolean isSupportConnected()
      • isSupportLowerBoundInclusive

        public boolean isSupportLowerBoundInclusive()
      • isSupportUpperBoundInclusive

        public boolean isSupportUpperBoundInclusive()
      • probability

        public double probability​(double x)
        Should use density, as P(X=x) is zero for real-valued distributions
        Specified by:
        probability in interface org.apache.commons.math3.distribution.RealDistribution
        Overrides:
        probability in class org.apache.commons.math3.distribution.AbstractRealDistribution
      • sample

        public double sample()
        Simply select one value from the observations at random and sample from it's Gaussian Kernel heap.
        Specified by:
        sample in interface org.apache.commons.math3.distribution.RealDistribution
        Overrides:
        sample in class org.apache.commons.math3.distribution.AbstractRealDistribution
      • getValues

        public java.util.List<java.lang.Double> getValues()
      • getH

        public double getH()
        The smoothing parameter for the density estimation that depends on the number of nodes and the inter-quartile range
        Returns:
      • getReasonableUpperBound

        public double getReasonableUpperBound()
      • getReasonableLowerBound

        public double getReasonableLowerBound()