Class Evaluation

  • All Implemented Interfaces:
    Summarizable

    public class Evaluation
    extends java.lang.Object
    implements Summarizable
    Class for evaluating machine learning models.

    -------------------------------------------------------------------

    General options when evaluating a learning scheme from the command-line:

    -t filename
    Name of the file with the training data. (required)

    -T filename
    Name of the file with the test data. If missing a cross-validation is performed.

    -c index
    Index of the class attribute (1, 2, ...; default: last).

    -x number
    The number of folds for the cross-validation (default: 10).

    -s seed
    Random number seed for the cross-validation (default: 1).

    -m filename
    The name of a file containing a cost matrix.

    -l filename
    Loads classifier from the given file.

    -d filename
    Saves classifier built from the training data into the given file.

    -v
    Outputs no statistics for the training data.

    -o
    Outputs statistics only, not the classifier.

    -i
    Outputs information-retrieval statistics per class.

    -k
    Outputs information-theoretic statistics.

    -p range
    Outputs predictions for test instances, along with the attributes in the specified range (and nothing else). Use '-p 0' if no attributes are desired.

    -r
    Outputs cumulative margin distribution (and nothing else).

    -g
    Only for classifiers that implement "Graphable." Outputs the graph representation of the classifier (and nothing else).

    -------------------------------------------------------------------

    Example usage as the main of a classifier (called FunkyClassifier):

     public static void main(String [] args) {
       try {
         Classifier scheme = new FunkyClassifier();
         System.out.println(Evaluation.evaluateModel(scheme, args));
       } catch (Exception e) {
         System.err.println(e.getMessage());
       }
     }
     

    ------------------------------------------------------------------

    Example usage from within an application:

     Instances trainInstances = ... instances got from somewhere
     Instances testInstances = ... instances got from somewhere
     Classifier scheme = ... scheme got from somewhere
    
     Evaluation evaluation = new Evaluation(trainInstances);
     evaluation.evaluateModel(scheme, testInstances);
     System.out.println(evaluation.toSummaryString());
     
    Version:
    $Revision: 1.53.2.6 $
    Author:
    Eibe Frank (eibe@cs.waikato.ac.nz), Len Trigg (trigg@cs.waikato.ac.nz)
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static double[] acc  
      protected static int k_MarginResolution
      Resolution of the margin histogram
      protected boolean m_ClassIsNominal
      Is the class nominal or numeric?
      protected java.lang.String[] m_ClassNames
      The names of the classes.
      protected double[] m_ClassPriors
      The prior probabilities of the classes
      protected double m_ClassPriorsSum
      The sum of counts for priors
      protected double[][] m_ConfusionMatrix
      Array for storing the confusion matrix.
      protected double m_Correct
      The weight of all correctly classified instances.
      protected CostMatrix m_CostMatrix
      The cost matrix (if given).
      protected Estimator m_ErrorEstimator
      Numeric class error estimator for scheme
      protected double m_Incorrect
      The weight of all incorrectly classified instances.
      protected double[] m_MarginCounts
      Cumulative margin distribution
      protected double m_MissingClass
      The weight of all instances that had no class assigned to them.
      protected boolean m_NoPriors
      enables/disables the use of priors, e.g., if no training set is present in case of de-serialized schemes
      protected int m_NumClasses
      The number of classes.
      protected int m_NumFolds
      The number of folds for a cross-validation.
      protected int m_NumTrainClassVals
      Number of non-missing class training instances seen
      protected Estimator m_PriorErrorEstimator
      Numeric class error estimator for prior
      protected double m_SumAbsErr
      Sum of absolute errors.
      protected double m_SumClass
      Sum of class values.
      protected double m_SumClassPredicted
      Sum of predicted * class values.
      protected double m_SumErr
      Sum of errors.
      protected double m_SumKBInfo
      Total Kononenko & Bratko Information
      protected double m_SumPredicted
      Sum of predicted values.
      protected double m_SumPriorAbsErr
      Sum of absolute errors of the prior
      protected double m_SumPriorEntropy
      Total entropy of prior predictions
      protected double m_SumPriorSqrErr
      Sum of absolute errors of the prior
      protected double m_SumSchemeEntropy
      Total entropy of scheme predictions
      protected double m_SumSqrClass
      Sum of squared class values.
      protected double m_SumSqrErr
      Sum of squared errors.
      protected double m_SumSqrPredicted
      Sum of squared predicted values.
      protected double m_TotalCost
      The total cost of predictions (includes instance weights)
      protected double[] m_TrainClassVals
      Array containing all numeric training class values seen
      protected double[] m_TrainClassWeights
      Array containing all numeric training class weights
      protected double m_Unclassified
      The weight of all unclassified instances.
      protected double m_WithClass
      The weight of all instances that had a class assigned to them.
      protected static double MIN_SF_PROB
      The minimum probablility accepted from an estimator to avoid taking log(0) in Sf calculations.
    • Constructor Summary

      Constructors 
      Constructor Description
      Evaluation​(Instances data)
      Initializes all the counters for the evaluation.
      Evaluation​(Instances data, CostMatrix costMatrix)
      Initializes all the counters for the evaluation and also takes a cost matrix as parameter.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      protected void addNumericTrainClass​(double classValue, double weight)
      Adds a numeric (non-missing) training class value and weight to the buffer of stored values.
      protected static java.lang.String attributeValuesString​(Instance instance, Range attRange)
      Builds a string listing the attribute values in a specified range of indices, separated by commas and enclosed in brackets.
      double avgCost()
      Gets the average cost, that is, total cost of misclassifications (incorrect plus unclassified) over the total number of instances.
      double[][] confusionMatrix()
      Returns a copy of the confusion matrix.
      double correct()
      Gets the number of instances correctly classified (that is, for which a correct prediction was made).
      double correlationCoefficient()
      Returns the correlation coefficient if the class is numeric.
      void crossValidateModel​(java.lang.String classifierString, Instances data, int numFolds, java.lang.String[] options, java.util.Random random)
      Performs a (stratified if class is nominal) cross-validation for a classifier on a set of instances.
      void crossValidateModel​(Classifier classifier, Instances data, int numFolds, java.util.Random random)
      Performs a (stratified if class is nominal) cross-validation for a classifier on a set of instances.
      double[] doEvaluateModelOnce​(Classifier classifier, Instance instance)  
      boolean equals​(java.lang.Object obj)
      Tests whether the current evaluation object is equal to another evaluation object
      double errorRate()
      Returns the estimated error rate or the root mean squared error (if the class is numeric).
      static java.lang.String evaluateModel​(java.lang.String classifierString, java.lang.String[] options)
      Evaluates a classifier with the options given in an array of strings.
      static java.lang.String evaluateModel​(Classifier classifier, java.lang.String[] options)
      Evaluates a classifier with the options given in an array of strings.
      double[] evaluateModel​(Classifier classifier, Instances data)
      Evaluates the classifier on a given set of instances.
      double evaluateModelOnce​(double[] dist, Instance instance)
      Evaluates the supplied distribution on a single instance.
      void evaluateModelOnce​(double prediction, Instance instance)
      Evaluates the supplied prediction on a single instance.
      double evaluateModelOnce​(Classifier classifier, Instance instance)  
      double falseNegativeRate​(int classIndex)
      Calculate the false negative rate with respect to a particular class.
      double falsePositiveRate​(int classIndex)
      Calculate the false positive rate with respect to a particular class.
      double fMeasure​(int classIndex)
      Calculate the F-Measure with respect to a particular class.
      protected static CostMatrix handleCostOption​(java.lang.String costFileName, int numClasses)
      Attempts to load a cost matrix.
      double incorrect()
      Gets the number of instances incorrectly classified (that is, for which an incorrect prediction was made).
      double kappa()
      Returns value of kappa statistic if class is nominal.
      double KBInformation()
      Return the total Kononenko & Bratko Information score in bits
      double KBMeanInformation()
      Return the Kononenko & Bratko Information score in bits per instance.
      double KBRelativeInformation()
      Return the Kononenko & Bratko Relative Information score
      static void main​(java.lang.String[] args)
      A test method for this class.
      protected double[] makeDistribution​(double predictedClass)
      Convert a single prediction into a probability distribution with all zero probabilities except the predicted value which has probability 1.0;
      protected static java.lang.String makeOptionString​(Classifier classifier)
      Make up the help string giving all the command line options
      double meanAbsoluteError()
      Returns the mean absolute error.
      double meanPriorAbsoluteError()
      Returns the mean absolute error of the prior.
      protected java.lang.String num2ShortID​(int num, char[] IDChars, int IDWidth)
      Method for generating indices for the confusion matrix.
      double numFalseNegatives​(int classIndex)
      Calculate number of false negatives with respect to a particular class.
      double numFalsePositives​(int classIndex)
      Calculate number of false positives with respect to a particular class.
      double numInstances()
      Gets the number of test instances that had a known class value (actually the sum of the weights of test instances with known class value).
      double numTrueNegatives​(int classIndex)
      Calculate the number of true negatives with respect to a particular class.
      double numTruePositives​(int classIndex)
      Calculate the number of true positives with respect to a particular class.
      double pctCorrect()
      Gets the percentage of instances correctly classified (that is, for which a correct prediction was made).
      double pctIncorrect()
      Gets the percentage of instances incorrectly classified (that is, for which an incorrect prediction was made).
      double pctUnclassified()
      Gets the percentage of instances not classified (that is, for which no prediction was made by the classifier).
      double precision​(int classIndex)
      Calculate the precision with respect to a particular class.
      protected static java.lang.String printClassifications​(Classifier classifier, Instances train, java.lang.String testFileName, int classIndex, Range attributesToOutput)
      Prints the predictions for the given dataset into a String variable.
      double priorEntropy()
      Calculate the entropy of the prior distribution
      double recall​(int classIndex)
      Calculate the recall with respect to a particular class.
      double relativeAbsoluteError()
      Returns the relative absolute error.
      double rootMeanPriorSquaredError()
      Returns the root mean prior squared error.
      double rootMeanSquaredError()
      Returns the root mean squared error.
      double rootRelativeSquaredError()
      Returns the root relative squared error if the class is numeric.
      protected void setNumericPriorsFromBuffer()
      Sets up the priors for numeric class attributes from the training class values that have been seen so far.
      void setPriors​(Instances train)
      Sets the class prior probabilities
      double SFEntropyGain()
      Returns the total SF, which is the null model entropy minus the scheme entropy.
      double SFMeanEntropyGain()
      Returns the SF per instance, which is the null model entropy minus the scheme entropy, per instance.
      double SFMeanPriorEntropy()
      Returns the entropy per instance for the null model
      double SFMeanSchemeEntropy()
      Returns the entropy per instance for the scheme
      double SFPriorEntropy()
      Returns the total entropy for the null model
      double SFSchemeEntropy()
      Returns the total entropy for the scheme
      java.lang.String toClassDetailsString()
      Generates a breakdown of the accuracy for each class (with default title), incorporating various information-retrieval statistics, such as true/false positive rate, precision/recall/F-Measure.
      java.lang.String toClassDetailsString​(java.lang.String title)
      Generates a breakdown of the accuracy for each class, incorporating various information-retrieval statistics, such as true/false positive rate, precision/recall/F-Measure.
      java.lang.String toCumulativeMarginDistributionString()
      Output the cumulative margin distribution as a string suitable for input for gnuplot or similar package.
      java.lang.String toMatrixString()
      Calls toMatrixString() with a default title.
      java.lang.String toMatrixString​(java.lang.String title)
      Outputs the performance statistics as a classification confusion matrix.
      java.lang.String toSummaryString()
      Calls toSummaryString() with no title and no complexity stats
      java.lang.String toSummaryString​(boolean printComplexityStatistics)
      Calls toSummaryString() with a default title.
      java.lang.String toSummaryString​(java.lang.String title, boolean printComplexityStatistics)
      Outputs the performance statistics in summary form.
      double totalCost()
      Gets the total cost, that is, the cost of each prediction times the weight of the instance, summed over all instances.
      double trueNegativeRate​(int classIndex)
      Calculate the true negative rate with respect to a particular class.
      double truePositiveRate​(int classIndex)
      Calculate the true positive rate with respect to a particular class.
      double unclassified()
      Gets the number of instances not classified (that is, for which no prediction was made by the classifier).
      protected void updateMargins​(double[] predictedDistribution, int actualClass, double weight)
      Update the cumulative record of classification margins
      protected void updateNumericScores​(double[] predicted, double[] actual, double weight)
      Update the numeric accuracy measures.
      void updatePriors​(Instance instance)
      Updates the class prior probabilities (when incrementally training)
      protected void updateStatsForClassifier​(double[] predictedDistribution, Instance instance)
      Updates all the statistics about a classifiers performance for the current test instance.
      protected void updateStatsForPredictor​(double predictedValue, Instance instance)
      Updates all the statistics about a predictors performance for the current test instance.
      void useNoPriors()
      disables the use of priors, e.g., in case of de-serialized schemes that have no access to the original training set, but are evaluated on a set set.
      protected static java.lang.String wekaStaticWrapper​(Sourcable classifier, java.lang.String className)
      Wraps a static classifier in enough source to test using the weka class libraries.
      • Methods inherited from class java.lang.Object

        clone, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • m_NumClasses

        protected int m_NumClasses
        The number of classes.
      • m_NumFolds

        protected int m_NumFolds
        The number of folds for a cross-validation.
      • m_Incorrect

        protected double m_Incorrect
        The weight of all incorrectly classified instances.
      • m_Correct

        protected double m_Correct
        The weight of all correctly classified instances.
      • m_Unclassified

        protected double m_Unclassified
        The weight of all unclassified instances.
      • m_MissingClass

        protected double m_MissingClass
        The weight of all instances that had no class assigned to them.
      • m_WithClass

        protected double m_WithClass
        The weight of all instances that had a class assigned to them.
      • m_ConfusionMatrix

        protected double[][] m_ConfusionMatrix
        Array for storing the confusion matrix.
      • m_ClassNames

        protected java.lang.String[] m_ClassNames
        The names of the classes.
      • m_ClassIsNominal

        protected boolean m_ClassIsNominal
        Is the class nominal or numeric?
      • m_ClassPriors

        protected double[] m_ClassPriors
        The prior probabilities of the classes
      • m_ClassPriorsSum

        protected double m_ClassPriorsSum
        The sum of counts for priors
      • m_CostMatrix

        protected CostMatrix m_CostMatrix
        The cost matrix (if given).
      • m_TotalCost

        protected double m_TotalCost
        The total cost of predictions (includes instance weights)
      • m_SumErr

        protected double m_SumErr
        Sum of errors.
      • m_SumAbsErr

        protected double m_SumAbsErr
        Sum of absolute errors.
      • m_SumSqrErr

        protected double m_SumSqrErr
        Sum of squared errors.
      • m_SumClass

        protected double m_SumClass
        Sum of class values.
      • m_SumSqrClass

        protected double m_SumSqrClass
        Sum of squared class values.
      • m_SumPredicted

        protected double m_SumPredicted
        Sum of predicted values.
      • m_SumSqrPredicted

        protected double m_SumSqrPredicted
        Sum of squared predicted values.
      • m_SumClassPredicted

        protected double m_SumClassPredicted
        Sum of predicted * class values.
      • m_SumPriorAbsErr

        protected double m_SumPriorAbsErr
        Sum of absolute errors of the prior
      • m_SumPriorSqrErr

        protected double m_SumPriorSqrErr
        Sum of absolute errors of the prior
      • m_SumKBInfo

        protected double m_SumKBInfo
        Total Kononenko & Bratko Information
      • k_MarginResolution

        protected static int k_MarginResolution
        Resolution of the margin histogram
      • m_MarginCounts

        protected double[] m_MarginCounts
        Cumulative margin distribution
      • m_NumTrainClassVals

        protected int m_NumTrainClassVals
        Number of non-missing class training instances seen
      • m_TrainClassVals

        protected double[] m_TrainClassVals
        Array containing all numeric training class values seen
      • m_TrainClassWeights

        protected double[] m_TrainClassWeights
        Array containing all numeric training class weights
      • m_PriorErrorEstimator

        protected Estimator m_PriorErrorEstimator
        Numeric class error estimator for prior
      • m_ErrorEstimator

        protected Estimator m_ErrorEstimator
        Numeric class error estimator for scheme
      • MIN_SF_PROB

        protected static final double MIN_SF_PROB
        The minimum probablility accepted from an estimator to avoid taking log(0) in Sf calculations.
        See Also:
        Constant Field Values
      • m_SumPriorEntropy

        protected double m_SumPriorEntropy
        Total entropy of prior predictions
      • m_SumSchemeEntropy

        protected double m_SumSchemeEntropy
        Total entropy of scheme predictions
      • m_NoPriors

        protected boolean m_NoPriors
        enables/disables the use of priors, e.g., if no training set is present in case of de-serialized schemes
      • acc

        public static double[] acc
    • Constructor Detail

      • Evaluation

        public Evaluation​(Instances data)
                   throws java.lang.Exception
        Initializes all the counters for the evaluation. Use useNoPriors() if the dataset is the test set and you can't initialize with the priors from the training set via setPriors(Instances).
        Parameters:
        data - set of training instances, to get some header information and prior class distribution information
        Throws:
        java.lang.Exception - if the class is not defined
        See Also:
        useNoPriors(), setPriors(Instances)
      • Evaluation

        public Evaluation​(Instances data,
                          CostMatrix costMatrix)
                   throws java.lang.Exception
        Initializes all the counters for the evaluation and also takes a cost matrix as parameter. Use useNoPriors() if the dataset is the test set and you can't initialize with the priors from the training set via setPriors(Instances).
        Parameters:
        data - set of training instances, to get some header information and prior class distribution information
        costMatrix - the cost matrix---if null, default costs will be used
        Throws:
        java.lang.Exception - if cost matrix is not compatible with data, the class is not defined or the class is numeric
        See Also:
        useNoPriors(), setPriors(Instances)
    • Method Detail

      • doEvaluateModelOnce

        public double[] doEvaluateModelOnce​(Classifier classifier,
                                            Instance instance)
                                     throws java.lang.Exception
        Throws:
        java.lang.Exception
      • confusionMatrix

        public double[][] confusionMatrix()
        Returns a copy of the confusion matrix.
        Returns:
        a copy of the confusion matrix as a two-dimensional array
      • crossValidateModel

        public void crossValidateModel​(Classifier classifier,
                                       Instances data,
                                       int numFolds,
                                       java.util.Random random)
                                throws java.lang.Exception
        Performs a (stratified if class is nominal) cross-validation for a classifier on a set of instances. Now performs a deep copy of the classifier before each call to buildClassifier() (just in case the classifier is not initialized properly).
        Parameters:
        classifier - the classifier with any options set.
        data - the data on which the cross-validation is to be performed
        numFolds - the number of folds for the cross-validation
        random - random number generator for randomization
        Throws:
        java.lang.Exception - if a classifier could not be generated successfully or the class is not defined
      • crossValidateModel

        public void crossValidateModel​(java.lang.String classifierString,
                                       Instances data,
                                       int numFolds,
                                       java.lang.String[] options,
                                       java.util.Random random)
                                throws java.lang.Exception
        Performs a (stratified if class is nominal) cross-validation for a classifier on a set of instances.
        Parameters:
        classifierString - a string naming the class of the classifier
        data - the data on which the cross-validation is to be performed
        numFolds - the number of folds for the cross-validation
        options - the options to the classifier. Any options
        random - the random number generator for randomizing the data accepted by the classifier will be removed from this array.
        Throws:
        java.lang.Exception - if a classifier could not be generated successfully or the class is not defined
      • evaluateModel

        public static java.lang.String evaluateModel​(java.lang.String classifierString,
                                                     java.lang.String[] options)
                                              throws java.lang.Exception
        Evaluates a classifier with the options given in an array of strings.

        Valid options are:

        -t filename
        Name of the file with the training data. (required)

        -T filename
        Name of the file with the test data. If missing a cross-validation is performed.

        -c index
        Index of the class attribute (1, 2, ...; default: last).

        -x number
        The number of folds for the cross-validation (default: 10).

        -s seed
        Random number seed for the cross-validation (default: 1).

        -m filename
        The name of a file containing a cost matrix.

        -l filename
        Loads classifier from the given file.

        -d filename
        Saves classifier built from the training data into the given file.

        -v
        Outputs no statistics for the training data.

        -o
        Outputs statistics only, not the classifier.

        -i
        Outputs detailed information-retrieval statistics per class.

        -k
        Outputs information-theoretic statistics.

        -p range
        Outputs predictions for test instances, along with the attributes in the specified range (and nothing else). Use '-p 0' if no attributes are desired.

        -r
        Outputs cumulative margin distribution (and nothing else).

        -g
        Only for classifiers that implement "Graphable." Outputs the graph representation of the classifier (and nothing else).

        Parameters:
        classifierString - class of machine learning classifier as a string
        options - the array of string containing the options
        Returns:
        a string describing the results
        Throws:
        java.lang.Exception - if model could not be evaluated successfully
      • main

        public static void main​(java.lang.String[] args)
        A test method for this class. Just extracts the first command line argument as a classifier class name and calls evaluateModel.
        Parameters:
        args - an array of command line arguments, the first of which must be the class name of a classifier.
      • evaluateModel

        public static java.lang.String evaluateModel​(Classifier classifier,
                                                     java.lang.String[] options)
                                              throws java.lang.Exception
        Evaluates a classifier with the options given in an array of strings.

        Valid options are:

        -t name of training file
        Name of the file with the training data. (required)

        -T name of test file
        Name of the file with the test data. If missing a cross-validation is performed.

        -c class index
        Index of the class attribute (1, 2, ...; default: last).

        -x number of folds
        The number of folds for the cross-validation (default: 10).

        -s random number seed
        Random number seed for the cross-validation (default: 1).

        -m file with cost matrix
        The name of a file containing a cost matrix.

        -l name of model input file
        Loads classifier from the given file.

        -d name of model output file
        Saves classifier built from the training data into the given file.

        -v
        Outputs no statistics for the training data.

        -o
        Outputs statistics only, not the classifier.

        -i
        Outputs detailed information-retrieval statistics per class.

        -k
        Outputs information-theoretic statistics.

        -p
        Outputs predictions for test instances (and nothing else).

        -r
        Outputs cumulative margin distribution (and nothing else).

        -g
        Only for classifiers that implement "Graphable." Outputs the graph representation of the classifier (and nothing else).

        Parameters:
        classifier - machine learning classifier
        options - the array of string containing the options
        Returns:
        a string describing the results
        Throws:
        java.lang.Exception - if model could not be evaluated successfully
      • handleCostOption

        protected static CostMatrix handleCostOption​(java.lang.String costFileName,
                                                     int numClasses)
                                              throws java.lang.Exception
        Attempts to load a cost matrix.
        Parameters:
        costFileName - the filename of the cost matrix
        numClasses - the number of classes that should be in the cost matrix (only used if the cost file is in old format).
        Returns:
        a CostMatrix value, or null if costFileName is empty
        Throws:
        java.lang.Exception - if an error occurs.
      • evaluateModel

        public double[] evaluateModel​(Classifier classifier,
                                      Instances data)
                               throws java.lang.Exception
        Evaluates the classifier on a given set of instances. Note that the data must have exactly the same format (e.g. order of attributes) as the data used to train the classifier! Otherwise the results will generally be meaningless.
        Parameters:
        classifier - machine learning classifier
        data - set of test instances for evaluation
        Returns:
        the predictions
        Throws:
        java.lang.Exception - if model could not be evaluated successfully
      • evaluateModelOnce

        public double evaluateModelOnce​(Classifier classifier,
                                        Instance instance)
                                 throws java.lang.Exception
        Throws:
        java.lang.Exception
      • evaluateModelOnce

        public double evaluateModelOnce​(double[] dist,
                                        Instance instance)
                                 throws java.lang.Exception
        Evaluates the supplied distribution on a single instance.
        Parameters:
        dist - the supplied distribution
        instance - the test instance to be classified
        Returns:
        the prediction
        Throws:
        java.lang.Exception - if model could not be evaluated successfully
      • evaluateModelOnce

        public void evaluateModelOnce​(double prediction,
                                      Instance instance)
                               throws java.lang.Exception
        Evaluates the supplied prediction on a single instance.
        Parameters:
        prediction - the supplied prediction
        instance - the test instance to be classified
        Throws:
        java.lang.Exception - if model could not be evaluated successfully
      • wekaStaticWrapper

        protected static java.lang.String wekaStaticWrapper​(Sourcable classifier,
                                                            java.lang.String className)
                                                     throws java.lang.Exception
        Wraps a static classifier in enough source to test using the weka class libraries.
        Parameters:
        classifier - a Sourcable Classifier
        className - the name to give to the source code class
        Returns:
        the source for a static classifier that can be tested with weka libraries.
        Throws:
        java.lang.Exception - if code-generation fails
      • numInstances

        public final double numInstances()
        Gets the number of test instances that had a known class value (actually the sum of the weights of test instances with known class value).
        Returns:
        the number of test instances with known class
      • incorrect

        public final double incorrect()
        Gets the number of instances incorrectly classified (that is, for which an incorrect prediction was made). (Actually the sum of the weights of these instances)
        Returns:
        the number of incorrectly classified instances
      • pctIncorrect

        public final double pctIncorrect()
        Gets the percentage of instances incorrectly classified (that is, for which an incorrect prediction was made).
        Returns:
        the percent of incorrectly classified instances (between 0 and 100)
      • totalCost

        public final double totalCost()
        Gets the total cost, that is, the cost of each prediction times the weight of the instance, summed over all instances.
        Returns:
        the total cost
      • avgCost

        public final double avgCost()
        Gets the average cost, that is, total cost of misclassifications (incorrect plus unclassified) over the total number of instances.
        Returns:
        the average cost.
      • correct

        public final double correct()
        Gets the number of instances correctly classified (that is, for which a correct prediction was made). (Actually the sum of the weights of these instances)
        Returns:
        the number of correctly classified instances
      • pctCorrect

        public final double pctCorrect()
        Gets the percentage of instances correctly classified (that is, for which a correct prediction was made).
        Returns:
        the percent of correctly classified instances (between 0 and 100)
      • unclassified

        public final double unclassified()
        Gets the number of instances not classified (that is, for which no prediction was made by the classifier). (Actually the sum of the weights of these instances)
        Returns:
        the number of unclassified instances
      • pctUnclassified

        public final double pctUnclassified()
        Gets the percentage of instances not classified (that is, for which no prediction was made by the classifier).
        Returns:
        the percent of unclassified instances (between 0 and 100)
      • errorRate

        public final double errorRate()
        Returns the estimated error rate or the root mean squared error (if the class is numeric). If a cost matrix was given this error rate gives the average cost.
        Returns:
        the estimated error rate (between 0 and 1, or between 0 and maximum cost)
      • kappa

        public final double kappa()
        Returns value of kappa statistic if class is nominal.
        Returns:
        the value of the kappa statistic
      • correlationCoefficient

        public final double correlationCoefficient()
                                            throws java.lang.Exception
        Returns the correlation coefficient if the class is numeric.
        Returns:
        the correlation coefficient
        Throws:
        java.lang.Exception - if class is not numeric
      • meanAbsoluteError

        public final double meanAbsoluteError()
        Returns the mean absolute error. Refers to the error of the predicted values for numeric classes, and the error of the predicted probability distribution for nominal classes.
        Returns:
        the mean absolute error
      • meanPriorAbsoluteError

        public final double meanPriorAbsoluteError()
        Returns the mean absolute error of the prior.
        Returns:
        the mean absolute error
      • relativeAbsoluteError

        public final double relativeAbsoluteError()
                                           throws java.lang.Exception
        Returns the relative absolute error.
        Returns:
        the relative absolute error
        Throws:
        java.lang.Exception - if it can't be computed
      • rootMeanSquaredError

        public final double rootMeanSquaredError()
        Returns the root mean squared error.
        Returns:
        the root mean squared error
      • rootMeanPriorSquaredError

        public final double rootMeanPriorSquaredError()
        Returns the root mean prior squared error.
        Returns:
        the root mean prior squared error
      • rootRelativeSquaredError

        public final double rootRelativeSquaredError()
        Returns the root relative squared error if the class is numeric.
        Returns:
        the root relative squared error
      • priorEntropy

        public final double priorEntropy()
                                  throws java.lang.Exception
        Calculate the entropy of the prior distribution
        Returns:
        the entropy of the prior distribution
        Throws:
        java.lang.Exception - if the class is not nominal
      • KBInformation

        public final double KBInformation()
                                   throws java.lang.Exception
        Return the total Kononenko & Bratko Information score in bits
        Returns:
        the K&B information score
        Throws:
        java.lang.Exception - if the class is not nominal
      • KBMeanInformation

        public final double KBMeanInformation()
                                       throws java.lang.Exception
        Return the Kononenko & Bratko Information score in bits per instance.
        Returns:
        the K&B information score
        Throws:
        java.lang.Exception - if the class is not nominal
      • KBRelativeInformation

        public final double KBRelativeInformation()
                                           throws java.lang.Exception
        Return the Kononenko & Bratko Relative Information score
        Returns:
        the K&B relative information score
        Throws:
        java.lang.Exception - if the class is not nominal
      • SFPriorEntropy

        public final double SFPriorEntropy()
        Returns the total entropy for the null model
        Returns:
        the total null model entropy
      • SFMeanPriorEntropy

        public final double SFMeanPriorEntropy()
        Returns the entropy per instance for the null model
        Returns:
        the null model entropy per instance
      • SFSchemeEntropy

        public final double SFSchemeEntropy()
        Returns the total entropy for the scheme
        Returns:
        the total scheme entropy
      • SFMeanSchemeEntropy

        public final double SFMeanSchemeEntropy()
        Returns the entropy per instance for the scheme
        Returns:
        the scheme entropy per instance
      • SFEntropyGain

        public final double SFEntropyGain()
        Returns the total SF, which is the null model entropy minus the scheme entropy.
        Returns:
        the total SF
      • SFMeanEntropyGain

        public final double SFMeanEntropyGain()
        Returns the SF per instance, which is the null model entropy minus the scheme entropy, per instance.
        Returns:
        the SF per instance
      • toCumulativeMarginDistributionString

        public java.lang.String toCumulativeMarginDistributionString()
                                                              throws java.lang.Exception
        Output the cumulative margin distribution as a string suitable for input for gnuplot or similar package.
        Returns:
        the cumulative margin distribution
        Throws:
        java.lang.Exception - if the class attribute is nominal
      • toSummaryString

        public java.lang.String toSummaryString()
        Calls toSummaryString() with no title and no complexity stats
        Specified by:
        toSummaryString in interface Summarizable
        Returns:
        a summary description of the classifier evaluation
      • toSummaryString

        public java.lang.String toSummaryString​(boolean printComplexityStatistics)
        Calls toSummaryString() with a default title.
        Parameters:
        printComplexityStatistics - if true, complexity statistics are returned as well
        Returns:
        the summary string
      • toSummaryString

        public java.lang.String toSummaryString​(java.lang.String title,
                                                boolean printComplexityStatistics)
        Outputs the performance statistics in summary form. Lists number (and percentage) of instances classified correctly, incorrectly and unclassified. Outputs the total number of instances classified, and the number of instances (if any) that had no class value provided.
        Parameters:
        title - the title for the statistics
        printComplexityStatistics - if true, complexity statistics are returned as well
        Returns:
        the summary as a String
      • toMatrixString

        public java.lang.String toMatrixString()
                                        throws java.lang.Exception
        Calls toMatrixString() with a default title.
        Returns:
        the confusion matrix as a string
        Throws:
        java.lang.Exception - if the class is numeric
      • toMatrixString

        public java.lang.String toMatrixString​(java.lang.String title)
                                        throws java.lang.Exception
        Outputs the performance statistics as a classification confusion matrix. For each class value, shows the distribution of predicted class values.
        Parameters:
        title - the title for the confusion matrix
        Returns:
        the confusion matrix as a String
        Throws:
        java.lang.Exception - if the class is numeric
      • toClassDetailsString

        public java.lang.String toClassDetailsString()
                                              throws java.lang.Exception
        Generates a breakdown of the accuracy for each class (with default title), incorporating various information-retrieval statistics, such as true/false positive rate, precision/recall/F-Measure. Should be useful for ROC curves, recall/precision curves.
        Returns:
        the statistics presented as a string
        Throws:
        java.lang.Exception - if class is not nominal
      • toClassDetailsString

        public java.lang.String toClassDetailsString​(java.lang.String title)
                                              throws java.lang.Exception
        Generates a breakdown of the accuracy for each class, incorporating various information-retrieval statistics, such as true/false positive rate, precision/recall/F-Measure. Should be useful for ROC curves, recall/precision curves.
        Parameters:
        title - the title to prepend the stats string with
        Returns:
        the statistics presented as a string
        Throws:
        java.lang.Exception - if class is not nominal
      • numTruePositives

        public double numTruePositives​(int classIndex)
        Calculate the number of true positives with respect to a particular class. This is defined as

         correctly classified positives
         
        Parameters:
        classIndex - the index of the class to consider as "positive"
        Returns:
        the true positive rate
      • truePositiveRate

        public double truePositiveRate​(int classIndex)
        Calculate the true positive rate with respect to a particular class. This is defined as

         correctly classified positives
         ------------------------------
               total positives
         
        Parameters:
        classIndex - the index of the class to consider as "positive"
        Returns:
        the true positive rate
      • numTrueNegatives

        public double numTrueNegatives​(int classIndex)
        Calculate the number of true negatives with respect to a particular class. This is defined as

         correctly classified negatives
         
        Parameters:
        classIndex - the index of the class to consider as "positive"
        Returns:
        the true positive rate
      • trueNegativeRate

        public double trueNegativeRate​(int classIndex)
        Calculate the true negative rate with respect to a particular class. This is defined as

         correctly classified negatives
         ------------------------------
               total negatives
         
        Parameters:
        classIndex - the index of the class to consider as "positive"
        Returns:
        the true positive rate
      • numFalsePositives

        public double numFalsePositives​(int classIndex)
        Calculate number of false positives with respect to a particular class. This is defined as

         incorrectly classified negatives
         
        Parameters:
        classIndex - the index of the class to consider as "positive"
        Returns:
        the false positive rate
      • falsePositiveRate

        public double falsePositiveRate​(int classIndex)
        Calculate the false positive rate with respect to a particular class. This is defined as

         incorrectly classified negatives
         --------------------------------
                total negatives
         
        Parameters:
        classIndex - the index of the class to consider as "positive"
        Returns:
        the false positive rate
      • numFalseNegatives

        public double numFalseNegatives​(int classIndex)
        Calculate number of false negatives with respect to a particular class. This is defined as

         incorrectly classified positives
         
        Parameters:
        classIndex - the index of the class to consider as "positive"
        Returns:
        the false positive rate
      • falseNegativeRate

        public double falseNegativeRate​(int classIndex)
        Calculate the false negative rate with respect to a particular class. This is defined as

         incorrectly classified positives
         --------------------------------
                total positives
         
        Parameters:
        classIndex - the index of the class to consider as "positive"
        Returns:
        the false positive rate
      • recall

        public double recall​(int classIndex)
        Calculate the recall with respect to a particular class. This is defined as

         correctly classified positives
         ------------------------------
               total positives
         

        (Which is also the same as the truePositiveRate.)

        Parameters:
        classIndex - the index of the class to consider as "positive"
        Returns:
        the recall
      • precision

        public double precision​(int classIndex)
        Calculate the precision with respect to a particular class. This is defined as

         correctly classified positives
         ------------------------------
          total predicted as positive
         
        Parameters:
        classIndex - the index of the class to consider as "positive"
        Returns:
        the precision
      • fMeasure

        public double fMeasure​(int classIndex)
        Calculate the F-Measure with respect to a particular class. This is defined as

         2 * recall * precision
         ----------------------
           recall + precision
         
        Parameters:
        classIndex - the index of the class to consider as "positive"
        Returns:
        the F-Measure
      • setPriors

        public void setPriors​(Instances train)
                       throws java.lang.Exception
        Sets the class prior probabilities
        Parameters:
        train - the training instances used to determine the prior probabilities
        Throws:
        java.lang.Exception - if the class attribute of the instances is not set
      • updatePriors

        public void updatePriors​(Instance instance)
                          throws java.lang.Exception
        Updates the class prior probabilities (when incrementally training)
        Parameters:
        instance - the new training instance seen
        Throws:
        java.lang.Exception - if the class of the instance is not set
      • useNoPriors

        public void useNoPriors()
        disables the use of priors, e.g., in case of de-serialized schemes that have no access to the original training set, but are evaluated on a set set.
      • equals

        public boolean equals​(java.lang.Object obj)
        Tests whether the current evaluation object is equal to another evaluation object
        Overrides:
        equals in class java.lang.Object
        Parameters:
        obj - the object to compare against
        Returns:
        true if the two objects are equal
      • printClassifications

        protected static java.lang.String printClassifications​(Classifier classifier,
                                                               Instances train,
                                                               java.lang.String testFileName,
                                                               int classIndex,
                                                               Range attributesToOutput)
                                                        throws java.lang.Exception
        Prints the predictions for the given dataset into a String variable.
        Parameters:
        classifier - the classifier to use
        train - the training data
        testFileName - the name of the test file
        classIndex - the class index
        attributesToOutput - the indices of the attributes to output
        Returns:
        the generated predictions for the attribute range
        Throws:
        java.lang.Exception - if test file cannot be opened
      • attributeValuesString

        protected static java.lang.String attributeValuesString​(Instance instance,
                                                                Range attRange)
        Builds a string listing the attribute values in a specified range of indices, separated by commas and enclosed in brackets.
        Parameters:
        instance - the instance to print the values from
        attRange - the range of the attributes to list
        Returns:
        a string listing values of the attributes in the range
      • makeOptionString

        protected static java.lang.String makeOptionString​(Classifier classifier)
        Make up the help string giving all the command line options
        Parameters:
        classifier - the classifier to include options for
        Returns:
        a string detailing the valid command line options
      • num2ShortID

        protected java.lang.String num2ShortID​(int num,
                                               char[] IDChars,
                                               int IDWidth)
        Method for generating indices for the confusion matrix.
        Parameters:
        num - integer to format
        IDChars - the characters to use
        IDWidth - the width of the entry
        Returns:
        the formatted integer as a string
      • makeDistribution

        protected double[] makeDistribution​(double predictedClass)
        Convert a single prediction into a probability distribution with all zero probabilities except the predicted value which has probability 1.0;
        Parameters:
        predictedClass - the index of the predicted class
        Returns:
        the probability distribution
      • updateStatsForClassifier

        protected void updateStatsForClassifier​(double[] predictedDistribution,
                                                Instance instance)
                                         throws java.lang.Exception
        Updates all the statistics about a classifiers performance for the current test instance.
        Parameters:
        predictedDistribution - the probabilities assigned to each class
        instance - the instance to be classified
        Throws:
        java.lang.Exception - if the class of the instance is not set
      • updateStatsForPredictor

        protected void updateStatsForPredictor​(double predictedValue,
                                               Instance instance)
                                        throws java.lang.Exception
        Updates all the statistics about a predictors performance for the current test instance.
        Parameters:
        predictedValue - the numeric value the classifier predicts
        instance - the instance to be classified
        Throws:
        java.lang.Exception - if the class of the instance is not set
      • updateMargins

        protected void updateMargins​(double[] predictedDistribution,
                                     int actualClass,
                                     double weight)
        Update the cumulative record of classification margins
        Parameters:
        predictedDistribution - the probability distribution predicted for the current instance
        actualClass - the index of the actual instance class
        weight - the weight assigned to the instance
      • updateNumericScores

        protected void updateNumericScores​(double[] predicted,
                                           double[] actual,
                                           double weight)
        Update the numeric accuracy measures. For numeric classes, the accuracy is between the actual and predicted class values. For nominal classes, the accuracy is between the actual and predicted class probabilities.
        Parameters:
        predicted - the predicted values
        actual - the actual value
        weight - the weight associated with this prediction
      • addNumericTrainClass

        protected void addNumericTrainClass​(double classValue,
                                            double weight)
        Adds a numeric (non-missing) training class value and weight to the buffer of stored values.
        Parameters:
        classValue - the class value
        weight - the instance weight
      • setNumericPriorsFromBuffer

        protected void setNumericPriorsFromBuffer()
        Sets up the priors for numeric class attributes from the training class values that have been seen so far.