Class Instances
- java.lang.Object
-
- org.processmining.plugins.workshop.Yaguang.WekaDiscriminationTree.Instances
-
- All Implemented Interfaces:
java.io.Serializable
public class Instances extends java.lang.Object implements java.io.SerializableClass for handling an ordered set of weighted instances.Typical usage (code from the main() method of this class):
...
// Read all the instances in the file
reader = new FileReader(filename);
instances = new Instances(reader);
// Make the last attribute be the class
instances.setClassIndex(instances.numAttributes() - 1);
// Print header and instances.
System.out.println("\nDataset:\n");
System.out.println(instances);
...
All methods that change a set of instances are safe, ie. a change of a set of instances does not affect any other sets of instances. All methods that change a datasets's attribute information clone the dataset before it is changed.
- Version:
- $Revision: 1.58.2.6 $
- Author:
- Eibe Frank (eibe@cs.waikato.ac.nz), Len Trigg (trigg@cs.waikato.ac.nz)
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static java.lang.StringFILE_EXTENSIONThe filename extension that should be used for arff filesprotected FastVectorm_AttributesThe attribute information.protected intm_ClassIndexThe class attribute's indexprotected int[]m_IndicesBufferBuffer of indices for sparse instanceprotected FastVectorm_InstancesThe instances.protected java.lang.Stringm_RelationNameThe dataset's name.protected double[]m_ValueBufferBuffer of values for sparse instancestatic java.lang.StringSERIALIZED_OBJ_FILE_EXTENSIONThe filename extension that should be used for bin.
-
Constructor Summary
Constructors Constructor Description Instances(java.io.Reader reader)Reads an ARFF file from a reader, and assigns a weight of one to each instance.Instances(java.io.Reader reader, int capacity)Reads the header of an ARFF file from a reader and reserves space for the given number of instances.Instances(java.lang.String name, FastVector attInfo, int capacity)Creates an empty set of instances.Instances(Instances dataset)Constructor copying all instances and references to the header information from the given set of instances.Instances(Instances dataset, int capacity)Constructor creating an empty set of instances.Instances(Instances source, int first, int toCopy)Creates a new set of instances by copying a subset of another set.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidadd(Instance instance)Adds one instance to the end of the set.Attributeattribute(int index)Returns an attribute.Attributeattribute(java.lang.String name)Returns an attribute given its name.AttributeStatsattributeStats(int index)Calculates summary statistics on the values that appear in this set of instances for a specified attribute.double[]attributeToDoubleArray(int index)Gets the value of all instances in this dataset for a particular attribute.booleancheckForStringAttributes()Checks for string attributes in the datasetbooleancheckInstance(Instance instance)Checks if the given instance is compatible with this dataset.AttributeclassAttribute()Returns the class attribute.intclassIndex()Returns the class attribute's index.voidcompactify()Compactifies the set of instances.protected voidcopyInstances(int from, Instances dest, int num)Copies instances from one set to the end of another one.voiddelete()Removes all instances from the set.voiddelete(int index)Removes an instance at the given position from the set.voiddeleteAttributeAt(int position)Deletes an attribute at the given position (0 to numAttributes() - 1).voiddeleteStringAttributes()Deletes all string attributes in the dataset.voiddeleteWithMissing(int attIndex)Removes all instances with missing values for a particular attribute from the dataset.voiddeleteWithMissing(Attribute att)Removes all instances with missing values for a particular attribute from the dataset.voiddeleteWithMissingClass()Removes all instances with a missing class value from the dataset.java.util.EnumerationenumerateAttributes()Returns an enumeration of all the attributes.java.util.EnumerationenumerateInstances()Returns an enumeration of all instances in the dataset.booleanequalHeaders(Instances dataset)Checks if two headers are equivalent.protected voiderrms(java.io.StreamTokenizer tokenizer, java.lang.String theMsg)Throws error message with line number and last token read.InstancefirstInstance()Returns the first instance in the set.protected voidfreshAttributeInfo()Replaces the attribute information by a clone of itself.protected voidgetFirstToken(java.io.StreamTokenizer tokenizer)Gets next token, skipping empty lines.protected voidgetIndex(java.io.StreamTokenizer tokenizer)Gets index, checking for a premature and of line.protected booleangetInstance(java.io.StreamTokenizer tokenizer, boolean flag)Reads a single instance using the tokenizer and appends it to the dataset.protected booleangetInstanceFull(java.io.StreamTokenizer tokenizer, boolean flag)Reads a single instance using the tokenizer and appends it to the dataset.protected booleangetInstanceSparse(java.io.StreamTokenizer tokenizer, boolean flag)Reads a single instance using the tokenizer and appends it to the dataset.protected voidgetLastToken(java.io.StreamTokenizer tokenizer, boolean endOfFileOk)Gets token and checks if its end of line.protected voidgetNextToken(java.io.StreamTokenizer tokenizer)Gets next token, checking for a premature and of line.java.util.RandomgetRandomNumberGenerator(long seed)Returns a random number generator.protected voidinitTokenizer(java.io.StreamTokenizer tokenizer)Initializes the StreamTokenizer used for reading the ARFF file.voidinsertAttributeAt(Attribute att, int position)Inserts an attribute at the given position (0 to numAttributes()) and sets all values to be missing.Instanceinstance(int index)Returns the instance at the given position.protected java.lang.StringinstancesAndWeights()Returns string including all instances, their weights and their indices in the original dataset.doublekthSmallestValue(int attIndex, int k)Returns the kth-smallest attribute value of a numeric attribute.doublekthSmallestValue(Attribute att, int k)Returns the kth-smallest attribute value of a numeric attribute.InstancelastInstance()Returns the last instance in the set.static voidmain(java.lang.String[] args)Main method for this class.doublemeanOrMode(int attIndex)Returns the mean (mode) for a numeric (nominal) attribute as a floating-point value.doublemeanOrMode(Attribute att)Returns the mean (mode) for a numeric (nominal) attribute as a floating-point value.static InstancesmergeInstances(Instances first, Instances second)Merges two sets of Instances together.intnumAttributes()Returns the number of attributes.intnumClasses()Returns the number of class labels.intnumDistinctValues(int attIndex)Returns the number of distinct values of a given attribute.intnumDistinctValues(Attribute att)Returns the number of distinct values of a given attribute.intnumInstances()Returns the number of instances in the dataset.protected intpartition(int attIndex, int l, int r)Partitions the instances around a pivot.protected voidquickSort(int attIndex, int left, int right)Implements quicksort according to Manber's "Introduction to Algorithms".voidrandomize(java.util.Random random)Shuffles the instances in the set so that they are ordered randomly.protected voidreadHeader(java.io.StreamTokenizer tokenizer)Reads and stores header of an ARFF file.booleanreadInstance(java.io.Reader reader)Reads a single instance from the reader and appends it to the dataset.protected voidreadTillEOL(java.io.StreamTokenizer tokenizer)Reads and skips all tokens before next end of line token.java.lang.StringrelationName()Returns the relation's name.voidrenameAttribute(int att, java.lang.String name)Renames an attribute.voidrenameAttribute(Attribute att, java.lang.String name)Renames an attribute.voidrenameAttributeValue(int att, int val, java.lang.String name)Renames the value of a nominal (or string) attribute value.voidrenameAttributeValue(Attribute att, java.lang.String val, java.lang.String name)Renames the value of a nominal (or string) attribute value.Instancesresample(java.util.Random random)Creates a new dataset of the same size using random sampling with replacement.InstancesresampleWithWeights(java.util.Random random)Creates a new dataset of the same size using random sampling with replacement according to the current instance weights.InstancesresampleWithWeights(java.util.Random random, double[] weights)Creates a new dataset of the same size using random sampling with replacement according to the given weight vector.protected intselect(int attIndex, int left, int right, int k)Implements computation of the kth-smallest element according to Manber's "Introduction to Algorithms".voidsetClass(Attribute att)Sets the class attribute.voidsetClassIndex(int classIndex)Sets the class index of the set.voidsetRelationName(java.lang.String newName)Sets the relation's name.voidsort(int attIndex)Sorts the instances based on an attribute.voidsort(Attribute att)Sorts the instances based on an attribute.voidstratify(int numFolds)Stratifies a set of instances according to its class values if the class attribute is nominal (so that afterwards a stratified cross-validation can be performed).protected voidstratStep(int numFolds)Help function needed for stratification of set.InstancesstringFreeStructure()Create a copy of the structure, but "cleanse" string types (i.e.doublesumOfWeights()Computes the sum of all the instances' weights.voidswap(int i, int j)Swaps two instances in the set.static voidtest(java.lang.String[] argv)Method for testing this class.InstancestestCV(int numFolds, int numFold)Creates the test set for one fold of a cross-validation on the dataset.java.lang.StringtoString()Returns the dataset as a string in ARFF format.java.lang.StringtoSummaryString()Generates a string summarizing the set of instances.InstancestrainCV(int numFolds, int numFold)Creates the training set for one fold of a cross-validation on the dataset.InstancestrainCV(int numFolds, int numFold, java.util.Random random)Creates the training set for one fold of a cross-validation on the dataset.doublevariance(int attIndex)Computes the variance for a numeric attribute.doublevariance(Attribute att)Computes the variance for a numeric attribute.
-
-
-
Field Detail
-
FILE_EXTENSION
public static java.lang.String FILE_EXTENSION
The filename extension that should be used for arff files
-
SERIALIZED_OBJ_FILE_EXTENSION
public static java.lang.String SERIALIZED_OBJ_FILE_EXTENSION
The filename extension that should be used for bin. serialized instances files
-
m_RelationName
protected java.lang.String m_RelationName
The dataset's name.
-
m_Attributes
protected FastVector m_Attributes
The attribute information.
-
m_Instances
protected FastVector m_Instances
The instances.
-
m_ClassIndex
protected int m_ClassIndex
The class attribute's index
-
m_ValueBuffer
protected double[] m_ValueBuffer
Buffer of values for sparse instance
-
m_IndicesBuffer
protected int[] m_IndicesBuffer
Buffer of indices for sparse instance
-
-
Constructor Detail
-
Instances
public Instances(java.io.Reader reader) throws java.io.IOExceptionReads an ARFF file from a reader, and assigns a weight of one to each instance. Lets the index of the class attribute be undefined (negative).- Parameters:
reader- the reader- Throws:
java.io.IOException- if the ARFF file is not read successfully
-
Instances
public Instances(java.io.Reader reader, int capacity) throws java.io.IOExceptionReads the header of an ARFF file from a reader and reserves space for the given number of instances. Lets the class index be undefined (negative).- Parameters:
reader- the readercapacity- the capacity- Throws:
java.lang.IllegalArgumentException- if the header is not read successfully or the capacity is negative.java.io.IOException- if there is a problem with the reader.
-
Instances
public Instances(Instances dataset)
Constructor copying all instances and references to the header information from the given set of instances.- Parameters:
instances- the set to be copied
-
Instances
public Instances(Instances dataset, int capacity)
Constructor creating an empty set of instances. Copies references to the header information from the given set of instances. Sets the capacity of the set of instances to 0 if its negative.- Parameters:
instances- the instances from which the header information is to be takencapacity- the capacity of the new dataset
-
Instances
public Instances(Instances source, int first, int toCopy)
Creates a new set of instances by copying a subset of another set.- Parameters:
source- the set of instances from which a subset is to be createdfirst- the index of the first instance to be copiedtoCopy- the number of instances to be copied- Throws:
java.lang.IllegalArgumentException- if first and toCopy are out of range
-
Instances
public Instances(java.lang.String name, FastVector attInfo, int capacity)Creates an empty set of instances. Uses the given attribute information. Sets the capacity of the set of instances to 0 if its negative. Given attribute information must not be changed after this constructor has been used.- Parameters:
name- the name of the relationattInfo- the attribute informationcapacity- the capacity of the set
-
-
Method Detail
-
stringFreeStructure
public Instances stringFreeStructure()
Create a copy of the structure, but "cleanse" string types (i.e. doesn't contain references to the strings seen in the past).- Returns:
- a copy of the instance structure.
-
add
public void add(Instance instance)
Adds one instance to the end of the set. Shallow copies instance before it is added. Increases the size of the dataset if it is not large enough. Does not check if the instance is compatible with the dataset. Note: String values are not transferred.- Parameters:
instance- the instance to be added
-
attribute
public Attribute attribute(int index)
Returns an attribute.- Parameters:
index- the attribute's index- Returns:
- the attribute at the given position
-
attribute
public Attribute attribute(java.lang.String name)
Returns an attribute given its name. If there is more than one attribute with the same name, it returns the first one. Returns null if the attribute can't be found.- Parameters:
name- the attribute's name- Returns:
- the attribute with the given name, null if the attribute can't be found
-
checkForStringAttributes
public boolean checkForStringAttributes()
Checks for string attributes in the dataset- Returns:
- true if string attributes are present, false otherwise
-
checkInstance
public boolean checkInstance(Instance instance)
Checks if the given instance is compatible with this dataset. Only looks at the size of the instance and the ranges of the values for nominal and string attributes.- Returns:
- true if the instance is compatible with the dataset
-
classAttribute
public Attribute classAttribute()
Returns the class attribute.- Returns:
- the class attribute
- Throws:
UnassignedClassException- if the class is not set
-
classIndex
public int classIndex()
Returns the class attribute's index. Returns negative number if it's undefined.- Returns:
- the class index as an integer
-
compactify
public void compactify()
Compactifies the set of instances. Decreases the capacity of the set so that it matches the number of instances in the set.
-
delete
public void delete()
Removes all instances from the set.
-
delete
public void delete(int index)
Removes an instance at the given position from the set.- Parameters:
index- the instance's position
-
deleteAttributeAt
public void deleteAttributeAt(int position)
Deletes an attribute at the given position (0 to numAttributes() - 1). A deep copy of the attribute information is performed before the attribute is deleted.- Parameters:
pos- the attribute's position- Throws:
java.lang.IllegalArgumentException- if the given index is out of range or the class attribute is being deleted
-
deleteStringAttributes
public void deleteStringAttributes()
Deletes all string attributes in the dataset. A deep copy of the attribute information is performed before an attribute is deleted.- Throws:
java.lang.IllegalArgumentException- if string attribute couldn't be successfully deleted (probably because it is the class attribute).
-
deleteWithMissing
public void deleteWithMissing(int attIndex)
Removes all instances with missing values for a particular attribute from the dataset.- Parameters:
attIndex- the attribute's index
-
deleteWithMissing
public void deleteWithMissing(Attribute att)
Removes all instances with missing values for a particular attribute from the dataset.- Parameters:
att- the attribute
-
deleteWithMissingClass
public void deleteWithMissingClass()
Removes all instances with a missing class value from the dataset.- Throws:
UnassignedClassException- if class is not set
-
enumerateAttributes
public java.util.Enumeration enumerateAttributes()
Returns an enumeration of all the attributes.- Returns:
- enumeration of all the attributes.
-
enumerateInstances
public java.util.Enumeration enumerateInstances()
Returns an enumeration of all instances in the dataset.- Returns:
- enumeration of all instances in the dataset
-
equalHeaders
public boolean equalHeaders(Instances dataset)
Checks if two headers are equivalent.- Parameters:
dataset- another dataset- Returns:
- true if the header of the given dataset is equivalent to this header
-
firstInstance
public Instance firstInstance()
Returns the first instance in the set.- Returns:
- the first instance in the set
-
getRandomNumberGenerator
public java.util.Random getRandomNumberGenerator(long seed)
Returns a random number generator. The initial seed of the random number generator depends on the given seed and the hash code of a string representation of a instances chosen based on the given seed.- Parameters:
seed- the given seed- Returns:
- the random number generator
-
insertAttributeAt
public void insertAttributeAt(Attribute att, int position)
Inserts an attribute at the given position (0 to numAttributes()) and sets all values to be missing. Shallow copies the attribute before it is inserted, and performs a deep copy of the existing attribute information.- Parameters:
att- the attribute to be insertedpos- the attribute's position- Throws:
java.lang.IllegalArgumentException- if the given index is out of range
-
instance
public Instance instance(int index)
Returns the instance at the given position.- Parameters:
index- the instance's index- Returns:
- the instance at the given position
-
kthSmallestValue
public double kthSmallestValue(Attribute att, int k)
Returns the kth-smallest attribute value of a numeric attribute. Note that calling this method will change the order of the data!- Parameters:
att- the Attribute objectk- the value of k- Returns:
- the kth-smallest value
-
kthSmallestValue
public double kthSmallestValue(int attIndex, int k)Returns the kth-smallest attribute value of a numeric attribute. Note that calling this method will change the order of the data! The number of non-missing values in the data must be as least as last as k for this to work.- Parameters:
attIndex- the attribute's indexk- the value of k- Returns:
- the kth-smallest value
-
lastInstance
public Instance lastInstance()
Returns the last instance in the set.- Returns:
- the last instance in the set
-
meanOrMode
public double meanOrMode(int attIndex)
Returns the mean (mode) for a numeric (nominal) attribute as a floating-point value. Returns 0 if the attribute is neither nominal nor numeric. If all values are missing it returns zero.- Parameters:
attIndex- the attribute's index- Returns:
- the mean or the mode
-
meanOrMode
public double meanOrMode(Attribute att)
Returns the mean (mode) for a numeric (nominal) attribute as a floating-point value. Returns 0 if the attribute is neither nominal nor numeric. If all values are missing it returns zero.- Parameters:
att- the attribute- Returns:
- the mean or the mode
-
numAttributes
public int numAttributes()
Returns the number of attributes.- Returns:
- the number of attributes as an integer
-
numClasses
public int numClasses()
Returns the number of class labels.- Returns:
- the number of class labels as an integer if the class attribute is nominal, 1 otherwise.
- Throws:
UnassignedClassException- if the class is not set
-
numDistinctValues
public int numDistinctValues(int attIndex)
Returns the number of distinct values of a given attribute. Returns the number of instances if the attribute is a string attribute. The value 'missing' is not counted.- Parameters:
attIndex- the attribute- Returns:
- the number of distinct values of a given attribute
-
numDistinctValues
public int numDistinctValues(Attribute att)
Returns the number of distinct values of a given attribute. Returns the number of instances if the attribute is a string attribute. The value 'missing' is not counted.- Parameters:
att- the attribute- Returns:
- the number of distinct values of a given attribute
-
numInstances
public int numInstances()
Returns the number of instances in the dataset.- Returns:
- the number of instances in the dataset as an integer
-
randomize
public void randomize(java.util.Random random)
Shuffles the instances in the set so that they are ordered randomly.- Parameters:
random- a random number generator
-
readInstance
public boolean readInstance(java.io.Reader reader) throws java.io.IOExceptionReads a single instance from the reader and appends it to the dataset. Automatically expands the dataset if it is not large enough to hold the instance. This method does not check for carriage return at the end of the line.- Parameters:
reader- the reader- Returns:
- false if end of file has been reached
- Throws:
java.io.IOException- if the information is not read successfully
-
relationName
public java.lang.String relationName()
Returns the relation's name.- Returns:
- the relation's name as a string
-
renameAttribute
public void renameAttribute(int att, java.lang.String name)Renames an attribute. This change only affects this dataset.- Parameters:
att- the attribute's indexname- the new name
-
renameAttribute
public void renameAttribute(Attribute att, java.lang.String name)
Renames an attribute. This change only affects this dataset.- Parameters:
att- the attributename- the new name
-
renameAttributeValue
public void renameAttributeValue(int att, int val, java.lang.String name)Renames the value of a nominal (or string) attribute value. This change only affects this dataset.- Parameters:
att- the attribute's indexval- the value's indexname- the new name
-
renameAttributeValue
public void renameAttributeValue(Attribute att, java.lang.String val, java.lang.String name)
Renames the value of a nominal (or string) attribute value. This change only affects this dataset.- Parameters:
att- the attributeval- the valuename- the new name
-
resample
public Instances resample(java.util.Random random)
Creates a new dataset of the same size using random sampling with replacement.- Parameters:
random- a random number generator- Returns:
- the new dataset
-
resampleWithWeights
public Instances resampleWithWeights(java.util.Random random)
Creates a new dataset of the same size using random sampling with replacement according to the current instance weights. The weights of the instances in the new dataset are set to one.- Parameters:
random- a random number generator- Returns:
- the new dataset
-
resampleWithWeights
public Instances resampleWithWeights(java.util.Random random, double[] weights)
Creates a new dataset of the same size using random sampling with replacement according to the given weight vector. The weights of the instances in the new dataset are set to one. The length of the weight vector has to be the same as the number of instances in the dataset, and all weights have to be positive.- Parameters:
random- a random number generatorweights- the weight vector- Returns:
- the new dataset
- Throws:
java.lang.IllegalArgumentException- if the weights array is of the wrong length or contains negative weights.
-
setClass
public void setClass(Attribute att)
Sets the class attribute.- Parameters:
att- attribute to be the class
-
setClassIndex
public void setClassIndex(int classIndex)
Sets the class index of the set. If the class index is negative there is assumed to be no class. (ie. it is undefined)- Parameters:
classIndex- the new class index- Throws:
java.lang.IllegalArgumentException- if the class index is too big or < 0
-
setRelationName
public void setRelationName(java.lang.String newName)
Sets the relation's name.- Parameters:
newName- the new relation name.
-
sort
public void sort(int attIndex)
Sorts the instances based on an attribute. For numeric attributes, instances are sorted in ascending order. For nominal attributes, instances are sorted based on the attribute label ordering specified in the header. Instances with missing values for the attribute are placed at the end of the dataset.- Parameters:
attIndex- the attribute's index
-
sort
public void sort(Attribute att)
Sorts the instances based on an attribute. For numeric attributes, instances are sorted into ascending order. For nominal attributes, instances are sorted based on the attribute label ordering specified in the header. Instances with missing values for the attribute are placed at the end of the dataset.- Parameters:
att- the attribute
-
stratify
public void stratify(int numFolds)
Stratifies a set of instances according to its class values if the class attribute is nominal (so that afterwards a stratified cross-validation can be performed).- Parameters:
numFolds- the number of folds in the cross-validation- Throws:
UnassignedClassException- if the class is not set
-
sumOfWeights
public double sumOfWeights()
Computes the sum of all the instances' weights.- Returns:
- the sum of all the instances' weights as a double
-
testCV
public Instances testCV(int numFolds, int numFold)
Creates the test set for one fold of a cross-validation on the dataset.- Parameters:
numFolds- the number of folds in the cross-validation. Must be greater than 1.numFold- 0 for the first fold, 1 for the second, ...- Returns:
- the test set as a set of weighted instances
- Throws:
java.lang.IllegalArgumentException- if the number of folds is less than 2 or greater than the number of instances.
-
toString
public java.lang.String toString()
Returns the dataset as a string in ARFF format. Strings are quoted if they contain whitespace characters, or if they are a question mark.- Overrides:
toStringin classjava.lang.Object- Returns:
- the dataset in ARFF format as a string
-
trainCV
public Instances trainCV(int numFolds, int numFold)
Creates the training set for one fold of a cross-validation on the dataset.- Parameters:
numFolds- the number of folds in the cross-validation. Must be greater than 1.numFold- 0 for the first fold, 1 for the second, ...- Returns:
- the training set
- Throws:
java.lang.IllegalArgumentException- if the number of folds is less than 2 or greater than the number of instances.
-
trainCV
public Instances trainCV(int numFolds, int numFold, java.util.Random random)
Creates the training set for one fold of a cross-validation on the dataset. The data is subsequently randomized based on the given random number generator.- Parameters:
numFolds- the number of folds in the cross-validation. Must be greater than 1.numFold- 0 for the first fold, 1 for the second, ...random- the random number generator- Returns:
- the training set
- Throws:
java.lang.IllegalArgumentException- if the number of folds is less than 2 or greater than the number of instances.
-
variance
public double variance(int attIndex)
Computes the variance for a numeric attribute.- Parameters:
attIndex- the numeric attribute- Returns:
- the variance if the attribute is numeric
- Throws:
java.lang.IllegalArgumentException- if the attribute is not numeric
-
variance
public double variance(Attribute att)
Computes the variance for a numeric attribute.- Parameters:
att- the numeric attribute- Returns:
- the variance if the attribute is numeric
- Throws:
java.lang.IllegalArgumentException- if the attribute is not numeric
-
attributeStats
public AttributeStats attributeStats(int index)
Calculates summary statistics on the values that appear in this set of instances for a specified attribute.- Parameters:
index- the index of the attribute to summarize.- Returns:
- an AttributeStats object with it's fields calculated.
-
attributeToDoubleArray
public double[] attributeToDoubleArray(int index)
Gets the value of all instances in this dataset for a particular attribute. Useful in conjunction with Utils.sort to allow iterating through the dataset in sorted order for some attribute.- Parameters:
index- the index of the attribute.- Returns:
- an array containing the value of the desired attribute for each instance in the dataset.
-
toSummaryString
public java.lang.String toSummaryString()
Generates a string summarizing the set of instances. Gives a breakdown for each attribute indicating the number of missing/discrete/unique values and other information.- Returns:
- a string summarizing the dataset
-
getInstance
protected boolean getInstance(java.io.StreamTokenizer tokenizer, boolean flag) throws java.io.IOExceptionReads a single instance using the tokenizer and appends it to the dataset. Automatically expands the dataset if it is not large enough to hold the instance.- Parameters:
tokenizer- the tokenizer to be usedflag- if method should test for carriage return after each instance- Returns:
- false if end of file has been reached
- Throws:
java.io.IOException- if the information is not read successfully
-
getInstanceSparse
protected boolean getInstanceSparse(java.io.StreamTokenizer tokenizer, boolean flag) throws java.io.IOExceptionReads a single instance using the tokenizer and appends it to the dataset. Automatically expands the dataset if it is not large enough to hold the instance.- Parameters:
tokenizer- the tokenizer to be usedflag- if method should test for carriage return after each instance- Returns:
- false if end of file has been reached
- Throws:
java.io.IOException- if the information is not read successfully
-
getInstanceFull
protected boolean getInstanceFull(java.io.StreamTokenizer tokenizer, boolean flag) throws java.io.IOExceptionReads a single instance using the tokenizer and appends it to the dataset. Automatically expands the dataset if it is not large enough to hold the instance.- Parameters:
tokenizer- the tokenizer to be usedflag- if method should test for carriage return after each instance- Returns:
- false if end of file has been reached
- Throws:
java.io.IOException- if the information is not read successfully
-
readHeader
protected void readHeader(java.io.StreamTokenizer tokenizer) throws java.io.IOExceptionReads and stores header of an ARFF file.- Parameters:
tokenizer- the stream tokenizer- Throws:
java.io.IOException- if the information is not read successfully
-
copyInstances
protected void copyInstances(int from, Instances dest, int num)Copies instances from one set to the end of another one.- Parameters:
source- the source of the instancesfrom- the position of the first instance to be copieddest- the destination for the instancesnum- the number of instances to be copied
-
errms
protected void errms(java.io.StreamTokenizer tokenizer, java.lang.String theMsg) throws java.io.IOExceptionThrows error message with line number and last token read.- Parameters:
theMsg- the error message to be throwntokenizer- the stream tokenizer- Throws:
IOExcpetion- containing the error messagejava.io.IOException
-
freshAttributeInfo
protected void freshAttributeInfo()
Replaces the attribute information by a clone of itself.
-
getFirstToken
protected void getFirstToken(java.io.StreamTokenizer tokenizer) throws java.io.IOExceptionGets next token, skipping empty lines.- Parameters:
tokenizer- the stream tokenizer- Throws:
java.io.IOException- if reading the next token fails
-
getIndex
protected void getIndex(java.io.StreamTokenizer tokenizer) throws java.io.IOExceptionGets index, checking for a premature and of line.- Parameters:
tokenizer- the stream tokenizer- Throws:
java.io.IOException- if it finds a premature end of line
-
getLastToken
protected void getLastToken(java.io.StreamTokenizer tokenizer, boolean endOfFileOk) throws java.io.IOExceptionGets token and checks if its end of line.- Parameters:
tokenizer- the stream tokenizer- Throws:
java.io.IOException- if it doesn't find an end of line
-
getNextToken
protected void getNextToken(java.io.StreamTokenizer tokenizer) throws java.io.IOExceptionGets next token, checking for a premature and of line.- Parameters:
tokenizer- the stream tokenizer- Throws:
java.io.IOException- if it finds a premature end of line
-
initTokenizer
protected void initTokenizer(java.io.StreamTokenizer tokenizer)
Initializes the StreamTokenizer used for reading the ARFF file.- Parameters:
tokenizer- the stream tokenizer
-
instancesAndWeights
protected java.lang.String instancesAndWeights()
Returns string including all instances, their weights and their indices in the original dataset.- Returns:
- description of instance and its weight as a string
-
partition
protected int partition(int attIndex, int l, int r)Partitions the instances around a pivot. Used by quicksort and kthSmallestValue.- Parameters:
attIndex- the attribute's indexleft- the first index of the subsetright- the last index of the subset- Returns:
- the index of the middle element
-
quickSort
protected void quickSort(int attIndex, int left, int right)Implements quicksort according to Manber's "Introduction to Algorithms".- Parameters:
attIndex- the attribute's indexleft- the first index of the subset to be sortedright- the last index of the subset to be sorted
-
readTillEOL
protected void readTillEOL(java.io.StreamTokenizer tokenizer) throws java.io.IOExceptionReads and skips all tokens before next end of line token.- Parameters:
tokenizer- the stream tokenizer- Throws:
java.io.IOException
-
select
protected int select(int attIndex, int left, int right, int k)Implements computation of the kth-smallest element according to Manber's "Introduction to Algorithms".- Parameters:
attIndex- the attribute's indexleft- the first index of the subsetright- the last index of the subsetk- the value of k- Returns:
- the index of the kth-smallest element
-
stratStep
protected void stratStep(int numFolds)
Help function needed for stratification of set.- Parameters:
numFolds- the number of folds for the stratification
-
swap
public void swap(int i, int j)Swaps two instances in the set.- Parameters:
i- the first instance's indexj- the second instance's index
-
mergeInstances
public static Instances mergeInstances(Instances first, Instances second)
Merges two sets of Instances together. The resulting set will have all the attributes of the first set plus all the attributes of the second set. The number of instances in both sets must be the same.- Parameters:
first- the first set of Instancessecond- the second set of Instances- Returns:
- the merged set of Instances
- Throws:
java.lang.IllegalArgumentException- if the datasets are not the same size
-
test
public static void test(java.lang.String[] argv)
Method for testing this class.- Parameters:
argv- should contain one element: the name of an ARFF file
-
main
public static void main(java.lang.String[] args)
Main method for this class. The following calls are possible:-
weka.core.Instances<filename>
prints a summary of a set of instances. -
weka.core.Instancesmerge <filename1> <filename2>
merges the two datasets (must have same number of instances) and outputs the results on stdout.
- Parameters:
args- the commandline parameters
-
-
-