Class Filter

  • All Implemented Interfaces:
    java.io.Serializable
    Direct Known Subclasses:
    PrefrentialSamplingFilter, RemoveSAFilter

    public abstract class Filter
    extends java.lang.Object
    implements java.io.Serializable
    An abstract class for instance filters: objects that take instances as input, carry out some transformation on the instance and then output the instance. The method implementations in this class assume that most of the work will be done in the methods overridden by subclasses.

    A simple example of filter use. This example doesn't remove instances from the output queue until all instances have been input, so has higher memory consumption than an approach that uses output instances as they are made available:

      Filter filter = ..some type of filter..
      Instances instances = ..some instances..
      for (int i = 0; i < data.numInstances(); i++) {
        filter.input(data.instance(i));
      }
      filter.batchFinished();
      Instances newData = filter.outputFormat();
      Instance processed;
      while ((processed = filter.output()) != null) {
        newData.add(processed);
      }
      ..do something with newData..
     
    Version:
    $Revision: 1.24.2.1 $
    Author:
    Len Trigg (trigg@cs.waikato.ac.nz)
    See Also:
    Serialized Form
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected boolean m_FirstBatchDone
      True if the first batch has been done
      protected boolean m_NewBatch
      Record whether the filter is at the start of a batch
    • Constructor Summary

      Constructors 
      Constructor Description
      Filter()  
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods 
      Modifier and Type Method Description
      static void batchFilterFile​(Filter filter, java.lang.String[] options)
      Method for testing filters ability to process multiple batches.
      boolean batchFinished()
      Signify that this batch of input to the filter is finished.
      protected void bufferInput​(Instance instance)
      Adds the supplied input instance to the inputformat dataset for later processing.
      protected void copyStringValues​(Instance instance, boolean instSrcCompat, Instances srcDataset, int[] srcStrAtts, Instances destDataset, int[] destStrAtts)
      Takes string values referenced by an Instance and copies them from a source dataset to a destination dataset.
      protected void copyStringValues​(Instance instance, boolean instSrcCompat, Instances srcDataset, Instances destDataset)
      Takes string values referenced by an Instance and copies them from a source dataset to a destination dataset.
      static void filterFile​(Filter filter, java.lang.String[] options)
      Method for testing filters.
      protected void flushInput()
      This will remove all buffered instances from the inputformat dataset.
      protected Instances getInputFormat()
      Gets the currently set inputformat instances.
      protected int[] getInputStringIndex()
      Returns an array containing the indices of all string attributes in the input format.
      Instances getOutputFormat()
      Gets the format of the output instances.
      protected int[] getOutputStringIndex()
      Returns an array containing the indices of all string attributes in the output format.
      protected int[] getStringIndices​(Instances insts)
      Gets an array containing the indices of all string attributes.
      boolean input​(Instance instance)
      Input an instance for filtering.
      boolean inputFormat​(Instances instanceInfo)
      Deprecated.
      use setInputFormat(Instances) instead.
      protected Instances inputFormatPeek()
      Returns a reference to the current input format without copying it.
      boolean isOutputFormatDefined()
      Returns whether the output format is ready to be collected
      static void main​(java.lang.String[] args)
      Main method for testing this class.
      int numPendingOutput()
      Returns the number of instances pending output
      Instance output()
      Output an instance after filtering and remove from the output queue.
      Instances outputFormat()
      Deprecated.
      use getOutputFormat() instead.
      protected Instances outputFormatPeek()
      Returns a reference to the current output format without copying it.
      Instance outputPeek()
      Output an instance after filtering but do not remove from the output queue.
      protected void push​(Instance instance)
      Adds an output instance to the queue.
      protected void resetQueue()
      Clears the output queue.
      boolean setInputFormat​(Instances instanceInfo)
      Sets the format of the input instances.
      protected void setOutputFormat​(Instances outputFormat)
      Sets the format of output instances.
      static Instances useFilter​(Instances data, Filter filter)
      Filters an entire set of instances through a filter and returns the new set.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • m_NewBatch

        protected boolean m_NewBatch
        Record whether the filter is at the start of a batch
      • m_FirstBatchDone

        protected boolean m_FirstBatchDone
        True if the first batch has been done
    • Constructor Detail

      • Filter

        public Filter()
    • Method Detail

      • setOutputFormat

        protected void setOutputFormat​(Instances outputFormat)
        Sets the format of output instances. The derived class should use this method once it has determined the outputformat. The output queue is cleared.
        Parameters:
        outputFormat - the new output format
      • getInputFormat

        protected Instances getInputFormat()
        Gets the currently set inputformat instances. This dataset may contain buffered instances.
        Returns:
        the input Instances.
      • inputFormatPeek

        protected Instances inputFormatPeek()
        Returns a reference to the current input format without copying it.
        Returns:
        a reference to the current input format
      • outputFormatPeek

        protected Instances outputFormatPeek()
        Returns a reference to the current output format without copying it.
        Returns:
        a reference to the current output format
      • push

        protected void push​(Instance instance)
        Adds an output instance to the queue. The derived class should use this method for each output instance it makes available.
        Parameters:
        instance - the instance to be added to the queue.
      • resetQueue

        protected void resetQueue()
        Clears the output queue.
      • bufferInput

        protected void bufferInput​(Instance instance)
        Adds the supplied input instance to the inputformat dataset for later processing. Use this method rather than getInputFormat().add(instance). Or else. Note that the provided instance gets copied when buffered.
        Parameters:
        instance - the Instance to buffer.
      • getInputStringIndex

        protected int[] getInputStringIndex()
        Returns an array containing the indices of all string attributes in the input format. This index is created during setInputFormat()
        Returns:
        an array containing the indices of string attributes in the input dataset.
      • getOutputStringIndex

        protected int[] getOutputStringIndex()
        Returns an array containing the indices of all string attributes in the output format. This index is created during setOutputFormat()
        Returns:
        an array containing the indices of string attributes in the output dataset.
      • copyStringValues

        protected void copyStringValues​(Instance instance,
                                        boolean instSrcCompat,
                                        Instances srcDataset,
                                        Instances destDataset)
        Takes string values referenced by an Instance and copies them from a source dataset to a destination dataset. The instance references are updated to be valid for the destination dataset. The instance may have the structure (i.e. number and attribute position) of either dataset (this affects where references are obtained from). The source dataset must have the same structure as the filter input format and the destination must have the same structure as the filter output format.
        Parameters:
        instance - the instance containing references to strings in the source dataset that will have references updated to be valid for the destination dataset.
        instSrcCompat - true if the instance structure is the same as the source, or false if it is the same as the destination
        srcDataset - the dataset for which the current instance string references are valid (after any position mapping if needed)
        destDataset - the dataset for which the current instance string references need to be inserted (after any position mapping if needed)
      • copyStringValues

        protected void copyStringValues​(Instance instance,
                                        boolean instSrcCompat,
                                        Instances srcDataset,
                                        int[] srcStrAtts,
                                        Instances destDataset,
                                        int[] destStrAtts)
        Takes string values referenced by an Instance and copies them from a source dataset to a destination dataset. The instance references are updated to be valid for the destination dataset. The instance may have the structure (i.e. number and attribute position) of either dataset (this affects where references are obtained from). Only works if the number of string attributes is the same in both indices (implicitly these string attributes should be semantically same but just with shifted positions).
        Parameters:
        instance - the instance containing references to strings in the source dataset that will have references updated to be valid for the destination dataset.
        instSrcCompat - true if the instance structure is the same as the source, or false if it is the same as the destination (i.e. which of the string attribute indices contains the correct locations for this instance).
        srcDataset - the dataset for which the current instance string references are valid (after any position mapping if needed)
        srcStrAtts - an array containing the indices of string attributes in the source datset.
        destDataset - the dataset for which the current instance string references need to be inserted (after any position mapping if needed)
        destStrAtts - an array containing the indices of string attributes in the destination datset.
      • flushInput

        protected void flushInput()
        This will remove all buffered instances from the inputformat dataset. Use this method rather than getInputFormat().delete();
      • inputFormat

        public boolean inputFormat​(Instances instanceInfo)
                            throws java.lang.Exception
        Deprecated.
        use setInputFormat(Instances) instead.
        Throws:
        java.lang.Exception
      • setInputFormat

        public boolean setInputFormat​(Instances instanceInfo)
                               throws java.lang.Exception
        Sets the format of the input instances. If the filter is able to determine the output format before seeing any input instances, it does so here. This default implementation clears the output format and output queue, and the new batch flag is set. Overriders should call super.setInputFormat(Instances)
        Parameters:
        instanceInfo - an Instances object containing the input instance structure (any instances contained in the object are ignored - only the structure is required).
        Returns:
        true if the outputFormat may be collected immediately
        Throws:
        java.lang.Exception - if the inputFormat can't be set successfully
      • outputFormat

        public Instances outputFormat()
        Deprecated.
        use getOutputFormat() instead.
      • getOutputFormat

        public Instances getOutputFormat()
        Gets the format of the output instances. This should only be called after input() or batchFinished() has returned true. The relation name of the output instances should be changed to reflect the action of the filter (eg: add the filter name and options).
        Returns:
        an Instances object containing the output instance structure only.
        Throws:
        java.lang.NullPointerException - if no input structure has been defined (or the output format hasn't been determined yet)
      • input

        public boolean input​(Instance instance)
                      throws java.lang.Exception
        Input an instance for filtering. Ordinarily the instance is processed and made available for output immediately. Some filters require all instances be read before producing output, in which case output instances should be collected after calling batchFinished(). If the input marks the start of a new batch, the output queue is cleared. This default implementation assumes all instance conversion will occur when batchFinished() is called.
        Parameters:
        instance - the input instance
        Returns:
        true if the filtered instance may now be collected with output().
        Throws:
        java.lang.NullPointerException - if the input format has not been defined.
        java.lang.Exception - if the input instance was not of the correct format or if there was a problem with the filtering.
      • batchFinished

        public boolean batchFinished()
                              throws java.lang.Exception
        Signify that this batch of input to the filter is finished. If the filter requires all instances prior to filtering, output() may now be called to retrieve the filtered instances. Any subsequent instances filtered should be filtered based on setting obtained from the first batch (unless the inputFormat has been re-assigned or new options have been set). This default implementation assumes all instance processing occurs during inputFormat() and input().
        Returns:
        true if there are instances pending output
        Throws:
        java.lang.NullPointerException - if no input structure has been defined,
        java.lang.Exception - if there was a problem finishing the batch.
      • output

        public Instance output()
        Output an instance after filtering and remove from the output queue.
        Returns:
        the instance that has most recently been filtered (or null if the queue is empty).
        Throws:
        java.lang.NullPointerException - if no output structure has been defined
      • outputPeek

        public Instance outputPeek()
        Output an instance after filtering but do not remove from the output queue.
        Returns:
        the instance that has most recently been filtered (or null if the queue is empty).
        Throws:
        java.lang.NullPointerException - if no input structure has been defined
      • numPendingOutput

        public int numPendingOutput()
        Returns the number of instances pending output
        Returns:
        the number of instances pending output
        Throws:
        java.lang.NullPointerException - if no input structure has been defined
      • isOutputFormatDefined

        public boolean isOutputFormatDefined()
        Returns whether the output format is ready to be collected
        Returns:
        true if the output format is set
      • getStringIndices

        protected int[] getStringIndices​(Instances insts)
        Gets an array containing the indices of all string attributes.
        Parameters:
        insts - the Instances to scan for string attributes.
        Returns:
        an array containing the indices of string attributes in the input structure. Will be zero-length if there are no string attributes
      • useFilter

        public static Instances useFilter​(Instances data,
                                          Filter filter)
                                   throws java.lang.Exception
        Filters an entire set of instances through a filter and returns the new set.
        Parameters:
        data - the data to be filtered
        filter - the filter to be used
        Returns:
        the filtered set of data
        Throws:
        java.lang.Exception - if the filter can't be used successfully
      • filterFile

        public static void filterFile​(Filter filter,
                                      java.lang.String[] options)
                               throws java.lang.Exception
        Method for testing filters.
        Parameters:
        argv - should contain the following arguments:
        -i input_file
        -o output_file
        -c class_index
        or -h for help on options
        Throws:
        java.lang.Exception - if something goes wrong or the user requests help on command options
      • batchFilterFile

        public static void batchFilterFile​(Filter filter,
                                           java.lang.String[] options)
                                    throws java.lang.Exception
        Method for testing filters ability to process multiple batches.
        Parameters:
        argv - should contain the following arguments:
        -i (first) input file
        -o (first) output file
        -r (second) input file
        -s (second) output file
        -c class_index
        or -h for help on options
        Throws:
        java.lang.Exception - if something goes wrong or the user requests help on command options
      • main

        public static void main​(java.lang.String[] args)
        Main method for testing this class.
        Parameters:
        argv - should contain arguments to the filter: use -h for help