Class AlgoPrefixSpan


  • public class AlgoPrefixSpan
    extends java.lang.Object
    This is a 2016 implementation of the PrefixSpan algorithm. PrefixSpan was proposed by Pei et al. 2001. NOTE: This implementation saves the pattern to a file as soon as they are found or can keep the pattern into memory, depending on what the user choose. This implementation was done in 2016. It is different than the previous implementation in SPMF which was implemented in 2008, and the AGP implementation. I have re-implemented the code to make it more efficient. This new implementation can be 10 times faster than the 2008 implementation, since I have added more optimizations Copyright (c) 2008-2012 Philippe Fournier-Viger This file is part of the SPMF DATA MINING SOFTWARE (http://www.philippe-fournier-viger.com/spmf). SPMF is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. SPMF is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with SPMF. If not, see .
    • Constructor Detail

      • AlgoPrefixSpan

        public AlgoPrefixSpan()
        Default constructor
    • Method Detail

      • runAlgorithm

        public SequentialPatterns runAlgorithm​(PatternSequenceDatabase patternSequenceDatabase,
                                               double minsupRelative)
                                        throws java.io.IOException
        Run the algorithm
        Parameters:
        patternSequenceDatabase - : a sequence database
        minsupRelative - : the minimum support as a percentage (e.g. 50%) as a value in [0,1]
        Returns:
        return the result, if saved into memory, otherwise null
        Throws:
        java.io.IOException - exception if error while writing the file
      • runAlgorithm

        public SequentialPatterns runAlgorithm​(PatternSequenceDatabase patternSequenceDatabase,
                                               int minsup)
                                        throws java.io.IOException
        Run the algorithm
        Parameters:
        database - : a sequence database
        minsupPercent - : the minimum support as an integer
        outputFilePath - : the path of the output file to save the result or null if you want the result to be saved into memory
        Returns:
        return the result, if saved into memory, otherwise null
        Throws:
        java.io.IOException - exception if error while writing the file
      • findAllFrequentPairsSingleItems

        protected java.util.Map<java.lang.Integer,​java.util.List<PseudoSequence>> findAllFrequentPairsSingleItems​(java.util.List<PseudoSequence> sequences,
                                                                                                                        int lastBufferPosition)
        Method to find all frequent items in a projected sequence database
        Parameters:
        sequences - the set of sequences
        patternBuffer - the current sequential pattern that we want to try to grow
        lastBufferPosition - the last position used in the buffer for storing the current prefix
        Returns:
        A list of pairs, where a pair is an item with (1) a boolean indicating if it is in an itemset that is "cut" and (2) the sequence IDs where it occurs.
      • findAllFrequentPairs

        protected AlgoPrefixSpan.MapFrequentPairs findAllFrequentPairs​(java.util.List<PseudoSequence> sequences,
                                                                       int lastBufferPosition)
        Method to find all frequent items in a projected sequence database
        Parameters:
        sequences - the set of sequences
        lastBufferPosition - the last position used in the buffer for storing the current prefix
        Returns:
        A list of pairs, where a pair is an item with (1) a boolean indicating if it is in an itemset that is "cut" and (2) the sequence IDs where it occurs.
      • printStatistics

        public void printStatistics()
        Print statistics about the algorithm execution to System.out.
        Parameters:
        size - the size of the database
      • getMaximumPatternLength

        public int getMaximumPatternLength()
        Get the maximum length of patterns to be found (in terms of item count)
        Returns:
        the maximumPatternLength
      • setMaximumPatternLength

        public void setMaximumPatternLength​(int maximumPatternLength)
        Set the maximum length of patterns to be found (in terms of item count)
        Parameters:
        maximumPatternLength - the maximumPatternLength to set
      • setShowSequenceIdentifiers

        public void setShowSequenceIdentifiers​(boolean showSequenceIdentifiers)
        Set that the sequence identifiers should be shown (true) or not (false) for each pattern found
        Parameters:
        showSequenceIdentifiers - true or false