de.uni_leipzig.asv.coocc
Class BinFileMultColPreparer

java.lang.Object
  extended byde.uni_leipzig.asv.coocc.BinFileMultColPreparer

public class BinFileMultColPreparer
extends Object

This class prepares the two tempFiles for fast access of a large multiple-column table containing only Integers. It takes the a file as input, which first must be dumped from the database via one of the following example commands:

select wort_nr, group_nr into outfile '/var/roedel/ksim/r_word_group.dump' from r_word_group where group_nr >= 3286 and group_nr <= 2578224 order by wort_nr asc;
select group_nr, group_type into outfile '/var/roedel/ksim/r_group_type.dump' from r_group_type where group_nr >= 3286 and group_nr <= 2578224 order by group_nr asc;

The file must be in the working directory of this program.

Assumes that wordnumers in the first column are mostly wothout holes.
It will fill up useless 4 bytes per missing wordnumber up to the next existing wordnumber

Format of first file: byte[4] of numbers

Format of second file: byte[4] of numbers times the columns - 1

Semantics: The integer at 4*n-1 in the first file gives the begin and the integer at 4*n of the first file gives the end of the area to read in the second file in order to retrieve the stored information for the n-th number

ASSUMPTIONS:
- column1: wordNrs don't have too large 'holes' (index will be too large otherwise)
- All columns in the dump are non-null (will throw Exception otherwise)


Field Summary
protected  int columns
           
static String ext1
          for better recognition the data file has this extension
static String ext2
          The index file gets another extension
 
Constructor Summary
BinFileMultColPreparer(String fileName, int columnCount)
          Sole constructor.
 
Method Summary
protected  void createFiles(String fileName)
          This method wraps the actual algorithm, taking care of opening and closing the files, etc.
static void main(String[] args)
          Use this main in order to prepare file directly from the command line.
protected  void readFiles(String fileName)
           
protected  int[] splitToIntArray(String line)
          Splits the line into an array of Integers.
protected  void writeFiles(BufferedReader reader, OutputStream datWriter, OutputStream idxWriter)
          This method actually does the work.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

ext1

public static final String ext1
for better recognition the data file has this extension

See Also:
Constant Field Values

ext2

public static final String ext2
The index file gets another extension

See Also:
Constant Field Values

columns

protected int columns
Constructor Detail

BinFileMultColPreparer

public BinFileMultColPreparer(String fileName,
                              int columnCount)
Sole constructor. Takes the fileName and columns count. If you have two columns in the file, give 2 here. If 3, give 3 etc. If you give 2 although you have 3, the third will be ignored.

Parameters:
fileName - String - the dump file
columnCount - int - number of columns in file
Method Detail

createFiles

protected final void createFiles(String fileName)
This method wraps the actual algorithm, taking care of opening and closing the files, etc.

Parameters:
fileName - String -

writeFiles

protected final void writeFiles(BufferedReader reader,
                                OutputStream datWriter,
                                OutputStream idxWriter)
                         throws Exception
This method actually does the work.

Parameters:
reader - BufferedReader - Where to read from
datWriter - OutputStream - Datafile
idxWriter - OutputStream - Indexfile
Throws:
Exception

splitToIntArray

protected final int[] splitToIntArray(String line)
Splits the line into an array of Integers.

Parameters:
line -
Returns:

readFiles

protected final void readFiles(String fileName)
Parameters:
fileName -

main

public static final void main(String[] args)
Use this main in order to prepare file directly from the command line.