de.uni_leipzig.asv.coocc
Class BinFileStrColPreparer
java.lang.Object
de.uni_leipzig.asv.coocc.BinFileStrColPreparer
- public class BinFileStrColPreparer
- extends Object
This class prepares the two tempFiles which are needed to access word_bins
faster.
It takes the wordlist file as input, which first must be dumped
from the database via the following command:
select w.wort_nr, w.wort_bin into outfile '/var/roedel/ksim/wortliste.dump' from wortliste w order by w.wort_nr asc;
The file must be in the working directory of this program under data/ksim/ or
as specified.
Assumes that wordnumers in the first column are mostly wothout holes.
It will fill up useless 4 bytes per missing wordnumber up to the next existing
wordnumber in the indexfile
Format of first file: char[?] of words
Format of second file: char[4]
Semantics: nth 4-byte-number gives location of end of collocationsnumbers of
the wordnumber n. Begin is stored at n-1
ASSUMPTIONS:
column1: wordNrs don't have too large 'holes'
|
Field Summary |
static String |
ext1
for better recognition the data file has this extension |
static String |
ext2
The index file gets another extension |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
ext1
public static final String ext1
- for better recognition the data file has this extension
- See Also:
- Constant Field Values
ext2
public static final String ext2
- The index file gets another extension
- See Also:
- Constant Field Values
BinFileStrColPreparer
public BinFileStrColPreparer(String fileName)
- Parameters:
fileName - String -
createFiles
protected final void createFiles(String fileName)
writeFiles
protected final void writeFiles(BufferedReader reader,
OutputStream datWriter,
OutputStream idxWriter)
throws Exception
- Throws:
Exception
testFiles
public final void testFiles()
main
public static final void main(String[] args)