Issuse1:java.lang.OutOfMemoryError: Java heap space
具体错误如下:
java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
at gnu.trove.TObjectIntHashMap.rehash(TObjectIntHashMap.java:170)
at gnu.trove.THash.postInsertHook(THash.java:359)
at gnu.trove.TObjectIntHashMap.put(TObjectIntHashMap.java:155)
at org.terrier.utility.TermCodes.getCode(TermCodes.java:100)
at org.terrier.structures.indexing.DocumentPostingList.getTermId(DocumentPostingList.java:133)
at org.terrier.structures.indexing.DocumentPostingList$2.execute(DocumentPostingList.java:168)
at org.terrier.structures.indexing.DocumentPostingList$2.execute(DocumentPostingList.java:166)
at gnu.trove.TObjectIntHashMap.forEachEntry(TObjectIntHashMap.java:426)
at org.terrier.structures.indexing.DocumentPostingList.getPostings2(DocumentPostingList.java:165)
at org.terrier.indexing.BasicIndexer.indexDocument(BasicIndexer.java:368)
at org.terrier.indexing.BasicIndexer.createDirectIndex(BasicIndexer.java:261)
at org.terrier.indexing.Indexer.index(Indexer.java:344)
at org.terrier.applications.TRECIndexing.index(TRECIndexing.java:123)
at org.terrier.applications.TrecTerrier.run(TrecTerrier.java:390)
at org.terrier.applications.TrecTerrier.applyOptions(TrecTerrier.java:573)
at org.terrier.applications.TrecTerrier.main(TrecTerrier.java:237)
21877.18user 916.34system 6:01:37elapsed 105%CPU (0avgtext+0avgdata 0maxresident)k
45946520inputs+21416016outputs (1major+1978833minor)pagefaults 0swaps
解决方案:
increased the maximum Java Heap Space to 2GB, by setting TERRIER_HEAP_MEM to 2048M in bin/terrier-env.sh.
And It seems to be running smoothly.
Issuse2:(可以概括为Key is not unique)
ERROR - This index (Index(/users/ishanic/terrier-3.0/var/index,data_1)) doesnt have an index structure called lexicon-keyfactory: property index.lexicon-keyfactory.class not found
ERROR - Valid structures are: [document-inputstream, meta-inputstream, document-factory, meta, document]
具体错误如下:
NFO - Collection #0 took 183780 seconds to build the runs for 20000000 documents
ERROR - Problem finishing index
java.io.IOException: Key is not unique: 38131,3514
at org.terrier.structures.collections.FSOrderedMapFile$MultiFSOMapWriter.mergeTwo(FSOrderedMapFile.java:908)
at org.terrier.structures.collections.FSOrderedMapFile$MultiFSOMapWriter.close(FSOrderedMapFile.java:861)
at org.terrier.structures.indexing.CompressingMetaIndexBuilder.close(CompressingMetaIndexBuilder.java:259)
at org.terrier.indexing.BasicSinglePassIndexer.createInvertedIndex(BasicSinglePassIndexer.java:274)
at org.terrier.indexing.BasicSinglePassIndexer.createDirectIndex(BasicSinglePassIndexer.java:147)
at org.terrier.indexing.Indexer.index(Indexer.java:344)
at org.terrier.applications.TRECIndexing.createSinglePass(TRECIndexing.java:221)
at org.terrier.applications.TrecTerrier.run(TrecTerrier.java:384)
at org.terrier.applications.TrecTerrier.applyOptions(TrecTerrier.java:573)
at org.terrier.applications.TrecTerrier.main(TrecTerrier.java:237)
INFO - Optimising structure lexicon
ERROR - This index (Index(/users/ishanic/terrier-3.0/var/index,data_1)) doesnt have an index structure called lexicon-keyfactory: property index.lexicon-keyfactory.class not found
ERROR - Valid structures are: [document-inputstream, meta-inputstream, document-factory, meta, document]
ERROR - This index (Index(/users/ishanic/terrier-3.0/var/index,data_1)) doesnt have an index structure called lexicon-valuefactory: property index.lexicon-valuefactory.class not found
ERROR - Valid structures are: [document-inputstream, meta-inputstream, document-factory, meta, document]
A problem occurred: java.lang.NullPointerException
java.lang.NullPointerException
at org.terrier.structures.collections.FSOrderedMapFile.numberOfEntries(FSOrderedMapFile.java:490)
at org.terrier.structures.FSOMapFileLexicon.optimise(FSOMapFileLexicon.java:389)
at org.terrier.structures.indexing.LexiconBuilder.optimise(LexiconBuilder.java:790)
at org.terrier.indexing.BasicIndexer.finishedInvertedIndexBuild(BasicIndexer.java:438)
at org.terrier.indexing.BasicSinglePassIndexer.createInvertedIndex(BasicSinglePassIndexer.java:292)
at org.terrier.indexing.BasicSinglePassIndexer.createDirectIndex(BasicSinglePassIndexer.java:147)
at org.terrier.indexing.Indexer.index(Indexer.java:344)
at org.terrier.applications.TRECIndexing.createSinglePass(TRECIndexing.java:221)
at org.terrier.applications.TrecTerrier.run(TrecTerrier.java:384)
at org.terrier.applications.TrecTerrier.applyOptions(TrecTerrier.java:573)
at org.terrier.applications.TrecTerrier.main(TrecTerrier.java:237)
解决方案:
这些问题是建立meta index 造成的,以下为解决方案:
Give this issue some thought.
* My initial idea was that your indexer.meta.forward.keylens was too small, but this is not the case.
* The error is occurring when building the reverse lookup table (docno -> docid). Will you need this functionality? If not, then you can disable it using indexer.meta.reverse.keys= during indexing.
* Otherwise, can you alter the exception being raised in FSOrderedMapFile to print the value of the key that is causing the collision?
我采用的是第二种,问题迎刃而解。