terrier安装与应用

2.Download and configure Terrier

  • Terrier Requirements:
    Terrier’s single requirement consists of an installed Java JRE 1.8.0 or higher.
  • Download Terrier
  • Step by Step Unix Installation
    After having downloaded Terrier, copy the file to the directory where you want to install Terrier. Navigate to this directory and execute the following command to decompress the distribution:
      tar -zxvf terrier-project-5.0-bin.tar.gz
    
    This will result in the creation of a terrier directory in your current directory. Next we will have to make sure that you have the correct Java version available on the system. Type:
    echo $JAVA_HOME
    
    If the environment variable $JAVA_HOME is set, this command will output the path of your Java installation. (e.g. /usr/java/jre1.8.0). If this command shows that you have a correct Java version (1.8.0 or later) installed then your all done. If your system does not meet these requirements you can download a Java 1.8 from the JRE 1.8 download website and set the environment variable by including the following line either in your /etc/profile or ~/.bashrc files:
    export JAVA_HOME=
    

3.Using Terrier

  • indexing
  • Go to the Terrier folder.
    cd terrier-project-5.0
    
  • Setup Terrier for using a TREC test collection by calling
     bin/trec_setup.sh /Users/zcy/Desktop/information/document
    
    execute result:
    terrier安装与应用_第1张图片
    In our example we are using a collection called VASWANI_NPL located at share/vaswani_npl/. It follows a traditional TREC test collection, with a corpus file, topics, and relevance assessments (qrels), and using the same format.
    
    21
    [Biochemical studies on camomile components/III. In vitro studies about theantipeptic activity of (--)-alpha-bisabolol (author's transl)].
    (--)-alpha-Bisabolol has a primary antipeptic action depending on dosage, which is not caused by an alteration of the pH-value. The proteolytic activity of pepsin is reduced by 50 percent through addition of bisabolol in the ratio of 1/0.5. The antipeptic action of bisabolol only occurs in case of direct contact. In case of a previous contact with the substrate, the inhibiting effect is lost
     
    
    
    1)If necessary, check/modify the collection.spec file. This might be required if the collection directory contained files that you do not want to index (READMEs, etc).
    2)Now we are ready to begin the indexing of the collection. This is achieved using the batchindexing command called from the terrier script, as follows:
    terrier安装与应用_第2张图片
    With Terrier’s default settings, the resulting index will be created in the var/index folder within the Terrier installation folder.
    Note: If you do not need the direct index structure for e.g. for query expansion, then you can use bin/terrier batchindexing -j for the faster single-pass indexing.
    Once indexing completes, you can verify your index by obtaining its statistics, using the indexstats command of Terrier.
    Now we can starting to set index in documents by:
     bin/terreier batchindexing
    
    batch-indexers - this is the code for indexing corpora of documents
    terrier安装与应用_第3张图片
    Once indexing completes, you can verify your index by obtaining its statistics, using the indexstats command of Terrier.
    terrier indexstats
    
    terrier安装与应用_第4张图片
  • Retrieval
    convert query file(topics2017.xml) to traditional TREC
    terrier安装与应用_第5张图片If alternatively, we want to use the title, description and the narrative tags to create the query, then we need to setup the properties as follows:
    TrecQueryTags.doctag=topic
    TrecQueryTags.idtag=num
    TrecQueryTags.process=disease,gene,demographic,other
    TrecQueryTags.skip=DESC,NARR
    
    run query command:
    bin/trec_terrier.sh -r -Dtrec.model=PL2 -c 10.99 -Dtrec.topics=/Users/zcy/Desktop/information/topics2017.xml
    
    execute result:
    terrier安装与应用_第6张图片
    Once indexing completes, you can find a file named TF_IDF_2.res in /var/results.
  • evaluation
    Now we will use the “-e” parameter to evaluate the results.
    bin/trec_terrier.sh -e -Dtrec.qrels=/Users/zcy/Desktop/information/qrels-treceval-abstracts.2017.txt
    
    execute result:
    terrier安装与应用_第7张图片
    Terrier goes to the var / results directory to find all. res file evaluations, and then saves the evaluation results as a. Eval file with the same name as the corresponding. res file.
    terrier安装与应用_第8张图片
    We can view the evaluation indicators in .eval file:
    terrier安装与应用_第9张图片

你可能感兴趣的:(terrier安装与应用)