IR-chapter7: computing scores in a complete search system

efficient scoring and ranking

IR-chapter7: computing scores in a complete search system_第1张图片
FastCosineScore
  • constructing a heap to pick out top K components
  • Inexact top K document retrieval
  • index elimination
    considering documents containing terms whose idf exceeds a preset threshold
    considering documents containing many(even all) terms
  • champion list
    precompute r documents with the highest weights for each term.
    r does not to be the same for every term.(rarer term, larger)
  • static quality scoring and ordering
    net-score
    global champion list, expansion two lists sorted by g(d) value
  • impact ordering
    sorted by common ordering: document-at-a-time scoring
    sorted by uncommon ordering: term-at-a-time scoring
    ordered by a decreasing tf value,advantage:
    1.stop after considering a prefix of posting list
    2.consedering query terms in decreasing order of idf.
  • cluster pruning
    pick ,compute nearest, cluster, computing cosine similarity from q to each leader, then the closest L and its follower
    variation - b1,b2

components of an information retrieval system

  • tiered indexes
    motivation: A has fewer than K documents
    solution: we set a tf threshold of 20 for tier 1 and 10 for tier 2, meaning that the tier 1 index only has postings entries with tf values exceeding 20, and the tier 2 index only has postings entries with tf values exceeding 10.
IR-chapter7: computing scores in a complete search system_第2张图片
tiered indexes
  • designing parsing and scoring function
    query parser - translate the user-specified keywords into a query with various operators
    scoring function - manual configuration or machine-learned scoring

  • putting it all together

IR-chapter7: computing scores in a complete search system_第3张图片
a complete search system

results snippets: snippets of text accompanying each document in the results list for a query.

Vector space scoring and query operator interaction

Google: the semantics of a conjunctive query that only retrieves documents containing all or most query terms.

  • Boolean retrieval
  • wildcard queries
  • phase queries

你可能感兴趣的:(IR-chapter7: computing scores in a complete search system)