Position Bias Correction for Search Behavior Analysis

Author:Wei Min, Jason Wang

1. Introduction

Clicks contain information about user satisfaction with search results and provide a measurement of item relevance/quality. However, the observed click-through rate (CTR) is confounded with position bias, that is, users are more likely to click top-ranked items than lower ones regardless of relevance. This bias affects the click and sales statistics that have a profound implication for machine learned ranking. So discovering an effective way to offset the bias could have a big impact. The purpose of this research is to evaluate and compare three different methods of compensating for position bias. We use eBay search data to compare the performance of COEC (clicks over expected clicks), GaP (Gamma-Poisson) and Validated Impression. We recommend Validated Impression as a reasonable fix based on the experimentation.

 2. Model Introduction

2.1 Position models

 Introduce a hidden variable exam which is true if an item is examined. We make two assumptions:

  • the probability of a click only depends on whether an item is examined
  • the probability that an item is examined is independent of the particular item, and only depends on its position

From this it follows (see ref. [2]) that the probability of clicking an item i in position jis the product of the relevance-based probability pi and the probability that position jgets examined qj :

P(click = 1 | ij) =  P(click = 1|exam=1, i)  × P(exam = 1|j) = pi ×qj .

2.1.1 COEC model

In COEC method estimate empirical CTR@position j qj  as the sum of clicks over all items appearing in position j divided by sum of impressions over all items appearing in position j. Estimate p as sum of clicks over all positions for item i divided by sum of “expected examinations” given the positions of each exposure of the item. For example, if item i has been exposed K times and the position of each exposure is jk , then

where, the the positional CTR vector qj is scaled so that qj = 1. The basic idea ofCOEC model is to give larger weight to clicks from lower position, and the weight function is scaled empirical positional CTR. However, the drawback of COEC is that the estimation of positional CTR (qj ) is itself biased, since high quality items are usually ranked top. 

2.1.2 GaP model

Assume the number of clicks follows a binomial distribution with pi × qj  as the success probability. When the number of impressions is sufficiently large (compared with observed CTR), binomial is approximated by a Poisson distribution. A prior is imposed on positional CTR (qj) for regularization purposes. Based on empirical data, a Gamma prior is used.  An iterative algorithm is used to estimate pi and qj (the recurrence is derived in ref. [1] section 5).  The risk is that, even though the assumption of Poisson is valid for web advertisements, it might not hold for eBay search. In addition, eBay item is relative stable on a given position under a query which may fail to capture the position effect. 

2. 2 Cascading and DBN models

 The assumption of Cascading and DBN models is that the probability of clicking an item is dependent on whether user is satisfied with items shown above on the search result page. Refer to [2] for details.

2.2.1 Valid Impression model

Validated Impression model is a simplified DBN model. Estimate P(click=1|exam=1,i) as number of clicks over number of all validated impressions for item i, where validated impression means that the item occurs before the last clicked item of the search, which ensures that the item is examined. For example if a user clicks on items ranked on position 2, 5 and 9, then only items on top 9 positions are considered to have validated impression.

 3.  Experiments

 3.1 COEC and GaP comparison

COEC and GaP comparison The Comparison is based on 100 % of two weeks US data (desktop only, top 50 positions). The first week data is used as training set to estimate the parameters and the prediction is evaluated on the same item set of second week. To get more accurate CTR, only items with large impressions are selected into the experiment, in addition, to reduce the algorithm computation complex, position are capped at 10. The argument is that the position curve figure1 has a relative flat tail.

  

figure1: Emperical Position CTR

The evaluation process is as follows:

1. Retrieve all SRPs/Items related to a given query

2. Consider items that appear in two consecutive weeks and each week with impression>100

3. Hold out item in the second week as test dataset, and calculate the CTR per position for each item

4. Train the models on first week data to estimate pi and q (To save computation cost, only 100K items are selected)

5. Predict the CTR of item i @ position j as pi×qj

6. Evaluate the prediction on test dataset using Mean Square Error (MSE)

The result of the comparison is table 1 on page 4. From the table, GaP model has a tiny better prediction accurate.

figure2: COEC and GaP Comparison

 

To avoid MSE being skewed by some extreme bad predictions, prediction errors are visualized as well. The error function:

             Error(x) = function(x)sign(x)log1p(abs(x))

The function preserves the sign of the prediction error and also shrinks the range for easy plotting. Again, from the figure 2 on page 4, the two models barely differs. If looking at the figure 3 on page 5, the position CTR from COEC a GaP are similar, which also indicates the GaP adjustment is quite limited.

figure3: Scaled Positional CTR from COEC and GaP

3.2 COEC and Valid Impression comparison

COEC and Validated Impression comparison  According above factor model that P(Click=1|Exam=1, position=1) =P(Click=1|Exam=1)P(Exam=1| position=1), when assuming user always exam the first position of search result page, say P(Exam=1|position=1)=1, then we have P(Click=1|Exam=1, position=1)  =P(Click=1|Exam=1). Therefore, the observed CTR of items at position 1 can be used to evaluate the estimation of position bias removed CTR pi.

In the experimentation, we use searches where query/item appears on position 1 for testing and those on other positions for model training. One week 10% eBay desktop traffic is collected and only Best-Match and no category/aspects constrained search result pages (SRPs) are considered. In addition, around 2.5% out-of-order searches (e.g. user first click lower ranked item and then click higher ranked item) are removed. The evaluation process is followed by the leave-one-out experimentation protocol:

  1. Retrieve all SRPs/Items related to a given query
  2. Filter out items with low impression (<100 impressions) that lead to inaccurate CTR estimation.
  3. Consider an items that appear in both position 1 and other positions
  4. Hold out item in position 1 as test dataset, and calculate the CTR of the item on position 1 based on the test dataset.
  5. Train the models on the remaining positions and predict P(Click=1 | Exam=1).
  6. Compute the error between the perdition and CTR in position 1 usingError(x)=function(x) sign(x) * log1p(abs(x))

From figure 4 on page 6, Valid impression outperform COEC model.

figure4: COEC and Valid Impression Comparison

3.3  Experiment Conclusion

GaP Model performs better in Ad Search in Microsoft [1] but worse in Item Search in my experimentation, which is probably because Ad has much smaller clicks than Item, therefore, Poisson approximation assumption that impression should be large compared with CTR is valid in Ad search but not in item search. However, comparing the COEC model with valid impression, valid impression has significant higher accuracy of estimating positional bias corrected CTR. Based on the experimentations, Valid impression are recommended to be used in eBay clicks model instead of the raw clicks that confounded with position bias.

4. Future Work

There are two applications under consideration to correct the position bias of eBay clicks model. One of them is to correct the raw clicks in item level metrics that involve with the calculation of the most important factors in search rank function. However,the infrastructure cost is high. Another application is to correct the raw clicks in Query Category Demand data which are used as business rule to demote items from category with less clicks and promote items from category with more clicks. No matter of which one, impact analysis will be conducted before any A/B tests. For example, in Category Demand application, questions like how many query has category clicks distribution shifted after using the position bias corrected clicks, and whether this change is positive or not and so on will be answered.

5.References

[1]     Chen and Yan, Position-Normalized Click Prediction in Search Advertising, KDD 2012

[2]     Chapelle and Zhang, A Dynamic Bayesian Network Click Model for Web Search Ranking, WWW 2009

你可能感兴趣的:(Machine,Learning)