Simultaneous product attribute name and value extraction from web pages

Author:

Bo Wu, Xueqi Cheng, Yu Wang, Yan Guo, Linhai Song

Abstract:

Much work has been done in the area of templateindependent web data extraction. However, these approaches deal with the attribute value extraction and annotation either in separate phases or constrained to a predefined set of attributes which is highly ineffective. In this paper, we perform the attribute extraction and annotation simultaneously by extracting the attribute name and value pair at the same time. In our approach, we use a co-training algorithm with naive Bayesian classifier to identify the candidate attribute name and value pairs in the unlabeled pages. The candidate attribute name and value pairs are used to detect the specification block of the product in web pages. Finally, all the attribute name and value pairs in the specification block are discovered. We conduct experiments for three types of products and obtain a promising result.

conference paper WI/IAT 2009 ecbs workshop

你可能感兴趣的:(attribute)