Combining Similarity and Distribution Features to Match Attributes

Yu Wang, Bingxing Fang, Yan Guo. Combining Similarity and Distribution Features to Match Attributes. In Proceedings of the 2nd International Workshop on Electronic Commerce, Business, and Services (ECBS 2009), Web Intelligence 2009.

Abstract : The Web contains much useful semistructued information which can be organized into web objects, and many of them are commercially valuable. The inner structures of these web objects are highly heterogeneous that web objects from different web sites cover different subsets of useful attributes. The complete set of attributes can be mined from web pages through attribute extraction algorithms. However, to construct high quality web object schema, some mined attributes should be merged since they are synonyms for the same concepts. Our empirical study shows that features used
by traditional schema matching and deep web integration methods are usually domain specific, so they are not applicable to match attributes extracted from the Web. To overcome this problem, this paper proposes new features to depict attribute distribution characteristics and uses machine learning techniques to combine attribute distribution characteristics with attribute similarity features. We empirically compare the proposed method with existing methods use other features, and the results show the effectiveness of our method.

你可能感兴趣的:(attribute)