lxml转码 乱码问题

from lxml.html.clean import Cleaner
response =unicode(response.content, "utf-8")

清除css格式
cleaner = Cleaner(style=True, scripts=True, page_structure=False, safe_attrs_only=False)
response = etree.HTML(cleaner.clean_html(response))

你可能感兴趣的:(lxml转码 乱码问题)