HTML Agility Pack 搭配 ScrapySharp,解析Html解析

HtmlAgilityPack 1.8.0

This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams).

PM> Install-Package HtmlAgilityPack -Version 1.8.0

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();  
doc.LoadHtml(html);  
HtmlAgilityPack.HtmlNode rootnode = doc.DocumentNode;  
HtmlAgilityPack.HtmlNode row = rootnode.SelectSingleNode("//*[@id='content']/div[3]/div[1]"); 


ScrapySharp 2.6.2

Scraping Framework containing :
- a web client able to simulate a web browser.
- an HtmlAgilityPack extension to select elements using css selector (like JQuery)


PM> Install-Package ScrapySharp -Version 2.6.2

   

 html.CssSelect("div"); //all div elements
    html.CssSelect("div.content"); //all div elements with css class 'content'
    html.CssSelect("div.widget.monthlist"); //all div elements with the both css class
    html.CssSelect("#postPaging"); //all HTML elements with the id postPaging
    html.CssSelect("div#postPaging.testClass");     // all HTML elements with the id postPaging and css class testClass
    html.CssSelect("div.content > p.para");     //p elements who are direct children of div elements with css class 'content'
    html.CssSelect("input[type = text].login");     // textbox with css class login

更多的CSS选择器使用方法可以参看W3的网页:CSS 选择器参考手册



你可能感兴趣的:(html解析,ScrapySharp,HtmlAgilityPack,C#)