高效获取网页源码

在不依赖界面的使用WebBrowser的前提下,可以用.net封装好的WebClient下载页面,也可以用HttpRequest发请求。当然,也可以退回去用xmlhttp这个com组件获取。事实证明,com组件非常快,而且很好用。因为页面的编码是自动识别的。

多的不说,看代码,以下测试代码需要添加COM引用:Microsoft Xml任意版本都可以,我用的是6.0。

测试代码如下。效果非常明显

public static void Test() { int tick = Environment.TickCount; string html = GetHtmlCom("http://www.csdn.net"); File.WriteAllText("comhtml.txt", html); Console.WriteLine("Com : " + (Environment.TickCount - tick).ToString()); tick = Environment.TickCount; html = GetHtmlWebclient("http://www.csdn.net"); File.WriteAllText("WebClient.txt", html); Console.WriteLine("Webclient : " + (Environment.TickCount - tick).ToString()); tick = Environment.TickCount; html = WebFunc.GetHtmlEx("http://www.csdn.net", Encoding.UTF8); File.WriteAllText("WebRequest.txt", html); Console.WriteLine("WebRequest : " + (Environment.TickCount - tick).ToString()); } public static string GetHtmlCom(string url) { XMLHTTP xmlhttp = new XMLHTTPClass(); xmlhttp.open("get", url, false, null, null); xmlhttp.send(""); while (xmlhttp.readyState != 4) Thread.Sleep(1); return xmlhttp.responseText; } public static string GetHtmlWebclient(string url) { return Encoding.UTF8.GetString(new WebClient().DownloadData(url)); } static class WebFunc { private static CookieContainer cookie = new CookieContainer(); private static string contentType = "application/x-www-form-urlencoded"; private static string accept = "image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/x-shockwave-flash, application/x-silverlight, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, application/x-ms-application, application/x-ms-xbap, application/vnd.ms-xpsdocument, application/xaml+xml, application/x-silverlight-2-b1, */*"; private static string userAgent = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022)"; public static string GetHtmlEx(string url, Encoding encoding) { HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url); request.UserAgent = userAgent; request.ContentType = contentType; request.CookieContainer = cookie; request.Accept = accept; request.Method = "get"; WebResponse response = request.GetResponse(); Stream responseStream = response.GetResponseStream(); StreamReader reader = new StreamReader(responseStream, encoding); String html = reader.ReadToEnd(); response.Close(); return html; } }  

 

输出的三个消耗时间为:

Com : 140
Webclient : 3074
WebRequest : 218

你可能感兴趣的:(html,String,Microsoft,url,encoding,WebBrowser)