采集一个网站上的所有图片

<div class="cnblogs_Highlighter"> <pre class="brush:csharp;gutter:false;">using System; using System.Collections.Generic; using System.Linq; using System.Web; using System.Web.UI; using System.Web.UI.WebControls; using System.Net; using System.IO; using System.Text; using Fizzler; using Fizzler.Systems.HtmlAgilityPack; using System.Text.RegularExpressions; </pre> </div> <p>以前。闲着没事。自己做了个采集美女图片的网站<a href="http://babe.1626.com">http://babe.1626.com</a>&nbsp;。如果作者看到我这个破代码。还不知道咋说我呢、</p> <p>它这个网站。看URL就知道。可以循环,然后一个页面里面有N个文章。一个文章里面又分了几个页面。所以。</p> <p>采集思路。就是循环。从大范围开始循环到小范围然后取出所有的图片、</p> <div onclick="cnblogs_code_show('e20cf744-6c38-4acd-b2e9-d7a158410bfa')" class="cnblogs_code"><img src="http://images.cnblogs.com/OutliningIndicators/ContractedBlock.gif" class="code_img_closed" id="code_img_closed_e20cf744-6c38-4acd-b2e9-d7a158410bfa" /><img src="http://images.cnblogs.com/OutliningIndicators/ExpandedBlockStart.gif" onclick="cnblogs_code_hide('e20cf744-6c38-4acd-b2e9-d7a158410bfa',event)" class="code_img_opened" id="code_img_opened_e20cf744-6c38-4acd-b2e9-d7a158410bfa" style="display: none;" /><span class="cnblogs_code_collapse">View Code </span> <div class="cnblogs_code_hide" id="cnblogs_code_open_e20cf744-6c38-4acd-b2e9-d7a158410bfa"> <pre><div><span style="color: #008080;"> 1</span> <span style="color: #0000ff;">protected</span><span style="color: #000000;"> </span><span style="color: #0000ff;">void</span><span style="color: #000000;"> Button1_Click(</span><span style="color: #0000ff;">object</span><span style="color: #000000;"> sender, EventArgs e)<br /></span><span style="color: #008080;"> 2</span> <span style="color: #000000;"> {<br /></span><span style="color: #008080;"> 3</span> <span style="color: #000000;"> </span><span style="color: #0000ff;">string</span><span style="color: #000000;"> UrlBegin </span><span style="color: #000000;">=</span><span style="color: #000000;"> </span><span style="color: #0000ff;">string</span><span style="color: #000000;">.Empty;</span><span style="color: #008000;">//</span><span style="color: #008000;">开始URL</span><span style="color: #008000;"><br /></span><span style="color: #008080;"> 4</span> <span style="color: #008000;">&nbsp;</span><span style="color: #000000;"> </span><span style="color: #0000ff;">for</span><span style="color: #000000;"> (</span><span style="color: #0000ff;">int</span><span style="color: #000000;"> c </span><span style="color: #000000;">=</span><span style="color: #000000;"> </span><span style="color: #800080;">1</span><span style="color: #000000;">; c </span><span style="color: #000000;">&lt;</span><span style="color: #000000;"> </span><span style="color: #800080;">168</span><span style="color: #000000;">; c</span><span style="color: #000000;">++</span><span style="color: #000000;">)</span><span style="color: #008000;">//</span><span style="color: #008000;">首先从第一页开始到最后一页。全部数据</span><span style="color: #008000;"><br /></span><span style="color: #008080;"> 5</span> <span style="color: #008000;">&nbsp;</span><span style="color: #000000;"> {<br /></span><span style="color: #008080;"> 6</span> <span style="color: #000000;"> UrlBegin </span><span style="color: #000000;">=</span><span style="color: #000000;"> </span><span style="color: #800000;">"</span><span style="color: #800000;">http://babe.1626.com/page/</span><span style="color: #800000;">"</span><span style="color: #000000;"> </span><span style="color: #000000;">+</span><span style="color: #000000;"> c;</span><span style="color: #008000;">//</span><span style="color: #008000;">开始页数</span><span style="color: #008000;"><br /></span><span style="color: #008080;"> 7</span> <span style="color: #008000;">&nbsp;</span><span style="color: #000000;"><br /></span><span style="color: #008080;"> 8</span> <span style="color: #000000;"> </span><span style="color: #0000ff;">string</span><span style="color: #000000;"> HtmlText </span><span style="color: #000000;">=</span><span style="color: #000000;"> GetHtmlSource(UrlBegin, Encoding.UTF8);<br /></span><span style="color: #008080;"> 9</span> <span style="color: #000000;"><br /></span><span style="color: #008080;"> 10</span> <span style="color: #000000;"> </span><span style="color: #0000ff;">if</span><span style="color: #000000;"> (</span><span style="color: #0000ff;">string</span><span style="color: #000000;">.IsNullOrEmpty(HtmlText.Trim()))<br /></span><span style="color: #008080;"> 11</span> <span style="color: #000000;"> {<br /></span><span style="color: #008080;"> 12</span> <span style="color: #000000;"> </span><span style="color: #0000ff;">return</span><span style="color: #000000;">;<br /></span><span style="color: #008080;"> 13</span> <span style="color: #000000;"> }<br /></span><span style="color: #008080;"> 14</span> <span style="color: #000000;"> var html </span><span style="color: #000000;">=</span><span style="color: #000000;"> </span><span style="color: #0000ff;">new</span><span style="color: #000000;"> HtmlAgilityPack.HtmlDocument();<br /></span><span style="color: #008080;"> 15</span> <span style="color: #000000;"> html.OptionFixNestedTags </span><span style="color: #000000;">=</span><span style="color: #000000;"> </span><span style="color: #0000ff;">true</span><span style="color: #000000;">;<br /></span><span style="color: #008080;"> 16</span> <span style="color: #000000;"> html.LoadHtml(HtmlText);<br /></span><span style="color: #008080;"> 17</span> <span style="color: #000000;"><br /></span><span style="color: #008080;"> 18</span> <span style="color: #000000;"> var document </span><span style="color: #000000;">=</span><span style="color: #000000;"> html.DocumentNode;<br /></span><span style="color: #008080;"> 19</span> <span style="color: #000000;"><br /></span><span style="color: #008080;"> 20</span> <span style="color: #000000;"> </span><span style="color: #008000;">//</span><span style="color: #008000;">取出当前页的所有单个文章的Url</span><span style="color: #008000;"><br /></span><span style="color: #008080;"> 21</span> <span style="color: #008000;"></span><span style="color: #000000;"> List</span><span style="color: #000000;">&lt;</span><span style="color: #0000ff;">string</span><span style="color: #000000;">&gt;</span><span style="color: #000000;"> PageUrlArticleLink </span><span style="color: #000000;">=</span><span style="color: #000000;"> </span><span style="color: #0000ff;">new</span><span style="color: #000000;"> List</span><span style="color: #000000;">&lt;</span><span style="color: #0000ff;">string</span><span style="color: #000000;">&gt;</span><span style="color: #000000;">();<br /></span><span style="color: #008080;"> 22</span> <span style="color: #000000;"> List</span><span style="color: #000000;">&lt;</span><span style="color: #0000ff;">string</span><span style="color: #000000;">&gt;</span><span style="color: #000000;"> TitleList </span><span style="color: #000000;">=</span><span style="color: #000000;"> </span><span style="color: #0000ff;">new</span><span style="color: #000000;"> List</span><span style="color: #000000;">&lt;</span><span style="color: #0000ff;">string</span><span style="color: #000000;">&gt;</span><span style="color: #000000;">();<br /></span><span style="color: #008080;"> 23</span> <span style="color: #000000;"><br /></span><span style="color: #008080;"> 24</span> <span style="color: #000000;"> var PageCount </span><span style="color: #000000;">=</span><span style="color: #000000;"> document.QuerySelectorAll(</span><span style="color: #800000;">"</span><span style="color: #800000;">.dialog</span><span style="color: #800000;">"</span><span style="color: #000000;">).Count();<br /></span><span style="color: #008080;"> 25</span> <span style="color: #000000;"><br /></span><span style="color: #008080;"> 26</span> <span style="color: #000000;"> </span><span style="color: #0000ff;">for</span><span style="color: #000000;"> (</span><span style="color: #0000ff;">int</span><span style="color: #000000;"> i </span><span style="color: #000000;">=</span><span style="color: #000000;"> </span><span style="color: #800080;">0</span><span style="color: #000000;">; i </span><span style="color: #000000;">&lt;</span><span style="color: #000000;"> PageCount; i</span><span style="color: #000000;">++</span><span style="color: #000000;">)<br /></span><span style="color: #008080;"> 27</span> <span style="color: #000000;"> {<br /></span><span style="color: #008080;"> 28</span> <span style="color: #000000;"> var Title </span><span style="color: #000000;">=</span><span style="color: #000000;"> document.QuerySelectorAll(</span><span style="color: #800000;">"</span><span style="color: #800000;">.dialog .post .title</span><span style="color: #800000;">"</span><span style="color: #000000;">).ToArray()[i].InnerText;<br /></span><span style="color: #008080;"> 29</span> <span style="color: #000000;"> </span><span style="color: #0000ff;">string</span><span style="color: #000000;"> UrlLink </span><span style="color: #000000;">=</span><span style="color: #000000;"> document.QuerySelectorAll(</span><span style="color: #800000;">"</span><span style="color: #800000;">.dialog .post .title</span><span style="color: #800000;">"</span><span style="color: #000000;">).ToArray()[i].InnerHtml;<br /></span><span style="color: #008080;"> 30</span> <span style="color: #000000;"><br /></span><span style="color: #008080;"> 31</span> <span style="color: #000000;"> TitleList.Add(Title);<br /></span><span style="color: #008080;"> 32</span> <span style="color: #000000;"> PageUrlArticleLink.Add(MatchUrl(UrlLink));<br /></span><span style="color: #008080;"> 33</span> <span style="color: #000000;"> }<br /></span><span style="color: #008080;"> 34</span> <span style="color: #000000;"><br /></span><span style="color: #008080;"> 35</span> <span style="color: #000000;"> </span><span style="color: #0000ff;">for</span><span style="color: #000000;"> (</span><span style="color: #0000ff;">int</span><span style="color: #000000;"> j </span><span style="color: #000000;">=</span><span style="color: #000000;"> </span><span style="color: #800080;">0</span><span style="color: #000000;">; j </span><span style="color: #000000;">&lt;</span><span style="color: #000000;"> PageUrlArticleLink.Count; j</span><span style="color: #000000;">++</span><span style="color: #000000;">)<br /></span><span style="color: #008080;"> 36</span> <span style="color: #000000;"> {<br /></span><span style="color: #008080;"> 37</span> <span style="color: #000000;"> GetPageSizeUrl(PageUrlArticleLink[j]);<br /></span><span style="color: #008080;"> 38</span> <span style="color: #000000;"> }<br /></span><span style="color: #008080;"> 39</span> <span style="color: #000000;"> }<br /></span><span style="color: #008080;"> 40</span> <span style="color: #000000;"> }<br /></span><span style="color: #008080;"> 41</span> <span style="color: #000000;"><br /></span><span style="color: #008080;"> 42</span> <span style="color: #000000;"> </span><span style="color: #808080;">///</span><span style="color: #008000;"> </span><span style="color: #808080;">&lt;summary&gt;</span><span style="color: #008000;"><br /></span><span style="color: #008080;"> 43</span> <span style="color: #008000;"> </span><span style="color: #808080;">///</span><span style="color: #008000;">取出一个文章的图片<br /></span><span style="color: #008080;"> 44</span> <span style="color: #008000;"> </span><span style="color: #808080;">///</span><span style="color: #008000;"> </span><span style="color: #808080;">&lt;/summary&gt;</span><span style="color: #008000;"><br /></span><span style="color: #008080;"> 45</span> <span style="color: #008000;"> </span><span style="color: #808080;">///</span><span style="color: #008000;"> </span><span style="color: #808080;">&lt;param name="UrlArticleLink"&gt;</span><span style="color: #008000;">文章Url</span><span style="color: #808080;">&lt;/param&gt;</span><span style="color: #008000;"><br /></span><span style="color: #008080;"> 46</span> <span style="color: #008000;"> </span><span style="color: #808080;">///</span><span style="color: #008000;"> </span><span style="color: #808080;">&lt;returns&gt;&lt;/returns&gt;</span><span style="color: #808080;"><br /></span><span style="color: #008080;"> 47</span> <span style="color: #808080;"></span><span style="color: #000000;"> </span><span style="color: #0000ff;">public</span><span style="color: #000000;"> </span><span style="color: #0000ff;">static</span><span style="color: #000000;"> </span><span style="color: #0000ff;">string</span><span style="color: #000000;"> GetPageSizeUrl(</span><span style="color: #0000ff;">string</span><span style="color: #000000;"> UrlArticleLink)<br /></span><span style="color: #008080;"> 48</span> <span style="color: #000000;"> {<br /></span><span style="color: #008080;"> 49</span> <span style="color: #000000;"> </span><span style="color: #008000;">//</span><span style="color: #008000;">UrlArticleLink = "</span><span style="color: #008000; text-decoration: underline;">http://babe.1626.com/pages/29436</span><span style="color: #008000;">";</span><span style="color: #008000;"><br /></span><span style="color: #008080;"> 50</span> <span style="color: #008000;"></span><span style="color: #000000;"> </span><span style="color: #0000ff;">string</span><span style="color: #000000;"> HtmlText </span><span style="color: #000000;">=</span><span style="color: #000000;"> GetHtmlSource(UrlArticleLink, Encoding.UTF8);<br /></span><span style="color: #008080;"> 51</span> <span style="color: #000000;"> </span><span style="color: #0000ff;">if</span><span style="color: #000000;"> (</span><span style="color: #0000ff;">string</span><span style="color: #000000;">.IsNullOrEmpty(HtmlText.Trim()))<br /></span><span style="color: #008080;"> 52</span> <span style="color: #000000;"> {<br /></span><span style="color: #008080;"> 53</span> <span style="color: #000000;"> </span><span style="color: #0000ff;">return</span><span style="color: #000000;"> </span><span style="color: #800000;">""</span><span style="color: #000000;">;<br /></span><span style="color: #008080;"> 54</span> <span style="color: #000000;"> }<br /></span><span style="color: #008080;"> 55</span> <span style="color: #000000;"> var html </span><span style="color: #000000;">=</span><span style="color: #000000;"> </span><span style="color: #0000ff;">new</span><span style="color: #000000;"> HtmlAgilityPack.HtmlDocument();<br /></span><span style="color: #008080;"> 56</span> <span style="color: #000000;"> html.OptionFixNestedTags </span><span style="color: #000000;">=</span><span style="color: #000000;"> </span><span style="color: #0000ff;">true</span><span style="color: #000000;">;<br /></span><span style="color: #008080;"> 57</span> <span style="color: #000000;"> html.LoadHtml(HtmlText);<br /></span><span style="color: #008080;"> 58</span> <span style="color: #000000;"><br /></span><span style="color: #008080;"> 59</span> <span style="color: #000000;"> var document </span><span style="color: #000000;">=</span><span style="color: #000000;"> html.DocumentNode;<br /></span><span style="color: #008080;"> 60</span> <span style="color: #000000;"><br /></span><span style="color: #008080;"> 61</span> <span style="color: #000000;"> </span><span style="color: #0000ff;">string</span><span style="color: #000000;"> GirlsName </span><span style="color: #000000;">=</span><span style="color: #000000;"> document.QuerySelectorAll(</span><span style="color: #800000;">"</span><span style="color: #800000;">title</span><span style="color: #800000;">"</span><span style="color: #000000;">).ToArray()[</span><span style="color: #800080;">0</span><span style="color: #000000;">].InnerText;<br /></span><span style="color: #008080;"> 62</span> <span style="color: #000000;"><br /></span><span style="color: #008080;"> 63</span> <span style="color: #000000;"> </span><span style="color: #008000;">//</span><span style="color: #008000;">看是否有下一页</span><span style="color: #008000;"><br /></span><span style="color: #008080;"> 64</span> <span style="color: #008000;"></span><span style="color: #000000;"> var PageSize </span><span style="color: #000000;">=</span><span style="color: #000000;"> document.QuerySelectorAll(</span><span style="color: #800000;">"</span><span style="color: #800000;">.flickr p</span><span style="color: #800000;">"</span><span style="color: #000000;">).Count();<br /></span><span style="color: #008080;"> 65</span> <span style="color: #000000;"> List</span><span style="color: #000000;">&lt;</span><span style="color: #0000ff;">string</span><span style="color: #000000;">&gt;</span><span style="color: #000000;"> UrlPageSize </span><span style="color: #000000;">=</span><span style="color: #000000;"> </span><span style="color: #0000ff;">new</span><span style="color: #000000;"> List</span><span style="color: #000000;">&lt;</span><span style="color: #0000ff;">string</span><span style="color: #000000;">&gt;</span><span style="color: #000000;">();<br /></span><span style="color: #008080;"> 66</span> <span style="color: #000000;"> UrlPageSize.Add(UrlArticleLink);<br /></span><span style="color: #008080;"> 67</span> <span style="color: #000000;"><br /></span><span style="color: #008080;"> 68</span> <span style="color: #000000;"> </span><span style="color: #008000;">//</span><span style="color: #008000;">如果有分页,则加到list里面</span><span style="color: #008000;"><br /></span><span style="color: #008080;"> 69</span> <span style="color: #008000;"></span><span style="color: #000000;"> </span><span style="color: #0000ff;">if</span><span style="color: #000000;"> (PageSize </span><span style="color: #000000;">!=</span><span style="color: #000000;"> </span><span style="color: #800080;">0</span><span style="color: #000000;">)<br /></span><span style="color: #008080;"> 70</span> <span style="color: #000000;"> {<br /></span><span style="color: #008080;"> 71</span> <span style="color: #000000;"> </span><span style="color: #0000ff;">int</span><span style="color: #000000;"> a </span><span style="color: #000000;">=</span><span style="color: #000000;"> document.QuerySelectorAll(</span><span style="color: #800000;">"</span><span style="color: #800000;">.flickr p a</span><span style="color: #800000;">"</span><span style="color: #000000;">).Count();</span><span style="color: #008000;">//</span><span style="color: #008000;">读取可能有问题不知道对方用啥分页控件</span><span style="color: #008000;"><br /></span><span style="color: #008080;"> 72</span> <span style="color: #008000;"></span><span style="color: #000000;"> </span><span style="color: #0000ff;">if</span><span style="color: #000000;"> (a </span><span style="color: #000000;">&gt;</span><span style="color: #000000;"> </span><span style="color: #800080;">0</span><span style="color: #000000;">)<br /></span><span style="color: #008080;"> 73</span> <span style="color: #000000;"> {<br /></span><span style="color: #008080;"> 74</span> <span style="color: #000000;"> </span><span style="color: #0000ff;">for</span><span style="color: #000000;"> (</span><span style="color: #0000ff;">int</span><span style="color: #000000;"> i </span><span style="color: #000000;">=</span><span style="color: #000000;"> </span><span style="color: #800080;">0</span><span style="color: #000000;">; i </span><span style="color: #000000;">&lt;</span><span style="color: #000000;"> a; i</span><span style="color: #000000;">++</span><span style="color: #000000;">)<br /></span><span style="color: #008080;"> 75</span> <span style="color: #000000;"> {<br /></span><span style="color: #008080;"> 76</span> <span style="color: #000000;"> </span><span style="color: #0000ff;">string</span><span style="color: #000000;"> PageSizeUrl </span><span style="color: #000000;">=</span><span style="color: #000000;"> document.QuerySelectorAll(</span><span style="color: #800000;">"</span><span style="color: #800000;">.flickr p a</span><span style="color: #800000;">"</span><span style="color: #000000;">).ToArray()[i].OuterHtml;<br /></span><span style="color: #008080;"> 77</span> <span style="color: #000000;"> UrlPageSize.Add(MatchUrl(PageSizeUrl));<br /></span><span style="color: #008080;"> 78</span> <span style="color: #000000;"> }<br /></span><span style="color: #008080;"> 79</span> <span style="color: #000000;"> }<br /></span><span style="color: #008080;"> 80</span> <span style="color: #000000;"> }<br /></span><span style="color: #008080;"> 81</span> <span style="color: #000000;"> List</span><span style="color: #000000;">&lt;</span><span style="color: #0000ff;">string</span><span style="color: #000000;">&gt;</span><span style="color: #000000;"> AllImgUrl </span><span style="color: #000000;">=</span><span style="color: #000000;"> </span><span style="color: #0000ff;">new</span><span style="color: #000000;"> List</span><span style="color: #000000;">&lt;</span><span style="color: #0000ff;">string</span><span style="color: #000000;">&gt;</span><span style="color: #000000;">();<br /></span><span style="color: #008080;"> 82</span> <span style="color: #000000;"><br /></span><span style="color: #008080;"> 83</span> <span style="color: #000000;"> </span><span style="color: #0000ff;">for</span><span style="color: #000000;"> (</span><span style="color: #0000ff;">int</span><span style="color: #000000;"> i </span><span style="color: #000000;">=</span><span style="color: #000000;"> </span><span style="color: #800080;">0</span><span style="color: #000000;">; i </span><span style="color: #000000;">&lt;</span><span style="color: #000000;"> UrlPageSize.Count; i</span><span style="color: #000000;">++</span><span style="color: #000000;">)<br /></span><span style="color: #008080;"> 84</span> <span style="color: #000000;"> {</span><span style="color: #008000;">//</span><span style="color: #008000;">把一个页面得到的图片加到泛型里</span><span style="color: #008000;"><br /></span><span style="color: #008080;"> 85</span> <span style="color: #008000;"></span><span style="color: #000000;"> AllImgUrl.AddRange((GetImg(UrlPageSize[i])));<br /></span><span style="color: #008080;"> 86</span> <span style="color: #000000;"> }<br /></span><span style="color: #008080;"> 87</span> <span style="color: #000000;"> </span><span style="color: #008000;">//</span><span style="color: #008000;">把图片链接放到数据库里面。并开始下载图片<br /></span><span style="color: #008080;"> 88</span> <span style="color: #008000;"> </span><span style="color: #008000;">//</span><span style="color: #008000;">图片存储采取那个方式呢?</span><span style="color: #008000;"><br /></span><span style="color: #008080;"> 89</span> <span style="color: #008000;"></span><span style="color: #000000;"> </span><span style="color: #0000ff;">for</span><span style="color: #000000;"> (</span><span style="color: #0000ff;">int</span><span style="color: #000000;"> i </span><span style="color: #000000;">=</span><span style="color: #000000;"> </span><span style="color: #800080;">0</span><span style="color: #000000;">; i </span><span style="color: #000000;">&lt;</span><span style="color: #000000;"> AllImgUrl.Count; i</span><span style="color: #000000;">++</span><span style="color: #000000;">)<br /></span><span style="color: #008080;"> 90</span> <span style="color: #000000;"> {<br /></span><span style="color: #008080;"> 91</span> <span style="color: #000000;"> </span><span style="color: #0000ff;">string</span><span style="color: #000000;"> FileName </span><span style="color: #000000;">=</span><span style="color: #000000;"> GirlsName </span><span style="color: #000000;">+</span><span style="color: #000000;"> DateTime.Now.ToString(</span><span style="color: #800000;">"</span><span style="color: #800000;">yyyyMMddHHmmssffff</span><span style="color: #800000;">"</span><span style="color: #000000;">);<br /></span><span style="color: #008080;"> 92</span> <span style="color: #000000;"><br /></span><span style="color: #008080;"> 93</span> <span style="color: #000000;"> </span><span style="color: #0000ff;">try</span><span style="color: #000000;"><br /></span><span style="color: #008080;"> 94</span> <span style="color: #000000;"> {<br /></span><span style="color: #008080;"> 95</span> <span style="color: #000000;"> DownloadOneFileByURLWithWebClient(CnShuk.Common.PageValidate.InputText(FileName), AllImgUrl[i], System.Web.HttpContext.Current.Server.MapPath(</span><span style="color: #800000;">"</span><span style="color: #800000;">img</span><span style="color: #800000;">"</span><span style="color: #000000;">));<br /></span><span style="color: #008080;"> 96</span> <span style="color: #000000;"> }<br /></span><span style="color: #008080;"> 97</span> <span style="color: #000000;"> </span><span style="color: #0000ff;">catch</span><span style="color: #000000;"> (SystemException)<br /></span><span style="color: #008080;"> 98</span> <span style="color: #000000;"> {<br /></span><span style="color: #008080;"> 99</span> <span style="color: #000000;"> </span><span style="color: #008000;">//</span><span style="color: #008000;">throw;</span><span style="color: #008000;"><br /></span><span style="color: #008080;">100</span> <span style="color: #008000;"></span><span style="color: #000000;"> </span><span style="color: #0000ff;">continue</span><span style="color: #000000;">;<br /></span><span style="color: #008080;">101</span> <span style="color: #000000;"> }<br /></span><span style="color: #008080;">102</span> <span style="color: #000000;"> }<br /></span><span style="color: #008080;">103</span> <span style="color: #000000;"> </span><span style="color: #0000ff;">return</span><span style="color: #000000;"> </span><span style="color: #800000;">""</span><span style="color: #000000;">;<br /></span><span style="color: #008080;">104</span> <span style="color: #000000;"> }<br /></span><span style="color: #008080;">105</span> <span style="color: #000000;"><br /></span><span style="color: #008080;">106</span> <span style="color: #000000;"><br /></span><span style="color: #008080;">107</span> <span style="color: #000000;"> </span><span style="color: #808080;">///</span><span style="color: #008000;"> </span><span style="color: #808080;">&lt;summary&gt;</span><span style="color: #008000;"><br /></span><span style="color: #008080;">108</span> <span style="color: #008000;"> </span><span style="color: #808080;">///</span><span style="color: #008000;"> 图片下载方法<br /></span><span style="color: #008080;">109</span> <span style="color: #008000;"> </span><span style="color: #808080;">///</span><span style="color: #008000;"> </span><span style="color: #808080;">&lt;/summary&gt;</span><span style="color: #008000;"><br /></span><span style="color: #008080;">110</span> <span style="color: #008000;"> </span><span style="color: #808080;">///</span><span style="color: #008000;"> </span><span style="color: #808080;">&lt;param name="fileName"&gt;</span><span style="color: #008000;">文件名</span><span style="color: #808080;">&lt;/param&gt;</span><span style="color: #008000;"><br /></span><span style="color: #008080;">111</span> <span style="color: #008000;"> </span><span style="color: #808080;">///</span><span style="color: #008000;"> </span><span style="color: #808080;">&lt;param name="url"&gt;</span><span style="color: #008000;">Url</span><span style="color: #808080;">&lt;/param&gt;</span><span style="color: #008000;"><br /></span><span style="color: #008080;">112</span> <span style="color: #008000;"> </span><span style="color: #808080;">///</span><span style="color: #008000;"> </span><span style="color: #808080;">&lt;param name="localPath"&gt;</span><span style="color: #008000;">存放路径</span><span style="color: #808080;">&lt;/param&gt;</span><span style="color: #808080;"><br /></span><span style="color: #008080;">113</span> <span style="color: #808080;"></span><span style="color: #000000;"> </span><span style="color: #0000ff;">public</span><span style="color: #000000;"> </span><span style="color: #0000ff;">static</span><span style="color: #000000;"> </span><span style="color: #0000ff;">void</span><span style="color: #000000;"> DownloadOneFileByURLWithWebClient(</span><span style="color: #0000ff;">string</span><span style="color: #000000;"> fileName, </span><span style="color: #0000ff;">string</span><span style="color: #000000;"> url, </span><span style="color: #0000ff;">string</span><span style="color: #000000;"> localPath)<br /></span><span style="color: #008080;">114</span> <span style="color: #000000;"> {<br /></span><span style="color: #008080;">115</span> <span style="color: #000000;"> </span><span style="color: #0000ff;">if</span><span style="color: #000000;"> (fileName.Contains(</span><span style="color: #800000;">"</span><span style="color: #800000;">:</span><span style="color: #800000;">"</span><span style="color: #000000;">))<br /></span><span style="color: #008080;">116</span> <span style="color: #000000;"> {<br /></span><span style="color: #008080;">117</span> <span style="color: #000000;"> fileName </span><span style="color: #000000;">=</span><span style="color: #000000;"> fileName.Replace(</span><span style="color: #800000;">'</span><span style="color: #800000;">:</span><span style="color: #800000;">'</span><span style="color: #000000;">, </span><span style="color: #800000;">'</span><span style="color: #800000;">a</span><span style="color: #800000;">'</span><span style="color: #000000;">);<br /></span><span style="color: #008080;">118</span> <span style="color: #000000;"> }<br /></span><span style="color: #008080;">119</span> <span style="color: #000000;"> System.Net.WebClient wc </span><span style="color: #000000;">=</span><span style="color: #000000;"> </span><span style="color: #0000ff;">new</span><span style="color: #000000;"> System.Net.WebClient();<br /></span><span style="color: #008080;">120</span> <span style="color: #000000;"><br /></span><span style="color: #008080;">121</span> <span style="color: #000000;"> </span><span style="color: #0000ff;">if</span><span style="color: #000000;"> (File.Exists(localPath </span><span style="color: #000000;">+</span><span style="color: #000000;"> fileName))<br /></span><span style="color: #008080;">122</span> <span style="color: #000000;"> {<br /></span><span style="color: #008080;">123</span> <span style="color: #000000;"> File.Delete(localPath </span><span style="color: #000000;">+</span><span style="color: #000000;"> fileName);<br /></span><span style="color: #008080;">124</span> <span style="color: #000000;"> }<br /></span><span style="color: #008080;">125</span> <span style="color: #000000;"> </span><span style="color: #0000ff;">if</span><span style="color: #000000;"> (Directory.Exists(localPath) </span><span style="color: #000000;">==</span><span style="color: #000000;"> </span><span style="color: #0000ff;">false</span><span style="color: #000000;">)<br /></span><span style="color: #008080;">126</span> <span style="color: #000000;"> {<br /></span><span style="color: #008080;">127</span> <span style="color: #000000;"> Directory.CreateDirectory(localPath);<br /></span><span style="color: #008080;">128</span> <span style="color: #000000;"> }<br /></span><span style="color: #008080;">129</span> <span style="color: #000000;"><br /></span><span style="color: #008080;">130</span> <span style="color: #000000;"> </span><span style="color: #0000ff;">try</span><span style="color: #000000;"><br /></span><span style="color: #008080;">131</span> <span style="color: #000000;"> {<br /></span><span style="color: #008080;">132</span> <span style="color: #000000;"> wc.DownloadFile(url, </span><span style="color: #800000;">@"</span><span style="color: #800000;">c:img</span><span style="color: #800000;">"</span><span style="color: #000000;"> </span><span style="color: #000000;">+</span><span style="color: #000000;"> fileName </span><span style="color: #000000;">+</span><span style="color: #000000;"> </span><span style="color: #800000;">"</span><span style="color: #800000;">.jpg</span><span style="color: #800000;">"</span><span style="color: #000000;">);<br /></span><span style="color: #008080;">133</span> <span style="color: #000000;"> }<br /></span><span style="color: #008080;">134</span> <span style="color: #000000;"> </span><span style="color: #0000ff;">catch</span><span style="color: #000000;"> (SystemException)<br /></span><span style="color: #008080;">135</span> <span style="color: #000000;"> {<br /></span><span style="color: #008080;">136</span> <span style="color: #000000;"> </span><span style="color: #0000ff;">throw</span><span style="color: #000000;">;<br /></span><span style="color: #008080;">137</span> <span style="color: #000000;"> }<br /></span><span style="color: #008080;">138</span> <span style="color: #000000;"> <br /></span><span style="color: #008080;">139</span> <span style="color: #000000;"> }<br /></span><span style="color: #008080;">140</span> <span style="color: #000000;"><br /></span><span style="color: #008080;">141</span> <span style="color: #000000;"> </span><span style="color: #808080;">///</span><span style="color: #008000;"> </span><span style="color: #808080;">&lt;summary&gt;</span><span style="color: #008000;"><br /></span><span style="color: #008080;">142</span> <span style="color: #008000;"> </span><span style="color: #808080;">///</span><span style="color: #008000;"> 取一个页面的图片<br /></span><span style="color: #008080;">143</span> <span style="color: #008000;"> </span><span style="color: #808080;">///</span><span style="color: #008000;"> </span><span style="color: #808080;">&lt;/summary&gt;</span><span style="color: #008000;"><br /></span><span style="color: #008080;">144</span> <span style="color: #008000;"> </span><span style="color: #808080;">///</span><span style="color: #008000;"> </span><span style="color: #808080;">&lt;param name="Url"&gt;</span><span style="color: #008000;">一个页面的Url(有可能是第二页)</span><span style="color: #808080;">&lt;/param&gt;</span><span style="color: #008000;"><br /></span><span style="color: #008080;">145</span> <span style="color: #008000;"> </span><span style="color: #808080;">///</span><span style="color: #008000;"> </span><span style="color: #808080;">&lt;returns&gt;</span><span style="color: #008000;">返回图片泛型</span><span style="color: #808080;">&lt;/returns&gt;</span><span style="color: #808080;"><br /></span><span style="color: #008080;">146</span> <span style="color: #808080;"></span><span style="color: #000000;"> </span><span style="color: #0000ff;">public</span><span style="color: #000000;"> </span><span style="color: #0000ff;">static</span><span style="color: #000000;"> List</span><span style="color: #000000;">&lt;</span><span style="color: #0000ff;">string</span><span style="color: #000000;">&gt;</span><span style="color: #000000;"> GetImg(</span><span style="color: #0000ff;">string</span><span style="color: #000000;"> Url)<br /></span><span style="color: #008080;">147</span> <span style="color: #000000;"> {<br /></span><span style="color: #008080;">148</span> <span style="color: #000000;"> List</span><span style="color: #000000;">&lt;</span><span style="color: #0000ff;">string</span><span style="color: #000000;">&gt;</span><span style="color: #000000;"> ImgUrl </span><span style="color: #000000;">=</span><span style="color: #000000;"> </span><span style="color: #0000ff;">new</span><span style="color: #000000;"> List</span><span style="color: #000000;">&lt;</span><span style="color: #0000ff;">string</span><span style="color: #000000;">&gt;</span><span style="color: #000000;">();<br /></span><span style="color: #008080;">149</span> <span style="color: #000000;"> </span><span style="color: #0000ff;">string</span><span style="color: #000000;"> HtmlText </span><span style="color: #000000;">=</span><span style="color: #000000;"> GetHtmlSource(Url, Encoding.UTF8);<br /></span><span style="color: #008080;">150</span> <span style="color: #000000;"><br /></span><span style="color: #008080;">151</span> <span style="color: #000000;"> </span><span style="color: #0000ff;">if</span><span style="color: #000000;"> (</span><span style="color: #0000ff;">string</span><span style="color: #000000;">.IsNullOrEmpty(HtmlText.Trim()))<br /></span><span style="color: #008080;">152</span> <span style="color: #000000;"> {<br /></span><span style="color: #008080;">153</span> <span style="color: #000000;"> </span><span style="color: #0000ff;">return</span><span style="color: #000000;"> ImgUrl;<br /></span><span style="color: #008080;">154</span> <span style="color: #000000;"> }<br /></span><span style="color: #008080;">155</span> <span style="color: #000000;"> var html </span><span style="color: #000000;">=</span><span style="color: #000000;"> </span><span style="color: #0000ff;">new</span><span style="color: #000000;"> HtmlAgilityPack.HtmlDocument();<br /></span><span style="color: #008080;">156</span> <span style="color: #000000;"> html.OptionFixNestedTags </span><span style="color: #000000;">=</span><span style="color: #000000;"> </span><span style="color: #0000ff;">true</span><span style="color: #000000;">;<br /></span><span style="color: #008080;">157</span> <span style="color: #000000;"> html.LoadHtml(HtmlText);<br /></span><span style="color: #008080;">158</span> <span style="color: #000000;"> var document </span><span style="color: #000000;">=</span><span style="color: #000000;"> html.DocumentNode;<br /></span><span style="color: #008080;">159</span> <span style="color: #000000;"><br /></span><span style="color: #008080;">160</span> <span style="color: #000000;"> </span><span style="color: #008000;">//</span><span style="color: #008000;">开始取图片</span><span style="color: #008000;"><br /></span><span style="color: #008080;">161</span> <span style="color: #008000;"></span><span style="color: #000000;"> var PageCount </span><span style="color: #000000;">=</span><span style="color: #000000;"> document.QuerySelectorAll(</span><span style="color: #800000;">"</span><span style="color: #800000;">.content img</span><span style="color: #800000;">"</span><span style="color: #000000;">).Count();<br /></span><span style="color: #008080;">162</span> <span style="color: #000000;"> </span><span style="color: #0000ff;">for</span><span style="color: #000000;"> (</span><span style="color: #0000ff;">int</span><span style="color: #000000;"> i </span><span style="color: #000000;">=</span><span style="color: #000000;"> </span><span style="color: #800080;">0</span><span style="color: #000000;">; i </span><span style="color: #000000;">&lt;</span><span style="color: #000000;"> PageCount; i</span><span style="color: #000000;">++</span><span style="color: #000000;">)<br /></span><span style="color: #008080;">163</span> <span style="color: #000000;"> {<br /></span><span style="color: #008080;">164</span> <span style="color: #000000;"> var ImgUrlLink </span><span style="color: #000000;">=</span><span style="color: #000000;"> document.QuerySelectorAll(</span><span style="color: #800000;">"</span><span style="color: #800000;">.content img</span><span style="color: #800000;">"</span><span style="color: #000000;">).ToArray()[i].OuterHtml;<br /></span><span style="color: #008080;">165</span> <span style="color: #000000;"><br /></span><span style="color: #008080;">166</span> <span style="color: #000000;"> Match m </span><span style="color: #000000;">=</span><span style="color: #000000;"> Regex.Match(ImgUrlLink, </span><span style="color: #800000;">@"</span><span style="color: #800000;">http://([w-]+.)+[w-]+(/[w- ./?%&amp;=]*)?</span><span style="color: #800000;">"</span><span style="color: #000000;">);<br /></span><span style="color: #008080;">167</span> <span style="color: #000000;"> </span><span style="color: #0000ff;">if</span><span style="color: #000000;"> (m.Success)<br /></span><span style="color: #008080;">168</span> <span style="color: #000000;"> {<br /></span><span style="color: #008080;">169</span> <span style="color: #000000;"> </span><span style="color: #0000ff;">if</span><span style="color: #000000;"> (m.ToString() </span><span style="color: #000000;">!=</span><span style="color: #000000;"> </span><span style="color: #800000;">"</span><span style="color: #800000;">http://babe.1626.com/wp-content/themes/elegant-box/images/transparent.gif</span><span style="color: #800000;">"</span><span style="color: #000000;">)<br /></span><span style="color: #008080;">170</span> <span style="color: #000000;"> {<br /></span><span style="color: #008080;">171</span> <span style="color: #000000;"> ImgUrl.Add(m.ToString());<br /></span><span style="color: #008080;">172</span> <span style="color: #000000;"> }<br /></span><span style="color: #008080;">173</span> <span style="color: #000000;"> }<br /></span><span style="color: #008080;">174</span> <span style="color: #000000;"> }<br /></span><span style="color: #008080;">175</span> <span style="color: #000000;"> </span><span style="color: #0000ff;">return</span><span style="color: #000000;"> ImgUrl;<br /></span><span style="color: #008080;">176</span> <span style="color: #000000;"> }<br /></span><span style="color: #008080;">177</span> <span style="color: #000000;"><br /></span><span style="color: #008080;">178</span> <span style="color: #000000;"> </span><span style="color: #808080;">///</span><span style="color: #008000;"> </span><span style="color: #808080;">&lt;summary&gt;</span><span style="color: #008000;"><br /></span><span style="color: #008080;">179</span> <span style="color: #008000;"> </span><span style="color: #808080;">///</span><span style="color: #008000;"> 验证提取Url<br /></span><span style="color: #008080;">180</span> <span style="color: #008000;"> </span><span style="color: #808080;">///</span><span style="color: #008000;"> </span><span style="color: #808080;">&lt;/summary&gt;</span><span style="color: #008000;"><br /></span><span style="color: #008080;">181</span> <span style="color: #008000;"> </span><span style="color: #808080;">///</span><span style="color: #008000;"> </span><span style="color: #808080;">&lt;param name="Url"&gt;</span><span style="color: #008000;">含有Url的字符串</span><span style="color: #808080;">&lt;/param&gt;</span><span style="color: #008000;"><br /></span><span style="color: #008080;">182</span> <span style="color: #008000;"> </span><span style="color: #808080;">///</span><span style="color: #008000;"> </span><span style="color: #808080;">&lt;returns&gt;</span><span style="color: #008000;">返回Url</span><span style="color: #808080;">&lt;/returns&gt;</span><span style="color: #808080;"><br /></span><span style="color: #008080;">183</span> <span style="color: #808080;"></span><span style="color: #000000;"> </span><span style="color: #0000ff;">public</span><span style="color: #000000;"> </span><span style="color: #0000ff;">static</span><span style="color: #000000;"> </span><span style="color: #0000ff;">string</span><span style="color: #000000;"> MatchUrl(</span><span style="color: #0000ff;">string</span><span style="color: #000000;"> Url)<br /></span><span style="color: #008080;">184</span> <span style="color: #000000;"> {<br /></span><span style="color: #008080;">185</span> <span style="color: #000000;"> Match m </span><span style="color: #000000;">=</span><span style="color: #000000;"> Regex.Match(Url, </span><span style="color: #800000;">@"</span><span style="color: #800000;">http://([w-]+.)+[w-]+(/[w- ./?%&amp;=]*)?</span><span style="color: #800000;">"</span><span style="color: #000000;">);<br /></span><span style="color: #008080;">186</span> <span style="color: #000000;"><br /></span><span style="color: #008080;">187</span> <span style="color: #000000;"> </span><span style="color: #0000ff;">if</span><span style="color: #000000;"> (m.Success)<br /></span><span style="color: #008080;">188</span> <span style="color: #000000;"> {<br /></span><span style="color: #008080;">189</span> <span style="color: #000000;"> </span><span style="color: #0000ff;">return</span><span style="color: #000000;"> m.ToString();<br /></span><span style="color: #008080;">190</span> <span style="color: #000000;"> }<br /></span><span style="color: #008080;">191</span> <span style="color: #000000;"> </span><span style="color: #0000ff;">else</span><span style="color: #000000;"><br /></span><span style="color: #008080;">192</span> <span style="color: #000000;"> {<br /></span><span style="color: #008080;">193</span> <span style="color: #000000;"> </span><span style="color: #0000ff;">return</span><span style="color: #000000;"> </span><span style="color: #800000;">""</span><span style="color: #000000;">;<br /></span><span style="color: #008080;">194</span> <span style="color: #000000;"> }<br /></span><span style="color: #008080;">195</span> <span style="color: #000000;"> }<br /></span><span style="color: #008080;">196</span> <span style="color: #000000;"><br /></span><span style="color: #008080;">197</span> <span style="color: #000000;"> </span><span style="color: #808080;">///</span><span style="color: #008000;"> </span><span style="color: #808080;">&lt;summary&gt;</span><span style="color: #008000;"><br /></span><span style="color: #008080;">198</span> <span style="color: #008000;"> </span><span style="color: #808080;">///</span><span style="color: #008000;"> 得到html<br /></span><span style="color: #008080;">199</span> <span style="color: #008000;"> </span><span style="color: #808080;">///</span><span style="color: #008000;"> </span><span style="color: #808080;">&lt;/summary&gt;</span><span style="color: #008000;"><br /></span><span style="color: #008080;">200</span> <span style="color: #008000;"> </span><span style="color: #808080;">///</span><span style="color: #008000;"> </span><span style="color: #808080;">&lt;param name="url"&gt;&lt;/param&gt;</span><span style="color: #008000;"><br /></span><span style="color: #008080;">201</span> <span style="color: #008000;"> </span><span style="color: #808080;">///</span><span style="color: #008000;"> </span><span style="color: #808080;">&lt;param name="charset"&gt;&lt;/param&gt;</span><span style="color: #008000;"><br /></span><span style="color: #008080;">202</span> <span style="color: #008000;"> </span><span style="color: #808080;">///</span><span style="color: #008000;"> </span><span style="color: #808080;">&lt;returns&gt;&lt;/returns&gt;</span><span style="color: #808080;"><br /></span><span style="color: #008080;">203</span> <span style="color: #808080;"></span><span style="color: #000000;"> </span><span style="color: #0000ff;">public</span><span style="color: #000000;"> </span><span style="color: #0000ff;">static</span><span style="color: #000000;"> </span><span style="color: #0000ff;">string</span><span style="color: #000000;"> GetHtmlSource(</span><span style="color: #0000ff;">string</span><span style="color: #000000;"> url, Encoding charset)<br /></span><span style="color: #008080;">204</span> <span style="color: #000000;"> {<br /></span><span style="color: #008080;">205</span> <span style="color: #000000;"> </span><span style="color: #008000;">//</span><span style="color: #008000;">处理内容 </span><span style="color: #008000;"><br /></span><span style="color: #008080;">206</span> <span style="color: #008000;"></span><span style="color: #000000;"> </span><span style="color: #0000ff;">string</span><span style="color: #000000;"> html </span><span style="color: #000000;">=</span><span style="color: #000000;"> </span><span style="color: #800000;">""</span><span style="color: #000000;">;<br /></span><span style="color: #008080;">207</span> <span style="color: #000000;"> </span><span style="color: #0000ff;">try</span><span style="color: #000000;"><br /></span><span style="color: #008080;">208</span> <span style="color: #000000;"> {<br /></span><span style="color: #008080;">209</span> <span style="color: #000000;"> HttpWebRequest request </span><span style="color: #000000;">=</span><span style="color: #000000;"> (HttpWebRequest)WebRequest.Create(url);<br /></span><span style="color: #008080;">210</span> <span style="color: #000000;"> HttpWebResponse response </span><span style="color: #000000;">=</span><span style="color: #000000;"> (HttpWebResponse)request.GetResponse();<br /></span><span style="color: #008080;">211</span> <span style="color: #000000;"> Stream stream </span><span style="color: #000000;">=</span><span style="color: #000000;"> response.GetResponseStream();<br /></span><span style="color: #008080;">212</span> <span style="color: #000000;"> StreamReader reader </span><span style="color: #000000;">=</span><span style="color: #000000;"> </span><span style="color: #0000ff;">new</span><span style="color: #000000;"> StreamReader(stream, charset);<br /></span><span style="color: #008080;">213</span> <span style="color: #000000;"> html </span><span style="color: #000000;">=</span><span style="color: #000000;"> reader.ReadToEnd();<br /></span><span style="color: #008080;">214</span> <span style="color: #000000;"> stream.Close();<br /></span><span style="color: #008080;">215</span> <span style="color: #000000;"> }<br /></span><span style="color: #008080;">216</span> <span style="color: #000000;"> </span><span style="color: #0000ff;">catch</span><span style="color: #000000;"> (Exception e)<br /></span><span style="color: #008080;">217</span> <span style="color: #000000;"> {<br /></span><span style="color: #008080;">218</span> <span style="color: #000000;"> }<br /></span><span style="color: #008080;">219</span> <span style="color: #000000;"> </span><span style="color: #0000ff;">return</span><span style="color: #000000;"> html;<br /></span><span style="color: #008080;">220</span> <span style="color: #000000;"> }<br /></span><span style="color: #008080;">221</span> <span style="color: #000000;"> }<br /></span><span style="color: #008080;">222</span> <span style="color: #000000;">}</span></div></pre> </div> </div> <p>里面也没啥技术含量。只是自己采集这玩。我采集了3G的美女图片。只是程序不稳定。也懒得优化了。只是这里面用到了2比较不错的东西。Fizzler和HtmlAgilityPack国内关于这资料也有。第二个的资料国外的比较多。很简单的。这个.其实这个还可以采集到后。直接插入数据库。然后导入到正式数据库里面。我比较懒。只要图片。代码烂。自己看吧.这代码下载的附件咋鸡巴上传呢。长时间没写。不知道咋搞了。这代码自己调调吧。粘贴上去就对了</p>

你可能感兴趣的:(图片)