C#超简单爬虫demo

运用正则表达式匹配链接,实现爬取煎蛋网的图片。代码很短,新手值得一试。

不说废话了,直接上图。

C#超简单爬虫demo_第1张图片
C#超简单爬虫demo_第2张图片

using System;
using System.IO;
using System.Net;
using System.Text;
using System.Text.RegularExpressions;
namespace Crawler
{
    class Program
    {
        static void Main(string[] args)
        {
            string url = "http://jandan.net/top-zoo";
            string path = @"D:\Picture\";
            HttpWebRequest webRequest = WebRequest.CreateHttp(url);
            webRequest.Method = "GET";
            webRequest.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) ";
            var webResponse = webRequest.GetResponse();
            StreamReader streamReader = new StreamReader(webResponse.GetResponseStream(), Encoding.UTF8);
            string str = streamReader.ReadToEnd();
            streamReader.Close();
            if (string.IsNullOrEmpty(str))
            {
                Console.WriteLine("————————-错误—————————");
                Console.ReadKey();
            }
            Regex regex = new Regex("(.*?(?:\\.(?:png|jpg|gif))))['|\"]");
            MatchCollection match = regex.Matches(str);
            WebClient client = new WebClient();
            int name = 0;
            try
            {
                foreach (Match match1 in match)
                {
                    string src = match1.Groups["Collect"].Value;
                    src = "http:"+src;
                    name++;
                    client.DownloadFile(src,path+name+".jpg");
                    Console.WriteLine("\n正在爬取———————" + "|" +src);                  
                }
            }
            catch (Exception ex)
            {
                Console.WriteLine("-------------" + ex);
            }
            Console.ReadKey();
        }

    }
}


希望能给有需要的人一些启示和帮助。

你可能感兴趣的:(爬虫,正则表达式,c#,regex)