在 .Net Core 上通过 JsHttpClient 抓取HTML页面数据(.Net Core 爬虫)

调试环境:ASP.NET Core Web API
目标框架:.NET Core 2.2 
开发工具:Visual Studio 2017 
   提供者:成长的小猪 Jason Song

由于我们在创业过程中,经常因为没有基础数据,于是通过编写爬虫来抓取相关网页数据,我们以前在.NET Framework 上使用 HttpHelper 类进行抓取相应网页上非常方便,如今技术在不断的迭代升级,跨平台的.Net Core 非常棒,在.Net Core 下我们也需要抓取网页数据,今天介绍一下我最近编写的一个小小库,希望也能帮助到你

JsHttpClient is a simple and flexible HTML page crawling client library for .Net Core
JsHttpClient 是一个用于在 .Net Core 上简单灵活的 HTML 页面抓取客户端库

安装方式1 : 工具 =》NuGet包管理器 =》程序包管理器控制台
在 .Net Core 上通过 JsHttpClient 抓取HTML页面数据(.Net Core 爬虫)_第1张图片

在控制台输入以下命令

PM> Install-Package JsHttpClient

在 .Net Core 上通过 JsHttpClient 抓取HTML页面数据(.Net Core 爬虫)_第2张图片

 

安装方式2 : 工具 =》NuGet包管理器 =》管理解决方案的NuGet程序包在 .Net Core 上通过 JsHttpClient 抓取HTML页面数据(.Net Core 爬虫)_第3张图片

 

快速开始

首先,在ConfigureServices(IServiceCollection服务)上添加JsHttpClient客户端服务。

// Startup.cs
// 文章来源 http://blog.csdn.net/jasonsong2008
public void ConfigureServices(IServiceCollection services)
{
    // Add JsHttpClient client services
    // 添加 JsHttpClient 
    // Add by Jason.Song(成长的小猪) on 2019/04/23
    services.AddJsHttpClient();
    // 是否允许自动重定向功能,如果不需要,请使用以下方式,默认允许为 true
    // services.AddJsHttpClient(new JsHttpClientOptions{ AllowAutoRedirect = false });
            
    services.AddMvc().SetCompatibilityVersion(CompatibilityVersion.Version_2_2);
}

使用示例
 

using System.Threading.Tasks;
using Microsoft.AspNetCore.Mvc;
using JasonSoft.Net.JsHttpClient.Http;

namespace JsHttpClient.WebApi.Controllers
{
    /// 
    /// Add by Jason.Song(成长的小猪) on 2019/04/24
    /// http://blog.csdn.net/jasonsong2008
    /// 
    [Route("api/[controller]")]
    [ApiController]
    public class TestController : ControllerBase
    {
        private readonly IJsHttpClient _client;

        /// 
        /// 实例化
        /// Add by Jason.Song(成长的小猪) on 2019/04/24
        /// 
        /// 
        public TestController(IJsHttpClient client)
        {
            _client = client;
        }

        /// 
        /// Asynchronous request test
        /// 异步请求测试
        /// Add by Jason.Song(成长的小猪) on 2019/04/24
        /// http://blog.csdn.net/jasonsong2008
        /// 
        /// 
        [HttpGet("HttpAsync")]
        public async Task HttpAsync()
        {
            const string urlString = "https://blog.csdn.net/jasonsong2008";
            var request = new JsHttpRequest {Uri = urlString};
            //request.Method = HttpMethod.Get;
            //request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*";
            //request.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36";
            //request.Referer = "https://blog.csdn.net/";
            //request.Host = "blog.csdn.net";
            //request.Cookie = "";
            //request.Timeout = 30;
            //request.Add("Upgrade-Insecure-Requests", "1");

            var response = await _client.SendAsync(request);
            //response.Cookie
            //response.ResultByte
            return Content(response.Html, "text/html; charset=utf-8");
        }

        /// 
        /// Synchronous request test
        /// 同步请求测试
        /// Add by Jason.Song(成长的小猪) on 2019/04/24
        /// http://blog.csdn.net/jasonsong2008
        /// 
        /// 
        [HttpGet("HttpSync")]
        public IActionResult HttpSync()
        {
            const string urlString = "https://blog.csdn.net/jasonsong2008";
            var request = new JsHttpRequest {Uri = urlString};
            //request.Method = HttpMethod.Get;
            //request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*";
            //request.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36";
            //request.Referer = "https://blog.csdn.net/";
            //request.Host = "blog.csdn.net";
            //request.Cookie = "";
            //request.Timeout = 30;
            //request.Add("Upgrade-Insecure-Requests", "1");

            var response = _client.Send(request);
            //response.Cookie
            //response.ResultByte
            return Content(response.Html, "text/html; charset=utf-8");
        }
    }
}

 

查看我本人更多原创文章,请点击这里。

你可能感兴趣的:(C#,ASP.NET,Core)