Python3 爬虫--公司代理问题解决

废话

好久没有造过轮子了,突发奇想解决一下一进公司写爬虫就遇到的代理的问题

正文

如果没有代理问题,如下代码就可以获取到网页 html 源码

import urllib
import urllib.request
from bs4 import BeautifulSoup

url = "http://wintersmilesb101.online/"

user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
req = urllib.request.Request(url, headers={
    'User-Agent': 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
})
response = urllib.request.urlopen(req)
content = conn.read().decode('utf-8')
print(content)

运行:

报错误信息

"C:\Program Files (x86)\Anaconda3\python.exe" D:/Alvin/PersonalProjects/Python/Spider/WinterSmileSB101Blog/main_error.py
Traceback (most recent call last):
  File "D:/Alvin/PersonalProjects/Python/Spider/WinterSmileSB101Blog/main_error.py", line 12, in 
    response = urllib.request.urlopen(req)
  File "C:\Program Files (x86)\Anaconda3\lib\urllib\request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Program Files (x86)\Anaconda3\lib\urllib\request.py", line 532, in open
    response = meth(req, response)
  File "C:\Program Files (x86)\Anaconda3\lib\urllib\request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Program Files (x86)\Anaconda3\lib\urllib\request.py", line 570, in error
    return self._call_chain(*args)
  File "C:\Program Files (x86)\Anaconda3\lib\urllib\request.py", line 504, in _call_chain
    result = func(*args)
  File "C:\Program Files (x86)\Anaconda3\lib\urllib\request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 407: Proxy Authentication Required ( Forefront TMG requires authorization to fulfill the request. Access to the Web Proxy filter is denied.  )
Process finished with exit code 1

从信息 urllib.error.HTTPError: HTTP Error 407: Proxy Authentication Required ( Forefront TMG requires authorization to fulfill the request. Access to the Web Proxy filter is denied. )

来看,是需要我们设置代理验证。

通过 request 中的 ProxyHandler 来设置我们的代理,

proxy = req.ProxyHandler({‘https’: ‘s1firewall:8080’}) 这个是 公司的代理设置方式,也就是前面是 链接的方式 http 或者 https,我试过 http 无效,所以这里使用 https,后面就是代理的 Address 和端口号

有些代理还可能需要 用户名和密码,就会写成类似这样,不过这里公司并不需要这样设置,这样设置反而会连不上代理服务器:

proxy = req.ProxyHandler({‘http’: r’http://username:password@url:port‘})

完整的设置代码如下:

设置代理之后

import urllib
import urllib.request as req
from bs4 import BeautifulSoup

url = "http://wintersmilesb101.online/"

user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
# 设置代理 IP,http 不行,使用 https
proxy = req.ProxyHandler({
    'https': 's1firewall:8080'})
auth = req.HTTPBasicAuthHandler()
# 构造 opener
opener = req.build_opener(proxy, auth, req.HTTPHandler)
# 添加 header
opener.addheaders = [('User-Agent', user_agent)]
# 安装 opener
req.install_opener(opener)
# 打开链接
conn = req.urlopen(url)
# 以 utf-8 编码获取网页内容
content = conn.read().decode('utf-8')
# 输出
print(content)

运行:

最终输出 Collapse source

"C:\Program Files (x86)\Anaconda3\python.exe" D:/Alvin/PersonalProjects/Python/Spider/WinterSmileSB101Blog/main.py



<html class="theme-next mist use-motion" lang="zh-Hans,zh-hk,en,fr-FR,ru,de,ja,id,ko,default">
<head>
  <meta charset="UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1"/>
<meta http-equiv="Cache-Control" content="no-transform" />
<meta http-equiv="Cache-Control" content="no-siteapp" />


  <link href="/lib/fancybox/source/jquery.fancybox.css?v=2.1.5" rel="stylesheet" type="text/css" />































    <link href="//fonts.googleapis.com/css?family=Monda:300,300italic,400,400italic,700,700italic|Roboto Slab:300,300italic,400,400italic,700,700italic|Lobster Two:300,300italic,400,400italic,700,700italic|PT Mono:300,300italic,400,400italic,700,700italic&subset=latin,latin-ext" rel="stylesheet" type="text/css">


<link href="/lib/font-awesome/css/font-awesome.min.css?v=4.6.2" rel="stylesheet" type="text/css" />
<link href="/css/main.css?v=5.1.0" rel="stylesheet" type="text/css" />

  <meta name="keywords" content="Android,JAVA,Unity3D,C#,javaScript,开发者,程序猿,极客,编程,开源,IT网站,Developer,Programmer,Coder,Geek,html,用户体验" />
  <link rel="alternate" href="http://blog.csdn.net/qq_21265915/rss/list" title="WinterSmileSB101 的个人房间" type="application/atom+xml" />

  <link rel="shortcut icon" type="image/x-icon" href="/images/myHeadImg.jpeg?v=5.1.0" />

<meta name="description" content="成都工业学院14级,学习了各种后台技能,对前端也甚是抱有好感,准备再入坑前端。">
<meta property="og:type" content="website">
<meta property="og:title" content="WinterSmileSB101 的个人房间">
<meta property="og:url" content="http://WinterSmileSB101.online/index.html">
<meta property="og:site_name" content="WinterSmileSB101 的个人房间">
<meta property="og:description" content="成都工业学院14级,学习了各种后台技能,对前端也甚是抱有好感,准备再入坑前端。">
<meta name="twitter:card" content="summary">
<meta name="twitter:title" content="WinterSmileSB101 的个人房间">
<meta name="twitter:description" content="成都工业学院14级,学习了各种后台技能,对前端也甚是抱有好感,准备再入坑前端。">
<script type="text/javascript" id="hexo.configurations">
  var NexT = window.NexT || {};
  var CONFIG = {
    root: '/',
    scheme: 'Mist',
    sidebar: {
     "position":"right","display":"post","offset":12,"offset_float":0,"b2t":true,"scrollpercent":true},
    fancybox: true,
    motion: true,
    duoshuo: {
      userId: '6376853978663093000',
      author: 'WinterSmileSB101'
    },
    algolia: {
      applicationID: '',
      apiKey: '',
      indexName: '',
      hits: {
     "per_page":10},
      labels: {
     "input_placeholder":"Search for Posts","hits_empty":"We didn't find any results for the search: ${query}","hits_stats":"${hits} results found in ${time} ms"}
    }
  };
script>
  <link rel="canonical" href="http://WinterSmileSB101.online/"/>
  <title> WinterSmileSB101 的个人房间 title>
head>
<body itemscope itemtype="http://schema.org/WebPage" lang="zh-Hans">






  <div class="container sidebar-position-right
   page-home
 ">
    <div class="headband">div>
    <header id="header" class="header" itemscope itemtype="http://schema.org/WPHeader">
      <div class="header-inner"><div class="site-brand-wrapper">
  <div class="site-meta ">

    <div class="custom-logo-site-title">
      <a href="/"  class="brand" rel="start">
        <span class="logo-line-before"><i>i>span>
        <span class="site-title">WinterSmileSB101 的个人房间span>
        <span class="logo-line-after"><i>i>span>
      a>
    div>

        <h1 class="site-subtitle" itemprop="description">胆小认生,不易相处h1>

  div>
  <div class="site-nav-toggle">
    <button>
      <span class="btn-bar">span>
      <span class="btn-bar">span>
      <span class="btn-bar">span>
    button>
  div>
div>
<nav class="site-nav">


    <ul id="menu" class="menu">


        <li class="menu-item menu-item-home">
          <a href="/" rel="section">

              <i class="menu-item-icon fa fa-fw fa-home">i> <br />

            首页
          a>
        li>


        <li class="menu-item menu-item-categories">
          <a href="/categories" rel="section">

              <i class="menu-item-icon fa fa-fw fa-th">i> <br />

            分类
          a>
        li>


        <li class="menu-item menu-item-about">
          <a href="/about" rel="section">

              <i class="menu-item-icon fa fa-fw fa-user">i> <br />

            关于
          a>
        li>


        <li class="menu-item menu-item-archives">
          <a href="/archives" rel="section">

              <i class="menu-item-icon fa fa-fw fa-archive">i> <br />

            归档
          a>
        li>


        <li class="menu-item menu-item-tags">
          <a href="/tags" rel="section">

              <i class="menu-item-icon fa fa-fw fa-tags">i> <br />

            标签
          a>
        li>


        <li class="menu-item menu-item-commonweal">
          <a href="/404.html" rel="section">

              <i class="menu-item-icon fa fa-fw fa-heartbeat">i> <br />

            公益404
          a>
        li>


        <li class="menu-item menu-item-search">

            <a href="javascript:;" class="popup-trigger">


              <i class="menu-item-icon fa fa-search fa-fw">i> <br />

            搜索
          a>
        li>

    ul>


    <div class="site-search">

  <div class="popup search-popup local-search-popup">
  <div class="local-search-header clearfix">
    <span class="search-icon">
      <i class="fa fa-search">i>
    span>
    <span class="popup-btn-close">
      <i class="fa fa-times-circle">i>
    span>
    <div class="local-search-input-wrapper">
      <input autocapitalize="off" autocomplete="off" autocorrect="off"
             placeholder="搜索..." spellcheck="false"
             type="text" id="local-search-input">
    div>
  div>
  <div id="local-search-result">div>
div>
    div>

nav>
 div>
    header>
    <main id="main" class="main">
      <div class="main-inner">
        <div class="content-wrap">
          <div id="content" class="content">

  <section id="posts" class="posts-expand">






  <article class="post post-type-normal " itemscope itemtype="http://schema.org/Article">
    <link itemprop="mainEntityOfPage" href="http://WinterSmileSB101.online/2017/04/08/Python3.7 爬虫(二)使用 Urllib2 与 BeautifulSoup 抓取解析网页/">
    <span hidden itemprop="author" itemscope itemtype="http://schema.org/Person">
      <meta itemprop="name" content="WinterSmileSB101">
      <meta itemprop="description" content="">
      <meta itemprop="image" content="http://on792ofrp.bkt.clouddn.com/17-3-22/29073846-file_1490159480452_d2de.jpg">
    span>
    <span hidden itemprop="publisher" itemscope itemtype="http://schema.org/Organization">
      <meta itemprop="name" content="WinterSmileSB101 的个人房间">
    span>

      <header class="post-header">


          <h2 class="post-title" itemprop="name headline">




                <a class="post-title-link" href="/2017/04/08/Python3.7 爬虫(二)使用 Urllib2 与 BeautifulSoup 抓取解析网页/" itemprop="url">
                  Python3.7 爬虫(二)使用 Urllib2 与 BeautifulSoup4 抓取解析网页
                a>


          h2>

        <div class="post-meta">
          <span class="post-time">

              <span class="post-meta-item-icon">
                <i class="fa fa-calendar-o">i>
              span>

                <span class="post-meta-item-text">发表于span>

              <time title="创建于" itemprop="dateCreated datePublished" datetime="2017-04-08T16:55:47+08:00">
                2017-04-08
              time>


              <span class="post-meta-divider">|span>


              <span class="post-meta-item-icon">
                <i class="fa fa-calendar-check-o">i>
              span>

                <span class="post-meta-item-text">更新于span>

              <time title="更新于" itemprop="dateModified" datetime="2017-04-09T14:17:52+08:00">
                2017-04-09
              time>

          span>

            <span class="post-category" >

              <span class="post-meta-divider">|span>

              <span class="post-meta-item-icon">
                <i class="fa fa-folder-o">i>
              span>

                <span class="post-meta-item-text">分类于span>


                <span itemprop="about" itemscope itemtype="http://schema.org/Thing">
                  <a href="/categories/爬虫/" itemprop="url" rel="index">
                    <span itemprop="name">爬虫span>
                  a>
                span>


                  ,


                <span itemprop="about" itemscope itemtype="http://schema.org/Thing">
                  <a href="/categories/爬虫/Python-爬虫/" itemprop="url" rel="index">
                    <span itemprop="name">Python 爬虫span>
                  a>
                span>



            span>



              <span class="post-comments-count">
                <span class="post-meta-divider">|span>
                <span class="post-meta-item-icon">
                  <i class="fa fa-comment-o">i>
                span>
                <a href="/2017/04/08/Python3.7 爬虫(二)使用 Urllib2 与 BeautifulSoup 抓取解析网页/#comments" itemprop="discussionUrl">
                  <span class="post-comments-count ds-thread-count" data-thread-key="2017/04/08/Python3.7 爬虫(二)使用 Urllib2 与 BeautifulSoup 抓取解析网页/" itemprop="commentCount">span>
                a>
              span>




             <span id="/2017/04/08/Python3.7 爬虫(二)使用 Urllib2 与 BeautifulSoup 抓取解析网页/" class="leancloud_visitors" data-flag-title="Python3.7 爬虫(二)使用 Urllib2 与 BeautifulSoup4 抓取解析网页">
               <span class="post-meta-divider">|span>
               <span class="post-meta-item-icon">
                 <i class="fa fa-eye">i>
               span>

                 <span class="post-meta-item-text">阅读次数 span>

                 <span class="leancloud-visitors-count">span>
             span>




        div>
      header>


    <div class="post-body" itemprop="articleBody">






版权声明:本文为 wintersmilesb101 -(个人独立博客– http://wintersmilesb101.online 欢迎访问)博主原创文章,未经博主允许不得转载。
开篇上一篇中我们通过原生的 re 模块已经完成了网页的解析,对于熟悉正则表达式的童鞋来说很好上手,但是对于萌新来说
          ...
          
          <div class="post-button text-center">
            <a class="btn" href="/2017/04/08/Python3.7 爬虫(二)使用 Urllib2 与 BeautifulSoup 抓取解析网页/#more" rel="contents">
              阅读全文 »
            a>
          div>
          


    div>
    <div>

    div>
    <div>

    div>
    <div>

    div>
    <footer class="post-footer">





        <div class="post-eof">div>

    footer>
  article>







  <article class="post post-type-normal " itemscope itemtype="http://schema.org/Article">
    <link itemprop="mainEntityOfPage" href="http://WinterSmileSB101.online/2017/04/08/Python3.7 爬虫(一)使用 Urllib 与正则表达式抓取/">
    <span hidden itemprop="author" itemscope itemtype="http://schema.org/Person">
      <meta itemprop="name" content="WinterSmileSB101">
      <meta itemprop="description" content="">
      <meta itemprop="image" content="http://on792ofrp.bkt.clouddn.com/17-3-22/29073846-file_1490159480452_d2de.jpg">
    span>
    <span hidden itemprop="publisher" itemscope itemtype="http://schema.org/Organization">
      <meta itemprop="name" content="WinterSmileSB101 的个人房间">
    span>

      <header class="post-header">


          <h2 class="post-title" itemprop="name headline">




                <a class="post-title-link" href="/2017/04/08/Python3.7 爬虫(一)使用 Urllib 与正则表达式抓取/" itemprop="url">
                  Python3.7 爬虫(一)使用 Urllib2 与正则表达式抓取
                a>


          h2>

        <div class="post-meta">
          <span class="post-time">

              <span class="post-meta-item-icon">
                <i class="fa fa-calendar-o">i>
              span>

                <span class="post-meta-item-text">发表于span>

              <time title="创建于" itemprop="dateCreated datePublished" datetime="2017-04-08T16:55:47+08:00">
                2017-04-08
              time>


              <span class="post-meta-divider">|span>


              <span class="post-meta-item-icon">
                <i class="fa fa-calendar-check-o">i>
              span>

                <span class="post-meta-item-text">更新于span>

              <time title="更新于" itemprop="dateModified" datetime="2017-04-09T10:25:07+08:00">
                2017-04-09
              time>

          span>

            <span class="post-category" >

              <span class="post-meta-divider">|span>

              <span class="post-meta-item-icon">
                <i class="fa fa-folder-o">i>
              span>

                <span class="post-meta-item-text">分类于span>


                <span itemprop="about" itemscope itemtype="http://schema.org/Thing">
                  <a href="/categories/爬虫/" itemprop="url" rel="index">
                    <span itemprop="name">爬虫span>
                  a>
                span>


                  ,


                <span itemprop="about" itemscope itemtype="http://schema.org/Thing">
                  <a href="/categories/爬虫/Python-爬虫/" itemprop="url" rel="index">
                    <span itemprop="name">Python 爬虫span>
                  a>
                span>



            span>



              <span class="post-comments-count">
                <span class="post-meta-divider">|span>
                <span class="post-meta-item-icon">
                  <i class="fa fa-comment-o">i>
                span>
                <a href="/2017/04/08/Python3.7 爬虫(一)使用 Urllib 与正则表达式抓取/#comments" itemprop="discussionUrl">
                  <span class="post-comments-count ds-thread-count" data-thread-key="2017/04/08/Python3.7 爬虫(一)使用 Urllib 与正则表达式抓取/" itemprop="commentCount">span>
                a>
              span>




             <span id="/2017/04/08/Python3.7 爬虫(一)使用 Urllib 与正则表达式抓取/" class="leancloud_visitors" data-flag-title="Python3.7 爬虫(一)使用 Urllib2 与正则表达式抓取">
               <span class="post-meta-divider">|span>
               <span class="post-meta-item-icon">
                 <i class="fa fa-eye">i>
               span>

                 <span class="post-meta-item-text">阅读次数 span>

                 <span class="leancloud-visitors-count">span>
             span>




        div>
      header>


    <div class="post-body" itemprop="articleBody">






版权声明:本文为 wintersmilesb101 -(个人独立博客– http://wintersmilesb101.online 欢迎访问)博主原创文章,未经博主允许不得转载。
我们今天就一起来通过 Python3 自带库 Urllib 与正则表达式来抓取糗事百科。废话不多说,下面正题:分析
          ...
          
          <div class="post-button text-center">
            <a class="btn" href="/2017/04/08/Python3.7 爬虫(一)使用 Urllib 与正则表达式抓取/#more" rel="contents">
              阅读全文 »
            a>
          div>
          


    div>
    <div>

    div>
    <div>

    div>
    <div>

    div>
    <footer class="post-footer">





        <div class="post-eof">div>

    footer>
  article>







  <article class="post post-type-normal " itemscope itemtype="http://schema.org/Article">
    <link itemprop="mainEntityOfPage" href="http://WinterSmileSB101.online/2017/04/08/Python3.7 爬虫(三)使用 Urllib2 与 BeautifulSoup 爬取网易云音乐歌单/">
    <span hidden itemprop="author" itemscope itemtype="http://schema.org/Person">
      <meta itemprop="name" content="WinterSmileSB101">
      <meta itemprop="description" content="">
      <meta itemprop="image" content="http://on792ofrp.bkt.clouddn.com/17-3-22/29073846-file_1490159480452_d2de.jpg">
    span>
    <span hidden itemprop="publisher" itemscope itemtype="http://schema.org/Organization">
      <meta itemprop="name" content="WinterSmileSB101 的个人房间">
    span>

      <header class="post-header">


          <h2 class="post-title" itemprop="name headline">




                <a class="post-title-link" href="/2017/04/08/Python3.7 爬虫(三)使用 Urllib2 与 BeautifulSoup 爬取网易云音乐歌单/" itemprop="url">
                  Python3.7 爬虫(三)使用 Urllib2 与 BeautifulSoup4 爬取网易云音乐歌单
                a>


          h2>

        <div class="post-meta">
          <span class="post-time">

              <span class="post-meta-item-icon">
                <i class="fa fa-calendar-o">i>
              span>

                <span class="post-meta-item-text">发表于span>

              <time title="创建于" itemprop="dateCreated datePublished" datetime="2017-04-08T16:55:47+08:00">
                2017-04-08
              time>


              <span class="post-meta-divider">|span>


              <span class="post-meta-item-icon">
                <i class="fa fa-calendar-check-o">i>
              span>

                <span class="post-meta-item-text">更新于span>

              <time title="更新于" itemprop="dateModified" datetime="2017-04-09T20:03:39+08:00">
                2017-04-09
              time>

          span>

            <span class="post-category" >

              <span class="post-meta-divider">|span>

              <span class="post-meta-item-icon">
                <i class="fa fa-folder-o">i>
              span>

                <span class="post-meta-item-text">分类于span>


                <span itemprop="about" itemscope itemtype="http://schema.org/Thing">
                  <a href="/categories/爬虫/" itemprop="url" rel="index">
                    <span itemprop="name">爬虫span>
                  a>
                span>


                  ,


                <span itemprop="about" itemscope itemtype="http://schema.org/Thing">
                  <a href="/categories/爬虫/Python-爬虫/" itemprop="url" rel="index">
                    <span itemprop="name">Python 爬虫span>
                  a>
                span>



            span>



              <span class="post-comments-count">
                <span class="post-meta-divider">|span>
                <span class="post-meta-item-icon">
                  <i class="fa fa-comment-o">i>
                span>
                <a href="/2017/04/08/Python3.7 爬虫(三)使用 Urllib2 与 BeautifulSoup 爬取网易云音乐歌单/#comments" itemprop="discussionUrl">
                  <span class="post-comments-count ds-thread-count" data-thread-key="2017/04/08/Python3.7 爬虫(三)使用 Urllib2 与 BeautifulSoup 爬取网易云音乐歌单/" itemprop="commentCount">span>
                a>
              span>




             <span id="/2017/04/08/Python3.7 爬虫(三)使用 Urllib2 与 BeautifulSoup 爬取网易云音乐歌单/" class="leancloud_visitors" data-flag-title="Python3.7 爬虫(三)使用 Urllib2 与 BeautifulSoup4 爬取网易云音乐歌单">
               <span class="post-meta-divider">|span>
               <span class="post-meta-item-icon">
                 <i class="fa fa-eye">i>
               span>

                 <span class="post-meta-item-text">阅读次数 span>

                 <span class="leancloud-visitors-count">span>
             span>




        div>
      header>


    <div class="post-body" itemprop="articleBody">






版权声明:本文为 wintersmilesb101 -(个人独立博客– http://wintersmilesb101.online 欢迎访问)博主原创文章,未经博主允许不得转载。
废话在前面的的博客中我们已经能够使用 python3 配合自带的库或者第三方库抓取以及解析网页,我们今天来试试抓取
          ...
          
          <div class="post-button text-center">
            <a class="btn" href="/2017/04/08/Python3.7 爬虫(三)使用 Urllib2 与 BeautifulSoup 爬取网易云音乐歌单/#more" rel="contents">
              阅读全文 »
            a>
          div>
          


    div>
    <div>

    div>
    <div>

    div>
    <div>

    div>
    <footer class="post-footer">





        <div class="post-eof">div>

    footer>
  article>







  <article class="post post-type-normal " itemscope itemtype="http://schema.org/Article">
    <link itemprop="mainEntityOfPage" href="http://WinterSmileSB101.online/2017/03/29/css-els/">
    <span hidden itemprop="author" itemscope itemtype="http://schema.org/Person">
      <meta itemprop="name" content="WinterSmileSB101">
      <meta itemprop="description" content="">
      <meta itemprop="image" content="http://on792ofrp.bkt.clouddn.com/17-3-22/29073846-file_1490159480452_d2de.jpg">
    span>
    <span hidden itemprop="publisher" itemscope itemtype="http://schema.org/Organization">
      <meta itemprop="name" content="WinterSmileSB101 的个人房间">
    span>

      <header class="post-header">


          <h2 class="post-title" itemprop="name headline">




                <a class="post-title-link" href="/2017/03/29/css-els/" itemprop="url">
                  Css 文字省略样式(单行/多行)
                a>


          h2>

        <div class="post-meta">
          <span class="post-time">

              <span class="post-meta-item-icon">
                <i class="fa fa-calendar-o">i>
              span>

                <span class="post-meta-item-text">发表于span>

              <time title="创建于" itemprop="dateCreated datePublished" datetime="2017-03-29T08:47:44+08:00">
                2017-03-29
              time>


              <span class="post-meta-divider">|span>


              <span class="post-meta-item-icon">
                <i class="fa fa-calendar-check-o">i>
              span>

                <span class="post-meta-item-text">更新于span>

              <time title="更新于" itemprop="dateModified" datetime="2017-03-29T09:03:16+08:00">
                2017-03-29
              time>

          span>

            <span class="post-category" >

              <span class="post-meta-divider">|span>

              <span class="post-meta-item-icon">
                <i class="fa fa-folder-o">i>
              span>

                <span class="post-meta-item-text">分类于span>


                <span itemprop="about" itemscope itemtype="http://schema.org/Thing">
                  <a href="/categories/WEB/" itemprop="url" rel="index">
                    <span itemprop="name">WEBspan>
                  a>
                span>


                  ,


                <span itemprop="about" itemscope itemtype="http://schema.org/Thing">
                  <a href="/categories/WEB/前端开发/" itemprop="url" rel="index">
                    <span itemprop="name">前端开发span>
                  a>
                span>


                  ,


                <span itemprop="about" itemscope itemtype="http://schema.org/Thing">
                  <a href="/categories/WEB/前端开发/CSS/" itemprop="url" rel="index">
                    <span itemprop="name">CSSspan>
                  a>
                span>



            span>



              <span class="post-comments-count">
                <span class="post-meta-divider">|span>
                <span class="post-meta-item-icon">
                  <i class="fa fa-comment-o">i>
                span>
                <a href="/2017/03/29/css-els/#comments" itemprop="discussionUrl">
                  <span class="post-comments-count ds-thread-count" data-thread-key="2017/03/29/css-els/" itemprop="commentCount">span>
                a>
              span>




             <span id="/2017/03/29/css-els/" class="leancloud_visitors" data-flag-title="Css 文字省略样式(单行/多行)">
               <span class="post-meta-divider">|span>
               <span class="post-meta-item-icon">
                 <i class="fa fa-eye">i>
               span>

                 <span class="post-meta-item-text">阅读次数 span>

                 <span class="leancloud-visitors-count">span>
             span>




        div>
      header>


    <div class="post-body" itemprop="articleBody">






版权声明:本文为 wintersmilesb101 -(个人独立博客– http://wintersmilesb101.online 欢迎访问)博主转载文章,原文地址。
效果图
上面的效果实现代码如下:12345678910111213141516171819202122232425262728
          ...
          
          <div class="post-button text-center">
            <a class="btn" href="/2017/03/29/css-els/#more" rel="contents">
              阅读全文 »
            a>
          div>
          


    div>
    <div>

    div>
    <div>

    div>
    <div>

    div>
    <footer class="post-footer">





        <div class="post-eof">div>

    footer>
  article>







  <article class="post post-type-normal " itemscope itemtype="http://schema.org/Article">
    <link itemprop="mainEntityOfPage" href="http://WinterSmileSB101.online/2017/03/28/mui-tab-pages/">
    <span hidden itemprop="author" itemscope itemtype="http://schema.org/Person">
      <meta itemprop="name" content="WinterSmileSB101">
      <meta itemprop="description" content="">
      <meta itemprop="image" content="http://on792ofrp.bkt.clouddn.com/17-3-22/29073846-file_1490159480452_d2de.jpg">
    span>
    <span hidden itemprop="publisher" itemscope itemtype="http://schema.org/Organization">
      <meta itemprop="name" content="WinterSmileSB101 的个人房间">
    span>

      <header class="post-header">


          <h2 class="post-title" itemprop="name headline">




                <a class="post-title-link" href="/2017/03/28/mui-tab-pages/" itemprop="url">
                  MUI 使用爬坑之路之 tab 多页面操作
                a>


          h2>

        <div class="post-meta">
          <span class="post-time">

              <span class="post-meta-item-icon">
                <i class="fa fa-calendar-o">i>
              span>

                <span class="post-meta-item-text">发表于span>

              <time title="创建于" itemprop="dateCreated datePublished" datetime="2017-03-28T13:08:23+08:00">
                2017-03-28
              time>


              <span class="post-meta-divider">|span>


              <span class="post-meta-item-icon">
                <i class="fa fa-calendar-check-o">i>
              span>

                <span class="post-meta-item-text">更新于span>

              <time title="更新于" itemprop="dateModified" datetime="2017-03-29T09:04:49+08:00">
                2017-03-29
              time>

          span>

            <span class="post-category" >

              <span class="post-meta-divider">|span>

              <span class="post-meta-item-icon">
                <i class="fa fa-folder-o">i>
              span>

                <span class="post-meta-item-text">分类于span>


                <span itemprop="about" itemscope itemtype="http://schema.org/Thing">
                  <a href="/categories/WEB/" itemprop="url" rel="index">
                    <span itemprop="name">WEBspan>
                  a>
                span>


                  ,


                <span itemprop="about" itemscope itemtype="http://schema.org/Thing">
                  <a href="/categories/WEB/前端开发/" itemprop="url" rel="index">
                    <span itemprop="name">前端开发span>
                  a>
                span>


                  ,


                <span itemprop="about" itemscope itemtype="http://schema.org/Thing">
                  <a href="/categories/WEB/前端开发/Hbuilder/" itemprop="url" rel="index">
                    <span itemprop="name">Hbuilderspan>
                  a>
                span>


                  ,


                <span itemprop="about" itemscope itemtype="http://schema.org/Thing">
                  <a href="/categories/WEB/前端开发/Hbuilder/MUI/" itemprop="url" rel="index">
                    <span itemprop="name">MUIspan>
                  a>
                span>



            span>



              <span class="post-comments-count">
                <span class="post-meta-divider">|span>
                <span class="post-meta-item-icon">
                  <i class="fa fa-comment-o">i>
                span>
                <a href="/2017/03/28/mui-tab-pages/#comments" itemprop="discussionUrl">
                  <span class="post-comments-count ds-thread-count" data-thread-key="2017/03/28/mui-tab-pages/" itemprop="commentCount">span>
                a>
              span>




             <span id="/2017/03/28/mui-tab-pages/" class="leancloud_visitors" data-flag-title="MUI 使用爬坑之路之 tab 多页面操作">
               <span class="post-meta-divider">|span>
               <span class="post-meta-item-icon">
                 <i class="fa fa-eye">i>
               span>

                 <span class="post-meta-item-text">阅读次数 span>

                 <span class="leancloud-visitors-count">span>
             span>




        div>
      header>


    <div class="post-body" itemprop="articleBody">






版权声明:本文为 wintersmilesb101 -(个人独立博客– http://wintersmilesb101.online 欢迎访问)博主原创文章,未经博主允许不得转载。
最近想入坑前端开发,也是为了开发 App 更快更接地气。在各种前端框架的纠结中我还是决定先入坑 MUI ,开坑不易
          ...
          
          <div class="post-button text-center">
            <a class="btn" href="/2017/03/28/mui-tab-pages/#more" rel="contents">
              阅读全文 »
            a>
          div>
          


    div>
    <div>

    div>
    <div>

    div>
    <div>

    div>
    <footer class="post-footer">





        <div class="post-eof">div>

    footer>
  article>







  <article class="post post-type-normal " itemscope itemtype="http://schema.org/Article">
    <link itemprop="mainEntityOfPage" href="http://WinterSmileSB101.online/2017/03/27/IOnic-first/">
    <span hidden itemprop="author" itemscope itemtype="http://schema.org/Person">
      <meta itemprop="name" content="WinterSmileSB101">
      <meta itemprop="description" content="">
      <meta itemprop="image" content="http://on792ofrp.bkt.clouddn.com/17-3-22/29073846-file_1490159480452_d2de.jpg">
    span>
    <span hidden itemprop="publisher" itemscope itemtype="http://schema.org/Organization">
      <meta itemprop="name" content="WinterSmileSB101 的个人房间">
    span>

      <header class="post-header">


          <h2 class="post-title" itemprop="name headline">




                <a class="post-title-link" href="/2017/03/27/IOnic-first/" itemprop="url">
                  Ionic2 的使用之坑
                a>


          h2>

        <div class="post-meta">
          <span class="post-time">

              <span class="post-meta-item-icon">
                <i class="fa fa-calendar-o">i>
              span>

                <span class="post-meta-item-text">发表于span>

              <time title="创建于" itemprop="dateCreated datePublished" datetime="2017-03-27T19:21:20+08:00">
                2017-03-27
              time>


              <span class="post-meta-divider">|span>


              <span class="post-meta-item-icon">
                <i class="fa fa-calendar-check-o">i>
              span>

                <span class="post-meta-item-text">更新于span>

              <time title="更新于" itemprop="dateModified" datetime="2017-03-27T22:20:20+08:00">
                2017-03-27
              time>

          span>

            <span class="post-category" >

              <span class="post-meta-divider">|span>

              <span class="post-meta-item-icon">
                <i class="fa fa-folder-o">i>
              span>

                <span class="post-meta-item-text">分类于span>


                <span itemprop="about" itemscope itemtype="http://schema.org/Thing">
                  <a href="/categories/WEB/" itemprop="url" rel="index">
                    <span itemprop="name">WEBspan>
                  a>
                span>


                  ,


                <span itemprop="about" itemscope itemtype="http://schema.org/Thing">
                  <a href="/categories/WEB/前端开发/" itemprop="url" rel="index">
                    <span itemprop="name">前端开发span>
                  a>
                span>


                  ,


                <span itemprop="about" itemscope itemtype="http://schema.org/Thing">
                  <a href="/categories/WEB/前端开发/IOnic-AngularJS/" itemprop="url" rel="index">
                    <span itemprop="name">IOnic AngularJSspan>
                  a>
                span>



            span>



              <span class="post-comments-count">
                <span class="post-meta-divider">|span>
                <span class="post-meta-item-icon">
                  <i class="fa fa-comment-o">i>
                span>
                <a href="/2017/03/27/IOnic-first/#comments" itemprop="discussionUrl">
                  <span class="post-comments-count ds-thread-count" data-thread-key="2017/03/27/IOnic-first/" itemprop="commentCount">span>
                a>
              span>




             <span id="/2017/03/27/IOnic-first/" class="leancloud_visitors" data-flag-title="Ionic2 的使用之坑">
               <span class="post-meta-divider">|span>
               <span class="post-meta-item-icon">
                 <i class="fa fa-eye">i>
               span>

                 <span class="post-meta-item-text">阅读次数 span>

                 <span class="leancloud-visitors-count">span>
             span>




        div>
      header>


    <div class="post-body" itemprop="articleBody">






版权声明:本文为 wintersmilesb101 -(个人独立博客– http://wintersmilesb101.online 欢迎访问)博主原创文章,未经博主允许不得转载。
在这里引用学习 IOnic 的地方,菜鸟驿站,不仅仅有 IOnic 还有很多其他的比如 Node.js、vue、Re
          ...
          
          <div class="post-button text-center">
            <a class="btn" href="/2017/03/27/IOnic-first/#more" rel="contents">
              阅读全文 »
            a>
          div>
          


    div>
    <div>

    div>
    <div>

    div>
    <div>

    div>
    <footer class="post-footer">





        <div class="post-eof">div>

    footer>
  article>







  <article class="post post-type-normal " itemscope itemtype="http://schema.org/Article">
    <link itemprop="mainEntityOfPage" href="http://WinterSmileSB101.online/2017/03/24/use-phantomjs-dynamic/">
    <span hidden itemprop="author" itemscope itemtype="http://schema.org/Person">
      <meta itemprop="name" content="WinterSmileSB101">
      <meta itemprop="description" content="">
      <meta itemprop="image" content="http://on792ofrp.bkt.clouddn.com/17-3-22/29073846-file_1490159480452_d2de.jpg">
    span>
    <span hidden itemprop="publisher" itemscope itemtype="http://schema.org/Organization">
      <meta itemprop="name" content="WinterSmileSB101 的个人房间">
    span>

      <header class="post-header">


          <h2 class="post-title" itemprop="name headline">




                <a class="post-title-link" href="/2017/03/24/use-phantomjs-dynamic/" itemprop="url">
                  一起学爬虫 Node.js 爬虫篇(三)使用 PhantomJS 爬取动态页面
                a>


          h2>

        <div class="post-meta">
          <span class="post-time">

              <span class="post-meta-item-icon">
                <i class="fa fa-calendar-o">i>
              span>

                <span class="post-meta-item-text">发表于span>

              <time title="创建于" itemprop="dateCreated datePublished" datetime="2017-03-24T09:29:38+08:00">
                2017-03-24
              time>


              <span class="post-meta-divider">|span>


              <span class="post-meta-item-icon">
                <i class="fa fa-calendar-check-o">i>
              span>

                <span class="post-meta-item-text">更新于span>

              <time title="更新于" itemprop="dateModified" datetime="2017-03-24T12:57:00+08:00">
                2017-03-24
              time>

          span>

            <span class="post-category" >

              <span class="post-meta-divider">|span>

              <span class="post-meta-item-icon">
                <i class="fa fa-folder-o">i>
              span>

                <span class="post-meta-item-text">分类于span>


                <span itemprop="about" itemscope itemtype="http://schema.org/Thing">
                  <a href="/categories/爬虫/" itemprop="url" rel="index">
                    <span itemprop="name">爬虫span>
                  a>
                span>


                  ,


                <span itemprop="about" itemscope itemtype="http://schema.org/Thing">
                  <a href="/categories/爬虫/Node-js-爬虫/" itemprop="url" rel="index">
                    <span itemprop="name">Node.js 爬虫span>
                  a>
                span>



            span>



              <span class="post-comments-count">
                <span class="post-meta-divider">|span>
                <span class="post-meta-item-icon">
                  <i class="fa fa-comment-o">i>
                span>
                <a href="/2017/03/24/use-phantomjs-dynamic/#comments" itemprop="discussionUrl">
                  <span class="post-comments-count ds-thread-count" data-thread-key="2017/03/24/use-phantomjs-dynamic/" itemprop="commentCount">span>
                a>
              span>




             <span id="/2017/03/24/use-phantomjs-dynamic/" class="leancloud_visitors" data-flag-title="一起学爬虫 Node.js 爬虫篇(三)使用 PhantomJS 爬取动态页面">
               <span class="post-meta-divider">|span>
               <span class="post-meta-item-icon">
                 <i class="fa fa-eye">i>
               span>

                 <span class="post-meta-item-text">阅读次数 span>

                 <span class="leancloud-visitors-count">span>
             span>




        div>
      header>


    <div class="post-body" itemprop="articleBody">






版权声明:本文为 wintersmilesb101 -(个人独立博客– http://wintersmilesb101.online 欢迎访问)博主原创文章,未经博主允许不得转载。
今天我们来学习如何使用 PhantomJS 来抓取动态网页,至于 PhantomJS 是啥啊什么的,看这里 我们这
          ...
          
          <div class="post-button text-center">
            <a class="btn" href="/2017/03/24/use-phantomjs-dynamic/#more" rel="contents">
              阅读全文 »
            a>
          div>
          


    div>
    <div>

    div>
    <div>

    div>
    <div>

    div>
    <footer class="post-footer">





        <div class="post-eof">div>

    footer>
  article>







  <article class="post post-type-normal " itemscope itemtype="http://schema.org/Article">
    <link itemprop="mainEntityOfPage" href="http://WinterSmileSB101.online/2017/03/24/get-phantomJS-start/">
    <span hidden itemprop="author" itemscope itemtype="http://schema.org/Person">
      <meta itemprop="name" content="WinterSmileSB101">
      <meta itemprop="description" content="">
      <meta itemprop="image" content="http://on792ofrp.bkt.clouddn.com/17-3-22/29073846-file_1490159480452_d2de.jpg">
    span>
    <span hidden itemprop="publisher" itemscope itemtype="http://schema.org/Organization">
      <meta itemprop="name" content="WinterSmileSB101 的个人房间">
    span>

      <header class="post-header">


          <h2 class="post-title" itemprop="name headline">




                <a class="post-title-link" href="/2017/03/24/get-phantomJS-start/" itemprop="url">
                  Node.js 动态网页爬取 PhantomJS 使用入门
                a>


          h2>

        <div class="post-meta">
          <span class="post-time">

              <span class="post-meta-item-icon">
                <i class="fa fa-calendar-o">i>
              span>

                <span class="post-meta-item-text">发表于span>

              <time title="创建于" itemprop="dateCreated datePublished" datetime="2017-03-24T08:43:25+08:00">
                2017-03-24
              time>


              <span class="post-meta-divider">|span>


              <span class="post-meta-item-icon">
                <i class="fa fa-calendar-check-o">i>
              span>

                <span class="post-meta-item-text">更新于span>

              <time title="更新于" itemprop="dateModified" datetime="2017-03-24T10:22:14+08:00">
                2017-03-24
              time>

          span>

            <span class="post-category" >

              <span class="post-meta-divider">|span>

              <span class="post-meta-item-icon">
                <i class="fa fa-folder-o">i>
              span>

                <span class="post-meta-item-text">分类于span>


                <span itemprop="about" itemscope itemtype="http://schema.org/Thing">
                  <a href="/categories/爬虫/" itemprop="url" rel="index">
                    <span itemprop="name">爬虫span>
                  a>
                span>


                  ,


                <span itemprop="about" itemscope itemtype="http://schema.org/Thing">
                  <a href="/categories/爬虫/Node-js-爬虫/" itemprop="url" rel="index">
                    <span itemprop="name">Node.js 爬虫span>
                  a>
                span>



            span>



              <span class="post-comments-count">
                <span class="post-meta-divider">|span>
                <span class="post-meta-item-icon">
                  <i class="fa fa-comment-o">i>
                span>
                <a href="/2017/03/24/get-phantomJS-start/#comments" itemprop="discussionUrl">
                  <span class="post-comments-count ds-thread-count" data-thread-key="2017/03/24/get-phantomJS-start/" itemprop="commentCount">span>
                a>
              span>




             <span id="/2017/03/24/get-phantomJS-start/" class="leancloud_visitors" data-flag-title="Node.js 动态网页爬取 PhantomJS 使用入门">
               <span class="post-meta-divider">|span>
               <span class="post-meta-item-icon">
                 <i class="fa fa-eye">i>
               span>

                 <span class="post-meta-item-text">阅读次数 span>

                 <span class="leancloud-visitors-count">span>
             span>




        div>
      header>


    <div class="post-body" itemprop="articleBody">






版权声明:本文为 wintersmilesb101 -(个人独立博客– http://wintersmilesb101.online 欢迎访问)博主原创文章,未经博主允许不得转载。
既然是入门,那我们就从人类的起源。。PhantomJS 来说起吧。1、PhantomJS是什么?PhantomJS
          ...
          
          <div class="post-button text-center">
            <a class="btn" href="/2017/03/24/get-phantomJS-start/#more" rel="contents">
              阅读全文 »
            a>
          div>
          


    div>
    <div>

    div>
    <div>

    div>
    <div>

    div>
    <footer class="post-footer">





        <div class="post-eof">div>

    footer>
  article>







  <article class="post post-type-normal " itemscope itemtype="http://schema.org/Article">
    <link itemprop="mainEntityOfPage" href="http://WinterSmileSB101.online/2017/03/23/node-spider-scend/">
    <span hidden itemprop="author" itemscope itemtype="http://schema.org/Person">
      <meta itemprop="name" content="WinterSmileSB101">
      <meta itemprop="description" content="">
      <meta itemprop="image" content="http://on792ofrp.bkt.clouddn.com/17-3-22/29073846-file_1490159480452_d2de.jpg">
    span>
    <span hidden itemprop="publisher" itemscope itemtype="http://schema.org/Organization">
      <meta itemprop="name" content="WinterSmileSB101 的个人房间">
    span>

      <header class="post-header">


          <h2 class="post-title" itemprop="name headline">




                <a class="post-title-link" href="/2017/03/23/node-spider-scend/" itemprop="url">
                  一起学爬虫 Node.js 爬虫篇(二)
                a>


          h2>

        <div class="post-meta">
          <span class="post-time">

              <span class="post-meta-item-icon">
                <i class="fa fa-calendar-o">i>
              span>

                <span class="post-meta-item-text">发表于span>

              <time title="创建于" itemprop="dateCreated datePublished" datetime="2017-03-23T17:17:58+08:00">
                2017-03-23
              time>


              <span class="post-meta-divider">|span>


              <span class="post-meta-item-icon">
                <i class="fa fa-calendar-check-o">i>
              span>

                <span class="post-meta-item-text">更新于span>

              <time title="更新于" itemprop="dateModified" datetime="2017-03-24T10:22:12+08:00">
                2017-03-24
              time>

          span>

            <span class="post-category" >

              <span class="post-meta-divider">|span>

              <span class="post-meta-item-icon">
                <i class="fa fa-folder-o">i>
              span>

                <span class="post-meta-item-text">分类于span>


                <span itemprop="about" itemscope itemtype="http://schema.org/Thing">
                  <a href="/categories/爬虫/" itemprop="url" rel="index">
                    <span itemprop="name">爬虫span>
                  a>
                span>


                  ,


                <span itemprop="about" itemscope itemtype="http://schema.org/Thing">
                  <a href="/categories/爬虫/Node-js-爬虫/" itemprop="url" rel="index">
                    <span itemprop="name">Node.js 爬虫span>
                  a>
                span>



            span>



              <span class="post-comments-count">
                <span class="post-meta-divider">|span>
                <span class="post-meta-item-icon">
                  <i class="fa fa-comment-o">i>
                span>
                <a href="/2017/03/23/node-spider-scend/#comments" itemprop="discussionUrl">
                  <span class="post-comments-count ds-thread-count" data-thread-key="2017/03/23/node-spider-scend/" itemprop="commentCount">span>
                a>
              span>




             <span id="/2017/03/23/node-spider-scend/" class="leancloud_visitors" data-flag-title="一起学爬虫 Node.js 爬虫篇(二)">
               <span class="post-meta-divider">|span>
               <span class="post-meta-item-icon">
                 <i class="fa fa-eye">i>
               span>

                 <span class="post-meta-item-text">阅读次数 span>

                 <span class="leancloud-visitors-count">span>
             span>




        div>
      header>


    <div class="post-body" itemprop="articleBody">






版权声明:本文为 wintersmilesb101 -(个人独立博客– http://wintersmilesb101.online 欢迎访问)博主原创文章,未经博主允许不得转载。
上一篇中我们对百度首页进行了标题的爬取,本来打算这次直接对上次没有爬取到的推荐新闻进行爬取,谁知道网页加载出来没网
          ...
          
          <div class="post-button text-center">
            <a class="btn" href="/2017/03/23/node-spider-scend/#more" rel="contents">
              阅读全文 »
            a>
          div>
          


    div>
    <div>

    div>
    <div>

    div>
    <div>

    div>
    <footer class="post-footer">





        <div class="post-eof">div>

    footer>
  article>







  <article class="post post-type-normal " itemscope itemtype="http://schema.org/Article">
    <link itemprop="mainEntityOfPage" href="http://WinterSmileSB101.online/2017/03/23/node-spider-first/">
    <span hidden itemprop="author" itemscope itemtype="http://schema.org/Person">
      <meta itemprop="name" content="WinterSmileSB101">
      <meta itemprop="description" content="">
      <meta itemprop="image" content="http://on792ofrp.bkt.clouddn.com/17-3-22/29073846-file_1490159480452_d2de.jpg">
    span>
    <span hidden itemprop="publisher" itemscope itemtype="http://schema.org/Organization">
      <meta itemprop="name" content="WinterSmileSB101 的个人房间">
    span>

      <header class="post-header">


          <h2 class="post-title" itemprop="name headline">




                <a class="post-title-link" href="/2017/03/23/node-spider-first/" itemprop="url">
                  一起学爬虫 Node.js 爬虫篇(一)
                a>


          h2>

        <div class="post-meta">
          <span class="post-time">

              <span class="post-meta-item-icon">
                <i class="fa fa-calendar-o">i>
              span>

                <span class="post-meta-item-text">发表于span>

              <time title="创建于" itemprop="dateCreated datePublished" datetime="2017-03-23T14:16:38+08:00">
                2017-03-23
              time>


              <span class="post-meta-divider">|span>


              <span class="post-meta-item-icon">
                <i class="fa fa-calendar-check-o">i>
              span>

                <span class="post-meta-item-text">更新于span>

              <time title="更新于" itemprop="dateModified" datetime="2017-03-24T10:22:55+08:00">
                2017-03-24
              time>

          span>

            <span class="post-category" >

              <span class="post-meta-divider">|span>

              <span class="post-meta-item-icon">
                <i class="fa fa-folder-o">i>
              span>

                <span class="post-meta-item-text">分类于span>


                <span itemprop="about" itemscope itemtype="http://schema.org/Thing">
                  <a href="/categories/爬虫/" itemprop="url" rel="index">
                    <span itemprop="name">爬虫span>
                  a>
                span>


                  ,


                <span itemprop="about" itemscope itemtype="http://schema.org/Thing">
                  <a href="/categories/爬虫/Node-js-爬虫/" itemprop="url" rel="index">
                    <span itemprop="name">Node.js 爬虫span>
                  a>
                span>



            span>



              <span class="post-comments-count">
                <span class="post-meta-divider">|span>
                <span class="post-meta-item-icon">
                  <i class="fa fa-comment-o">i>
                span>
                <a href="/2017/03/23/node-spider-first/#comments" itemprop="discussionUrl">
                  <span class="post-comments-count ds-thread-count" data-thread-key="2017/03/23/node-spider-first/" itemprop="commentCount">span>
                a>
              span>




             <span id="/2017/03/23/node-spider-first/" class="leancloud_visitors" data-flag-title="一起学爬虫 Node.js 爬虫篇(一)">
               <span class="post-meta-divider">|span>
               <span class="post-meta-item-icon">
                 <i class="fa fa-eye">i>
               span>

                 <span class="post-meta-item-text">阅读次数 span>

                 <span class="leancloud-visitors-count">span>
             span>




        div>
      header>


    <div class="post-body" itemprop="articleBody">






版权声明:本文为 wintersmilesb101 -(个人独立博客– http://wintersmilesb101.online 欢迎访问)博主原创文章,未经博主允许不得转载。
一看到爬虫或者一百度爬虫,那是铺天盖地的全是 Python 爬虫啊,不得不说爬虫的框架与资料,Python 基本是最
          ...
          
          <div class="post-button text-center">
            <a class="btn" href="/2017/03/23/node-spider-first/#more" rel="contents">
              阅读全文 »
            a>
          div>
          


    div>
    <div>

    div>
    <div>

    div>
    <div>

    div>
    <footer class="post-footer">





        <div class="post-eof">div>

    footer>
  article>


  section>

  <nav class="pagination">
    <span class="page-number current">1span><a class="page-number" href="/page/2/">2a><span class="space">span><a class="page-number" href="/page/6/">6a><a class="extend next" rel="next" href="/page/2/"><i class="fa fa-angle-right">i>a>
  nav>
          div>



        div>



  <div class="sidebar-toggle">
    <div class="sidebar-toggle-line-wrap">
      <span class="sidebar-toggle-line sidebar-toggle-line-first">span>
      <span class="sidebar-toggle-line sidebar-toggle-line-middle">span>
      <span class="sidebar-toggle-line sidebar-toggle-line-last">span>
    div>
  div>
  <aside id="sidebar" class="sidebar">
    <div class="sidebar-inner">


      <section class="site-overview sidebar-panel sidebar-panel-active">
        <div class="site-author motion-element" itemprop="author" itemscope itemtype="http://schema.org/Person">
          <img class="site-author-image" itemprop="image"
               src="http://on792ofrp.bkt.clouddn.com/17-3-22/29073846-file_1490159480452_d2de.jpg"
               alt="WinterSmileSB101" />
          <p class="site-author-name" itemprop="name">WinterSmileSB101p>

              <p class="site-description motion-element" itemprop="description">p>

        div>
        <nav class="site-state motion-element">

            <div class="site-state-item site-state-posts">
              <a href="/archives">
                <span class="site-state-item-count">52span>
                <span class="site-state-item-name">日志span>
              a>
            div>




            <div class="site-state-item site-state-categories">
              <a href="/categories/index.html">
                <span class="site-state-item-count">26span>
                <span class="site-state-item-name">分类span>
              a>
            div>




            <div class="site-state-item site-state-tags">
              <a href="/tags/index.html">
                <span class="site-state-item-count">113span>
                <span class="site-state-item-name">标签span>
              a>
            div>

        nav>

          <div class="feed-link motion-element">
            <a href="http://blog.csdn.net/qq_21265915/rss/list" rel="alternate">
              <i class="fa fa-rss">i>
              RSS
            a>
          div>

        
        <div class="links-of-author motion-element">
         <span class="links-of-author-item">
         <a href="https://github.com/WinterSmileSB101" title="Github">
         <i class="fa fa-fw fa-github fa-lg">i>
         a>
         span>
         <span class="links-of-author-item">
                  <a href="http://weibo.com/5602632941/profile?rightmod=1&wvr=6&mod=personinfo&is_all=1" title="微博">
                  <i class="fa fa-fw fa-weibo fa-lg">i>
                  a>
                  span>
         <span class="links-of-author-item">
         <a href="http://www.jianshu.com/users/73344bc7e890/timeline" title="简书">
         <i class="fa fa-fw fa-bookmark fa-lg">i>
         a>
         span>
<br />
        <span class="links-of-author-item">
                 <a href="https://www.douban.com/people/159359470/" title="豆瓣">
                 <i class="fa fa-fw fa-newspaper-o fa-lg">i>
                 a>
                 span>
        <span class="links-of-author-item">
                 <a href="http://blog.csdn.net/qq_21265915" title="CSDN博客">
                 <i class="fa fa-fw fa-bug fa-lg">i>
                 a>
                 span>
        div>
        






      section>


        <div class="back-to-top">
          <i class="fa fa-arrow-up">i>

            <span id="scrollpercent"><span>0span>%span>

        div>

    div>
  aside>


      div>
    main>
    <footer id="footer" class="footer">
      <div class="footer-inner">
        <div class="copyright" >

  ©  2017.3.20 -
  <span itemprop="copyrightYear">2017span>
  <span class="with-love">
    <i class="fa fa-heart">i>
  span>
  <span class="author" itemprop="copyrightHolder">Powered By - WinterSmileSB101span>
div>

<div class="powered-by">
    个人专属
div>
<div class="theme-info">
  博客 -
  WinterSmileSB101
div>



      div>
    footer>

  div>

<script type="text/javascript">
  if (Object.prototype.toString.call(window.Promise) !== '[object Function]') {
    window.Promise = null;
  }
script>




  <script type="text/javascript" src="/lib/jquery/index.js?v=2.1.3">script>

  <script type="text/javascript" src="/lib/fastclick/lib/fastclick.min.js?v=1.0.6">script>

  <script type="text/javascript" src="/lib/jquery_lazyload/jquery.lazyload.js?v=1.9.7">script>

  <script type="text/javascript" src="/lib/velocity/velocity.min.js?v=1.2.1">script>

  <script type="text/javascript" src="/lib/velocity/velocity.ui.min.js?v=1.2.1">script>

  <script type="text/javascript" src="/lib/fancybox/source/jquery.fancybox.pack.js?v=2.1.5">script>

  <script type="text/javascript" src="/lib/canvas-nest/canvas-nest.min.js">script>



  <script type="text/javascript" src="/js/src/utils.js?v=5.1.0">script>
  <script type="text/javascript" src="/js/src/motion.js?v=5.1.0">script>





  <script type="text/javascript" src="/js/src/bootstrap.js?v=5.1.0">script>




  <script type="text/javascript">
    var duoshuoQuery = {short_name:"wintersmilesb101"};
    (function() {
      
      var ds = document.createElement('script');
      ds.type = 'text/javascript';ds.async = true;
      ds.id = 'duoshuo-script';
      ds.src = (document.location.protocol == 'https:' ? 'https:' : 'http:') + '//static.duoshuo.com/embed.js';
      ds.charset = 'UTF-8';
      (document.getElementsByTagName('head')[0]
      || document.getElementsByTagName('body')[0]).appendChild(ds);
    })();
  script>



      <script src="/lib/ua-parser-js/dist/ua-parser.min.js?v=0.7.9">script>
      <script src="/js/src/hook-duoshuo.js?v=5.1.0">script>


    <script src="/lib/ua-parser-js/dist/ua-parser.min.js?v=0.7.9">script>
    <script src="/js/src/hook-duoshuo.js">script>



  <script type="text/javascript">
    // Popup Window;
    var isfetched = false;
    // Search DB path;
    var search_path = "search.xml";
    if (search_path.length == 0) {
      search_path = "search.xml";
    }
    var path = "/" + search_path;
    // monitor main search box;
    function proceedsearch() {
      
      $("body")
        .append('
'
) .css('overflow', 'hidden'); $('.popup').toggle(); } // search function; var searchFunc = function(path, search_id, content_id) { 'use strict'; $.ajax({ url: path, dataType: "xml", async: true, success: function( xmlResponse ) { // get the contents from search data isfetched = true; $('.popup').detach().appendTo('.header-inner'); var datas = $( "entry", xmlResponse ).map(function() { return { title: $( "title", this ).text(), content: $("content",this).text(), url: $( "url" , this).text() }; }).get(); var $input = document.getElementById(search_id); var $resultContent = document.getElementById(content_id); $input.addEventListener('input', function(){ var matchcounts = 0; var str='
    '; var keywords = this.value.trim().toLowerCase().split(/[\s\-]+/); $resultContent.innerHTML = ""; if (this.value.trim().length > 1) { // perform local searching datas.forEach(function(data) { var isMatch = false; var content_index = []; var data_title = data.title.trim().toLowerCase(); var data_content = data.content.trim().replace(/<[^>]+>/g,"").toLowerCase(); var data_url = decodeURIComponent(data.url); var index_title = -1; var index_content = -1; var first_occur = -1; // only match artiles with not empty titles and contents if(data_title != '') { keywords.forEach(function(keyword, i) { index_title = data_title.indexOf(keyword); index_content = data_content.indexOf(keyword); if( index_title >= 0 || index_content >= 0 ){ isMatch = true; if (i == 0) { first_occur = index_content; } } }); } // show search results if (isMatch) { matchcounts += 1; str += "
  • "+ data_title +""; var content = data.content.trim().replace(/<[^>]+>/g,""); if (first_occur >= 0) { // cut out 100 characters var start = first_occur - 20; var end = first_occur + 80; if(start < 0){ start = 0; } if(start == 0){ end = 50; } if(end > content.length){ end = content.length; } var match_content = content.substring(start, end); // highlight all keywords keywords.forEach(function(keyword){ var regS = new RegExp(keyword, "gi"); match_content = match_content.replace(regS, ""+keyword+""); }); str += "

    " + match_content +"...

    "
    } str += "
  • "
    ; } })}; str += "
"
; if (matchcounts == 0) { str = '
'
} if (keywords == "") { str = '
'
} $resultContent.innerHTML = str; }); proceedsearch(); } });} // handle and trigger popup window; $('.popup-trigger').click(function(e) { e.stopPropagation(); if (isfetched == false) { searchFunc(path, 'local-search-input', 'local-search-result'); } else { proceedsearch(); }; }); $('.popup-btn-close').click(function(e){ $('.popup').hide(); $(".local-search-pop-overlay").remove(); $('body').css('overflow', ''); }); $('.popup').click(function(e){ e.stopPropagation(); });
script> <script src="https://cdn1.lncld.net/static/js/av-core-mini-0.6.1.js">script> <script>AV.initialize("cOFi0858xVYxKW1wnErxqEra-gzGzoHsz", "7LaqqR82XnjzTbkv5eCKw5aW");script> <script> function showTime(Counter) { var query = new AV.Query(Counter); var entries = []; var $visitors = $(".leancloud_visitors"); $visitors.each(function () { entries.push( $(this).attr("id").trim() ); }); query.containedIn('url', entries); query.find() .done(function (results) { var COUNT_CONTAINER_REF = '.leancloud-visitors-count'; if (results.length === 0) { $visitors.find(COUNT_CONTAINER_REF).text(0); return; } for (var i = 0; i < results.length; i++) { var item = results[i]; var url = item.get('url'); var time = item.get('time'); var element = document.getElementById(url); $(element).find(COUNT_CONTAINER_REF).text(time); } for(var i = 0; i < entries.length; i++) { var url = entries[i]; var element = document.getElementById(url); var countSpan = $(element).find(COUNT_CONTAINER_REF); if( countSpan.text() == '') { countSpan.text(0); } } }) .fail(function (object, error) { console.log("Error: " + error.code + " " + error.message); }); } function addCount(Counter) { var $visitors = $(".leancloud_visitors"); var url = $visitors.attr('id').trim(); var title = $visitors.attr('data-flag-title').trim(); var query = new AV.Query(Counter); query.equalTo("url", url); query.find({ success: function(results) { if (results.length > 0) { var counter = results[0]; counter.fetchWhenSave(true); counter.increment("time"); counter.save(null, { success: function(counter) { var $element = $(document.getElementById(url)); $element.find('.leancloud-visitors-count').text(counter.get('time')); }, error: function(counter, error) { console.log('Failed to save Visitor num, with error message: ' + error.message); } }); } else { var newcounter = new Counter(); /* Set ACL */ var acl = new AV.ACL(); acl.setPublicReadAccess(true); acl.setPublicWriteAccess(true); newcounter.setACL(acl); /* End Set ACL */ newcounter.set("title", title); newcounter.set("url", url); newcounter.set("time", 1); newcounter.save(null, { success: function(newcounter) { var $element = $(document.getElementById(url)); $element.find('.leancloud-visitors-count').text(newcounter.get('time')); }, error: function(newcounter, error) { console.log('Failed to create'); } }); } }, error: function(error) { console.log('Error:' + error.code + " " + error.message); } }); } $(function() { var Counter = AV.Object.extend("Counter"); if ($('.leancloud_visitors').length == 1) { addCount(Counter); } else if ($('.post-title-link').length > 1) { showTime(Counter); } }); script> body> html> Process finished with exit code 0

ok 现在能够正确的访问到网址并且拿到源码了,想怎么嘿嘿怎么嘿嘿。

如有问题,希望不吝赐教

你可能感兴趣的:(爬虫,python,爬虫,代理ip)