【Scrapy 五分钟撸网站】挑战全网爬虫的爬虫,全部文章目录索引
全部内容采用Scrapy框架,文章有标注网站全套的数据抓取教程以及经验指导,只要我有力气每周都会更新的爬虫,粉丝可以留言定制各种网站的爬虫脚本。
中国经济网 是国家重点新闻网站中唯一以经济报道为中心的综合新闻网站,每日采写大量经济新闻,同时整合国内主要媒体经济新闻及信息,为政府部门、企业决策提供权威的参考…
1. 不了解5分钟快速抓网站思路的小伙伴先看
【Scrapy 五分钟撸网站】全站数据必备基础知识
2. 不了解数据抓取业务管理整理小伙伴先看
【Scrapy 五分钟撸网站】爬虫目标整理和数据准备
3. 不了解Scrapy模板量产的小伙伴先看(必看)
【Scrapy 五分钟撸网站】数据抓取项目框架通用模板
1. 创建spider项目
scrapy genspider www_ce_cn " "
2. 整理全站css样式
先来看下页面的CSS样式,全站统一基础样式六种,其余特殊样式比较多统一交给gerapy_auto_extractor.extractors 的 extract_list处理。
3. 修改 www_ce_cn.py 的的内容
这里将需要修改的地方进行说明,其他地方参考模板,不需修改。
allowed_domains = []
web_name = "中国经济网"
start_menu = [
# 金融证券
[
{
"channel_name": "金融证券-互联网金融观察", "url": "http://finance.ce.cn/hlwjr/", },
{
"channel_name": "金融证券-专题精选", "url": "http://finance.ce.cn/home/zt/", },
{
"channel_name": "金融证券-财经滚动新闻", "url": "http://finance.ce.cn/rolling/", },
{
"channel_name": "金融证券-公司聚焦", "url": "http://finance.ce.cn/home/jrzq/dc/", },
{
"channel_name": "金融证券-私募观点", "url": "http://finance.ce.cn/jjpd/jjdsp/jjsyzn/", },
{
"channel_name": "金融证券-板块研究", "url": "http://finance.ce.cn/10cjsy/bk/", },
],
# 期货频道
[
{
"channel_name": "金融证券-期货频道-交易所&协会通知", "url": "http://finance.ce.cn/futures/jysjxhtz/", },
{
"channel_name": "金融证券-期货频道-公司专栏", "url": "http://finance.ce.cn/futures/qhgszl/", },
{
"channel_name": "金融证券-期货频道-会议论坛", "url": "http://finance.ce.cn/futures/qhhyjlt/", },
{
"channel_name": "金融证券-期货频道-投资顾问", "url": "http://finance.ce.cn/futures/qhtzgw/", },
{
"channel_name": "金融证券-期货频道-评论&研报", "url": "http://finance.ce.cn/futures/qhscpl/", },
{
"channel_name": "金融证券-期货频道-资讯&公告", "url": "http://finance.ce.cn/futures/qtzx/", },
{
"channel_name": "金融证券-期货频道-现货资讯", "url": "http://finance.ce.cn/futures/xhzx/", },
{
"channel_name": "金融证券-期货频道-期货滚动报道", "url": "http://finance.ce.cn/futures/qhgdbd/", },
{
"channel_name": "金融证券-期货频道-期市要闻区", "url": "http://finance.ce.cn/futures/qhywq/", },
{
"channel_name": "金融证券-期货频道-期货交易所", "url": "http://finance.ce.cn/futures/gjqhjyslj/", },
{
"channel_name": "金融证券-期货频道-期货", "url": "http://finance.ce.cn/futures/zjqhxy/", },
],
# 新三板
[
{
"channel_name": "金融证券-新三板-新三板滚动新闻", "url": "http://finance.ce.cn/xsb/xsbgdxw/", },
],
##理财
[
{
"channel_name": "金融证券-理财-热点聚焦", "url": "http://finance.ce.cn/money/2016lc/rdjj/", },
],
# 基金频道
[
{
"channel_name": "金融证券-基金频道-基金滚动新闻", "url": "http://finance.ce.cn/jjpd/jjpdgd/", },
{
"channel_name": "金融证券-基金频道-基金人物秀", "url": "http://finance.ce.cn/jjpd/jjpddyp/rwx/", },
{
"channel_name": "金融证券-基金频道-中经视点", "url": "http://finance.ce.cn/jjpd/jjpddyp/zjsd/", },
{
"channel_name": "金融证券-基金频道-基金研报", "url": "http://finance.ce.cn/jjpd/jjdsp/yb/", },
{
"channel_name": "金融证券-基金频道-基金看市", "url": "http://finance.ce.cn/jjpd/jjpddyp/jjks/", },
{
"channel_name": "金融证券-基金频道-基金公告", "url": "http://finance.ce.cn/jjpd/jjdsp/jjgg/", },
{
"channel_name": "金融证券-基金频道-机构专栏", "url": "http://finance.ce.cn/jjpd/jjpddyp/jjpdzl/jjpdjg/", },
{
"channel_name": "金融证券-基金频道-私募动态", "url": "http://finance.ce.cn/jjpd/jjdep/dt/", },
{
"channel_name": "金融证券-基金频道-要闻", "url": "http://finance.ce.cn/jjpd/jjpddyp/jjpdyw/", },
{
"channel_name": "金融证券-基金频道-基金经理风采", "url": "http://finance.ce.cn/jjpd/jjdsp/jjfc/", },
{
"channel_name": "金融证券-基金频道-基金申赎异动", "url": "http://finance.ce.cn/jjpd/jjdsp/ssyd/", },
{
"channel_name": "金融证券-基金频道-新基速递", "url": "http://finance.ce.cn/jjpd/jjdsp/tj/", },
{
"channel_name": "金融证券-基金频道-基金创新", "url": "http://finance.ce.cn/jjpd/jjdsp/jjcx/", },
{
"channel_name": "金融证券-基金频道-私募研报", "url": "http://finance.ce.cn/jjpd/jjdsp/zcg/", },
{
"channel_name": "金融证券-基金频道-基金学堂", "url": "http://finance.ce.cn/jjpd/jjdep/xt/", },
],
##保险频道
[
{
"channel_name": "金融证券-保险频道-业内交流", "url": "http://finance.ce.cn/insurance/jjdt/", },
{
"channel_name": "金融证券-保险频道-记者观察", "url": "http://finance.ce.cn/insurance/zzbx/", },
{
"channel_name": "金融证券-保险频道-保险专题", "url": "http://finance.ce.cn/insurance/bxzt/", },
{
"channel_name": "金融证券-保险频道-政策法规", "url": "http://finance.ce.cn/insurance/zcfg/", },
{
"channel_name": "金融证券-保险频道-保险课堂", "url": "http://finance.ce.cn/insurance/bxlp/", },
{
"channel_name": "金融证券-保险频道-险企新闻", "url": "http://finance.ce.cn/insurance/ylbx/", },
{
"channel_name": "金融证券-保险频道-行业动态", "url": "http://finance.ce.cn/insurance/ccbx/", },
{
"channel_name": "金融证券-保险频道-理赔维权", "url": "http://finance.ce.cn/insurance/jkx/", },
{
"channel_name": "金融证券-保险频道-险种产品", "url": "http://finance.ce.cn/insurance/ywx/", },
{
"channel_name": "金融证券-保险频道-保险数据", "url": "http://finance.ce.cn/insurance/zbx/", },
{
"channel_name": "金融证券-保险频道-2021保险", "url": "http://finance.ce.cn/insurance1/scrollnews/", },
{
"channel_name": "金融证券-保险频道-要闻", "url": "http://finance.ce.cn/insurance/yw/", },
{
"channel_name": "金融证券-保险频道-黑名单", "url": "http://finance.ce.cn/insurance/wzbx/", },
{
"channel_name": "金融证券-保险频道-深度报道", "url": "http://finance.ce.cn/sub/gssj/sdbd/", },
],
##银行频道
[
{
"channel_name": "金融证券-银行频道-要闻关注", "url": "http://finance.ce.cn/bank/yw/", },
{
"channel_name": "金融证券-银行频道-信贷融资", "url": "http://finance.ce.cn/bank/xdfx/", },
{
"channel_name": "金融证券-银行频道-理财产品", "url": "http://finance.ce.cn/bank/lccp/", },
{
"channel_name": "金融证券-银行频道-上市银行", "url": "http://finance.ce.cn/bank/dzyh/", },
{
"channel_name": "金融证券-银行频道-行业新闻", "url": "http://finance.ce.cn/bank/sryh/", },
{
"channel_name": "金融证券-银行频道-优惠信息", "url": "http://finance.ce.cn/bank/yhk/", },
{
"channel_name": "金融证券-银行频道-滚动新闻", "url": "http://finance.ce.cn/bank12/scroll/", },
{
"channel_name": "金融证券-银行频道-银行专题", "url": "http://finance.ce.cn/bank/yhzt/", },
{
"channel_name": "金融证券-银行频道-政策法规", "url": "http://finance.ce.cn/bank/zcfg/", },
{
"channel_name": "金融证券-银行频道-银行课堂", "url": "http://finance.ce.cn/bank/hqjr/", },
{
"channel_name": "金融证券-银行频道-独家报道", "url": "http://finance.ce.cn/bank/jgdt/", },
{
"channel_name": "金融证券-银行频道-机构专栏", "url": "http://finance.ce.cn/bank/zzyh/", },
{
"channel_name": "金融证券-银行频道-业绩一览", "url": "http://finance.ce.cn/bank/wzyh/", },
],
##股市频道
[
{
"channel_name": "金融证券-股市频道-债市聚焦", "url": "http://finance.ce.cn/home/cfzq/zq/", },
{
"channel_name": "金融证券-股市频道-股指期货", "url": "http://finance.ce.cn/10cjsy/qt/", },
{
"channel_name": "金融证券-股市频道-海外市场", "url": "http://finance.ce.cn/10cjsy/hw/", },
{
"channel_name": "金融证券-股市频道-并购重组", "url": "http://finance.ce.cn/10cjsy/bg/", },
{
"channel_name": "金融证券-股市频道-大势研判", "url": "http://finance.ce.cn/home/zqzq/dp/", },
{
"channel_name": "金融证券-股市频道-即时解盘", "url": "http://finance.ce.cn/stock/jsjp/", },
{
"channel_name": "金融证券-股市频道-上市全观察", "url": "http://finance.ce.cn/shqgc/", },
{
"channel_name": "金融证券-股市频道-金融证券", "url": "http://finance.ce.cn/", },
{
"channel_name": "金融证券-股市频道-滚动资讯", "url": "http://finance.ce.cn/stock/gsgdbd/", },
],
# 滚动新闻
[
{
"channel_name": "金融证券-外汇滚动新闻", "url": "http://finance.ce.cn/fe/gdxw/", },
{
"channel_name": "金融证券-小微金融滚动", "url": "http://finance.ce.cn/xwjr/gd/", },
{
"channel_name": "金融证券-债券滚动报道", "url": "http://finance.ce.cn/bond/zqgdbd/", },
{
"channel_name": "金融证券-新三板评论", "url": "http://finance.ce.cn/xsb/xsbpl/", },
{
"channel_name": "金融证券-新三板知识", "url": "http://finance.ce.cn/xsb/xsbzs/", },
{
"channel_name": "金融证券-新三板公司动态", "url": "http://finance.ce.cn/xsb/xsbgsdt/", },
{
"channel_name": "金融证券-上市动态", "url": "http://finance.ce.cn/shqgc/sc/", },
{
"channel_name": "金融证券-公司解析", "url": "http://finance.ce.cn/shqgc/pl/", },
{
"channel_name": "金融证券-融资滚动", "url": "http://finance.ce.cn/rz/rzgd/", },
{
"channel_name": "金融证券-股市七日谈", "url": "http://finance.ce.cn/sub/qrt/gs/", },
{
"channel_name": "金融证券-银行七日谈", "url": "http://finance.ce.cn/sub/qrt/yh/", },
{
"channel_name": "金融证券-保险七日谈", "url": "http://finance.ce.cn/sub/qrt/bx/", },
{
"channel_name": "金融证券-上市动态", "url": "http://finance.ce.cn/shqgc/sc/", },
{
"channel_name": "金融证券-上市公司人事更多报道", "url": "http://finance.ce.cn/sub/ssgsrs/gd/", },
{
"channel_name": "金融证券-最新报道", "url": "http://finance.ce.cn/sub/ggttk/zx/", },
{
"channel_name": "金融证券-小微金融滚动", "url": "http://finance.ce.cn/xwjr/gd/", },
{
"channel_name": "金融证券-热点聚焦", "url": "http://finance.ce.cn/2015home/jj/", },
{
"channel_name": "金融证券-焦点财讯", "url": "http://finance.ce.cn/sub/cj2009/", },
{
"channel_name": "金融证券-焦点财讯", "url": "http://finance.ce.cn/sub/cj2009/", },
{
"channel_name": "金融证券-最新报道", "url": "http://finance.ce.cn/sub/ggttk/zx/", },
],
# 食品
[
{
"channel_name": "产业市场-食品-食品专题", "url": "http://www.ce.cn/cysc/sp/subject/", },
{
"channel_name": "产业市场-食品-曝光台", "url": "http://www.ce.cn/cysc/sp/baoguantai/", },
{
"channel_name": "产业市场-食品-公司观察", "url": "http://www.ce.cn/cysc/sp/ssgs/", },
{
"channel_name": "产业市场-食品-食品行业动态", "url": "http://www.ce.cn/cysc/sp/info/", },
{
"channel_name": "产业市场-食品-中经舆情", "url": "http://www.ce.cn/cysc/sp/zhongjingyuqing/", },
{
"channel_name": "产业市场-食品-食品安全大讲堂", "url": "http://www.ce.cn/cysc/sp/djt/", },
{
"channel_name": "产业市场-食品-食品监管动态", "url": "http://www.ce.cn/cysc/sp/shiyaojianju/", },
{
"channel_name": "产业市场-食品-老年食品与营养", "url": "http://www.ce.cn/cysc/sp/lnsp/", },
{
"channel_name": "产业市场-食品-各地美食", "url": "http://www.ce.cn/cysc/sp/wy/", },
{
"channel_name": "产业市场-食品-科学用药", "url": "http://www.ce.cn/cysc/sp/aqts/", },
{
"channel_name": "产业市场-食品-中经调查", "url": "http://www.ce.cn/cysc/sp/dc/", },
{
"channel_name": "产业市场-食品-酒业", "url": "http://www.ce.cn/cysc/sp/jiu/", },
{
"channel_name": "产业市场-食品-会展报道", "url": "http://www.ce.cn/cysc/sp/xcbd/", },
{
"channel_name": "产业市场-食品-本网专稿", "url": "http://www.ce.cn/cysc/sp/bwzg/", },
{
"channel_name": "产业市场-食品-医疗器械", "url": "http://www.ce.cn/cysc/sp/tjj/", },
{
"channel_name": "产业市场-食品-餐饮", "url": "http://www.ce.cn/cysc/sp/cyaq/", },
{
"channel_name": "产业市场-食品-饮料", "url": "http://www.ce.cn/cysc/sp/cy/", },
{
"channel_name": "产业市场-食品-乳业", "url": "http://www.ce.cn/cysc/sp/ry/", },
{
"channel_name": "产业市场-食品-保健食品", "url": "http://www.ce.cn/cysc/sp/ly/", },
{
"channel_name": "产业市场-食品-药品", "url": "http://www.ce.cn/cysc/sp/bk/", },
],
# 房产
[
{
"channel_name": "产业市场-房产-房产资讯", "url": "http://www.ce.cn/cysc/fdc/fc/", },
{
"channel_name": "产业市场-房产-商业地产", "url": "http://www.ce.cn/cysc/fdc/jn/sy/", },
{
"channel_name": "产业市场-房产-本网专稿", "url": "http://www.ce.cn/cysc/fdc/12/", },
],
# 能源
[
{
"channel_name": "产业市场-能源-滚动新闻", "url": "http://www.ce.cn/cysc/ny/gdxw/", },
{
"channel_name": "产业市场-能源-冶金", "url": "http://www.ce.cn/cysc/newmain/jdpd/yj/", },
{
"channel_name": "产业市场-能源-本网专稿", "url": "http://www.ce.cn/cysc/newmain/right/zg/", },
{
"channel_name": "产业市场-能源-专题列表", "url": "http://www.ce.cn/cysc/newmain/yc/zt/", },
],
# IT
[
{
"channel_name": "产业市场-IT-本网专稿", "url": "http://www.ce.cn/cysc/newmain/right/zg/", },
{
"channel_name": "产业市场-IT-滚动新闻", "url": "http://www.ce.cn/cysc/tech/gd2012/", },
],
# 家电
[
{
"channel_name": "产业市场-家电-网购卖场", "url": "http://www.ce.cn/cysc/zgjd/wgsv/", },
{
"channel_name": "产业市场-家电-政策法规", "url": "http://www.ce.cn/cysc/zgjd/zcfg/", },
{
"channel_name": "产业市场-家电-业绩财报", "url": "http://www.ce.cn/cysc/zgjd/yjcb/", },
{
"channel_name": "产业市场-家电-质量曝光", "url": "http://www.ce.cn/cysc/zgjd/jdsh/", },
{
"channel_name": "产业市场-家电-行业新闻", "url": "http://www.ce.cn/cysc/zgjd/hyfx/", },
{
"channel_name": "产业市场-家电-公司观察", "url": "http://www.ce.cn/cysc/zgjd/qycz/", },
{
"channel_name": "产业市场-家电-业界动态", "url": "http://www.ce.cn/cysc/zgjd/yjxw/", },
{
"channel_name": "产业市场-家电-今日更新", "url": "http://www.ce.cn/cysc/zgjd/kx/", },
],
# 交通
[
{
"channel_name": "产业市场-交通-要闻", "url": "http://www.ce.cn/cysc/jtys/yw/", },
{
"channel_name": "产业市场-交通-铁路", "url": "http://www.ce.cn/cysc/jtys/tielu/", },
{
"channel_name": "产业市场-交通-航空", "url": "http://www.ce.cn/cysc/jtys/hangkong/", },
{
"channel_name": "产业市场-交通-公路", "url": "http://www.ce.cn/cysc/jtys/gonglu/", },
{
"channel_name": "产业市场-交通-海运", "url": "http://www.ce.cn/cysc/jtys/haiyun/", },
{
"channel_name": "产业市场-交通-城市交通", "url": "http://www.ce.cn/cysc/jtys/csjt/", },
{
"channel_name": "产业市场-交通-综合物流", "url": "http://www.ce.cn/cysc/jtys/zhwl/", },
{
"channel_name": "产业市场-交通-交通法规", "url": "http://www.ce.cn/cysc/jtys/fgjd/", },
{
"channel_name": "产业市场-交通-交通运输", "url": "http://www.ce.cn/cysc/jtys/", },
],
# 质量安全
[
{
"channel_name": "产业市场-质量安全-每日更新", "url": "http://www.ce.cn/cysc/zljd/gd/", },
{
"channel_name": "产业市场-质量安全-权威发布", "url": "http://www.ce.cn/cysc/zljd/qwfb/", },
{
"channel_name": "产业市场-质量安全-消费预警", "url": "http://www.ce.cn/cysc/zljd/xfyj/", },
{
"channel_name": "产业市场-质量安全-黑榜", "url": "http://www.ce.cn/cysc/zljd/hb/", },
{
"channel_name": "产业市场-质量安全-红榜", "url": "http://www.ce.cn/cysc/zljd/hong/", },
{
"channel_name": "产业市场-质量安全-电子商务", "url": "http://www.ce.cn/cysc/zljd/dzsw/", },
{
"channel_name": "产业市场-质量安全-召回信息", "url": "http://www.ce.cn/cysc/zljd/zhxx/", },
{
"channel_name": "产业市场-质量安全-各地市场信息", "url": "http://www.ce.cn/cysc/zljd/zlxx/", },
{
"channel_name": "产业市场-质量安全-本网原创", "url": "http://www.ce.cn/cysc/zljd/yc/", },
{
"channel_name": "产业市场-质量安全-关注度", "url": "http://www.ce.cn/cysc/zljd/gzd/", },
{
"channel_name": "产业市场-质量安全-质量观察", "url": "http://www.ce.cn/cysc/zljd/yqhz/", },
{
"channel_name": "产业市场-质量安全-服务质量", "url": "http://www.ce.cn/cysc/zljd/fwzl/", },
{
"channel_name": "产业市场-质量安全-消协资讯", "url": "http://www.ce.cn/cysc/zljd/xxzx/", },
{
"channel_name": "产业市场-质量安全-标准纵览", "url": "http://www.ce.cn/cysc/zljd/bz/", },
{
"channel_name": "产业市场-质量安全-政策法规", "url": "http://www.ce.cn/cysc/zljd/zcfg/", },
{
"channel_name": "产业市场-质量安全-质量知识大讲堂", "url": "http://www.ce.cn/cysc/zljd/djt/", },
{
"channel_name": "产业市场-质量安全-滚动", "url": "http://www.ce.cn/cysc/zljd/gd/", },
],
# 质量经济
[
{
"channel_name": "产业市场-质量经济-曝光台", "url": "http://12365.ce.cn/zlpd/bgtd/", },
{
"channel_name": "产业市场-质量经济-质量专题", "url": "http://12365.ce.cn/zlpd/rdzt/", },
{
"channel_name": "产业市场-质量经济-质量舆论", "url": "http://12365.ce.cn/zlpd/zlsp/", },
{
"channel_name": "产业市场-质量经济-质量管理", "url": "http://12365.ce.cn/zlpd/jdgl/", },
{
"channel_name": "产业市场-质量经济-品牌建设", "url": "http://12365.ce.cn/zlpd/bytx/", },
{
"channel_name": "产业市场-质量经济-地方质检", "url": "http://12365.ce.cn/zlpd/dfzj/", },
{
"channel_name": "产业市场-质量经济-权威发布", "url": "http://12365.ce.cn/zlpd/qwfb/", },
{
"channel_name": "产业市场-质量经济-质量资讯", "url": "http://12365.ce.cn/zlpd/jsxx/", },
{
"channel_name": "产业市场-质量经济-质量提升", "url": "http://12365.ce.cn/zlpd/yw/yw/", },
{
"channel_name": "产业市场-质量经济-高度关注", "url": "http://12365.ce.cn/zlpd/ldr/", },
{
"channel_name": "产业市场-质量经济-质量技术基础", "url": "http://12365.ce.cn/zlpd/rzbz/", },
{
"channel_name": "产业市场-质量经济-诚信责任", "url": "http://12365.ce.cn/zlpd/ppjs/", },
],
# 医药频道
[
{
"channel_name": "产业市场-医药频道-大咖谈", "url": "http://www.ce.cn/cysc/yy/qyjzf/", },
{
"channel_name": "产业市场-医药频道-行业动态", "url": "http://www.ce.cn/cysc/yy/hydt/", },
{
"channel_name": "产业市场-医药频道-权威发布", "url": "http://www.ce.cn/cysc/yy/qwfb/", },
{
"channel_name": "产业市场-医药频道-医药科普", "url": "http://www.ce.cn/cysc/yy/yydjt/", },
{
"channel_name": "产业市场-医药频道-资本市场", "url": "http://www.ce.cn/cysc/yy/ssgs/", },
{
"channel_name": "产业市场-医药频道-监督报道", "url": "http://www.ce.cn/cysc/yy/yyhhb/", },
{
"channel_name": "产业市场-医药频道-公司新闻", "url": "http://www.ce.cn/cysc/yy/gdpl/", },
{
"channel_name": "产业市场-医药频道-中医药", "url": "http://www.ce.cn/cysc/yy/zy/", },
{
"channel_name": "产业市场-医药频道-药店", "url": "http://www.ce.cn/cysc/yy/yd/", },
{
"channel_name": "产业市场-医药频道-临床研究", "url": "http://www.ce.cn/cysc/yy/hzp/", },
{
"channel_name": "产业市场-医药频道-医美·化妆品", "url": "http://www.ce.cn/cysc/yy/lcyj/", },
{
"channel_name": "产业市场-医药频道-医疗器械", "url": "http://www.ce.cn/cysc/yy/ylqx/", },
{
"channel_name": "产业市场-医药频道-医疗新闻", "url": "http://www.ce.cn/cysc/yy/ylxw/", },
{
"channel_name": "产业市场-医药频道-创新药", "url": "http://www.ce.cn/cysc/yy/hwkx/", },
],
# 生态文明
[
{
"channel_name": "产业市场-生态文明-滚动新闻", "url": "http://www.ce.cn/cysc/stwm/gd/", },
{
"channel_name": "产业市场-生态文明-美丽中国", "url": "http://www.ce.cn/cysc/stwm/mlzg/", },
{
"channel_name": "产业市场-生态文明-环境监管", "url": "http://www.ce.cn/cysc/stwm/qygc/", },
{
"channel_name": "产业市场-生态文明-绿色发展", "url": "http://www.ce.cn/cysc/stwm/lsjj/", },
{
"channel_name": "产业市场-生态文明-污染防治", "url": "http://www.ce.cn/cysc/stwm/wrfz/", },
{
"channel_name": "产业市场-生态文明-生态保护", "url": "http://www.ce.cn/cysc/stwm/zxdt/", },
{
"channel_name": "产业市场-生态文明-政策解读", "url": "http://www.ce.cn/cysc/stwm/zc/", },
{
"channel_name": "产业市场-生态文明-本网专稿", "url": "http://www.ce.cn/cysc/stwm/zg/", },
],
# 旅游频道
[
{
"channel_name": "产业市场-旅游频道-滚动", "url": "http://travel.ce.cn/gdtj/", },
{
"channel_name": "产业市场-旅游频道-文化旅游", "url": "http://travel.ce.cn/xsy/gl/", },
{
"channel_name": "产业市场-旅游频道-舆情投诉", "url": "http://travel.ce.cn/xsy/yq/", },
{
"channel_name": "产业市场-旅游频道-酒店航空", "url": "http://travel.ce.cn/xsy/jd/", },
{
"channel_name": "产业市场-旅游频道-在线旅游", "url": "http://travel.ce.cn/xsy/zx/", },
{
"channel_name": "产业市场-旅游频道-旅游经济信息联播", "url": "http://travel.ce.cn/xsy/sp/", },
{
"channel_name": "产业市场-旅游频道-权威发布", "url": "http://travel.ce.cn/xsy/fb/", },
{
"channel_name": "产业市场-旅游频道-产业经济", "url": "http://travel.ce.cn/xsy/cy/", },
],
# 文化产业
[
{
"channel_name": "产业市场-文化产业-中经文化产业", "url": "http://www.ce.cn//culture/whcyk/zjwhcy/", },
{
"channel_name": "产业市场-文化产业-独家专稿", "url": "http://www.ce.cn/culture/whcyk/zg/", },
{
"channel_name": "产业市场-文化产业-专题", "url": "http://www.ce.cn/culture/whcyk/zt/", },
{
"channel_name": "产业市场-文化产业-文化名人访", "url": "http://www.ce.cn/culture/whmrf/", },
{
"channel_name": "产业市场-文化产业-文化达人", "url": "http://www.ce.cn/culture/dr/", },
{
"channel_name": "产业市场-文化产业-文化月报", "url": "http://www.ce.cn/culture/yb/", },
{
"channel_name": "产业市场-文化产业-文化舆情", "url": "http://www.ce.cn/culture/whcyk/jrht/", },
{
"channel_name": "产业市场-文化产业-文化要闻", "url": "http://www.ce.cn/culture/whcyk/yaowen/", },
{
"channel_name": "产业市场-文化产业-滚动", "url": "http://www.ce.cn/culture/gd/", },
],
# 书画
[
{
"channel_name": "产业市场-书画-文化名人访", "url": "http://www.ce.cn/culture/whmrf/", },
{
"channel_name": "产业市场-书画-文化产业频道", "url": "http://www.ce.cn/culture/", },
{
"channel_name": "产业市场-书画-书画高清图", "url": "http://shuhua.ce.cn/dtbf/", },
{
"channel_name": "产业市场-书画-名人库", "url": "http://shuhua.ce.cn/ren/", },
{
"channel_name": "产业市场-书画-展览", "url": "http://shuhua.ce.cn/sy2015/zhan/", },
{
"channel_name": "产业市场-书画-艺术市场", "url": "http://shuhua.ce.cn/sy2015/pmxw/", },
{
"channel_name": "产业市场-书画-要闻", "url": "http://shuhua.ce.cn/sy2015/yw/", },
{
"channel_name": "产业市场-书画-书画快报", "url": "http://shuhua.ce.cn/xinxi/", },
],
# 时政社会
[
{
"channel_name": "时政社会-人事动态", "url": "http://district.ce.cn/newarea/sddy/", },
{
"channel_name": "时政社会-宏观经济", "url": "http://www.ce.cn/macro/more/", },
{
"channel_name": "时政社会-时政", "url": "http://www.ce.cn/xwzx/gnsz/gdxw/", },
{
"channel_name": "时政社会-要闻", "url": "http://www.ce.cn/xwzx/gnsz/szyw/", },
{
"channel_name": "时政社会-社会", "url": "http://www.ce.cn/xwzx/shgj/", },
{
"channel_name": "时政社会-法制", "url": "http://www.ce.cn/xwzx/fazhi/", },
{
"channel_name": "时政社会-地方党政人物库", "url": "http://district.ce.cn/zt/rwk/", },
{
"channel_name": "时政社会-专题", "url": "http://www.ce.cn/zt/sz/", },
{
"channel_name": "时政社会-专稿", "url": "http://www.ce.cn/xwzx/xinwen/bwzg/", },
{
"channel_name": "时政社会-即时要闻", "url": "http://www.ce.cn/xwzx/xinwen/jsyw/", },
{
"channel_name": "时政社会-社会广角", "url": "http://www.ce.cn/xwzx/shgj/gdxw/", },
{
"channel_name": "时政社会-科教", "url": "http://www.ce.cn/xwzx/kj/", },
{
"channel_name": "时政社会-科普知识", "url": "http://www.ce.cn/xwzx/xinwen/kjjy/kpzs/", },
{
"channel_name": "时政社会-教育资讯", "url": "http://www.ce.cn/xwzx/xinwen/kjjy/jyzx/", },
{
"channel_name": "时政社会-图片中心", "url": "http://www.ce.cn/xwzx/photo/", },
],
# 中经视频
[
{
"channel_name": "中经视频-最新", "url": "http://cen.ce.cn/more/", },
{
"channel_name": "中经视频-中韩专线直击", "url": "http://cen.ce.cn/cevideo/cen/zj/", },
{
"channel_name": "中经视频-巴中特快", "url": "http://cen.ce.cn/cevideo/cen/ct/", },
{
"channel_name": "中经视频-一带一路·面对面", "url": "http://cen.ce.cn/cevideo/cen/ff/", },
{
"channel_name": "中经视频-中巴经贸热线", "url": "http://cen.ce.cn/cevideo/cen/rx/", },
{
"channel_name": "中经视频-短视频", "url": "http://cen.ce.cn/cevideo/sv/", },
{
"channel_name": "中经视频-每周中国经济", "url": "http://cen.ce.cn/cevideo/cen/mz/", },
{
"channel_name": "中经视频-巴基斯坦人在中国", "url": "http://cen.ce.cn/cevideo/cen/wic/", },
{
"channel_name": "中经视频-直播", "url": "http://cen.ce.cn/cevideo/zb/h/", },
{
"channel_name": "中经视频-中巴经贸企业名录", "url": "http://cen.ce.cn/cevideo/cen/qy/", },
{
"channel_name": "中经视频-专题·活动", "url": "http://cen.ce.cn/cevideo/sc/", },
{
"channel_name": "中经视频-关于中经网韩国(株)", "url": "http://cen.ce.cn/cevideo/cek/", },
{
"channel_name": "中经视频-关于中经视频", "url": "http://cen.ce.cn/cevideo/cevideo/", },
],
# 评论理论
[
{
"channel_name": "评论理论-专题", "url": "http://views.ce.cn/main/zt/", },
{
"channel_name": "评论理论-经济大讲堂", "url": "http://views.ce.cn/view/society/", },
{
"channel_name": "评论理论-观察家", "url": "http://views.ce.cn/view/obs/", },
{
"channel_name": "评论理论-经济眼", "url": "http://views.ce.cn/view/economy/", },
{
"channel_name": "评论理论-经济学人", "url": "http://views.ce.cn/fun/who/", },
{
"channel_name": "评论理论-声音", "url": "http://views.ce.cn/main/net/", },
{
"channel_name": "评论理论-理论前沿", "url": "http://views.ce.cn/main/qy/", },
{
"channel_name": "评论理论-经点热评", "url": "http://views.ce.cn/main/jdrp/", },
{
"channel_name": "评论理论-网言众议", "url": "http://views.ce.cn/main/disc/", },
{
"channel_name": "评论理论-中经天天评", "url": "http://views.ce.cn/main/yc/", },
{
"channel_name": "评论理论-理论动态", "url": "http://views.ce.cn/main/lldt/", },
{
"channel_name": "评论理论-今日看点", "url": "http://views.ce.cn/main/kd/", },
{
"channel_name": "评论理论-理论百科", "url": "http://views.ce.cn/fun/llbk/", },
],
# 脱贫攻坚
[
{
"channel_name": "脱贫攻坚-攻坚先锋", "url": "http://tuopin.ce.cn/rw/", },
{
"channel_name": "脱贫攻坚-书记县长纵横谈", "url": "http://tuopin.ce.cn/sjxz/", },
{
"channel_name": "脱贫攻坚-政策指南", "url": "http://tuopin.ce.cn/zczn/", },
{
"channel_name": "脱贫攻坚-产业兴县", "url": "http://tuopin.ce.cn/cyxx/", },
{
"channel_name": "脱贫攻坚-独家视角", "url": "http://tuopin.ce.cn/exclusive/", },
{
"channel_name": "脱贫攻坚-热点话题", "url": "http://tuopin.ce.cn/rdht/", },
{
"channel_name": "脱贫攻坚-省部动态", "url": "http://tuopin.ce.cn/sbdt/", },
{
"channel_name": "脱贫攻坚-今日要闻", "url": "http://tuopin.ce.cn/yw/", },
{
"channel_name": "脱贫攻坚-专稿", "url": "http://tuopin.ce.cn/zg/", },
{
"channel_name": "脱贫攻坚-滚动资讯", "url": "http://tuopin.ce.cn/news/", },
{
"channel_name": "脱贫攻坚-美丽乡村", "url": "http://tuopin.ce.cn/mlxc/", },
{
"channel_name": "脱贫攻坚-实用信息", "url": "http://tuopin.ce.cn/syxx/", },
{
"channel_name": "脱贫攻坚-谈贫论富", "url": "http://tuopin.ce.cn/pfl/", },
{
"channel_name": "脱贫攻坚-国际扶贫", "url": "http://tuopin.ce.cn/gjfp/", },
{
"channel_name": "脱贫攻坚-驻村帮扶", "url": "http://tuopin.ce.cn/zcbf/", },
{
"channel_name": "脱贫攻坚-培训讲坛", "url": "http://tuopin.ce.cn/pxjt/", },
{
"channel_name": "脱贫攻坚-社会扶贫", "url": "http://tuopin.ce.cn/sh/", },
],
# 汽车
[
{
"channel_name": "汽车频道-滚动", "url": "http://auto.ce.cn/auto/gundong/", },
{
"channel_name": "汽车频道-社会责任", "url": "http://auto.ce.cn/car/csr/", },
{
"channel_name": "汽车频道-后市场", "url": "http://auto.ce.cn/car/hsc/", },
{
"channel_name": "汽车频道-观察家", "url": "http://auto.ce.cn/car/gcj/", },
{
"channel_name": "汽车频道-领袖说", "url": "http://auto.ce.cn/car/lx/", },
{
"channel_name": "汽车频道-新车", "url": "http://auto.ce.cn/car/xc/", },
{
"channel_name": "汽车频道-产经", "url": "http://auto.ce.cn/car/cj/", },
{
"channel_name": "汽车频道-资讯", "url": "http://auto.ce.cn/car/zx/", },
{
"channel_name": "汽车频道-试驾", "url": "http://auto.ce.cn/auto/shijia/", },
{
"channel_name": "汽车频道-特别报道", "url": "http://auto.ce.cn/car/tbbd/", },
{
"channel_name": "汽车频道-特别策划", "url": "http://auto.ce.cn/car/ch/", },
{
"channel_name": "汽车频道-原创观点", "url": "http://auto.ce.cn/car/yc/", },
],
# 会展
[
{
"channel_name": "会展中国-滚动", "url": "http://expo.ce.cn/gd/", },
{
"channel_name": "会展中国-直播", "url": "http://expo.ce.cn/shy/zb/", },
{
"channel_name": "会展中国-专题", "url": "http://expo.ce.cn/shy/zt/", },
{
"channel_name": "会展中国-论道", "url": "http://expo.ce.cn/shy/ld/", },
{
"channel_name": "会展中国-政策", "url": "http://expo.ce.cn/shy/zc/", },
{
"channel_name": "会展中国-会展名人堂", "url": "http://expo.ce.cn/shy/mrt/", },
{
"channel_name": "会展中国-艺术博览", "url": "http://expo.ce.cn/shy/ys/01/", },
{
"channel_name": "会展中国-节庆活动", "url": "http://expo.ce.cn/shy/jq/01/", },
{
"channel_name": "会展中国-会奖商旅", "url": "http://expo.ce.cn/shy/MICE/01/", },
{
"channel_name": "会展中国-名企", "url": "http://expo.ce.cn/shy/mq/", },
{
"channel_name": "会展中国-产业会展", "url": "http://expo.ce.cn/shy/cy/", },
],
# 城市频道
[
{
"channel_name": "城市频道-城市建设", "url": "http://city.ce.cn/main/build/", },
{
"channel_name": "城市频道-城市探索者", "url": "http://city.ce.cn/main/exclusive/", },
{
"channel_name": "城市频道-生态城市", "url": "http://city.ce.cn/main/ecological/", },
{
"channel_name": "城市频道-城市经济", "url": "http://city.ce.cn/main/economy/", },
{
"channel_name": "城市频道-环球观察", "url": "http://city.ce.cn/main/observation/", },
{
"channel_name": "城市频道-省市动态", "url": "http://city.ce.cn/main/news/", },
{
"channel_name": "城市频道-城市周刊", "url": "http://city.ce.cn/main/cityweek/", },
],
# 公益频道
[
{
"channel_name": "公益频道-社会责任", "url": "http://gongyi.ce.cn/gy/zr/", },
{
"channel_name": "公益频道-公益行动", "url": "http://gongyi.ce.cn/gy/gyxd/", },
{
"channel_name": "公益频道-公益新闻", "url": "http://gongyi.ce.cn/news/", },
],
# 生活频道
[
{
"channel_name": "生活频道", "url": "http://fashion.ce.cn/", },
],
# 健康频道
[
{
"channel_name": "健康频道-专稿", "url": "http://health.ce.cn/zg/", },
{
"channel_name": "健康频道-资讯", "url": "http://health.ce.cn/news/", },
{
"channel_name": "健康频道-养老咨询", "url": "http://health.ce.cn/sy2015/ylzx/", },
{
"channel_name": "健康频道-权威发布", "url": "http://health.ce.cn/sy2015/qwfb/", },
{
"channel_name": "健康频道-家庭护理", "url": "http://health.ce.cn/sy2015/jthl/", },
{
"channel_name": "健康频道-高端访谈", "url": "http://health.ce.cn/sy2015/gdft/", },
{
"channel_name": "健康频道-休闲健身", "url": "http://health.ce.cn/sy2015/xxjs/", },
{
"channel_name": "健康频道-健康产业", "url": "http://health.ce.cn/sy2015/jkcy/", },
{
"channel_name": "健康频道-图片", "url": "http://health.ce.cn/sy2015/tp/", },
{
"channel_name": "健康频道-医药科技", "url": "http://health.ce.cn/sy2015/yykj/", },
{
"channel_name": "健康频道-健康名人堂", "url": "http://health.ce.cn/sy2015/qwzj/", },
{
"channel_name": "健康频道-养生保健", "url": "http://health.ce.cn/sy2015/ysbj/", },
{
"channel_name": "健康频道-曝光台", "url": "http://health.ce.cn/sy2015/hyxw/", },
{
"channel_name": "健康频道-健康资讯", "url": "http://health.ce.cn/sy2015/jkzx/", },
],
# 科技频道
[
{
"channel_name": "科技频道-产经资讯", "url": "http://tech.ce.cn/cjzx/", },
{
"channel_name": "科技频道-在线教育", "url": "http://tech.ce.cn/tech2018/zxjy/", },
{
"channel_name": "科技频道-网络安全", "url": "http://tech.ce.cn/tech2018/safe/", },
{
"channel_name": "科技频道-创新科技", "url": "http://tech.ce.cn/tech2018/newtech/", },
{
"channel_name": "科技频道-科技生活", "url": "http://tech.ce.cn/tech2018/life/", },
{
"channel_name": "科技频道-人工智能", "url": "http://tech.ce.cn/tech2018/rgzn/", },
{
"channel_name": "科技频道-科学新知", "url": "http://tech.ce.cn/tech2018/kx/", },
{
"channel_name": "科技频道-科技名企", "url": "http://tech.ce.cn/tech2018/kjmq/", },
{
"channel_name": "科技频道-科技新闻", "url": "http://tech.ce.cn/news/", },
],
# 旅游经济
[
{
"channel_name": "旅游经济-滚动", "url": "http://travel.ce.cn/gdtj/", },
{
"channel_name": "旅游经济-海南", "url": "http://travel.ce.cn/ztk/hainan/", },
{
"channel_name": "旅游经济-文化旅游", "url": "http://travel.ce.cn/xsy/gl/", },
{
"channel_name": "旅游经济-舆情投诉", "url": "http://travel.ce.cn/xsy/yq/", },
{
"channel_name": "旅游经济-酒店航空", "url": "http://travel.ce.cn/xsy/jd/", },
{
"channel_name": "旅游经济-在线旅游", "url": "http://travel.ce.cn/xsy/zx/", },
{
"channel_name": "旅游经济-旅游经济信息联播", "url": "http://travel.ce.cn/xsy/sp/", },
{
"channel_name": "旅游经济-权威发布", "url": "http://travel.ce.cn/xsy/fb/", },
{
"channel_name": "旅游经济-产业经济", "url": "http://travel.ce.cn/xsy/cy/", },
],
# 中国商用汽车网
[
{
"channel_name": "中国商用汽车网-滚动新闻", "url": "http://cv.ce.cn/news/", },
{
"channel_name": "中国商用汽车网-交通新闻", "url": "http://cv.ce.cn/2020/jtxw/", },
{
"channel_name": "中国商用汽车网-专题推荐", "url": "http://cv.ce.cn/2020/zttj/", },
{
"channel_name": "中国商用汽车网-试驾报告", "url": "http://cv.ce.cn/2020/xcsj/", },
{
"channel_name": "中国商用汽车网-新车上市", "url": "http://cv.ce.cn/2020/xcsj/", },
{
"channel_name": "中国商用汽车网-企业动态", "url": "http://cv.ce.cn/2020/qydt/", },
{
"channel_name": "中国商用汽车网-行业资讯", "url": "http://cv.ce.cn/2020/hyzx/", },
{
"channel_name": "中国商用汽车网-本网专稿", "url": "http://cv.ce.cn/2020/bwzg/", },
],
# 家电频道
[
{
"channel_name": "家电频道-网购卖场", "url": "http://www.ce.cn/cysc/zgjd/wgsv/", },
{
"channel_name": "家电频道-政策法规", "url": "http://www.ce.cn/cysc/zgjd/zcfg/", },
{
"channel_name": "家电频道-业绩财报", "url": "http://www.ce.cn/cysc/zgjd/yjcb/", },
{
"channel_name": "家电频道-质量曝光", "url": "http://www.ce.cn/cysc/zgjd/jdsh/", },
{
"channel_name": "家电频道-行业新闻", "url": "http://www.ce.cn/cysc/zgjd/hyfx/", },
{
"channel_name": "家电频道-公司观察", "url": "http://www.ce.cn/cysc/zgjd/qycz/", },
{
"channel_name": "家电频道-业界动态", "url": "http://www.ce.cn/cysc/zgjd/yjxw/", },
{
"channel_name": "家电频道-今日更新", "url": "http://www.ce.cn/cysc/zgjd/kx/", },
],
]
整体网站数据列表有多少种样式就要做多少个parseX,并添加到
parse_list = [
self.parse1, # 金融证券
self.parse1, # 期货频道
self.parse1, # 新三板
self.parse1, # 理财
self.parse1, # 基金频道
self.parse1, # 保险频道
self.parse1, # 银行频道
self.parse1, # 股市频道
self.parse1, # 滚动新闻
self.parse2, # 食品
self.parse3, # 房产
self.parse2, # 能源
self.parse2, # IT
self.parse2, # 家电
self.parse2, # 交通
self.parse2, # 质量安全
self.parse4, # 质量经济
self.parse5, # 医药频道
self.parse2, # 生态文明
self.parse6, # 旅游频道
self.parse0, # 文化产业
self.parse0, # 书画
self.parse0, # 时政社会
self.parse0, # 中经视频
self.parse0, # 评论理论
self.parse6, # 脱贫攻坚
self.parse7, # 汽车
self.parse0, # 会展
self.parse0, # 城市频道
self.parse6, # 公益频道
self.parse0, # 中国经济网-生活频道
self.parse6, # 健康频道
self.parse6, # 科技频道
self.parse6, # 旅游经济
self.parse2, # 中国商用汽车网
self.parse2, # 家电频道
]
# 样式1通用
data = extract_list(response.text)
for each in range(len(data)):
item['title'] = data[each]["title"].strip() # 内容标题
item['url'] = parse.urljoin(response.url, data[each]["url"]) # 拼接正文url
# 样式1
Item_title = response.xpath('//tr/td[@class="font14"]/a/text()').extract() # 文章标题列表
Item_url = response.xpath('//tr/td[@class="font14"]/a/@href').extract() # 文章链接列表
# 样式2
Item_title = response.xpath('//div[@class="left"]/ul/li/a/text()').extract() # 文章标题列表
Item_url = response.xpath('//div[@class="left"]/ul/li/a/@href').extract() # 文章链接列表
# 样式3
Item_title = response.xpath('//div[@class="sec_left"]/ul/li/span/a/text()').extract() # 文章标题列表
Item_url = response.xpath('//div[@class="sec_left"]/ul/li/span/a/@href').extract() # 文章链接列表
# 样式4
Item_title = response.xpath('//div[@class="listf"]/ul/li/span/a/text()').extract() # 文章标题列表
Item_url = response.xpath('//div[@class="listf"]/ul/li/span/a/@href').extract() # 文章链接列表
# 样式5
Item_title = response.xpath('//tr/td[@style="font-size:14px"]/a/text()').extract() # 文章标题列表
Item_url = response.xpath('//tr/td[@style="font-size:14px"]/a/@href').extract() # 文章链接列表
# 样式6
Item_title = response.xpath('//div[@class="list"]/ul/li/a/text()').extract() # 文章标题列表
Item_url = response.xpath('//div[@class="list"]/ul/li/a/@href').extract() # 文章链接列表
# 样式7
Item_title = response.xpath('//div[@class="piclist plno clearfix"]/h2/a/text()').extract() # 文章标题列表
Item_url = response.xpath('//div[@class="piclist plno clearfix"]/h2/a/@href').extract() # 文章链接列表
Item_thumbImg = response.xpath('//div[@class="piclist plno clearfix"]/a/img/@src').extract() # 文章封面图片列表
1. 抓取详情页内容
# 处理详情页带格式,这里整个页面进行抓取
item['content'] = ""
if 'class="content"' in response.text and len(None2Str(item['content'])) < 5:
item['content'] = response.xpath('//div[@class="content"]').extract_first()
if 'tbody' in response.text and len(None2Str(item['content'])) < 5:
item['content'] = response.xpath('//tbody').extract_first()
if 'body' in response.text and len(None2Str(item['content'])) < 5:
item['content'] = response.xpath('//body').extract_first()
2. 特别说明
有些网站的程序员丧心病狂到一定程度10个页面9种样式这种,由于我们不可能每个页面都打开看一下详情页的CSS格式,因此有个通用的解决办法。
db.你的表名.find({content:/body/})