web scraper爬虫1

一、配置web scraper
从Chrome浏览的扩展商店中安装web scraper;安装过程不做赘述;
安装完成后,在浏览器页面按F12打开console模式,点击web scraper进行操作。

二、内容抓取简单操作
1.循环多个相同页面内容抓取
可以使用正规则表达式,循环抓取指定页面,如[x-y]
2.表格按行显示
开启首列内容为"multiple"的设置为true,其他列的"multiple"为false;
3.抓取子页面内容的元素
设置link,并以该元素为父节点。

简单案例:
{"selectors":[{"parentSelectors":["_root"],"type":"SelectorLink","multiple":true,"id":"link","selector":"td.table-com-name a","delay":""},{"parentSelectors":["_root"],"type":"SelectorText","multiple":false,"id":"name","selector":"td.table-com-name a","regex":"","delay":""},{"parentSelectors":["_root"],"type":"SelectorText","multiple":false,"id":"date","selector":"td.table-time","regex":"","delay":""},{"parentSelectors":["_root"],"type":"SelectorText","multiple":false,"id":"jieduan","selector":"td.table-stage a","regex":"","delay":""},{"parentSelectors":["_root"],"type":"SelectorText","multiple":false,"id":"lingyu","selector":"td.table-type a","regex":"","delay":""}],"startUrl":"http://www.cyzone.cn/index.php?c=index&a=init&tpl=dbsearch&wq=%E5%86%9C%E6%9D%91&modelid=18&page=[1-9]","_id":"nongcun2"}

你可能感兴趣的:(web scraper爬虫1)