爬取特码

爬取数据

from selenium import webdriver
import re
driver = webdriver.PhantomJS()
driver.get("http://6hc.98vp.com/")
t1 = driver.find_element_by_xpath("/html/body/div[2]/div[2]/div[3]/table/tbody")
t1.text
l1 = re.findall(',([0-9]{1,3})',t1.text.replace('\n',','))

处理数据生产字典


dd={}
for i in range(len(l1)):
  if i%8==0:
    val1=[]
    key1=l1[i]
  else:
    val1.append(l1[i])
  if len(val1)==7: 
    dd1={key1:val1}
    dd.update(dd1)

写入文件

import json
json.dump(dd,open('dd.txt','w'))
ldd=json.load(open('dd.txt','r'))

url.jion

from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get('http://6hc.98vp.com/')
t=driver.find_element_by_xpath('//*[@id="main"]')
str_=t.text.replace('\n',',')
l1=str_.split(',')
url0="http://6hc.98vp.com/lhc/gethistory?y="
urls=[]
for i in range(len(l1)):
  url1=url0+l1[i]
  urls.append(url1)

你可能感兴趣的:(爬取特码)