小爬虫,爬取shellcode

shellcode网站 http://shell-storm.org/shellcode/

使用在线网站http://tools.bugscaner.com/sitemapspider 生成site map

使用下面的代码写到markdown里面去

#coding=utf-8
import requests
from bs4 import BeautifulSoup 
import re
import html


haatml=r"C:\Users\Administrator.WQ-20160501NYYU\Downloads\sitemap.html"
shellcode_dir=r"F:\wangpei\\tools\scripts\shellcode\\1.md"
soup = BeautifulSoup(open(haatml,encoding="utf8"),features='html.parser')
i=0
with open(shellcode_dir,"w",encoding="utf8") as f:
    for x in soup.find_all('a'):
        url=x.get("href")
        title=x.get("title")
        req=requests.get(url=url)
        pre_idx=req.text.find("
")+5
        post_idx=req.text.find("")
        markdown="## "+title+"\n```python\n"+html.unescape(req.text[pre_idx:post_idx])+"\n```\n"
        f.write(markdown)
        print(str(i)+title)
        i=i+1



你可能感兴趣的:(一些记录)