笔趣阁小说电子书生成办法

为什么要写这个脚本

1.收费
2.广告
只是想干干净净的看小说而已

步骤

1.使用下边的脚本，下载小说，保存成xxx.html
2.使用calibre工具，生成电子书。记得在生成电子书时，目录结构->一级目录，填写//h:h4 ，这样生成的电子书就带有目录了。
3.好了，可以美美看纯净版小说了

脚本内容

#!/usr/bin/python3
#-*-coding:utf-8-*-
#biquge小说下载
import re
import urllib.request
import ssl
from pyquery import PyQuery as pq
import time 

def getHtml(url):
    page = urllib.request.urlopen(url)
    html = page.read()
    html = html.decode('utf-8')
    return html
#下载列表
def getArticleList(listurl,contenturl_prefix):
    html=getHtml(listurl)
    doc = pq(html)
    ret=[]
    for a in doc("div#list dd a").items():
        href=contenturl_prefix+a.attr("href")
        title=a.text()
        ret.append((href,title))
    return ret
#下载内容
def getArticle(contenturl):
    html=getHtml(contenturl)
    doc = pq(html)
    return doc("div#content").html()

#
if __name__ == "__main__":
    article_list_url,article_url_prefix=("https://www.biquge.info/1_1760/","https://www.biquge.info/1_1760/")
    article_iterms = getArticleList(article_list_url,article_url_prefix)
    save2file = "/Users/myname/Downloads/xiaoshuo.html" 
    with open(save2file,'w',encoding="utf-8") as f:
        f.write("")
        for art in article_iterms:
            content = getArticle(art[0])
            f.write(""+art[1]+"")
            f.write(content)
            print(art[1])
            time.sleep(1)
        f.write("")

其实也就是个最简单的爬虫脚本，
稍作修改也适应其他小说网站。

加强版本，支持大本书切割

#!/usr/bin/python3
#-*-coding:utf-8-*-
#biquge小说下载
import re
import urllib.request
import ssl
from pyquery import PyQuery as pq
import time 
import sys

def getHtml(url):
    page = urllib.request.urlopen(url)
    html = page.read()
    html = html.decode('utf-8')
    return html

#下载列表
def getArticleList(listurl,contenturl_prefix):
    html=getHtml(listurl)
    doc = pq(html)
    ret=[]
    for a in doc("div#list dd a").items():
        href=contenturl_prefix+a.attr("href")
        title=a.text()
        ret.append((href,title))
    return ret

#下载内容
def getArticle(contenturl):
    html=getHtml(contenturl)
    doc = pq(html)
    return doc("div#content").html()

#清洗    
def clearArticle(content):
    #TODO 清洗，自己实现，可以用正则、字符串替换的方式清理
    return content.replace("龗","")

#支持分割功能
if __name__ == "__main__":
    article_list_url,article_url_prefix=("https://www.biquge.info/1_1760/","https://www.biquge.info/1_1760/")
    article_iterms = getArticleList(article_list_url,article_url_prefix)
    save2file = "/Users/myname/Downloads/xiaoshuo_{0:0>4d}.html" 
    single_book_size = 100 #1本书最多有多少章节，避免一本书过大，转换失败
    fo = None
    index=0
    book_no=1
    for art in article_iterms:
        print(art[1])
        if index%single_book_size==0:
            if fo!=None:
                fo.write("")
                fo.close()
                fo=None
                book_no=book_no+1
            fo=open(save2file.format(book_no),'w',encoding="utf-8")
            fo.write("")
        content = clearArticle(getArticle(art[0]))
        fo.write(""+art[1]+"")
        fo.write(content)
        time.sleep(1)#下载1篇后，休息1秒钟，做一个有道德的爬虫
        index=index+1
    if index%single_book_size!=0:
        fo.write("")
        fo.close()

笔趣阁小说电子书生成办法

为什么要写这个脚本

步骤

"+art[1]+"

加强版本，支持大本书切割

"+art[1]+"

你可能感兴趣的:(笔趣阁小说电子书生成办法)