使用xpath解析B站视频播放目录

有时对于B站很多集的技术视频,可以通过xpath分析元素,打印出来作为目录使用,以下为参考代码:

功能:

1、解析xml元素,逐行打印;

2、记录当前集时长,播放总时长;

3、前面留有标志位,可以自行打勾作为提醒

xml提取:

将目录元素手工保存至文本文件中,如 bilibili_video_lst.xml

保存

...
元素即可:

import requests
from lxml import etree


def get_content():
    video_xml_path = "my_useful_tools/data/bilibili_video_lst.xml"
    with open(video_xml_path, encoding="utf-8") as f:
        xml_str = f.read()
    all_durtime = {
        "hour": 0,
        "min": 0,
        "sec": 0
    }
    xml_root_data = etree.HTML(xml_str)
    for li_xml_data in xml_root_data.xpath('//div/ul/li'):
        # print(li_xml_data)
        cur_durtime = {"hour": 0, "min": 0, "sec": 0}
        title = li_xml_data.xpath("a/@title")[0]
        durtime = li_xml_data.xpath('a/div/div[@class="duration"]/text()')[0]

        hour = 0
        mins = 0
        sec = 0
        
        if len(durtime.split(":")) == 2: 
            mins, sec = [int(i) for i in durtime.split(":")]
        else:
            hour, mins, sec = [int(i) for i in durtime.split(":")]

        if (all_durtime["sec"] + sec) >= 60:
            mins += 1
            all_durtime["sec"] = all_durtime["sec"] + sec - 60
        else:
            all_durtime["sec"] = all_durtime["sec"] + sec

        if (all_durtime["min"] + mins) >= 60:
            hour += 1
            all_durtime["min"] = all_durtime["min"] + mins - 60
        else:
            all_durtime["min"] = all_durtime["min"] + mins

        all_durtime["hour"] += hour

        iter_durtime = str(all_durtime["hour"]).zfill(2) + ":" + str(all_durtime["min"]).zfill(2)


        print("[_] [_] [_]",iter_durtime, durtime + '\t' + title)
    

if __name__ == '__main__':
    get_content()
    print('THE END.')

结果示例:

[_] [_] [_] 00:16 16:41 day01_01.离线批计算与实时流失计算的理解(核心要点:数据流的哲学观)
[_] [_] [_] 01:06 49:19 day01_02.flink的基本概念、运行时架构及重要特性
[_] [_] [_] 02:00 54:03 day01_03.flink入门程序编写示例(从socket数据源做wordcount)
[_] [_] [_] 02:18 18:29 day01_04.flink入门程序编写示例(scala版本的wordcount)
[_] [_] [_] 02:47 28:59 day02_01.flink入门编程-批计算编程示例

你可能感兴趣的:(html,python,xml,xpath)