爬取教学立方的所有课件下载链接

爬取教学立方的所有课件下载链接

爬取一门课程下的所有课件链接,点击链接即可下载

待追加功能:

  1. 将代码改为面向对象
  2. 写入文件的方式
  3. 直接进行下载到指定的目录文件夹下

已实现功能:

  1. 获取课件的下载链接
  2. 获取课件一共有几页
  3. 将链接写入文本文件中
import requests
import json
import jsonpath
from itertools import chain

url = "https://teaching.applysquare.com/Api/CourseAttachment/getList/token/“此处未安全起见取消显示”?page={}&cid=7612"
headers = {
    "User-Agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.122 Mobile Safari/537.36"
}


for i in range(5):
    rea_url = url.format(i*1)
    # print(rea_url)
    r = requests.get(url=rea_url, headers=headers)


    # print(r.json())
    # 以下为获取数量、标题、链接
    list_num  = jsonpath.jsonpath(r.json(), '$..count')
    name = jsonpath.jsonpath(r.json(), '$..title')
    rate = jsonpath.jsonpath(r.json(), '$..path')
    want = list(chain.from_iterable(zip(name, rate)))
    # print(want)


    # 以下为页数的判断
    # str_num = "".join(list_num)此行错误为list中有数字
    for i in list_num:
        pass
    if (i/10)>(i//10):
        page = (i//10)+1
    else:
        page = i//10

    # 以下为写入文件的判断
    count1 = 0
    for w in want:

        if count1 % 3 == 1:
            want.insert(count1, ":")
        count1 += 1

    count2 = 0
    for w in want:
        if count2 % 4 == 0:
            want.insert(count2, "\n")
        count2 += 1
    # print(type(want))list类型
    # print(want)
    str1 = " ".join(want)
    print(str1)
    with open("教学立方.txt", "a", encoding="utf-8") as f:
        f.write(str1)
    print("-------------------------------------------------")

其中代码额url省去的部分为: 包内的***:path***

你可能感兴趣的:(PYTHON,python,json)