爬虫(三)

1.JS逆向实战破解X-Bogus值

X-Bogus:以DFS开头,总长28位

爬虫(三)_第1张图片
爬虫(三)_第2张图片
答案是X-Bogus,因为会把负载里面所有的值打包生成X-Boogus

1.1 找X-Bogus加密位置(请求堆栈)

爬虫(三)_第3张图片
爬虫(三)_第4张图片
爬虫(三)_第5张图片
爬虫(三)_第6张图片
爬虫(三)_第7张图片

1.1.1 绝招加高级断点(日志断点)

日志断点看有没有X-B值
爬虫(三)_第8张图片
爬虫(三)_第9张图片
爬虫(三)_第10张图片
日志断点加上请求内容还是太多,下面看条件断点

1.1.2 绝招加高级断点(条件断点)

爬虫(三)_第11张图片
爬虫(三)_第12张图片
爬虫(三)_第13张图片
爬虫(三)_第14张图片

1.1.3 做逆向(js逆向)

爬虫(三)_第15张图片
爬虫(三)_第16张图片
爬虫(三)_第17张图片
爬虫(三)_第18张图片
爬虫(三)_第19张图片
爬虫(三)_第20张图片
爬虫(三)_第21张图片
爬虫(三)_第22张图片

2. Python调用JS获取X-Bogus值

安装:

pip install pyExecJs
import execjs

with open("douyin.js") as f:
    js_data = f.read()

js_compile =execjs.compile(js_data)
xb_data =js_compile.call("window.xiaoc",)
print(xb_data)

爬虫(三)_第23张图片
爬虫(三)_第24张图片

import requests
import execjs

with open("douying.js") as f:
    js_code = f.read()
js_compile = execjs.compile(js_code)
url = 'https://www.douyin.com/aweme/v1/web/aweme/post/?'
user_id = "MS4wLjABAAAA2WKmM-8lEtk72YjLLI6CFWFZRDtA_WtTUmg-5p7wHqI"
params = f"device_platform=webapp&aid=6383&channel=channel_pc_web&sec_user_id={user_id}&max_cursor=0&locate_query=false&show_live_replay_strategy=1&need_time_list=1&time_list_query=0&whale_cut_token=&cut_version=1&count=18&publish_video_strategy_type=2&pc_client_type=1&version_code=170400&version_name=17.4.0&cookie_enabled=true&screen_width=1536&screen_height=864&browser_language=zh-CN&browser_platform=Win32&browser_name=Edge&browser_version=122.0.0.0&browser_online=true&engine_name=Blink&engine_version=122.0.0.0&os_name=Windows&os_version=10&cpu_core_num=12&device_memory=8&platform=PC&downlink=10&effective_type=4g&round_trip_time=100&webid=7331003658269885952&msToken=i-THFcUZPJzlfcptH7pAamO1QadvQ88RnCYldJseXyIeYmMRC7guwnHnX0z6ENz1dxnyj-1QWQQLjqp9_pHjr8lU-MqWQ9g466pOEyefDAGUGskgcu6wkKoWNzH6"
x_b = js_compile.call("window.yuan", params)
print("xb:", x_b)

new_url = url + params + "&X-Bogus=" + x_b

headers = {
    'authority': 'www.douyin.com',
    'accept': 'application/json, text/plain, */*',
    'accept-language': 'zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6',
    'cache-control': 'no-cache',
    'cookie': 'ttwid=1%7CRJTTuwiJjYo8a1GXOAc0ysKVOH3AoSWjoA5U6N16pHk%7C1706882315%7C10741635cb69c8b1954456e49e85f1a8b629e3fe8266996d349e0165c31a4c5d; volume_info=%7B%22isUserMute%22%3Afalse%2C%22isMute%22%3Afalse%2C%22volume%22%3A0.7%7D; passport_csrf_token=2315da18ef8f1bd1c7d818260eae36b3; passport_csrf_token_default=2315da18ef8f1bd1c7d818260eae36b3; bd_ticket_guard_client_web_domain=2; ttcid=80c91cb353504f9fa1cbfd551625091119; SEARCH_RESULT_LIST_TYPE=%22single%22; FORCE_LOGIN=%7B%22videoConsumedRemainSeconds%22%3A180%2C%22isForcePopClose%22%3A1%7D; xgplayer_device_id=88169692726; xgplayer_user_id=821848890322; pwa2=%220%7C0%7C3%7C0%22; n_mh=j2Ixqzt56zfZqN7wQu0bsTqZmHMjmKDA8OWvBWdIfNw; passport_auth_status=ca6c3e1b67cbe85fe382b36d87909e47%2C; passport_auth_status_ss=ca6c3e1b67cbe85fe382b36d87909e47%2C; _bd_ticket_crypt_doamin=2; __security_server_data_status=1; store-region=cn-gs; store-region-src=uid; s_v_web_id=verify_ls4q16wq_gd9vvmzX_APhv_4s5s_BQWF_Kx0IHQHrC3eV; d_ticket=cc21dc4dd81ccb045d9f213894826fe819301; publish_badge_show_info=%221%2C0%2C0%2C1706883171006%22; sso_uid_tt=5ce4bf94843b5ceab602e1b8265103f2; sso_uid_tt_ss=5ce4bf94843b5ceab602e1b8265103f2; toutiao_sso_user=7c738c8c4feec8e0cd87a55f80fef383; toutiao_sso_user_ss=7c738c8c4feec8e0cd87a55f80fef383; uid_tt=2ebe791b464abe496f63029d978005c6; uid_tt_ss=2ebe791b464abe496f63029d978005c6; sid_tt=fde6df3996602ac63ddb768c5ae33686; sessionid=fde6df3996602ac63ddb768c5ae33686; sessionid_ss=fde6df3996602ac63ddb768c5ae33686; LOGIN_STATUS=1; _bd_ticket_crypt_cookie=3db9c81dd0322f1ee81e8cdfcd276366; download_guide=%223%2F20240203%2F0%22; stream_player_status_params=%22%7B%5C%22is_auto_play%5C%22%3A0%2C%5C%22is_full_screen%5C%22%3A0%2C%5C%22is_full_webscreen%5C%22%3A0%2C%5C%22is_mute%5C%22%3A0%2C%5C%22is_speed%5C%22%3A1%2C%5C%22is_visible%5C%22%3A0%7D%22; passport_assist_user=CkFNOzqmwHfXWRP4fyG05ko4_tavdpx5sOdpxaxaZb3KerJKAvnQ_EYX2N_zqxeqcAxJbfoIF_jeWbGmfp6nO2aTsBpKCjxUSrMbw60tg9cAtd0mQvsBdPCiBm2_h2p-EF-YIhSA10b_92HZgF0oetw9H9Av8ThB2baI3o-zq5ptELsQ377IDRiJr9ZUIAEiAQMx5xgb; sid_ucp_sso_v1=1.0.0-KDYwMjEyMmJmYzk2NzlmZDAyZWQ5NzBkNzFlYzllMjQ1ZDJlMWY5NDIKIQjN9LDFnfTrAhDaxv6tBhjvMSAMMPmV9vgFOAVA-wdIAxoCbGYiIDdjNzM4YzhjNGZlZWM4ZTBjZDg3YTU1ZjgwZmVmMzgz; ssid_ucp_sso_v1=1.0.0-KDYwMjEyMmJmYzk2NzlmZDAyZWQ5NzBkNzFlYzllMjQ1ZDJlMWY5NDIKIQjN9LDFnfTrAhDaxv6tBhjvMSAMMPmV9vgFOAVA-wdIAxoCbGYiIDdjNzM4YzhjNGZlZWM4ZTBjZDg3YTU1ZjgwZmVmMzgz; sid_guard=fde6df3996602ac63ddb768c5ae33686%7C1707058010%7C5184000%7CThu%2C+04-Apr-2024+14%3A46%3A50+GMT; sid_ucp_v1=1.0.0-KDgyY2IyM2Y3NzRhODEzZjY4YTg0MjhhZjQ2Mjg5YmYwN2U3NTgxMzMKGwjN9LDFnfTrAhDaxv6tBhjvMSAMOAVA-wdIBBoCbHEiIGZkZTZkZjM5OTY2MDJhYzYzZGRiNzY4YzVhZTMzNjg2; ssid_ucp_v1=1.0.0-KDgyY2IyM2Y3NzRhODEzZjY4YTg0MjhhZjQ2Mjg5YmYwN2U3NTgxMzMKGwjN9LDFnfTrAhDaxv6tBhjvMSAMOAVA-wdIBBoCbHEiIGZkZTZkZjM5OTY2MDJhYzYzZGRiNzY4YzVhZTMzNjg2; odin_tt=15ed412c92f86046a1f66b145483d18abe524ff517a32e287576fbbfee99abf87fb4b8ab259662e67b01f22aa06d185e; __ac_nonce=065c109e900f165569cd2; __ac_signature=_02B4Z6wo00f01ziblOAAAIDAhWWIp6lmGeM4u5BAAKvyHpOmyrU6K4rSSVRojioj55mhOG-mKTbPjs-6kQzaUaBrWFogi-SAHv3zDTMm8UbiOwX1XPxgDOTr7qpvpa12HandEuX1fzU5t3DT1d; dy_swidth=1536; dy_sheight=864; csrf_session_id=17273b27de04f7773592476475360114; strategyABtestKey=%221707149804.686%22; msToken=p6X3LJVOnOsZfhXFO3WGU-vQNTBIcLa7dg4sB05ADylBkkt9qrWQevh6eaZZU652-TDp_8f80ZNGFNlUd9ap-dzG_C74z_v6u8VA1Fg156ZmACGi1fQ=; home_can_add_dy_2_desktop=%220%22; IsDouyinActive=true; stream_recommend_feed_params=%22%7B%5C%22cookie_enabled%5C%22%3Atrue%2C%5C%22screen_width%5C%22%3A1536%2C%5C%22screen_height%5C%22%3A864%2C%5C%22browser_online%5C%22%3Atrue%2C%5C%22cpu_core_num%5C%22%3A12%2C%5C%22device_memory%5C%22%3A8%2C%5C%22downlink%5C%22%3A10%2C%5C%22effective_type%5C%22%3A%5C%224g%5C%22%2C%5C%22round_trip_time%5C%22%3A150%7D%22; msToken=i-THFcUZPJzlfcptH7pAamO1QadvQ88RnCYldJseXyIeYmMRC7guwnHnX0z6ENz1dxnyj-1QWQQLjqp9_pHjr8lU-MqWQ9g466pOEyefDAGUGskgcu6wkKoWNzH6; FOLLOW_NUMBER_YELLOW_POINT_INFO=%22MS4wLjABAAAAaGmtHScBtHcQitX8N9xUsNBdYVG4USCnwCrcjubaRP_o8UgL_J7Gmki9xuE6bbqL%2F1707235200000%2F0%2F0%2F1707152072578%22; bd_ticket_guard_client_data=eyJiZC10aWNrZXQtZ3VhcmQtdmVyc2lvbiI6MiwiYmQtdGlja2V0LWd1YXJkLWl0ZXJhdGlvbi12ZXJzaW9uIjoxLCJiZC10aWNrZXQtZ3VhcmQtcmVlLXB1YmxpYy1rZXkiOiJCSEd3amJDZWRrRGQvRmxIYjJJU3JuVVFERDNQakt3ZTdwaDZjWWlOR3VBTy9hUWlPbittMVpZQUNpWmJzRFJMWGxOWmp3ak04c0lURElRbVludEtqNlU9IiwiYmQtdGlja2V0LWd1YXJkLXdlYi12ZXJzaW9uIjoxfQ%3D%3D; tt_scid=B6.qvhXzjKAHGQyeSuXGsra-IPcCo.sMCBwCjy7OHMNlfgpsijh9lnDDeIxlH2QXae71; passport_fe_beating_status=true',
    'pragma': 'no-cache',
    'referer': 'https://www.douyin.com/user/MS4wLjABAAAA2WKmM-8lEtk72YjLLI6CFWFZRDtA_WtTUmg-5p7wHqI',
    'sec-ch-ua': '"Chromium";v="122", "Not(A:Brand";v="24", "Microsoft Edge";v="122"',
    'sec-ch-ua-mobile': '?0',
    'sec-ch-ua-platform': '"Windows"',
    'sec-fetch-dest': 'empty',
    'sec-fetch-mode': 'cors',
    'sec-fetch-site': 'same-origin',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36 Edg/122.0.0.0',
}

response = requests.get(
    new_url,
    headers=headers)

print(response.text)

3. 下载视频

爬虫(三)_第25张图片
完整代码:
JS逆向crawler douyinshipin

import requests
import execjs
import threading
with open("douying.js") as f:
    js_code = f.read()
js_compile = execjs.compile(js_code)
url = 'https://www.douyin.com/aweme/v1/web/aweme/post/?'
user_id = "MS4wLjABAAAA2WKmM-8lEtk72YjLLI6CFWFZRDtA_WtTUmg-5p7wHqI"
params = f"device_platform=webapp&aid=6383&channel=channel_pc_web&sec_user_id={user_id}&max_cursor=0&locate_query=false&show_live_replay_strategy=1&need_time_list=1&time_list_query=0&whale_cut_token=&cut_version=1&count=18&publish_video_strategy_type=2&pc_client_type=1&version_code=170400&version_name=17.4.0&cookie_enabled=true&screen_width=1536&screen_height=864&browser_language=zh-CN&browser_platform=Win32&browser_name=Edge&browser_version=122.0.0.0&browser_online=true&engine_name=Blink&engine_version=122.0.0.0&os_name=Windows&os_version=10&cpu_core_num=12&device_memory=8&platform=PC&downlink=10&effective_type=4g&round_trip_time=100&webid=7331003658269885952&msToken=i-THFcUZPJzlfcptH7pAamO1QadvQ88RnCYldJseXyIeYmMRC7guwnHnX0z6ENz1dxnyj-1QWQQLjqp9_pHjr8lU-MqWQ9g466pOEyefDAGUGskgcu6wkKoWNzH6"
x_b = js_compile.call("window.yuan", params)
print("xb:", x_b)

new_url = url + params + "&X-Bogus=" + x_b

headers = {
    'authority': 'www.douyin.com',
    'accept': 'application/json, text/plain, */*',
    'accept-language': 'zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6',
    'cache-control': 'no-cache',
    'cookie': 'ttwid=1%7CRJTTuwiJjYo8a1GXOAc0ysKVOH3AoSWjoA5U6N16pHk%7C1706882315%7C10741635cb69c8b1954456e49e85f1a8b629e3fe8266996d349e0165c31a4c5d; volume_info=%7B%22isUserMute%22%3Afalse%2C%22isMute%22%3Afalse%2C%22volume%22%3A0.7%7D; passport_csrf_token=2315da18ef8f1bd1c7d818260eae36b3; passport_csrf_token_default=2315da18ef8f1bd1c7d818260eae36b3; bd_ticket_guard_client_web_domain=2; ttcid=80c91cb353504f9fa1cbfd551625091119; SEARCH_RESULT_LIST_TYPE=%22single%22; FORCE_LOGIN=%7B%22videoConsumedRemainSeconds%22%3A180%2C%22isForcePopClose%22%3A1%7D; xgplayer_device_id=88169692726; xgplayer_user_id=821848890322; pwa2=%220%7C0%7C3%7C0%22; n_mh=j2Ixqzt56zfZqN7wQu0bsTqZmHMjmKDA8OWvBWdIfNw; passport_auth_status=ca6c3e1b67cbe85fe382b36d87909e47%2C; passport_auth_status_ss=ca6c3e1b67cbe85fe382b36d87909e47%2C; _bd_ticket_crypt_doamin=2; __security_server_data_status=1; store-region=cn-gs; store-region-src=uid; s_v_web_id=verify_ls4q16wq_gd9vvmzX_APhv_4s5s_BQWF_Kx0IHQHrC3eV; d_ticket=cc21dc4dd81ccb045d9f213894826fe819301; publish_badge_show_info=%221%2C0%2C0%2C1706883171006%22; sso_uid_tt=5ce4bf94843b5ceab602e1b8265103f2; sso_uid_tt_ss=5ce4bf94843b5ceab602e1b8265103f2; toutiao_sso_user=7c738c8c4feec8e0cd87a55f80fef383; toutiao_sso_user_ss=7c738c8c4feec8e0cd87a55f80fef383; uid_tt=2ebe791b464abe496f63029d978005c6; uid_tt_ss=2ebe791b464abe496f63029d978005c6; sid_tt=fde6df3996602ac63ddb768c5ae33686; sessionid=fde6df3996602ac63ddb768c5ae33686; sessionid_ss=fde6df3996602ac63ddb768c5ae33686; LOGIN_STATUS=1; _bd_ticket_crypt_cookie=3db9c81dd0322f1ee81e8cdfcd276366; download_guide=%223%2F20240203%2F0%22; stream_player_status_params=%22%7B%5C%22is_auto_play%5C%22%3A0%2C%5C%22is_full_screen%5C%22%3A0%2C%5C%22is_full_webscreen%5C%22%3A0%2C%5C%22is_mute%5C%22%3A0%2C%5C%22is_speed%5C%22%3A1%2C%5C%22is_visible%5C%22%3A0%7D%22; passport_assist_user=CkFNOzqmwHfXWRP4fyG05ko4_tavdpx5sOdpxaxaZb3KerJKAvnQ_EYX2N_zqxeqcAxJbfoIF_jeWbGmfp6nO2aTsBpKCjxUSrMbw60tg9cAtd0mQvsBdPCiBm2_h2p-EF-YIhSA10b_92HZgF0oetw9H9Av8ThB2baI3o-zq5ptELsQ377IDRiJr9ZUIAEiAQMx5xgb; sid_ucp_sso_v1=1.0.0-KDYwMjEyMmJmYzk2NzlmZDAyZWQ5NzBkNzFlYzllMjQ1ZDJlMWY5NDIKIQjN9LDFnfTrAhDaxv6tBhjvMSAMMPmV9vgFOAVA-wdIAxoCbGYiIDdjNzM4YzhjNGZlZWM4ZTBjZDg3YTU1ZjgwZmVmMzgz; ssid_ucp_sso_v1=1.0.0-KDYwMjEyMmJmYzk2NzlmZDAyZWQ5NzBkNzFlYzllMjQ1ZDJlMWY5NDIKIQjN9LDFnfTrAhDaxv6tBhjvMSAMMPmV9vgFOAVA-wdIAxoCbGYiIDdjNzM4YzhjNGZlZWM4ZTBjZDg3YTU1ZjgwZmVmMzgz; sid_guard=fde6df3996602ac63ddb768c5ae33686%7C1707058010%7C5184000%7CThu%2C+04-Apr-2024+14%3A46%3A50+GMT; sid_ucp_v1=1.0.0-KDgyY2IyM2Y3NzRhODEzZjY4YTg0MjhhZjQ2Mjg5YmYwN2U3NTgxMzMKGwjN9LDFnfTrAhDaxv6tBhjvMSAMOAVA-wdIBBoCbHEiIGZkZTZkZjM5OTY2MDJhYzYzZGRiNzY4YzVhZTMzNjg2; ssid_ucp_v1=1.0.0-KDgyY2IyM2Y3NzRhODEzZjY4YTg0MjhhZjQ2Mjg5YmYwN2U3NTgxMzMKGwjN9LDFnfTrAhDaxv6tBhjvMSAMOAVA-wdIBBoCbHEiIGZkZTZkZjM5OTY2MDJhYzYzZGRiNzY4YzVhZTMzNjg2; odin_tt=15ed412c92f86046a1f66b145483d18abe524ff517a32e287576fbbfee99abf87fb4b8ab259662e67b01f22aa06d185e; __ac_nonce=065c109e900f165569cd2; __ac_signature=_02B4Z6wo00f01ziblOAAAIDAhWWIp6lmGeM4u5BAAKvyHpOmyrU6K4rSSVRojioj55mhOG-mKTbPjs-6kQzaUaBrWFogi-SAHv3zDTMm8UbiOwX1XPxgDOTr7qpvpa12HandEuX1fzU5t3DT1d; dy_swidth=1536; dy_sheight=864; csrf_session_id=17273b27de04f7773592476475360114; strategyABtestKey=%221707149804.686%22; msToken=p6X3LJVOnOsZfhXFO3WGU-vQNTBIcLa7dg4sB05ADylBkkt9qrWQevh6eaZZU652-TDp_8f80ZNGFNlUd9ap-dzG_C74z_v6u8VA1Fg156ZmACGi1fQ=; home_can_add_dy_2_desktop=%220%22; IsDouyinActive=true; stream_recommend_feed_params=%22%7B%5C%22cookie_enabled%5C%22%3Atrue%2C%5C%22screen_width%5C%22%3A1536%2C%5C%22screen_height%5C%22%3A864%2C%5C%22browser_online%5C%22%3Atrue%2C%5C%22cpu_core_num%5C%22%3A12%2C%5C%22device_memory%5C%22%3A8%2C%5C%22downlink%5C%22%3A10%2C%5C%22effective_type%5C%22%3A%5C%224g%5C%22%2C%5C%22round_trip_time%5C%22%3A150%7D%22; msToken=i-THFcUZPJzlfcptH7pAamO1QadvQ88RnCYldJseXyIeYmMRC7guwnHnX0z6ENz1dxnyj-1QWQQLjqp9_pHjr8lU-MqWQ9g466pOEyefDAGUGskgcu6wkKoWNzH6; FOLLOW_NUMBER_YELLOW_POINT_INFO=%22MS4wLjABAAAAaGmtHScBtHcQitX8N9xUsNBdYVG4USCnwCrcjubaRP_o8UgL_J7Gmki9xuE6bbqL%2F1707235200000%2F0%2F0%2F1707152072578%22; bd_ticket_guard_client_data=eyJiZC10aWNrZXQtZ3VhcmQtdmVyc2lvbiI6MiwiYmQtdGlja2V0LWd1YXJkLWl0ZXJhdGlvbi12ZXJzaW9uIjoxLCJiZC10aWNrZXQtZ3VhcmQtcmVlLXB1YmxpYy1rZXkiOiJCSEd3amJDZWRrRGQvRmxIYjJJU3JuVVFERDNQakt3ZTdwaDZjWWlOR3VBTy9hUWlPbittMVpZQUNpWmJzRFJMWGxOWmp3ak04c0lURElRbVludEtqNlU9IiwiYmQtdGlja2V0LWd1YXJkLXdlYi12ZXJzaW9uIjoxfQ%3D%3D; tt_scid=B6.qvhXzjKAHGQyeSuXGsra-IPcCo.sMCBwCjy7OHMNlfgpsijh9lnDDeIxlH2QXae71; passport_fe_beating_status=true',
    'pragma': 'no-cache',
    'referer': 'https://www.douyin.com/user/MS4wLjABAAAA2WKmM-8lEtk72YjLLI6CFWFZRDtA_WtTUmg-5p7wHqI',
    'sec-ch-ua': '"Chromium";v="122", "Not(A:Brand";v="24", "Microsoft Edge";v="122"',
    'sec-ch-ua-mobile': '?0',
    'sec-ch-ua-platform': '"Windows"',
    'sec-fetch-dest': 'empty',
    'sec-fetch-mode': 'cors',
    'sec-fetch-site': 'same-origin',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36 Edg/122.0.0.0',
}

response = requests.get(
    new_url,
    headers=headers)

# print(response.text)

aweme_list = response.json().get("aweme_list")

url_list = [aweme.get("video").get("play_addr").get("url_list")[0] for aweme in aweme_list] # 加起来形成一个完整地址,取video里面的play_addr里面url_list的第一个地址(播放列表有重复的取第一个)(查看网页预览一步一步点)
# print(url_list)


# 下载短视频

def get_one_video(url, c):
    res = requests.get(url)
    # 文件写操作
    with open(f"./videos/{c}.mp4", "wb") as f:  # w:写文本 wb写字节
        f.write(res.content)
    print(f"{c}.mp4下载成功!")


c = 1
t_list = []
for url in url_list:
    t = threading.Thread(target=get_one_video, args=(url, c))
    t.start()
    t_list.append(t)
    c += 1

for t in t_list:
    t.join()    # 遍历t_list里面所有的线程对象,等待所有的都执行完join才通过

你可能感兴趣的:(python,crawler,request,X-Bogus)