Hello! 我是小小,今天是本周的第三篇,对于第三篇来说,介绍一个爬数据的东东。这里以爬取抖音数据为例子。
前期准备需要一部安卓手机。下载相关软件,这里下载HttpCanary 抖音App
在手机上打开网址 https://play.google.com/store/apps/details?id=com.guoshi.httpcanary&hl=zh&gl=US 如下图所示:下载相关的软件。并启动,如图所示
下载抖音,启动
首先科普一下啥事抓包,抓包是一种类似于中间人攻击的手段,使用抓包可以实现拦截并获取用户发送和接收的HTTP流量,可以用于进行数据的分析。在安卓上抓包相当的简单,只需要点击下方的小飞机,即可。此时可以看到已经开始抓包。
按下暂停,进入任意一个请求。可以详细的看到相应的抓包内容。在这就可以进行详细的分析啦。在总览中,可以看到相应的请求的信息。在请求中,可以看到相应的请求体,以及请求头部的详细信息。在响应中,可以看到相应的响应部分。
这里开始抓取抖音的用户搜索列表。这里只抓取前一条。首先,清空之前抓取的请求。如图所示单击按钮开始抓包。然后快速的进入到抖音,搜索一个用户。如上图所示。然后停止进行抓包。可以看到一共抓取了35条HTTP请求然后,逐个分析,查看哪个是用户列表的HTTP请求
这里的分析的方法,常见的方法有,返回值法分析,根据返回值分析哪个是可能的请求,根据名称分析,例如搜索,那么url里一定有search,keyword相关的关键字。如果是点赞,那么一定有like相关的。但是不一定是绝对的。
这里分析结果如下,分析出来其中
https://aweme.snssdk.com/aweme/v1/search/sug/?keyword=%E5%B0%8F%E6%A9%99%E5%AD%90&source=user&from_group_id=6901212774668504323&os_api=23&device_type=MI+5s&ssmix=a&manifest_version_code=130701&dpi=240&uuid=910000000073543&app_name=aweme&version_name=13.7.0&ts=1606905781&cpu_support64=false&app_type=normal&appTheme=ddark&ac=wifi&host_abi=armeabi-v7a&update_version_code=13709900&channel=aweGW&_rticket=1606905782673&device_platform=android&iid=2497731620770349&version_code=130700&cdid=8ef1cc20-e0a7-478a-9193-4f396474f75e&openudid=ea10cded4241887b&device_id=69441294706&resolution=810*1440&os_version=6.0.1&language=zh&device_brand=Xiaomi&aid=1128
这个很长很长的连接就是用户列表链接。可以看到在APP中的详细信息如下根据这个信息,把这个信息导出,复制到电脑端的POSTMan中。如图所示然后选择发送相关请求。可以看到请求已经发送。返回的结果已经出现。
{
"sug_list": [
{
"pos": [
{
"begin": 0,
"end": 2
}
],
"content": "小橙子",
"sug_type": "",
"word_record": {
"group_id": "6541999374455543053",
"words_position": 0,
"words_content": "小橙子",
"words_source": "sug"
},
"extra_info": {
"combine_utility": "0.105454",
"is_rich_sug": "0",
"latency": "52489",
"recall_reason": "orion_qse_recall|origin_query|bg_search_after_read|aweme_index_query_word_shortterm|viking_recall|new_user_word",
"rich_sug_type": "0",
"score": "10033.483513"
}
},
{
"pos": [
{
"begin": 0,
"end": 2
}
],
"content": "小橙子先生",
"sug_type": "",
"word_record": {
"group_id": "6663331049126335757",
"words_position": 1,
"words_content": "小橙子先生",
"words_source": "sug"
},
"extra_info": {
"combine_utility": "0.075498",
"is_rich_sug": "0",
"latency": "52489",
"recall_reason": "orion_qse_recall|aweme_orion_word|aweme_index_query_word_shortterm|new_user_word",
"rich_sug_type": "0",
"score": "35.694026"
}
},
{
"pos": [
{
"begin": 0,
"end": 2
}
],
"content": "小橙子????",
"sug_type": "",
"word_record": {
"group_id": "6605856988817528077",
"words_position": 2,
"words_content": "小橙子????",
"words_source": "sug"
},
"extra_info": {
"combine_utility": "0.022446",
"is_rich_sug": "0",
"latency": "52489",
"recall_reason": "new_user_word|aweme_index_query_word_shortterm",
"rich_sug_type": "0",
"score": "23.660895"
}
},
{
"pos": [
{
"begin": 0,
"end": 2
}
],
"content": "小橙子向李尖尖道歉",
"sug_type": "",
"word_record": {
"group_id": "6867526537433453828",
"words_position": 3,
"words_content": "小橙子向李尖尖道歉",
"words_source": "sug"
},
"extra_info": {
"combine_utility": "0.010760",
"is_rich_sug": "0",
"latency": "52489",
"recall_reason": "orion_qse_recall|aweme_index_query_word_shortterm|aweme_orion_word",
"rich_sug_type": "0",
"score": "15.679590"
}
},
{
"pos": [
{
"begin": 0,
"end": 2
}
],
"content": "小橙子姐姐",
"sug_type": "",
"word_record": {
"group_id": "6595874137606984963",
"words_position": 4,
"words_content": "小橙子姐姐",
"words_source": "sug"
},
"extra_info": {
"combine_utility": "0.006576",
"is_rich_sug": "0",
"latency": "52489",
"recall_reason": "orion_qse_recall|aweme_orion_word|aweme_index_query_word_shortterm|new_user_word",
"rich_sug_type": "0",
"score": "21.280143"
}
},
{
"pos": [
{
"begin": 0,
"end": 2
}
],
"content": "小橙子妲己视频",
"sug_type": "",
"word_record": {
"group_id": "6733562976994874637",
"words_position": 5,
"words_content": "小橙子妲己视频",
"words_source": "sug"
},
"extra_info": {
"combine_utility": "0.005108",
"is_rich_sug": "0",
"latency": "52489",
"recall_reason": "orion_qse_recall|aweme_index_query_word_shortterm|aweme_orion_word",
"rich_sug_type": "0",
"score": "16.143941"
}
},
{
"pos": [
{
"begin": 0,
"end": 2
}
],
"content": "小橙子2.0",
"sug_type": "",
"word_record": {
"group_id": "6626934284127048967",
"words_position": 6,
"words_content": "小橙子2.0",
"words_source": "sug"
},
"extra_info": {
"combine_utility": "0.001549",
"is_rich_sug": "0",
"latency": "52489",
"recall_reason": "aweme_orion_word|new_user_word",
"rich_sug_type": "0",
"score": "16.700139"
}
},
{
"pos": [
{
"begin": 0,
"end": 2
}
],
"content": "小橙子吖",
"sug_type": "",
"word_record": {
"group_id": "6657385354091386119",
"words_position": 7,
"words_content": "小橙子吖",
"words_source": "sug"
},
"extra_info": {
"combine_utility": "0.001493",
"is_rich_sug": "0",
"latency": "52489",
"recall_reason": "new_user_word|aweme_index_query_word_shortterm",
"rich_sug_type": "0",
"score": "16.461999"
}
},
{
"pos": [
{
"begin": 0,
"end": 2
}
],
"content": "小橙子摔下楼梯",
"sug_type": "",
"word_record": {
"group_id": "6860077678121194759",
"words_position": 8,
"words_content": "小橙子摔下楼梯",
"words_source": "sug"
},
"extra_info": {
"combine_utility": "0.001470",
"is_rich_sug": "0",
"latency": "52489",
"recall_reason": "orion_qse_recall|aweme_index_query_word_shortterm|aweme_orion_word",
"rich_sug_type": "0",
"score": "15.075646"
}
},
{
"pos": [
{
"begin": 0,
"end": 2
}
],
"content": "小橙子妈妈????",
"sug_type": "",
"word_record": {
"group_id": "6741513233120630030",
"words_position": 9,
"words_content": "小橙子妈妈????",
"words_source": "sug"
},
"extra_info": {
"combine_utility": "0.001376",
"is_rich_sug": "0",
"latency": "52489",
"recall_reason": "new_user_word",
"rich_sug_type": "0",
"score": "14.622774"
}
}
],
"status_code": 0,
"status_msg": "",
"rid": "20201202202753010198065013351E671F",
"words_query_record": {
"info": "{}",
"words_source": "sug",
"query_id": ""
},
"extra": {
"now": 1606912074000,
"logid": "20201202202753010198065013351E671F",
"fatal_item_ids": [],
"search_request_id": ""
},
"log_pb": {
"impr_id": "20201202202753010198065013351E671F"
}
}
至此数据到手,至于生下来这么做,那就随你喽,你可以保存数据,可以进行数据分析,等等都可以的。
我是小小,一枚程序猿,我们下期再见。双鱼座的哦~
END
「 往期文章 」
红包 | 让真正抢红包的人都能抢到红包!
财政 | 十一月财政总结
领路人 | 想要成为2.0版马云?除了机会,你更需要一位领路人
扫描二维码
获取更多精彩
小明菜市场
来源:网络(侵删)
图片来源:网络(侵删)
点个 在看你最好看