抓取51cto学院在线播放视频的真实地址

比如这个视频
http://edu.51cto.com/static/js/51player.swf?id=16836&autoplay=1&callbackJs=SyPlayerStatus

浏览器器会发一个请求
http://edu.51cto.com/index.php?do=api&m=index&lesson_id=16836&sign=1c1a46c7b63a403c15c9d94bd610b887

sign后面的值应该是swf文件生成的,返回的数据包如下

{"lesson_title":"\u5229\u7528python socketServer\u591a\u7ebf\u7a0b\u5f00\u53d1FTP\u8f6f\u4ef6","lesson_duration":"3493","video_url":"aHR0cDovL3ZpZGVvLjUxY3RvLmNvbS8yMDEzLTEyLzExLzE0NzM2L3lhdmNfNTJhODBmZTFkN2M1Ml8wLm1wND9LRVkxPTdkZTVhOWY2YWI4ZjQ0NzBlOGZkMGE3YjY3MmU0NTBhJktFWTI9NTJmNDY5MDF8aHR0cDovL3ZpZGVvLjUxY3RvLmNvbS8yMDEzLTEyLzExLzE0NzM2L3lhdmNfNTJhODBmZTFkN2M1Ml8xLm1wND9LRVkxPTYzNTgzOWIxMGRhZTg5Y2QxOGRjM2E4NjM0ZjJhNTgxJktFWTI9NTJmNDY5MDF8aHR0cDovL3ZpZGVvLjUxY3RvLmNvbS8yMDEzLTEyLzExLzE0NzM2L3lhdmNfNTJhODBmZTFkN2M1Ml8yLm1wND9LRVkxPWRmNWY0MWVmYmFhNjA4YjhiZTZmNzc3MjZlMWU2MDhmJktFWTI9NTJmNDY5MDF8aHR0cDovL3ZpZGVvLjUxY3RvLmNvbS8yMDEzLTEyLzExLzE0NzM2L3lhdmNfNTJhODBmZTFkN2M1Ml8zLm1wND9LRVkxPTUxMWQ1M2M1MzU5MTk4OTZmZTI3Y2U1MGNiNGNjYWJkJktFWTI9NTJmNDY5MDF8aHR0cDovL3ZpZGVvLjUxY3RvLmNvbS8yMDEzLTEyLzExLzE0NzM2L3lhdmNfNTJhODBmZTFkN2M1Ml80Lm1wND9LRVkxPWViZDY0MzJjYTA1Y2MyMzRhZDEyYTYzODM3ZWJjNzM5JktFWTI9NTJmNDY5MDF8aHR0cDovL3ZpZGVvLjUxY3RvLmNvbS8yMDEzLTEyLzExLzE0NzM2L3lhdmNfNTJhODBmZTFkN2M1Ml81Lm1wND9LRVkxPWFiNjMyMjUzMjBmMGE3ZjM0ZTJkYzkyODU1NWMzNDQwJktFWTI9NTJmNDY5MDF8aHR0cDovL3ZpZGVvLjUxY3RvLmNvbS8yMDEzLTEyLzExLzE0NzM2L3lhdmNfNTJhODBmZTFkN2M1Ml82Lm1wND9LRVkxPWVhMzA4NzQ2MjViZTllMjExYTBjNDBmYjQzOTkwNjQ5JktFWTI9NTJmNDY5MDF8aHR0cDovL3ZpZGVvLjUxY3RvLmNvbS8yMDEzLTEyLzExLzE0NzM2L3lhdmNfNTJhODBmZTFkN2M1Ml83Lm1wND9LRVkxPWVkOGIzMWI1MWVkMTlkNWNhNjE1ZWViNzk1MmUxOTg4JktFWTI9NTJmNDY5MDF8aHR0cDovL3ZpZGVvLjUxY3RvLmNvbS8yMDEzLTEyLzExLzE0NzM2L3lhdmNfNTJhODBmZTFkN2M1Ml84Lm1wND9LRVkxPTExYzliMGU3OGQzMTU3ODBjNzU5NWRhNDcwNzNjNmQ2JktFWTI9NTJmNDY5MDF8aHR0cDovL3ZpZGVvLjUxY3RvLmNvbS8yMDEzLTEyLzExLzE0NzM2L3lhdmNfNTJhODBmZTFkN2M1Ml85Lm1wND9LRVkxPWYxMDQ4Zjg3ODY2YjIwNzJlODAyMjU0MThiNTVlMzg1JktFWTI9NTJmNDY5MDF8aHR0cDovL3ZpZGVvLjUxY3RvLmNvbS8yMDEzLTEyLzExLzE0NzM2L3lhdmNfNTJhODBmZTFkN2M1Ml8xMC5tcDQ\/S0VZMT05ZTgyMDI3ZjBlZGNmZmNiZjg0YWEzZWYwNTMwMmZlYyZLRVkyPTUyZjQ2OTAxfGh0dHA6Ly92aWRlby41MWN0by5jb20vMjAxMy0xMi8xMS8xNDczNi95YXZjXzUyYTgwZmUxZDdjNTJfMTEubXA0P0tFWTE9Mjk1YjIxOTgyYzEyZDYzNDlhZTg3Mjk4MGRkNGUwOTQmS0VZMj01MmY0NjkwMQ==","uid":8343567,"htime":"2580","utime":1391748703,"stime":1391749377,"infotip":"\u6b22\u8fce\u89c2\u770b51CTO\u5b66\u9662\u8bfe\u7a0b\uff01","heartInterval":30000}


video_url用base64解码即可

http://video.51cto.com/2013-12/11/14736/yavc_52a80fe1d7c52_0.mp4?KEY1=7de5a9f6ab8f4470e8fd0a7b672e450a&KEY2=52f46901|http://video.51cto.com/2013-12/11/14736/yavc_52a80fe1d7c52_1.mp4?KEY1=635839b10dae89cd18dc3a8634f2a581&KEY2=52f46901|http://video.51cto.com/2013-12/11/14736/yavc_52a80fe1d7c52_2.mp4?KEY1=df5f41efbaa608b8be6f77726e1e608f&KEY2=52f46901|http://video.51cto.com/2013-12/11/14736/yavc_52a80fe1d7c52_3.mp4?KEY1=511d53c535919896fe27ce50cb4ccabd&KEY2=52f46901|http://video.51cto.com/2013-12/11/14736/yavc_52a80fe1d7c52_4.mp4?KEY1=ebd6432ca05cc234ad12a63837ebc739&KEY2=52f46901|http://video.51cto.com/2013-12/11/14736/yavc_52a80fe1d7c52_5.mp4?KEY1=ab63225320f0a7f34e2dc928555c3440&KEY2=52f46901|http://video.51cto.com/2013-12/11/14736/yavc_52a80fe1d7c52_6.mp4?KEY1=ea30874625be9e211a0c40fb43990649&KEY2=52f46901|http://video.51cto.com/2013-12/11/14736/yavc_52a80fe1d7c52_7.mp4?KEY1=ed8b31b51ed19d5ca615eeb7952e1988&KEY2=52f46901|http://video.51cto.com/2013-12/11/14736/yavc_52a80fe1d7c52_8.mp4?KEY1=11c9b0e78d315780c7595da47073c6d6&KEY2=52f46901|http://video.51cto.com/2013-12/11/14736/yavc_52a80fe1d7c52_9.mp4?KEY1=f1048f87866b2072e80225418b55e385&KEY2=52f46901|http://video.51cto.com/2013-12/11/14736/yavc_52a80fe1d7c52_10.mp4


sign值是随机的,在网页源码中无法获取到,那么只能用浏览器模拟,可以用python+Selenium,由于chrome浏览器速度快,并且自带flash插件,这里Selenium就用chrome driver来驱动


然后再涉及把chrome中的数据导出的问题。我测试了两种方式,一是通过Wireshark命令行来抓数据,不过这个Wireshark命令行启动较慢,而且写入数据不是实时的,有几秒的延时。


第二种方式就是ChromeCacheView.exe软件,这个导出chrome缓存中的文件,并且支持命令行。


剩下的工作就是用python+Selenium+ChromeCacheView.exe写个循环脚本就行了.

你可能感兴趣的:(python,51cto学院在线播放视频地址)