Python利用BeautifulSoup抓取解析网页数据

阅读更多
网页UI以及HTML组织形式,目的是抓取网页数据并解析。
Python利用BeautifulSoup抓取解析网页数据_第1张图片
 
Rank
Name
Level
League
Trophies
Donations
Role
#1
11
4438
379
Leader
#2
12
4344
498
Co-Leader
#3
11
4276
322
Co-Leader
#4
12
4229
264
Co-Leader
#5
12
4220
380
Co-Leader
#6
11
4205
204
Co-Leader
#7
11
4171
308
Co-Leader
#8
11
4154
186
Co-Leader
#9
hwj
10
4023
48
Elder
#10
11
4000
202
Co-Leader
#11
11
3893
470
Co-Leader
#12
11
3882
72
Co-Leader
#13
11
3873
199
Co-Leader
#14
11
3862
178
Co-Leader
#15
10
3812
38
Co-Leader
#16
10
3769
42
Elder
#17
10
3692
100
Elder
#18
11
3656
240
Co-Leader
#19
11
3615
376
Co-Leader
#20
9
3550
384
Co-Leader
#21
11
3514
74
Elder
#22
10
3480
188
Co-Leader
#23
11
3459
166
Co-Leader
#24
10
3380
92
Co-Leader
#25
10
3305
43
Member
#26
11
3264
168
Co-Leader
#27
10
3253
300
Co-Leader
#28
RT
10
3230
108
Elder
#29
11
3200
64
Elder
#30
RT
10
3196
244
Elder
#31
10
3147
341
Co-Leader
#32
10
3030
0
Co-Leader
#33
9
3014
10
Member
#34
10
2846
32
Elder
#35
9
2841
286
Elder
#36
9
2811
20
Member
#37
9
2806
158
Co-Leader
#38
9
2785
60
Member
#39
10
2753
34
Member
#40
8
2744
20
Member
#41
9
2656
108
Member
#42
8
2655
116
Co-Leader
#43
8
2625
30
Elder
#44
10
2623
0
Member
#45
10
2544
0
Member
#46
10
2542
0
Member
#47
9
2472
0
Elder
#48
10
2443
88
Member
#49
9
2371
0
Member
#50
9
2349
40
Member

 
Python利用BeautifulSoup抓取解析网页数据_第2张图片
 通过查看页面源代码,我们发现每一个玩家信息都是存储在一个class为clan__rowContainer的div中。

那么我们就可以通过soup的finaAll选择器来获取所有行的玩家信息,然后遍历挨个解析玩家数据。

具体代码:

for i, row in enumerate(soup.findAll("div",attrs = {"class":"clan__rowContainer"})):
        user_dict = {}
        for j,col in enumerate(row.findAll("div",attrs = {"class":"clan__row"})):
            if j == 0:
                user_dict["rank"] = col.string.strip().replace("#","")
            elif j == 1:
                user_dict["name"] = col.a.string.strip()
                user_dict["uid"] = col.a.get("href").strip("/profile/")
            elif j == 2:
                user_dict["level"] = col.span.string.strip()
            elif j == 3:
                user_dict["league"] = col.contents[1].div.get("class")[0].replace("league__","")
            elif j == 4:
                user_dict["score"] = col.div.string.strip()
            elif j == 5:
                user_dict["donations"] = col.string.strip()
            elif j == 6:
                user_dict["role"] = col.string.strip()
        print(user_dict)
        

   

 

  • Python利用BeautifulSoup抓取解析网页数据_第3张图片
  • 大小: 417 KB
  • Python利用BeautifulSoup抓取解析网页数据_第4张图片
  • 大小: 486.6 KB
  • 查看图片附件

你可能感兴趣的:(Python利用BeautifulSoup抓取解析网页数据)