四周学会爬虫网课.第一周

认识网页的构成

HTML : <>     结构,房间构造
CSS : <div class=" ">   样式,装修
JaveScript : <script>  功能,电器
HTML:
·.<div>div> 对应一块区域
2.<div>
    <p>Wow!p>加入文字
div>
3.<div calss='a'>  给区域加入样式
    <p>Wow!p>
div>
4.<li>列表
5.<img>图片
6.<h1>标题
7.<a href = "">控制连接
8.<body>标签中的信息:网页中可见
9.<head>标签中的信息:传递给浏览器

以及自己跟着写的简单爬虫

from bs4 import BeautifulSoup
path = '/Users/Administrator/Desktop/1.2/index.html'
with open(path, 'r') as wb:
    Soup = BeautifulSoup(wb, 'lxml')
    Images = Soup.select('body > div > div > div.col-md-9 > div > div > div > img')
    names = Soup.select('body > div > div > div.col-md-9 > div > div > div > div.caption > h4 > a')
    moneys = Soup.select('body > div > div > div.col-md-9 > div > div > div > div.caption > h4.pull-right')
    peoples = Soup.select('body > div > div > div.col-md-9 > div > div > div > div.ratings > p.pull-right')
    stars = Soup.select('body > div > div > div.col-md-9 > div > div > div > div.ratings > p > span')

for Image, money, people, star, name in zip(Images, moneys, peoples, stars, names):
    data = {
        'Image': Image.get('src'),
        'people': people.get_text(),
        'money': money.get_text(),
        'name': name.get_text(),
        'star': star.get_text(),
    }
    print(data)
from bs4 import BeautifulSoup
path = '/Users/Administrator/Desktop/1.2/index.html'
with open(path, 'r') as wb:
    Soup = BeautifulSoup(wb, 'lxml')
    Images = Soup.select('body > div > div > div.col-md-9 > div > div > div > img')
    names = Soup.select('body > div > div > div.col-md-9 > div > div > div > div.caption > h4 > a')
    moneys = Soup.select('body > div > div > div.col-md-9 > div > div > div > div.caption > h4.pull-right')
    peoples = Soup.select('body > div > div > div.col-md-9 > div > div > div > div.ratings > p.pull-right')
    stars = Soup.select('body > div > div > div.col-md-9 > div > div > div > div.ratings > p > span')

for Image, money, people, star, name in zip(Images, moneys, peoples, stars, names):
    data = {
        'Image': Image.get('src'),
        'people': people.get_text(),
        'money': money.get_text(),
        'name': name.get_text(),
        'star': star.get_text(),
    }
    print(data)

你可能感兴趣的:(四周学会爬虫网课.第一周)