selenium 知网爪巴虫

selenium 知网爪巴虫_第1张图片
知网爬虫的第一步,输入检索条件

selenium 通过模拟鼠标点击,自动实现:选择检索词的类别、输入检索词、选择精确还是模糊查找、逻辑关系、点击检索按钮等一系列动作

而你所需要做的,就是给出搜索条件:

search_words = '摘要:地理探测器(精确) OR 摘要:geodetector(精确)'

首先将搜索条件处理成四元组:(逻辑关系,搜索类型,搜索词,精确|模糊)

search_words = 'BEG '+search_words
pieces = search_words.split(' ')

conditions = []
for p in pieces:
    if p in ['BEG', 'OR', 'AND','NOT']:
        conditions.append([p])
    else:
        conditions[-1] += p.replace(')','').replace('(',':').split(':')
print(conditions)
'''
[['BEG', '摘要', '地理探测器', '精确'], ['OR', '摘要', 'geodetector', '精确']]
'''

然后就开始一系列的点击啦

search_type = {
     
    "主题":"SU",
    "篇关摘":"TKA",
    "关键词":"KY",
    "篇名":"TI",
    "全文":"FT",
    "作者":"AU",
    "第一作者":"FI",
    "通讯作者":"RP",
    "作者单位":"AF",
    "基金":"FU",
    "摘要":"AB",
    "小标题":"CO",
    "参考文献":"RF",
    "分类号":"CLC",
    "文献来源":"LY",
    "DOI":"DOI"
}

search_fuzzy = {
     
    "精确" : "=",
    "模糊" :"%"
}

logical_id = {
     
    "AND": 0,
    "OR":1,
    "NOT":2
}

sleep_time = 0.5

search_middle = driver.find_element_by_class_name('search-middle')
dds = search_middle.find_elements_by_tag_name("dd")
if len(dds) < len(conditions):
    pass

for i in range(len(conditions)):

    if i > 0:
        logical_list = dds[i].find_element_by_xpath('.//div[@class="sort logical"]')
        logical_list.click()
        time.sleep(sleep_time)
        options = logical_list.find_elements_by_xpath(
            './/a'
        )
        options[logical_id[conditions[i][0]]].click()
        time.sleep(sleep_time)

    dds[i].find_element_by_xpath('.//div[@class="sort reopt"]').click()
    time.sleep(sleep_time)
    dds[i].find_element_by_xpath(
        './/a[@value="{}"]'.format(search_type[conditions[i][1]])
    ).click()
    time.sleep(sleep_time)
    
    dds[i].find_element_by_tag_name('input').clear()
    dds[i].find_element_by_tag_name('input').send_keys(conditions[i][2])
    
    dds[i].find_element_by_xpath('.//div[@class="sort special"]')\
            .find_element_by_class_name('sort-default').click()
    time.sleep(sleep_time)
    dds[i].find_element_by_xpath(
        './/a[@value="{}"]'.format(search_fuzzy[conditions[i][3]])
    ).click()
    time.sleep(sleep_time)
driver.find_element_by_class_name('search-buttons').click()

time.sleep(5)

成果展示:
selenium 知网爪巴虫_第2张图片

关注后免费下载完整代码哦:

https://download.csdn.net/download/itnerd/12832133

注:代码用 jupyter notebook 完成,这个好用的工具怎么能不学呢

你可能感兴趣的:(#,爪巴虫技术,#,编程语言,知网,爪巴虫,selenium)