python + 夜神模拟器 + appium 小红书app数据抓取

代码部分需要根据自己的模拟器设置进行修改,指定的元素信息每台设备都不相同,需要进行查找修改

环境搭建

:执行命令

pip install Appium-Python-Client

安装 Android Studio

(自带Android SDK)
下载地址:https://developer.android.google.cn/studio/
python + 夜神模拟器 + appium 小红书app数据抓取_第1张图片

命令行窗口进入模拟器安装的bin路径 D:\Nox\bin
,在终端输入 adb devices 命令,如果能显示手机信息则 android sdk 环境配置成功

#启动链接端口
nox_adb.exe connect 127.0.0.1:62001

# adb命令:
adb devices

# 命令执行结果:
List of devices attached
127.0.0.1:62001    device



Appium 桌面版安装

1、下载地址:https://github.com/appium/appium-desktop/releases/tag/v1.13.0

2、打开 Appium,在“编辑配置”中修改 ANDROID_HOME、JAVA_HOME 路径

python + 夜神模拟器 + appium 小红书app数据抓取_第2张图片
3、在启动服务之后,点击“启动检查器会话”

python + 夜神模拟器 + appium 小红书app数据抓取_第3张图片

保存连接手机所对应的信息
python + 夜神模拟器 + appium 小红书app数据抓取_第4张图片
{
“platformName”: “Android”,
“deviceName”: “DUK_AL20”,
“appPackage”: “com.xingin.xhs”,
“platformVersion”: “9”,
“appActivity”: “.activity.SplashActivity”,
“noReset”: true,
“unicodeKeyboard”: true
}

 
// 常见参数解释:
deviceName:指定启动设备的名称,在设置中可以找到
automationName:指定自动化引擎,默认appium
platformName:指定移动平台,Android或者iOS
platformVersion:指定平台的系统版本。例如指定Android系统版本为8.1.0
appActivity:待测试app的Activity,注意,原生app的话要在activity前加个“.”
appPackage:待测试app的包名(package)信息
udid:手机device编码,连接上手机后使用adb devices可以查询到
	

获取appPackage和appActivity的方法:
最简单有效的方法为使用命令行获取。使用Appium客户端连接到APP,将APP打开到需要获取appActivity的页面,执行下面命令:

adb shell
dumpsys activity | grep mFocusedActivity

python + 夜神模拟器 + appium 小红书app数据抓取_第5张图片

代码部分

不同的定位方式
driver.find_element_by_id("XXXX")
driver.find_element_by_xpath("XXXX")
driver.find_element_by_class_name("XXXX")
driver.find_element_by_link_text("XXXX")
driver.find_element_by_name("XXXX")
driver.find_element_by_css_selector("XXXX")

# 点击
driver.find_element_by_id("XXXX").click()
 
# 输入文字
driver.find_element_by_id("XXXX").send_keys("XXXX")
 
 
# 根据坐标定位点击
driver.tap([(983, 1820)])
 
 
# 滑动屏幕(x1、y1:起始坐标;x2、y2:终点坐标)
driver.swipe(x1, y1, x1, y2)

import time
import datetime
import pymysql

from appium import webdriver
#1、启动APP
caps = {
    "platformName": "Android",
    "deviceName": "127.0.0.1:62001",
    'platformVersion': '5.1.1',
    "appPackage": "com.xingin.xhs",
    #"appActivity": ".antispam.CaptchaActivity",  # 主页
    "appActivity": ".index.v2.IndexActivityV2",  # 主页
    "noReset": True,
    "automationName": "UiAutomator1",
    "unicodeKeyboard": True
}


# 获得屏幕尺寸数据
def getSize():
    x = driver.get_window_size()['width']
    y = driver.get_window_size()['height']
    return (x, y)


# 屏幕向下滑动
def swipeUp():
    l = getSize()
    x1 = int(l[0] * 0.5)  # x坐标
    y1 = int(l[1] * 0.9)  # 起始y坐标
    y2 = int(l[1] * 0.1)  # 终点y坐标
    driver.swipe(x1, y1, x1, y2)


# 屏幕向上滑动
def swipeDown():
    l = getSize()
    x1 = int(l[0] * 0.5)  # x坐标
    y1 = int(l[1] * 0.9)  # 起始y坐标
    y2 = int(l[1] * 0.1)  # 终点y坐标
    driver.swipe(x1, y1, x1, y2)


# 判断是否存在指定元素
def isElement(driver, identifyBy, c):
    time.sleep(1)
    flag = False
    try:
        if identifyBy == "id":
            driver.find_element_by_id(c)
        elif identifyBy == "xpath":
            driver.find_element_by_xpath(c)
        elif identifyBy == "class":
            driver.find_element_by_class_name(c)
        elif identifyBy == "link text":
            driver.find_element_by_link_text(c)
        elif identifyBy == "partial link text":
            driver.find_element_by_partial_link_text(c)
        elif identifyBy == "name":
            driver.find_element_by_name(c)
        elif identifyBy == "tag name":
            driver.find_element_by_tag_name(c)
        elif identifyBy == "css selector":
            driver.find_element_by_css_selector(c)
        flag = True
    except Exception as e:
        # 不存在数据
        flag = False
    finally:
        return flag




if __name__ == '__main__':
    # 打开app
    driver = webdriver.Remote('http://127.0.0.1:4723/wd/hub', caps)

    print('当前时间',datetime.datetime.now())

    xx = getSize()
    print('屏幕尺寸',xx)
    # 点击首页输入框
    t = driver.find_element_by_id("com.xingin.xhs:id/bc6").click()
    #driver.find_element_by_xpath("/hierarchy/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/androidx.drawerlayout.widget.DrawerLayout/android.widget.LinearLayout[1]/android.widget.RelativeLayout/androidx.viewpager.widget.ViewPager/android.widget.LinearLayout/android.widget.FrameLayout[1]/android.widget.FrameLayout/android.widget.FrameLayout[2]/android.widget.FrameLayout/android.widget.FrameLayout").click()
    time.sleep(2)
    # 输入查询数据
    driver.find_element_by_id('com.xingin.xhs:id/bdd').send_keys('离人泪')
    #driver.find_element_by_xpath('/hierarchy/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.LinearLayout/android.widget.LinearLayout/android.widget.LinearLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.EditText').send_keys('离人泪')
    time.sleep(1)
    #点击搜索按钮
    driver.find_element_by_id("com.xingin.xhs:id/bdg").click()
    #driver.find_element_by_xpath('/hierarchy/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.LinearLayout/android.widget.LinearLayout/android.widget.LinearLayout/android.widget.LinearLayout/android.widget.TextView').click()
    time.sleep(1)

    #点击综合
    #driver.find_element_by_xpath('/hierarchy/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.view.View/androidx.viewpager.widget.ViewPager/android.widget.FrameLayout/android.view.View/android.widget.RelativeLayout/android.widget.RelativeLayout/android.widget.RelativeLayout/android.widget.LinearLayout/android.widget.TextView[1]').click()
    driver.find_element_by_id('com.xingin.xhs:id/bd4').click()
    time.sleep(1)

    #点击筛选
    driver.find_element_by_id('com.xingin.xhs:id/bd5').click()
    #driver.find_element_by_xpath('/hierarchy/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.view.View/androidx.viewpager.widget.ViewPager/android.widget.FrameLayout/android.view.View/android.widget.RelativeLayout/android.widget.RelativeLayout/android.widget.RelativeLayout/android.widget.TextView').click()
    time.sleep(1)

    #点击普通笔记
    driver.find_element_by_xpath('/hierarchy/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.view.View[1]/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.view.View/android.widget.TextView[1]').click()
    time.sleep(1)

    #点击完成
    driver.find_element_by_id('com.xingin.xhs:id/bc1').click()
    #driver.find_element_by_xpath('/hierarchy/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.LinearLayout/android.widget.TextView[2]').click()
    time.sleep(1)

    #记载下滑多少次
    count = 1
    wzs = 1
    tt = True

    while tt == True :
        # 获取数据
        # 手机端 i 值是动态变化的一页差不多最大有8个
        for i in range(1, 8):
            try:
                # 点击进入详情页
                driver.find_element_by_xpath('/hierarchy/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.view.View/androidx.viewpager.widget.ViewPager/android.widget.FrameLayout/android.view.View/android.widget.FrameLayout[' + str(i) + ']/android.widget.FrameLayout/android.widget.LinearLayout').click()

                print("第" + str(wzs) + "条内容")

                if (isElement(driver, "id", "com.xingin.xhs:id/btf")) == True:
                    title = driver.find_element_by_id('com.xingin.xhs:id/btf').text
                    print('标题是', title)

                if (isElement(driver, "id", "com.xingin.xhs:id/an4")) == True:
                    wz = driver.find_element_by_id('com.xingin.xhs:id/an4').text
                    print('文章内容是', wz)

                wzs += 1
                # 点击返回按钮
                driver.find_element_by_id('com.xingin.xhs:id/k1').click()
                time.sleep(1)
            except Exception as e:
                break

        count += 1
        if count < 6 :
            swipeUp()
            time.sleep(1)
        else :
            tt = False

    print(datetime.datetime.now())

遇到的问题

问题:已开启开发者模式、各配置都已设置;adb devices 命令一直看不到连接手机的信息
原因1:手机需要启动开发者模式、连接方式改为传输文件模式、允许USB调试
原因2:数据线只支持充电,不支持数据传输
原因3:重启 adb 服务

adb kill-server
adb start-server

问题:Appium 手机信息中,appPackage 等参数如何获取
解决1:打开需要抓取的APP,在终端输入命令查看activity信息

// 查询命令
adb shell dumpsys window w | grep mCurrent
 
// 查询结果(com.tencent.mm对应appPackage;.ui.LauncherUI对应Activity)
mCurrentFocus=Window{7da7888 u0 com.tencent.mm/com.tencent.mm.ui.LauncherUI}

解决2:小红书 APP 通过上面方式获取到的信息不是实际的 activity

// 查询命令
adb shell dumpsys activity activities
// 查询结果(realActivity 参数中:com.xingin.xhs 对应 AppPackage;.activity.SplashActivity 对应 Activity)
// 输出的信息非常多,下面只截取了其中一部分
ACTIVITY MANAGER ACTIVITIES (dumpsys activity activities)
Display #0 (activities from top to bottom):
 
  Stack #11: type=standard mode=fullscreen
  isSleeping=false
  mBounds=Rect(0, 0 - 0, 0)
    Task id #454
    mBounds=Rect(0, 0 - 0, 0)
    mMinWidth=-1
    mMinHeight=-1
    mLastNonFullscreenBounds=null
    * TaskRecord{77f77e4 #454 A=com.xingin.xhs U=0 StackId=11 sz=1}
      userId=0 effectiveUid=u0a230 mCallingUid=u0a230 mUserSetupComplete=true mCallingPackage=com.xingin.xhs
      affinity=com.xingin.xhs
      intent={act=android.intent.action.MAIN cat=[android.intent.category.LAUNCHER] flg=0x10200000 cmp=com.xingin.xhs/.activity.SplashActivity}
      realActivity=com.xingin.xhs/.activity.SplashActivity
      autoRemoveRecents=false isPersistable=true numFullscreen=1 activityType=1
      rootWasReset=true mNeverRelinquishIdentity=true mReuseTask=false mLockTaskAuth=LOCK_TASK_AUTH_PINNABLE
      Activities=[ActivityRecord{fca8591 u0 com.xingin.xhs/.index.IndexNewActivity t454}]
      askedCompatMode=false inRecents=true isAvailable=true
      mRootProcess=ProcessRecord{d833203 26131:com.xingin.xhs/u0a230}
      stackId=11
      hasBeenVisible=true mResizeMode=RESIZE_MODE_UNRESIZEABLE mSupportsPictureInPicture=false isResizeable=false lastActiveTime=48960562 (inactive for 8s)

你可能感兴趣的:(python_爬虫基础,python)