利用Appium+Python+Android设备爬取APP数据

一、前言

Appium 简单介绍

Appium 是一个自动化测试开源工具,支持iOS和 Android平台上的原生应用,web应用和混合应用。

Appium 是跨平台的,支持OSX,Windows以及Linux系统;支持多语言,采用C/S设计模式,只要满足client能够发送http请求给server即可

工作原理

利用Appium+Python+Android设备爬取APP数据_第1张图片

二、环境搭建

python 安装 python-client

执行命令:

pip install Appium-Python-Client

安装 Android Studio(自带Android SDK)

1、下载地址:https://developer.android.google.cn/studio/

2、修改 ~/.bash_profile 环境变量

export ANDROID_HOME=/Users/linchen/Library/Android/sdk
export PATH=${PATH}:${ANDROID_HOME}/platform-tools
export PATH=${PATH}:${ANDROID_HOME}/tools
export PATH=${PATH}:${ANDROID_HOME}/build-tools/29.0.0

3、安装完成后在设置中获取 Android SDK 地址(例如:/Users/linchen/Library/Android/sdk),后续会用到
利用Appium+Python+Android设备爬取APP数据_第2张图片

4、手机连接到电脑(安卓机需要开启开发者模式、传输文件模式),在终端输入 adb devices 命令,如果能显示手机信息则 android sdk 环境配置成功

# adb命令:
adb devices

# 命令执行结果:
List of devices attached
FFK0217930005760    device

Appium 桌面版安装

1、下载地址:https://github.com/appium/appium-desktop/releases/tag/v1.13.0

2、打开 Appium,在“编辑配置”中修改 ANDROID_HOME、JAVA_HOME 路径
利用Appium+Python+Android设备爬取APP数据_第3张图片
3、在启动服务之后,点击“启动检查器会话”
利用Appium+Python+Android设备爬取APP数据_第4张图片
保存连接手机所对应的信息
利用Appium+Python+Android设备爬取APP数据_第5张图片

{
    "platformName": "Android",
    "deviceName": "DUK_AL20",
    "appPackage": "com.xingin.xhs",
    "platformVersion": "9",
    "appActivity": ".activity.SplashActivity",
    "noReset": true,
    "unicodeKeyboard": true
}
 
 
// 常见参数解释:
deviceName:指定启动设备的名称,在设置中可以找到
automationName:指定自动化引擎,默认appium
platformName:指定移动平台,Android或者iOS
platformVersion:指定平台的系统版本。例如指定Android系统版本为8.1.0
appActivity:待测试app的Activity,注意,原生app的话要在activity前加个“.”
appPackage:待测试app的包名(package)信息
udid:手机device编码,连接上手机后使用adb devices可以查询到

4、在左侧点击元素,右侧可以看到对应的元素信息,主要是 xpath、resource-id 信息
利用Appium+Python+Android设备爬取APP数据_第6张图片

三、代码部分

基础用法

1、启动APP

caps = {
    "platformName": "Android",# 平台名称
    "deviceName": "DUK_AL20",# 设备名称HonnorV9
    "appPackage": "com.xingin.xhs",# apk的包名
    "platformVersion": "9",# 系统版本号
    "appActivity": ".activity.SplashActivity",
    "noReset": True,  # 免登陆TRUE
    "unicodeKeyboard": True  # 解决不能输入中文的问题
}
driver = webdriver.Remote('http://127.0.0.1:4723/wd/hub', caps)

2、基础操作

# 不同的定位方式
driver.find_element_by_id("XXXX")
driver.find_element_by_xpath("XXXX")
driver.find_element_by_class_name("XXXX")
driver.find_element_by_link_text("XXXX")
driver.find_element_by_name("XXXX")
driver.find_element_by_css_selector("XXXX")
 
 
# 点击
driver.find_element_by_id("XXXX").click()
 
# 输入文字
driver.find_element_by_id("XXXX").send_keys("XXXX")
 
 
# 根据坐标定位点击
driver.tap([(983, 1820)])
 
 
# 滑动屏幕(x1、y1:起始坐标;x2、y2:终点坐标)
driver.swipe(x1, y1, x1, y2)

抓取小红书 APP 代码

import time
import datetime
import pymysql
 
from appium import webdriver
 
caps = {
    "platformName": "Android",# 平台名称
    "deviceName": "DUK_AL20",# 设备名称HonnorV9
    "appPackage": "com.xingin.xhs",# apk的包名
    "platformVersion": "9",# 系统版本号
    "appActivity": ".activity.SplashActivity",
    "noReset": True,  # 免登陆TRUE
    "unicodeKeyboard": True  # 解决不能输入中文的问题
}
 
# 获得屏幕尺寸数据
def getSize():
    x = driver.get_window_size()['width']
    y = driver.get_window_size()['height']
    return (x, y)
 
# 屏幕向下滑动
def swipeUp():
    l = getSize()
    x1 = int(l[0] * 0.5)  # x坐标
    y1 = int(l[1] * 0.9)  # 起始y坐标
    y2 = int(l[1] * 0.1)  # 终点y坐标
    driver.swipe(x1, y1, x1, y2)
 
# 屏幕向上滑动
def swipeDown():
    l = getSize()
    x1 = int(l[0] * 0.5)  # x坐标
    y1 = int(l[1] * 0.25)  # 起始y坐标
    y2 = int(l[1] * 0.75)  # 终点y坐标
    driver.swipe(x1, y1, x1, y2)
 
# 判断是否存在指定元素
def isElement(driver, identifyBy, c):
    time.sleep(1)
    flag = False
    try:
        if identifyBy == "id":
            driver.find_element_by_id(c)
        elif identifyBy == "xpath":
            driver.find_element_by_xpath(c)
        elif identifyBy == "class":
            driver.find_element_by_class_name(c)
        elif identifyBy == "link text":
            driver.find_element_by_link_text(c)
        elif identifyBy == "partial link text":
            driver.find_element_by_partial_link_text(c)
        elif identifyBy == "name":
            driver.find_element_by_name(c)
        elif identifyBy == "tag name":
            driver.find_element_by_tag_name(c)
        elif identifyBy == "css selector":
            driver.find_element_by_css_selector(c)
        flag = True
    except Exception as e:
        flag = False
    finally:
        print("结果")
        return flag
 
 
if __name__ == '__main__':
    # 查询数据
    db = pymysql.connect("10.0.1.25", "phper", "phper111", "explosive", charset='utf8')
    cursor = db.cursor()
    sql = "select * from crawl_author_config where type = 4"
    cursor.execute(sql)
    results = cursor.fetchall()
    db.close()
 
    # 打开app
    driver = webdriver.Remote('http://127.0.0.1:4723/wd/hub', caps)
 
    time.sleep(2)
    print(datetime.datetime.now())
    for row in results:
        # 点击首页输入框
        # t = driver.find_element_by_id("com.xingin.xhs:id/a4t").click()
        t = driver.find_element_by_xpath("/hierarchy/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.support.v4.widget.DrawerLayout/android.widget.LinearLayout/android.widget.RelativeLayout/android.support.v4.view.ViewPager/android.widget.LinearLayout/android.widget.LinearLayout/android.widget.FrameLayout[2]/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.TextView").click()
        time.sleep(2)
        # 输入要搜索的用户名称
        print("抓取:" + str(row[4]))
        # text = driver.find_element_by_id("com.xingin.xhs:id/aza")
        text = driver.find_element_by_xpath("/hierarchy/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.LinearLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.EditText").send_keys(str(row[4]))
        time.sleep(0.5)
        # 点击搜索按键
        # driver.find_element_by_id("com.xingin.xhs:id/ayx").click()
        driver.find_element_by_xpath("/hierarchy/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.LinearLayout/android.widget.LinearLayout/android.widget.TextView").click()
        time.sleep(2)
        # 点击搜索用户按钮
        # driver.find_element_by_id("com.xingin.xhs:id/c8t").click()
        driver.find_element_by_xpath("/hierarchy/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.view.ViewGroup/android.widget.LinearLayout/android.widget.HorizontalScrollView/android.widget.LinearLayout/android.support.v7.app.a.b[3]/android.widget.TextView").click()
        time.sleep(1)
        # 点击第一个
        driver.find_element_by_xpath("/hierarchy/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.view.ViewGroup/android.view.ViewGroup/android.widget.FrameLayout/android.view.ViewGroup/android.widget.RelativeLayout[1]/android.widget.LinearLayout/android.widget.TextView[1]").click()
        time.sleep(1)
        # 点击只显示视频
        driver.find_element_by_xpath("/hierarchy/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.RelativeLayout/android.view.ViewGroup/android.view.ViewGroup/android.view.ViewGroup/android.view.ViewGroup/android.widget.LinearLayout/android.view.ViewGroup/android.widget.FrameLayout[2]/android.widget.TextView").click()
        time.sleep(1)
 
        # 下滑屏幕
        swipeUp()
        # 获取数据
        for i in range(1, 7):
            try:
                # 点击进入详情页
                driver.find_element_by_xpath("/hierarchy/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.RelativeLayout/android.view.ViewGroup/android.view.ViewGroup/android.view.ViewGroup/android.view.ViewGroup/android.widget.FrameLayout[" + str(i) + "]/android.widget.LinearLayout").click()
                time.sleep(1)
                # 点击展开更多
                driver.find_element_by_xpath("/hierarchy/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.view.ViewGroup/android.view.ViewGroup/android.view.ViewGroup/android.view.ViewGroup/android.widget.FrameLayout[3]/android.widget.TextView").click()
                time.sleep(1)
                # 详情内容
                print("第" + str(i) + "条内容")
                if(isElement(driver,"id","com.xingin.xhs:id/b9d")):
                    print("desc:"+driver.find_element_by_id("com.xingin.xhs:id/b9d").text.replace('\n', '').replace('\r', ''))
                else:
                    print("desc:"+driver.find_element_by_xpath("/hierarchy/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.view.ViewGroup/android.view.ViewGroup/android.view.ViewGroup/android.view.ViewGroup/android.widget.FrameLayout[3]/android.widget.TextView").text.replace('\n', '').replace('\r', ''))
                if(isElement(driver,"xpath","/hierarchy/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.view.ViewGroup/android.view.ViewGroup/android.view.ViewGroup/android.view.ViewGroup/android.widget.LinearLayout[1]/android.view.ViewGroup/android.widget.TextView")):
                    print("tag:"+driver.find_element_by_xpath("/hierarchy/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.view.ViewGroup/android.view.ViewGroup/android.view.ViewGroup/android.view.ViewGroup/android.widget.LinearLayout[1]/android.view.ViewGroup/android.widget.TextView").text)
                else:
                    print("无tag")
                print("likeNum:" + driver.find_element_by_id("com.xingin.xhs:id/likeTextView").text)
                print("collectNum:" + driver.find_element_by_id("com.xingin.xhs:id/qz").text)
                print("commentNum:" + driver.find_element_by_id("com.xingin.xhs:id/commentTextView").text)
                print("==========")
                # 点击返回按钮
                driver.find_element_by_id("com.xingin.xhs:id/backButton").click()
                time.sleep(1)
            except Exception as e:
                continue
 
        driver.find_element_by_xpath("/hierarchy/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.RelativeLayout/android.view.ViewGroup/android.view.ViewGroup/android.widget.LinearLayout/android.widget.FrameLayout/android.view.ViewGroup/android.widget.LinearLayout/android.view.ViewGroup/android.widget.ImageView[1]").click()
        time.sleep(0.5)
        driver.find_element_by_xpath("/hierarchy/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.view.ViewGroup/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.ImageView").click()
        time.sleep(0.5)
        driver.find_element_by_xpath("/hierarchy/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.FrameLayout/android.widget.FrameLayout/android.widget.LinearLayout/android.widget.LinearLayout/android.widget.LinearLayout/android.widget.ImageView").click()
        time.sleep(0.5)
 
    print(datetime.datetime.now())

四、遇到的问题

问题:已开启开发者模式、各配置都已设置;adb devices 命令一直看不到连接手机的信息
原因1:手机需要启动开发者模式、连接方式改为传输文件模式、允许USB调试
原因2:数据线只支持充电,不支持数据传输
原因3:重启 adb 服务

adb kill-server
adb start-server

问题:Appium 手机信息中,appPackage 等参数如何获取
解决1:打开需要抓取的APP,在终端输入命令查看activity信息

// 查询命令
adb shell dumpsys window w | grep mCurrent
 
// 查询结果(com.tencent.mm对应appPackage;.ui.LauncherUI对应Activity)
mCurrentFocus=Window{7da7888 u0 com.tencent.mm/com.tencent.mm.ui.LauncherUI}

解决2:小红书 APP 通过上面方式获取到的信息不是实际的 activity

// 查询命令
adb shell dumpsys activity activities
 
 
// 查询结果(realActivity 参数中:com.xingin.xhs 对应 AppPackage;.activity.SplashActivity 对应 Activity)
// 输出的信息非常多,下面只截取了其中一部分
ACTIVITY MANAGER ACTIVITIES (dumpsys activity activities)
Display #0 (activities from top to bottom):
 
  Stack #11: type=standard mode=fullscreen
  isSleeping=false
  mBounds=Rect(0, 0 - 0, 0)
    Task id #454
    mBounds=Rect(0, 0 - 0, 0)
    mMinWidth=-1
    mMinHeight=-1
    mLastNonFullscreenBounds=null
    * TaskRecord{77f77e4 #454 A=com.xingin.xhs U=0 StackId=11 sz=1}
      userId=0 effectiveUid=u0a230 mCallingUid=u0a230 mUserSetupComplete=true mCallingPackage=com.xingin.xhs
      affinity=com.xingin.xhs
      intent={act=android.intent.action.MAIN cat=[android.intent.category.LAUNCHER] flg=0x10200000 cmp=com.xingin.xhs/.activity.SplashActivity}
      realActivity=com.xingin.xhs/.activity.SplashActivity
      autoRemoveRecents=false isPersistable=true numFullscreen=1 activityType=1
      rootWasReset=true mNeverRelinquishIdentity=true mReuseTask=false mLockTaskAuth=LOCK_TASK_AUTH_PINNABLE
      Activities=[ActivityRecord{fca8591 u0 com.xingin.xhs/.index.IndexNewActivity t454}]
      askedCompatMode=false inRecents=true isAvailable=true
      mRootProcess=ProcessRecord{d833203 26131:com.xingin.xhs/u0a230}
      stackId=11
      hasBeenVisible=true mResizeMode=RESIZE_MODE_UNRESIZEABLE mSupportsPictureInPicture=false isResizeable=false lastActiveTime=48960562 (inactive for 8s)

你可能感兴趣的:(利用Appium+Python+Android设备爬取APP数据)