使用Excel+VBA对网页进行操作

使用Excel+VBA对网页进行操作

因为在知乎的一些答案,最近总有私信问我如何使用VBA网抓的,我基本都没有回复。因为这个问题太大了,对于有基础的人来说,自己百度或者上ExcelHome论坛其实很容易找到答案,并不需要我说什么,而对于没有基础的人来说,三言两语不可能解决问题,我也不想把私信变成聊天窗。借着知乎开放专栏的机会,正好来仔细交代一下这个问题。

对于Excel和VBA我所知有限,仅能解决自己遇到的一些问题,并不一定适用于所有场景。以下内容建立在了解基本VBA使用以及HTML语言知识的基础上:

一、前期准备

就我所知,VBA并不能操作任意浏览器及网页,我们所能做的仅仅是对IE进行一些操作,是的,仅仅是IE。不要告诉我电脑上没有IE,那样就可以Exit Sub了。就像Python用import、C#用using一样,VBA也需要引用一些库才能对IE进行操作,不过好在同属微软产品,所以我们能很简便的利用VBA自带的一些库。

首先我们要做的就是在VBA中引用Micorsoft Internet Controls,看这个名字就知道是帮助我们控制IE页面用的。

二、网页操作

引用Micorsoft Internet Controls之后,我们就可以对页面为所欲为了,不过首页我们要有个页面,上帝说要有页面!

1、打开网页

我们以在百度搜索“扯乎”关键词为例:

    With CreateObject("internetexplorer.application")
        .Visible = True
        .Navigate "https://www.baidu.com/s?wd=扯乎"
'关闭网页
'       .Quit
    End With

代码很简单,先创建一个IE对象,然后给一些属性赋值。Visible是可见性,说的是在对网页进行操作时,这个网页是不是会被看见。熟练之后可以设置为False,不仅让程序在跑的时候有种神秘感(并没有),还能稍微加快一点速度。

不过有一点要记住,这个网页我们打开之后并没有关闭,也就是说程序结束后需要手动关闭,如果网页不可见是无法手动关闭的。代码中注释的部分就是关闭网页用的。Navigate不用多说就是URL。

我们必须要等网页完全加载完才能开始信息的抓取,这个时候使用到:(从这里开始,所有的代码都需要写在With代码块中

        While .ReadyState <> 4 Or .Busy
            DoEvents
        Wend

Busy是网页忙碌状态,ReadyState是HTTP的5种就绪状态,对应如下:

  • 0:请求未初始化(还没有调用 open())。
  • 1:请求已经建立,但是还没有发送(还没有调用 send())。
  • 2:请求已发送,正在处理中(通常现在可以从响应中获取内容头)。
  • 3:请求在处理中;通常响应中已有部分数据可用了,但是服务器还没有完成响应的生成。
  • 4:响应已完成;您可以获取并使用服务器的响应了。

2、获取信息

我们先把页面中的所有内容抓下来,后期筛选出有用的部分再慢慢给抓取添加条件。

        Set dmt = .Document
        For i = 0 To dmt.all.Length - 1
            Set htMent = dmt.all(i)
            With ActiveSheet
                .Cells(i + 2, "A") = htMent.tagName
                .Cells(i + 2, "B") = TypeName(htMent)
                .Cells(i + 2, "C") = htMent.ID
                .Cells(i + 2, "D") = htMent.Name
                .Cells(i + 2, "E") = htMent.Value
                .Cells(i + 2, "F") = htMent.Text
                .Cells(i + 2, "G") = htMent.innerText
            End With
        Next i

这块代码和JS有些相似,需要从IE.Document.all中把页面上所有节点找出来。这里也提供其他几种方法:

这些都是在抓取了全部页面内容后帮助筛选有效信息时使用起来比较方便的。当然all还是最好用的,因为all也存在all("IDName")以及all.IDName等等用法。

上面代码部分返回的属性值都是HTML基本内容,就不一一解释了。

3、填充信息

网抓神器当然还是Python,大部分人使用Excel的目的还是在于对页面内容进行自动填充,直接让表格提交网页,问卷录入之类的工作都省心不少。在抓取了页面内容之后,想填充更加是易如反掌的事情,只需要直接给页面标签的Value属性赋值就可以了。

不过网页中除了文本框,可能还存在一些其他没有Value的标签,比如:下拉菜单、单选框。给这些内容赋值就需要一些基本的HTML知识了。

'下拉菜单选择
.all("select")(0).Selected = True
'单选按钮选择
.all("radio").Checked = True
'复选按钮选择
.all("checkbox").Checked = True

下拉菜单是select标签,每个选项都在一个option标签里,所以返回一个集合,需要选中某个选项就要修改对应的Selected属性为True。单选和复选按钮都是input标签,区别在于类型分别是radio和checkbox,要选中某个选项需要修改对应的Checked属性。

三、数据接口

有时候我们能直接拿到一些API,通过API返回数据当然比打开网页更方便快捷,所使用的方法也有一些不太一样。

1、请求接口

比如我从网上得到一个能通过城市查询免费WIFI的API,通过Excel接口访问就使用下面的代码:(虽然是免费的,为了避免麻烦还是把我的AppKey隐去了)

  Dim http
  Set http = CreateObject("Microsoft.XMLHTTP")
  http.Open "GET", "http://api.avatardata.cn/Wifi/QueryByCity", False
  http.setRequestHeader "CONTENT-TYPE", "application/x-www-form-urlencoded"
  http.send "key=[AppKey]&city=北京&page=1"

这时我们创建的对象就不再是IE,而是HTTP对象。这里用的是ajax的Open方法,GET是数据发送方式,第二个参数是接口地址,第三个参数是指定请求方式是否为异步。如果这个API有帐号密码,分别写在第四第五个参数。

setRequestHeader就是给接口发送一个HTTP协议头文件,最后send的内容是接口参数。当然,这个QueryString也可以直接写在URL里,send一个空字符串就可以了。

2、接口返回

接口返回获取的方式很简单:

  If http.Status = 200 Then Range("A1").Value = http.responseText

这里的HTTP状态又变成200了,和之前说好的不一样啊摔~有兴趣可以自己查查具体有哪些。

不过接口返回要么是JSON要么是XML,Excel处理起来十分不方便。这里提供一个处理JSON的方法,是从网上找来的类模块,具体内容放在附录里。在添加了这个clsJSON类模块后,对JSON的处理就变得十分简单了。

将上面的代码改成:

  If http.Status = 200 Then
     Dim json$
     json = http.responseText
     Dim objJSON As New clsJSON, dicJSON As Object
     Set dicJSON = objJSON.parse(json)
     
     For i = 1 To dicJSON("result")("data").Count
       Sheet1.Cells(i + 1, 1) = dicJSON("result")("data")(i)("name")
       Sheet1.Cells(i + 1, 2) = dicJSON("result")("data")(i)("intro")
       Sheet1.Cells(i + 1, 3) = dicJSON("result")("data")(i)("address")
     Next i
  End If

接口返回的示例我也放在附录里了,根据接口返回的对象名、数组名去修改dicJSON后面的内容就可以了。这个处理JSON的模块用的是VBA中字典+集合的原理,所以数据处理后的调用方式也参照字典和集合。

以上是我用Excel+VBA进行网页操作的一些个人经验,希望能帮助到一些有需要的人。有什么错漏的地方,也希望知乎大牛批评指正。

附录一:VBA处理JSON的类模块

Option Explicit
'================================
' VBA处理JSON文件的类模块
'
' http://www.cnhup.com
'================================
Const INVALID_JSON      As Long = 1
Const INVALID_OBJECT    As Long = 2
Const INVALID_ARRAY     As Long = 3
Const INVALID_BOOLEAN   As Long = 4
Const INVALID_NULL      As Long = 5
Const INVALID_KEY       As Long = 6

Private Sub Class_Initialize()

End Sub

Private Sub Class_Terminate()

End Sub

Public Function parse(ByRef str As String) As Object

    Dim index As Long
    index = 1
    
    On Error Resume Next

    Call skipChar(str, index)
    Select Case Mid(str, index, 1)
    Case "{"
        Set parse = parseObject(str, index)
    Case "["
        Set parse = parseArray(str, index)
    End Select

End Function

Private Function parseObject(ByRef str As String, ByRef index As Long) As Object

    Set parseObject = CreateObject("Scripting.Dictionary")
    
    ' "{"
    Call skipChar(str, index)
    If Mid(str, index, 1) <> "{" Then Err.Raise vbObjectError + INVALID_OBJECT, Description:="char " & index & " : " & Mid(str, index)
    index = index + 1
    
    Do
    
        Call skipChar(str, index)
        If "}" = Mid(str, index, 1) Then
            index = index + 1
            Exit Do
        ElseIf "," = Mid(str, index, 1) Then
            index = index + 1
            Call skipChar(str, index)
        End If
        
        Dim key As String
        
        ' add key/value pair
        parseObject.Add key:=parseKey(str, index), Item:=parseValue(str, index)
        
    Loop

End Function

Private Function parseArray(ByRef str As String, ByRef index As Long) As Collection

    Set parseArray = New Collection
    
    ' "["
    Call skipChar(str, index)
    If Mid(str, index, 1) <> "[" Then Err.Raise vbObjectError + INVALID_ARRAY, Description:="char " & index & " : " + Mid(str, index)
    index = index + 1
    
    Do
        
        Call skipChar(str, index)
        If "]" = Mid(str, index, 1) Then
            index = index + 1
            Exit Do
        ElseIf "," = Mid(str, index, 1) Then
            index = index + 1
            Call skipChar(str, index)
        End If
        
        ' add value
        parseArray.Add parseValue(str, index)
        
    Loop

End Function

Private Function parseValue(ByRef str As String, ByRef index As Long)

    Call skipChar(str, index)
    
    Select Case Mid(str, index, 1)
    Case "{"
        Set parseValue = parseObject(str, index)
    Case "["
        Set parseValue = parseArray(str, index)
    Case """", "'"
        parseValue = parseString(str, index)
    Case "t", "f"
        parseValue = parseBoolean(str, index)
    Case "n"
        parseValue = parseNull(str, index)
    Case Else
        parseValue = parseNumber(str, index)
    End Select

End Function

Private Function parseString(ByRef str As String, ByRef index As Long) As String

    Dim quote   As String
    Dim char    As String
    Dim code    As String
    
    Call skipChar(str, index)
    quote = Mid(str, index, 1)
    index = index + 1
    Do While index > 0 And index <= Len(str)
        char = Mid(str, index, 1)
        Select Case (char)
        Case "\"
            index = index + 1
            char = Mid(str, index, 1)
            Select Case (char)
            Case """", "\\", "/"
                parseString = parseString & char
                index = index + 1
            Case "b"
                parseString = parseString & vbBack
                index = index + 1
            Case "f"
                parseString = parseString & vbFormFeed
                index = index + 1
            Case "n"
                parseString = parseString & vbNewLine
                index = index + 1
            Case "r"
                parseString = parseString & vbCr
                index = index + 1
            Case "t"
                parseString = parseString & vbTab
                index = index + 1
            Case "u"
                index = index + 1
                code = Mid(str, index, 4)
                parseString = parseString & ChrW(Val("&h" + code))
                index = index + 4
            End Select
        Case quote
            index = index + 1
            Exit Function
        Case Else
            parseString = parseString & char
            index = index + 1
        End Select
    Loop

End Function

Private Function parseNumber(ByRef str As String, ByRef index As Long)

    Dim value   As String
    Dim char    As String
    
    Call skipChar(str, index)
    Do While index > 0 And index <= Len(str)
        char = Mid(str, index, 1)
        If InStr("+-0123456789.eE", char) Then
            value = value & char
            index = index + 1
        Else
            If InStr(value, ".") Or InStr(value, "e") Or InStr(value, "E") Then
                parseNumber = CDbl(value)
            Else
                parseNumber = CInt(value)
            End If
            Exit Function
        End If
    Loop


End Function

Private Function parseBoolean(ByRef str As String, ByRef index As Long) As Boolean

    Call skipChar(str, index)
    If Mid(str, index, 4) = "true" Then
        parseBoolean = True
        index = index + 4
    ElseIf Mid(str, index, 5) = "false" Then
        parseBoolean = False
        index = index + 5
    Else
        Err.Raise vbObjectError + INVALID_BOOLEAN, Description:="char " & index & " : " & Mid(str, index)
    End If

End Function

Private Function parseNull(ByRef str As String, ByRef index As Long)

    Call skipChar(str, index)
    If Mid(str, index, 4) = "null" Then
        parseNull = Null
        index = index + 4
    Else
        Err.Raise vbObjectError + INVALID_NULL, Description:="char " & index & " : " & Mid(str, index)
    End If

End Function

Private Function parseKey(ByRef str As String, ByRef index As Long) As String

    Dim dquote  As Boolean
    Dim squote  As Boolean
    Dim char    As String
    
    Call skipChar(str, index)
    Do While index > 0 And index <= Len(str)
        char = Mid(str, index, 1)
        Select Case (char)
        Case """"
            dquote = Not dquote
            index = index + 1
            If Not dquote Then
                Call skipChar(str, index)
                If Mid(str, index, 1) <> ":" Then
                    Err.Raise vbObjectError + INVALID_KEY, Description:="char " & index & " : " & parseKey
                End If
            End If
        Case "'"
            squote = Not squote
            index = index + 1
            If Not squote Then
                Call skipChar(str, index)
                If Mid(str, index, 1) <> ":" Then
                    Err.Raise vbObjectError + INVALID_KEY, Description:="char " & index & " : " & parseKey
                End If
            End If
        Case ":"
            If Not dquote And Not squote Then
                index = index + 1
                Exit Do
            End If
        Case Else
            If InStr(vbCrLf & vbCr & vbLf & vbTab & " ", char) Then
            Else
                parseKey = parseKey & char
            End If
            index = index + 1
        End Select
    Loop

End Function

Public Sub skipChar(ByRef str As String, ByRef index As Long)

    While index > 0 And index <= Len(str) And InStr(vbCrLf & vbCr & vbLf & vbTab & " ", Mid(str, index, 1))
        index = index + 1
    Wend

End Sub

Public Function toString(ByRef obj As Variant) As String

    Select Case VarType(obj)
        Case vbNull
            toString = "null"
        Case vbDate
            toString = """" & CStr(obj) & """"
        Case vbString
            toString = """" & encode(obj) & """"
        Case vbObject
            Dim bFI, i
            bFI = True
            If TypeName(obj) = "Dictionary" Then
                toString = toString & "{"
                Dim keys
                keys = obj.keys
                For i = 0 To obj.Count - 1
                    If bFI Then bFI = False Else toString = toString & ","
                    Dim key
                    key = keys(i)
                    toString = toString & """" & key & """:" & toString(obj(key))
                Next i
                toString = toString & "}"
            ElseIf TypeName(obj) = "Collection" Then
                toString = toString & "["
                Dim value
                For Each value In obj
                    If bFI Then bFI = False Else toString = toString & ","
                    toString = toString & toString(value)
                Next value
                toString = toString & "]"
            End If
        Case vbBoolean
            If obj Then toString = "true" Else toString = "false"
        Case vbVariant, vbArray, vbArray + vbVariant
            Dim sEB
            toString = multiArray(obj, 1, "", sEB)
        Case Else
            toString = Replace(obj, ",", ".")
    End Select

End Function

Private Function encode(str) As String
    
    Dim i, j, aL1, aL2, c, p

    aL1 = Array(&H22, &H5C, &H2F, &H8, &HC, &HA, &HD, &H9)
    aL2 = Array(&H22, &H5C, &H2F, &H62, &H66, &H6E, &H72, &H74)
    For i = 1 To Len(str)
        p = True
        c = Mid(str, i, 1)
        For j = 0 To 7
            If c = Chr(aL1(j)) Then
                encode = encode & "\" & Chr(aL2(j))
                p = False
                Exit For
            End If
        Next

        If p Then
            Dim a
            a = AscW(c)
            If a > 31 And a < 127 Then
                encode = encode & c
            ElseIf a > -1 Or a < 65535 Then
                encode = encode & "\u" & String(4 - Len(Hex(a)), "0") & Hex(a)
            End If
        End If
    Next
End Function

Private Function multiArray(aBD, iBC, sPS, ByRef sPT)   ' Array BoDy, Integer BaseCount, String PoSition
    Dim iDU, iDL, i ' Integer DimensionUBound, Integer DimensionLBound
    On Error Resume Next
    iDL = LBound(aBD, iBC)
    iDU = UBound(aBD, iBC)
    
    Dim sPB1, sPB2  ' String PointBuffer1, String PointBuffer2
    If Err.Number = 9 Then
        sPB1 = sPT & sPS
        For i = 1 To Len(sPB1)
            If i <> 1 Then sPB2 = sPB2 & ","
            sPB2 = sPB2 & Mid(sPB1, i, 1)
        Next
'        multiArray = multiArray & toString(Eval("aBD(" & sPB2 & ")"))
        multiArray = multiArray & toString(aBD(sPB2))
    Else
        sPT = sPT & sPS
        multiArray = multiArray & "["
        For i = iDL To iDU
            multiArray = multiArray & multiArray(aBD, iBC + 1, i, sPT)
            If i < iDU Then multiArray = multiArray & ","
        Next
        multiArray = multiArray & "]"
        sPT = Left(sPT, iBC - 2)
    End If
    Err.Clear
End Function

附录二:JSON返回示例

{
"resultcode":"200",
"reason":"ReturnSuccessd!",
"result":{
"data":[
{
"name":"北京市法雨合",
"intro":"法雨合0层",
"address":"北京市朝阳区朝阳区三里屯",
"google_lat":"39.9372423",
"google_lon":"116.4480615",
"baidu_lat":"39.942952987502",
"baidu_lon":"116.45464108129",
"province":"北京市",
"city":"北京市"
},
{
"name":"北京朝阳西坝河光熙门北里",
"intro":"朝阳西坝河光熙门北里34-8号0层",
"address":"北京市朝阳区朝阳区西坝河光熙门北里34号-8号0层",
"google_lat":"39.9635121",
"google_lon":"116.435895",
"baidu_lat":"39.969407173324",
"baidu_lon":"116.44243487981",
"province":"北京市",
"city":"北京市"
},
{
"name":"北京朝阳三里屯北街",
"intro":"",
"address":"北京市朝阳区朝阳三里屯北街8号0层",
"google_lat":"39.9254286",
"google_lon":"116.4605935",
"baidu_lat":"39.931073085771",
"baidu_lon":"116.46719483818",
"province":"北京市",
"city":"北京市"
},
{
"name":"北京大都酒吧街",
"intro":"",
"address":"北京市朝阳区元大都酒吧街11号",
"google_lat":"39.975984",
"google_lon":"116.424389",
"baidu_lat":"39.982089966811",
"baidu_lon":"116.43086831752",
"province":"北京市",
"city":"北京市"
},
{
"name":"北京西城前海北沿",
"intro":"",
"address":"北京市西城区西城前海北沿10号0层",
"google_lat":"39.9369032",
"google_lon":"116.3919335",
"baidu_lat":"39.943215619704",
"baidu_lon":"116.39830652238",
"province":"北京市",
"city":"北京市"
},
{
"name":"北京市西城后海南沿36号对面",
"intro":"后海南沿36号对面0层",
"address":"北京市西城区后海南沿36号",
"google_lat":"39.9396792",
"google_lon":"116.389129",
"baidu_lat":"39.945967638433",
"baidu_lon":"116.39551153315",
"province":"北京市",
"city":"北京市"
},
{
"name":"北京市赛百味",
"intro":"ok",
"address":"北京市西城区中关村东路18号",
"google_lat":"39.9810991",
"google_lon":"116.3333866",
"baidu_lat":"39.9867766224",
"baidu_lon":"116.34001632032",
"province":"北京市",
"city":"北京市"
},
{
"name":"北京市光华路数码01",
"intro":"",
"address":"北京市朝阳区光华路数码01大厦0层",
"google_lat":"39.9132392",
"google_lon":"116.4592309",
"baidu_lat":"39.918885961978",
"baidu_lon":"116.46583845234",
"province":"北京市",
"city":"北京市"
},
{
"name":"北京市盛铭帮逸园会馆",
"intro":"盛铭帮逸园会馆0",
"address":"北京市朝阳区逸园25号",
"google_lat":"39.8710876",
"google_lon":"116.4602965",
"baidu_lat":"39.876744728506",
"baidu_lon":"116.46693498949",
"province":"北京市",
"city":"北京市"
},
{
"name":"北京市地平线酒吧",
"intro":"",
"address":"北京市朝阳区朝阳三里屯北街70号",
"google_lat":"39.9254286",
"google_lon":"116.4605935",
"baidu_lat":"39.931073085771",
"baidu_lon":"116.46719483818",
"province":"北京市",
"city":"北京市"
}
],
"pageinfo":{
"pnums":20,
"current":1
	}
	}
}

网页抓取VBA

分享
  举报

你可能感兴趣的:(VBA)