vba - 利用XMLHTTP實現網頁抓取

  显示原文与译文双语对照的内容
0 0

我想從網頁 http://www.eex.com/en/market-data/power/derivatives-market/phelix-futures 獲得一些數據。

如果我使用舊的InternetExplorer對象( 代碼 below ),我可以遍歷HTML文檔。 但我想使用 XMLHTTP 對象( 第二代碼) 。

Sub IEZagon() 
 'we define the essential variables
 Dim ie As Object 
 Dim TDelement, TDelements 
 Dim AnhorLink, AnhorLinks 
 'add the"Microsoft Internet Controls" reference in your VBA Project indirectly
 Set ie = CreateObject("InternetExplorer.Application") 
 With ie 
. Visible = True 
. navigate ("[URL]http://www.eex.com/en/market-data/power/derivatives-market/phelix-futures[/URL]") 
 While ie.ReadyState <> 4 
 DoEvents 
 Wend 
 Set AnhorLinks =. document.getElementsbytagname("a") 
 Set TDelements =. document.getElementsbytagname("td") 
 For Each AnhorLink In AnhorLinks 
 Debug.Print AnhorLink.innertext 
 Next 
 For Each TDelement In TDelements 
 Debug.Print TDelement.innertext 
 Next 
 End With 
 Set ie = Nothing 
End Sub

在XMLHTTP對象中使用代碼:

Sub FuturesScrap(ByVal URL As String) 
 Dim XMLHttpRequest As XMLHTTP 
 Dim HTMLDoc As New HTMLDocument 
 Set XMLHttpRequest = New MSXML2.XMLHTTP 
 XMLHttpRequest.Open"GET", URL, False 
 XMLHttpRequest.send 
 While XMLHttpRequest.readyState <> 4 
 DoEvents 
 Wend 
 Debug.Print XMLHttpRequest.responseText 
 HTMLDoc.body.innerHTML = XMLHttpRequest.responseText 
 With HTMLDoc.body 
 Set AnchorLinks =. getElementsByTagName("a") 
 Set TDelements =. getElementsByTagName("td") 
 For Each AnchorLink In AnchorLinks 
 Debug.Print AnhorLink.innerText 
 Next 
 For Each TDelement In TDelements 
 Debug.Print TDelement.innerText 
 Next 
 End With 
End Sub 

我只得到基本的HTML:

<html> 
<head> 
<title>Resource Not found</title> 
<link rel= 'stylesheet' type='text/css' href='/blueprint/css/errorpage.css'/>
</head> 
<body> 
<table class="header"> 
<tr> 
<td class="CMTitle CMHFill"><span class="large">Resource Not found</span></td> 
</tr> 
</table> 
<div class="body"> 
<p style="font-weight:bold;">The requested resource does Not exist.</p> 
</div> 
<table class="footer"> 
<tr> 
<td class="CMHFill"> </td> 
</tr> 
</table> 
</body> 
</html>

我想瀏覽表格和coresponding數據。 最後,我想從年到月的時間間隔中選擇:

我很感激你的幫助 ! 謝謝!

时间:原作者:5个回答

0 0

我可以確認在運行代碼( 帶或者不帶url標籤) 時得到了與你相同的HTML 。 我在這裡找到了一個有用的帖子: 我已經使用這裡找到的方法修改了你的代碼,現在它似乎已經下載了正確的信息。

Sub test()
 Call FuturesScrap1("http://www.eex.com/en/market-data/power/derivatives-market/phelix-futures")
End Sub

我包括了調用 for,因為url標記出現了一個錯誤,它導致了。

Sub FuturesScrap1(ByVal URL As String)
 Dim HTMLDoc As New HTMLDocument
 Dim oHttp As MSXML2.XMLHTTP
 Dim sHTML As String
 Dim AnchorLinks As Object
 Dim TDelements As Object
 Dim TDelement As Object
 Dim AnchorLink As Object
 On Error Resume Next
 Set oHttp = New MSXML2.XMLHTTP
 If Err.Number <> 0 Then
 Set oHttp = CreateObject("MSXML.XMLHTTPRequest")
 MsgBox"Error 0 has occured while creating a MSXML.XMLHTTPRequest object"
 End If
 On Error GoTo 0
 If oHttp Is Nothing Then
 MsgBox"For some reason I wasn't able to make a MSXML2.XMLHTTP object"
 Exit Sub
 End If
 'Open the URL in browser object
 oHttp.Open"GET", URL, False
 oHttp.send
 sHTML = oHttp.responseText
 Debug.Print oHttp.responseText
 HTMLDoc.body.innerHTML = oHttp.responseText
 With HTMLDoc.body
 Set AnchorLinks =. getElementsByTagName("a")
 Set TDelements =. getElementsByTagName("td")
 For Each AnchorLink In AnchorLinks
 Debug.Print AnchorLink.innerText
 Next
 For Each TDelement In TDelements
 Debug.Print TDelement.innerText
 Next
 End With
End Sub

編輯下列評論:

我沒有找到使用MSXML2對象的table 元素,源代碼似乎不包含它們。 在firebug中,td 標籤是存在的,因這裡thik是由JavaScript代碼生成的。 我不知道MSXML2是否可以以運行這個 JavaScript,因這裡我修改了,以使用 IE,但它不是快速代碼。 我發現 td 元素可以能需要一些時間才能成為可以用的( 假設 IE 必須運行 JavaScript ),因這裡在下載數據之前,我已經經放入了一些步驟。

如果在工作表中將元素內容下載到活動工作表中,請注意,如果在工作簿中運行有有用數據的工作表,那麼我就。

Sub FuturesScrap3(ByVal URL As String)
 Dim HTMLDoc As New HTMLDocument
 Dim AnchorLinks As Object
 Dim tdElements As Object
 Dim tdElement As Object
 Dim AnchorLink As Object
 Dim lRow As Long
 Dim oElement As Object
 Dim oIE As InternetExplorer
 Set oIE = New InternetExplorer
 oIE.navigate URL
 oIE.Visible = True
 Do Until (oIE.readyState = 4 And Not oIE.Busy)
 DoEvents
 Loop
 'Wait for Javascript to run
 Application.Wait (Now + TimeValue("0:01:00"))
 HTMLDoc.body.innerHTML = oIE.document.body.innerHTML
 With HTMLDoc.body
 Set AnchorLinks =. getElementsByTagName("a")
 Set tdElements =. getElementsByTagName("td") '
 For Each AnchorLink In AnchorLinks
 Debug.Print AnchorLink.innerText
 Next AnchorLink
 End With
 lRow = 1
 For Each tdElement In tdElements
 Debug.Print tdElement.innerText
 Cells(lRow, 1).Value = tdElement.innerText
 lRow = lRow + 1
 Next
 'Clicking the Month tab
 For Each oElement In oIE.document.all
 If Trim(oElement.innerText) ="Month" Then
 oElement.Focus
 oElement.Click
 End If
 Next oElement
 Do Until (oIE.readyState = 4 And Not oIE.Busy)
 DoEvents
 Loop
 'Wait for Javascript to run
 Application.Wait (Now + TimeValue("0:01:00"))
 HTMLDoc.body.innerHTML = oIE.document.body.innerHTML
 With HTMLDoc.body
 Set AnchorLinks =. getElementsByTagName("a")
 Set tdElements =. getElementsByTagName("td") '
 For Each AnchorLink In AnchorLinks
 Debug.Print AnchorLink.innerText
 Next AnchorLink
 End With
 lRow = 1
 For Each tdElement In tdElements
 Debug.Print tdElement.innerText
 Cells(lRow, 2).Value = tdElement.innerText
 lRow = lRow + 1
 Next tdElement
End sub
原作者:
...