Automate Scraping of Hyperlinks from Web Pages Using Excel VBA

 


How to scrape hyperlinks from web pages automatically using Excel VBA.

  • Define the variables required
  • Navigate to the desired website or web page from where you wish to scrape the hyperlinks
  • Ensure that the web page is done downloading before you start extracting the links
  • Use your HTML knowledge to define the tag like ‘a’ for a hyperlink that you wish to capture
  • Loop through all the links and capture them one by one
  • Place each link in the next empty row
  • Finally recover the memory by setting the values to ‘null’ or ‘nothing’
  • To make your code execute faster stop ‘screen updating’ in the beginning and at the end activate the ‘screen updating’ process

Watch the video before you implement the Excel VBA code given below:


View the video on YouTube.

Before we executing the code below we need to activate the ‘Microsoft HTML Object Library’ and ‘Microsoft Internet Controls’ as shown in the image below by clicking on ‘Tools’ and then selecting ‘References’ from the Microsoft Visual Basic for Applications window:

Automate Scraping of hyperlinks from web pages using Excel VBA

Automate Scraping of hyperlinks from web pages using Excel VBA

Sub scrapeHyperlinksWebsite()
‘We refer to an active copy of Internet Explorer
Dim ie As InternetExplorer
‘code to refer to the HTML document returned
Dim html As HTMLDocument
Dim ElementCol As Object
Dim Link As Object
Dim erow As Long
Application.ScreenUpdating = False
‘open Internet Explorer and go to website
Set ie = New InternetExplorer
ie.Visible = False
ie.navigate “http://www.exceltrainingvideos.com”

‘Wait until IE is done loading page
Do While ie.readyState <> READYSTATE_COMPLETE

Application.StatusBar = “Trying to go to website…”
DoEvents
Loop

Set html = ie.document
‘Display text of HTML document returned in a cell
‘Range(“A1”) = html.DocumentElement.innerHTML
Set ElementCol = html.getElementsByTagName(“a”)

For Each Link In ElementCol
erow = Worksheets(“Sheet1”).Cells(Rows.Count, 1).End(xlUp).Offset(1, 0).Row
Cells(erow, 1).Value = Link
Cells(erow, 1).Columns.AutoFit
Next

‘close down IE, reset status bar & turn on screenupdating

Set ie = Nothing
Application.StatusBar = “”
Application.ScreenUpdating = True

End Sub

Download a sample file by clicking on the Excel icon:


Further Reading:
Application.ScreenUpdating Property (Excel)
Two ways to get data from websites using Excel VBA (this blog)

4 thoughts on “Automate Scraping of Hyperlinks from Web Pages Using Excel VBA

  1. Pingback: Automate Email Extraction from Website Using Excel VBA | Excel VBA Training Videos

  2. Pegaso

    Hello!

    Interesting application, but something is not working. I can not make the code to open any webpage, I don’t get any errors but simply runs without open any website. This is the code:

    Sub scrapeHyperlinksWebsite()
    ‘We refer to an active copy of Internet Explorer
    Dim ie As InternetExplorer
    ‘code to refer to the HTML document returned
    Dim html As HTMLDocument
    Dim ElementCol As Object
    Dim Link As Object
    Dim erow As Long
    Application.ScreenUpdating = False
    ‘open Internet Explorer and go to website
    Set ie = New InternetExplorer
    ie.Visible = False
    ie.navigate “http://www.exceltrainingvideos.com”

    ‘Wait until IE is done loading page
    Do While ie.readyState READYSTATE_COMPLETE

    Application.StatusBar = “Trying to go to website…”
    DoEvents
    Loop

    Set html = ie.document
    ‘Display text of HTML document returned in a cell
    Range(“A1”) = html.DocumentElement.innerHTML
    Set ElementCol = html.getElementsByTagName(“a”)

    For Each Link In ElementCol
    erow = Worksheets(“Sheet1”).Cells(Rows.Count, 1).End(xlUp).Offset(1, 0).Row
    Cells(erow, 1).Value = Link
    Cells(erow, 1).Columns.AutoFit
    Next

    ‘close down IE, reset status bar & turn on screenupdating

    Set ie = Nothing
    Application.StatusBar = “”
    Application.ScreenUpdating = True

    End Sub

    I had to copy&paste from this page and QA/QC it for errors when converting the code, (wrong double-quotation marks), and to activate the references through Tools/references, I cleaned the cache in IE, but still doesn’t navigate, any ideas? Thanks for sharing your work!

    Reply
  3. Pingback: Get src attribute from img tag in web page with vba | Excel VBA Training Videos

Leave a Reply

Your email address will not be published. Required fields are marked *