The text that follows is owned by the site above referred.
Here is only a small part of the article, for more please follow the link
Learn to write Excel VBA code to automate your web browser.
VBA = Visual Basic for Applications, the macro programming language within Excel
IE = Internet Explorer
Most browser automation tools are complex, programmed by software engineers. But because XL and IE –both Microsoft products– play so well together, normal folks like you and I can build some pretty cool things too.
Web browsing & interaction, sending & replying to emails, social media interaction, blogging/posting/content editing, product searching/comparing/price-checking/buying, all kinds of stuff … you can build a robot to do it for you, and do it faster, and around the clock.
Let’s start by building a basic web bot to open a search page, submit a search request, evaluate and scrape the results page, and return the results to Excel.
- Excel Basics – but first, if you’re new to Excel you should start here.
- VBA Basics – it would also help to know some VBA fundamentals (eg. what’s a variable, what’s a subroutine, for-next loops, the VBA Editor, etc) — these courses are good.
- HTML and the DOM – and for creating web-bots, you should understand basic HTML structure – how a webpage is laid out – the Document Object Model.
Ok, you’re grounded in the basics. Now let’s jump into building a simple bot to search for a keyword and return the results…
A beginner web automation project: keyword search bot
Let’s create a web bot that goes to my new favorite search engine, enters a keyword, submits the query, then scans the results and prints them to an Excel sheet, highlighting any links from yellowpages.com.
Begin by opening Excel and start a new, blank workbook.
Enter a keyword – let’s say auto parts – in cell A2, and a location – let’s say Houston TX – in cell C1.
Click the Developer tab (not showing?), and click Visual Basic.
Then click Tools › References, and add the reference libraries Microsoft HTML Object Library and Microsoft Internet Controls…
Next, in the Visual Basic Editor (VBE), click View › Project Explorer.
Then, right-click VBAProject, and Insert › a new Module…
Now copy & paste the code below into Module1…
'start a new subroutine called SearchBot Sub SearchBot() 'dimension (declare or set aside memory for) our variables Dim objIE As InternetExplorer 'special object variable representing the IE browser Dim aEle As HTMLLinkElement 'special object variable for an <a> (link) element Dim y As Integer 'integer variable we'll use as a counter Dim result as String 'string variable that will hold our result link 'initiating a new instance of Internet Explorer and asigning it to objIE Set objIE = New InternetExplorer 'make IE browser visible (False would allow IE to run in the background) objIE.Visible = True 'navigate IE to this web page (a pretty neat search engine really) objIE.navigate "https://duckduckgo.com" 'wait here a few seconds while the browser is busy Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop 'in the search box put cell "A2" value, the word "in" and cell "C1" value objIE.document.getElementById("search_form_input_homepage").Value = _ Sheets("Sheet1").Range("A2").Value & " in " & Sheets("Sheet1").Range("C1").Value 'click the 'go' button objIE.document.getElementById("search_button_homepage").Click 'wait again for the browser Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop 'the first search result will go in row 2 y = 2 'for each <a> element in the collection of objects with class of 'result__a'... For Each aEle In objIE.document.getElementsByClassName("result__a") '...get the href link and print it to the sheet in col C, row y result = aEle Sheets("Sheet1").Range("C" & y).Value = result '...get the text within the element and print it to the sheet in col D Sheets("Sheet1").Range("D" & y).Value = aEle.innerText Debug.Print aEle.innerText 'is it a yellowpages link? If InStr(result, "yellowpages.com") > 0 Or InStr(result, "yp.com") > 0 Then 'make the result red Sheets("Sheet1").Range("C" & y).Interior.ColorIndex = 3 'place a 1 to the left Sheets("Sheet1").Range("B" & y).Value = 1 End If 'increment our row counter, so the next result goes below y = y + 1 'repeat times the # of ele's we have in the collection Next 'add up the yellowpages listings Sheets("Sheet1").Range("B1").Value = _ Application.WorksheetFunction.Sum(Sheets("Sheet1").Range("B2:B100")) 'close the browser objIE.Quit 'exit our SearchBot subroutine End Sub
Your module should look like the image below. To run this subroutine you’d press the green play button or just press F5. But first, let’s have a closer look. Click the first line,
Sub SearchBot(). Then press F8 to begin stepping through the code line-by-line (also called debugging).
Each time you press F8 the line of code highlighted in yellow will execute, and then the next line down of code will turn yellow and wait for your key press. Say you change the code in a line and want to re-run it… you’d move the yellow line back up by dragging the yellow arrow with your mouse, or by clicking the line you want and then pressing CTRL+F9
Notice that after executing the line
objIE.Visible = True, Internet Explorer opens? Now position all 3 applications (Excel, VBE, and IE) on the screen so that each is visible, and continue stepping through the code using F8.
What’s happening here…
Let’s look at the code line by line. Watch the video too if you want.
'start a new subroutine called SearchBot
The first thing you’ll notice are lots of comments I added to the code. VBA ignores lines beginning with a single quote (
'), highlighting them green, as comments. Adding your own personal comments to your code is an especially good idea for beginners, or for more complex code that requires a memory-jogger when you go back to it.
This tells VBA we are starting a new subroutine, or Procedure. A module can contain subroutines (
End Sub) and functions (
End Function). In the video, we start the subroutine by putting the cursor within the code and pressing F5 (run) or F8 (execute line-by-line). But usually a
Function is called by another
Function, or by a launch button. The name of a
Function can be almost anything you want. CamelCase naming convention, with first letters capitalized, is typical for naming subroutines and functions.
Dim objIE as InternetExplorer
Think of variables (like objIE) as floating references or containers that refer to objects or numbers we want to manipulate. But before we get to use them, we have to declare, or
Dim, them, which sets aside some memory for whatever kind of use we have in mind for them. To manipulate Internet Explorer as an object in VBA, we need to
Dim the browser — either generically as an
Object variable (an example of late binding objects), or more specifically as an
InternetExplorer special object variable like we did here (an example of early binding objects).
Right now don’t worry about early vs. late binding… there are pros and cons for each method… but one advantage of early binding is VBA knows exactly what you intend for the variable, and therefore can show you its Intellisense (helper hints) popup menu when typing code after a specified object variable, which is great for helping beginners snoop around and see different things the object could possibly do. Why call it objIE? No special reason, aside from making it easy to remember what it’s for. Names of variables can be almost anything you want, and camelCase with first letter lowercase is the usual naming convention.
Dim aEle As HTMLLinkElement
Here we reserve aEle as a different type of object variable. Specifically, we early bind it for use as a webpage link (the
<a> tags on a webpage). The other way would be to declare it more generically as
Dim aEle as Object then later put it to use in one or more specific roles as an object — either way works. Once again, I could choose to call it anything, even mySuperHrefThingy, but I like the simplicity of aEle because it’s short and makes me think <a> element.
Dim y as Integer
Let’s use the variable y to refer to the row number on the Excel sheet where we’ll have the bot put our search results. Integer type variables in VBA are round numbers between -32,768 and 32,768. If we needed a variable to hold a value greater than 32,768 we’d
Dim y as Long instead. Or if we needed it to have decimals we’d use
Dim y as Double.
Dim result as String
For each aEle we find, we’ll copy its href value over to a string (alphanumeric text) variable called result, then print the string to the Excel sheet. Ok, I admit we don’t really need this –we could just print aEle’s value to the sheet directly– but I added it to introduce you to string variables.
Set objIE = New InternetExplorer
Setting an object instantiates or activates a new instance of its object type. So we’re saying let’s start a new instance of the IE browser.
objIE.Visible = True
Awesome – your bot just opened your IE browser! Properties (characteristics, such as making IE Visible) and Methods (actions) are invoked after typing a dot (
.) after an object, like you see here. In your VBE, delete this line and start retyping it. You’ll notice after typing
objIE. that the Intellisense helper pops up, listing all the properties and methods available for the
InternetExplorer object. Go ahead and scroll through the list.
.navigate event tells IE to navigate to whatever webpage is within the quotes. Yay automation time!
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
But the code wants to run all at once, in a millisecond. By the time IE actually loads the page of search results, the entire subroutine is long over. We have to slow things down and wait for the browser while its navigating. We could use something like
Application.Wait(Now + TimeValue("00:00:05")) to wait 5 seconds, but if the page loads faster we’re wasting time waiting for nothing, and if the page takes longer we’re in the same pickle as before.
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop is an elegant
Do...Loopsolution consisting of 3 VBA statements we’ve tidied up on one line, using colons (
:) to connect them. It’s saying: while IE is busy navigating, just sit here and run in circles.
objIE.Busy = True and
objIE.readyState <> 4 (
<>means not equal) are essentially the same thing –one could probably be ommitted– but just to make sure the page is really done loading we make both of these conditions requirements for getting out of the loop, by using the logical operator
DoEvents part is technically not needed but we add it in the loop to free up processing resources and allow other things to happen within Excel during this time, asynchronous to the running loop. For the over-achievers out there, here are some other ways we could have written this line of code to do the same thing.
objIE.document.getElementById("search_form_input_homepage").Value = _
Sheets("Sheet1").Range("A2").Value & " in " & Sheets("Sheet1").Range("C1").Value
This VBA statement says: Find the search box element on the webpage by its ID, which is search_form_input_homepage, and make its text value what we have in cell A2, plus the string ” in “, plus what we have in cell C1. The underscore (
_) just lets us put longer pieces of code on multiple lines for better readability. How did I know what the search input box’s ID was, or that it even had an ID? Firebug of course! If you don’t have Firefox get it, and then install the Firebug add-on. Then right-click the search box, click Inspect Element with Firebug, and behold…
…the line of code is highlighted in the Firebug console (I added the red box around the ID). The image above reminds us a webpage is organized in a nested, tree-like structure, starting with
.document as the root objectand branching out from there into smaller objects. This is called the Document Object Model.
.getElementById("__") is the go-to tool for grabbing one of those objects from the webpage. With it you can scrape data from a page, as well as writing to a webpage as we’re doing here. Here’s a sneak peak at some other ways we could have found the search box and entered data into it (shown in the order I would have tried them, if an ID was not available).
I used Firebug to find the ID of the search button (magnifying glass) on the page, then used
.getElementByIdagain to grab it, and the
.Click method to click it. Alternatively, we could have submitted the search box form like this instead
objIE.document.getElementById("search_form_homepage").Submit. Actually, there’s a simpler method still that would allow us to skip the previous 3 steps altogether. Can you see it? Hint: take a look at the url we ended up with… https://duckduckgo.com/?q=auto+parts+in+Houston+TX. Got it? If you still don’t see it, and you don’t think you’ll be able to sleep tonight, email me 🙂
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
We need to wait for the browser for a second (an eternity in CPU land) while the search is running.
y = 2
Now we have a page full of search results. We’ll put them in column C, but we need to start in row 2 since C1 contains our heading Houston TX.
For Each aEle In objIE.document.getElementsByClassName("result__a")
This is the opening statement in our action-packed
For...Next loop, where the procedure seeks out each
aEle(link) object that’s found in the collection of all objects on the page that have a class of result__a (again I used Firebug to locate this particular class name to target). For each match found, we temporarily assign it to the
aEle variable and do the following with it…
result = aEle
Sheets("Sheet1").Range("C" & y).Value = result
We take the found <a> element (
aEle) and assign its value to a string variable,
result. Ok, we really didn’t need to do this… we could just work with
aEle directly without passing it around… I just wanted to demonstrate what a string variable is! Then we print the string to the sheet in column C, row
Sheets("Sheet1").Range("D" & y).Value = aEle.innerText
Let’s say we also grab the anchor text for each link and plop that into the cell to the right. The
.textContent) gets us that text. Side-note: I’ve built many bots with the code
If InStr(objIE.document.body.innerText, "some text") > 0 then ... where I needed to know if the web page had some text in it somewhere.
We’ll also print this text in the debugging console, the Immediate Window, as well (note: you may have to press CTRL+G to show the Immediate Window, or open it from the View menu).
If InStr(result, "yellowpages.com") > 0 Or InStr(result, "yp.com") > 0 Then
Sheets("Sheet1").Range("C" & y).Interior.ColorIndex = 3
Sheets("Sheet1").Range("B" & y).Value = 1
Let’s say we want to single out Yellow Pages links.
InStr looks for one string inside another, using the format
InStr(haystack, needle) and returns the numerical starting point of the first occurence of needle, or
0 if not found. If ‘yp’ or ‘yellowpages’ is found in a link we make the interior of the cell red with the
.ColorIndex VBA property, and place a ‘1’ in the cell to the left. FYI, short and simple
If...Then statements, like this one:
If x = y Then Debug.print "Yo, x is the same as y, bro!", can be written on one line with the closing
If...Then statements are the backbone of any great A.I. … Watson, Wall-E, and now SearchBot()!
y = y + 1
Now we increment
y by 1, so each new result found will be printed on the next row down.
Continue looping until we run out of
aEle‘s to look at.
Sheets("Sheet1").Range("B1").Value = _
Since we put a 1 in row B each time we found a Yellowpages link, we can now just add them up, making cell B1 display the total. Alternatively, we could have used another
For...Next loop to look at each cell in column B one at a time, adding up all the 1’s. But using Excel’s powerful built-in WorksheetFunction‘s whenever possible is better.
So we opened IE, navigated to DuckDuckGo, performed a search, parsed the results on the web page, printed them to Excel, and even did some further analysis on the retrieved data. All that’s left is to close (
.Quit) the browser object and end (
End Sub) the procedure.
Finally, let’s save the file. We need to save the file with the extension .xlsm, since this is most definitely a macro-enabled workbook.
Congratulations! You built your first bot, and in the process learned a good deal about the most common VBA commands used to interact with and automate the web.
To download this Excel VBA project: xl-searchbot.xlsm (keyword search & scrape bot)
Recommended next: Web scraping with getElementsByTagName()