Selenium Chrome For Web App Testing And Automation
Selenium is an open-source tool that helps automate web browser interactions for website testing, scraping and more. It’s useful when you need to automate the browser to perform a number of tasks, such as clicking on buttons, scrolling, etc. Even if primarily Selenium is used for website testing, it can also be used for web scraping because it helps locate the required public data on a website.
It provides a single interface that lets you write scripts in programming languages like Python, Ruby, Java, NodeJS, PHP, Perl, and C#.
Modern web development needs Selenium testing because:
- It automates repeated testing tasks of smaller components of larger code-bases
- It’s integral to agile development and CI/CD
- It frees resources from manual testing
- It’s consistently reliable; catches bugs that human testers might miss
- You can test your web application at scale
- It’s precise; the customizable error reporting is an added plus
- It’s reusable; you can refactor and reuse an end-to-end test script every time a new feature gets deployed.
- It’s scalable; over time, you can develop an extensive library of repeatable test cases for a product
Selenium Webdriver also known as Selenium 2.0
WebDriver executes test scripts through browser-specific drivers. It consists of API, Library, Driver and Frameworks. It supports libraries for integration with natural or programming language test frameworks.
Basically the WebDriver has a local end (‘client’) which sends the commands (test scripts) to a browser-specific driver. The driver executes these commands on its browser-instance. That way if the test script calls for execution on Chrome and Firefox, the ChromeDriver will execute the test script on Chrome; on the other side the GeckoDriver will do the same on Firefox.
Selenium Chrome Proxy Authentication
When you need to use a proxy with Python and Selenium library with chromedriver you usually use the following code (Without any username and password):
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--proxy-server=%s' % hostname + ":" + port)
driver = webdriver.Chrome(chrome_options=chrome_options)
That works fine unless proxy requires authentication. If the proxy requires you to log in with a username and password you have to use one of the solutions explained below.
1. HTTP Proxy Authentication with Chromedriver in Selenium
In order to set up proxy authentication we will generate a special file and upload it to chromedriver dynamically using the following code below. This code configures selenium with chromedriver to use HTTP proxy that requires authentication with username and password.
import os
import zipfile
from selenium import webdriver
PROXY_HOST = '192.168.10.10' # rotating proxy or host
PROXY_PORT = 9000 # port
PROXY_USER = 'proxy-user' # username
PROXY_PASS = 'proxy-password' # password
manifest_json = """
{
"version": "1.0.0",
"manifest_version": 2,
"name": "Chrome Proxy",
"permissions": [
"proxy",
"tabs",
"unlimitedStorage",
"storage",
"<all_urls>",
"webRequest",
"webRequestBlocking"
],
"background": {
"scripts": ["background.js"]
},
"minimum_chrome_version":"22.0.0"
}
"""
background_js = """
var config = {
mode: "fixed_servers",
rules: {
singleProxy: {
scheme: "http",
host: "%s",
port: parseInt(%s)
},
bypassList: ["localhost"]
}
};
chrome.proxy.settings.set({value: config, scope: "regular"}, function() {});
function callbackFn(details) {
return {
authCredentials: {
username: "%s",
password: "%s"
}
};
}
chrome.webRequest.onAuthRequired.addListener(
callbackFn,
{urls: ["<all_urls>"]},
['blocking']
);
""" % (PROXY_HOST, PROXY_PORT, PROXY_USER, PROXY_PASS)
def get_chromedriver(use_proxy=False, user_agent=None):
path = os.path.dirname(os.path.abspath(__file__))
chrome_options = webdriver.ChromeOptions()
if use_proxy:
pluginfile = 'proxy_auth_plugin.zip'
with zipfile.ZipFile(pluginfile, 'w') as zp:
zp.writestr("manifest.json", manifest_json)
zp.writestr("background.js", background_js)
chrome_options.add_extension(pluginfile)
if user_agent:
chrome_options.add_argument('--user-agent=%s' % user_agent)
driver = webdriver.Chrome(
os.path.join(path, 'chromedriver'),
chrome_options=chrome_options)
return driver
def main():
driver = get_chromedriver(use_proxy=True)
driver.get('https://httpbin.org/ip') # any url you want to crawl
Function get_chromedriver returns configured selenium webdriver that you can use in your application.
2. Using Selenium-Wire Package
Selenium Wire extends Selenium’s Python bindings to give you access to the underlying requests made by the browser. You author your code in the same way as you do with Selenium, but you get extra APIs for inspecting requests and responses and making changes to them on the fly.
Example code from the documentation:
HTTP proxies
from seleniumwire import webdriver
options = {
'proxy': {
'http': 'http://user:[email protected]:8888',
'https': 'https://user:[email protected]:8888',
'no_proxy': 'localhost,127.0.0.1'
}
}
driver = webdriver.Chrome(seleniumwire_options=options)
SOCKS proxies
from seleniumwire import webdriver
options = {
'proxy': {
'http': 'socks5://user:[email protected]:8888',
'https': 'socks5://user:[email protected]:8888',
'no_proxy': 'localhost,127.0.0.1'
}
}
driver = webdriver.Chrome(seleniumwire_options=options)
Install with:
pip install selenium-wire
Another recommended package is webdriver-manager. It’s a package that helps with the management of binary drivers for different browsers. There’s no need to manually download a new version of a web driver after each update.
You can install the webdriver-manager using the pip command:
pip install webdrive-manager
Selenium is a great tool for public web scraping, especially when learning the basics. With the help of ProxyEmpire’s Residential And Mobile Proxies, web scraping becomes even more efficient.
TL;DR
Selenium is an automation tool for browser testing and web scraping. It supports multiple languages like Python. WebDriver executes scripts through browser-specific drivers.
To use a proxy with Selenium’s Python chromedriver binding:
- Generate a manifest and background script with proxy details.
- Zip the files into a plugin and add extension to ChromeOptions.
- Pass ChromeOptions when creating the driver to enable proxy auth.
Alternatively, use the Selenium-Wire package which offers simple proxy configuration:
from seleniumwire import webdriver
options = {'proxy': {'http': 'http://user:[email protected]:8888'}}
driver = webdriver.Chrome(seleniumwire_options=options)
Selenium-Wire also allows inspecting requests/responses.
For public web scraping, Selenium is great combined with reliable residential and mobile proxies from ProxyEmpire to avoid blocks.