Chromedriver Proxy with Selenium using Python

Selenium Chrome For Web App Testing And Automation

Selenium is an open-source tool that helps automate web browser interactions for website testing, scraping and more. It’s useful when you need to automate the browser to perform a number of tasks, such as clicking on buttons, scrolling, etc. Even if primarily Selenium is used for website testing, it can also be used for web scraping because it helps locate the required public data on a website.

It provides a single interface that lets you write scripts in programming languages like Python, Ruby, Java, NodeJS, PHP, Perl, and C#.

Selenium automates frequent and recurrent functional, performance, and compatibility testing. This gives developers near-instant feedback for faster debugging, leaving them with more time to code business logic for newer versions/features.

Modern web development needs Selenium testing because:

  • It automates repeated testing tasks of smaller components of larger code-bases
  • It’s integral to agile development and CI/CD
  • It frees resources from manual testing
  • It’s consistently reliable; catches bugs that human testers might miss
  • You can test your web application at scale
  • It’s precise; the customizable error reporting is an added plus
  • It’s reusable; you can refactor and reuse an end-to-end test script every time a new feature gets deployed.
  • It’s scalable; over time, you can develop an extensive library of repeatable test cases for a product

Selenium Webdriver also known as Selenium 2.0

 WebDriver executes test scripts through browser-specific drivers. It consists of API, Library, Driver and Frameworks. It supports libraries for integration with natural or programming language test frameworks.

Basically the WebDriver has a local end (‘client’) which sends the commands (test scripts) to a browser-specific driver. The driver executes these commands on its browser-instance. That way if the test script calls for execution on Chrome and Firefox, the ChromeDriver will execute the test script on Chrome; on the other side the GeckoDriver will do the same on Firefox.

 

Selenium Chrome Proxy Authentication

When you need to use a proxy with Python and Selenium library with chromedriver you usually use the following code (Without any username and password):
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--proxy-server=%s' % hostname + ":" + port)
driver = webdriver.Chrome(chrome_options=chrome_options)
That works fine unless proxy requires authentication. If the proxy requires you to log in with a username and password you have to use one of the solutions explained below.

1. HTTP Proxy Authentication with Chromedriver in Selenium

In order to set up proxy authentication we will generate a special file and upload it to chromedriver dynamically using the following code below. This code configures selenium with chromedriver to use HTTP proxy that requires authentication with username and password.

import os
import zipfile

from selenium import webdriver

PROXY_HOST = '192.168.10.10'  # rotating proxy or host
PROXY_PORT = 9000 # port
PROXY_USER = 'proxy-user' # username
PROXY_PASS = 'proxy-password' # password


manifest_json = """
{
    "version": "1.0.0",
    "manifest_version": 2,
    "name": "Chrome Proxy",
    "permissions": [
        "proxy",
        "tabs",
        "unlimitedStorage",
        "storage",
        "<all_urls>",
        "webRequest",
        "webRequestBlocking"
    ],
    "background": {
        "scripts": ["background.js"]
    },
    "minimum_chrome_version":"22.0.0"
}
"""

background_js = """
var config = {
        mode: "fixed_servers",
        rules: {
        singleProxy: {
            scheme: "http",
            host: "%s",
            port: parseInt(%s)
        },
        bypassList: ["localhost"]
        }
    };

chrome.proxy.settings.set({value: config, scope: "regular"}, function() {});

function callbackFn(details) {
    return {
        authCredentials: {
            username: "%s",
            password: "%s"
        }
    };
}

chrome.webRequest.onAuthRequired.addListener(
            callbackFn,
            {urls: ["<all_urls>"]},
            ['blocking']
);
""" % (PROXY_HOST, PROXY_PORT, PROXY_USER, PROXY_PASS)


def get_chromedriver(use_proxy=False, user_agent=None):
    path = os.path.dirname(os.path.abspath(__file__))
    chrome_options = webdriver.ChromeOptions()
    if use_proxy:
        pluginfile = 'proxy_auth_plugin.zip'

        with zipfile.ZipFile(pluginfile, 'w') as zp:
            zp.writestr("manifest.json", manifest_json)
            zp.writestr("background.js", background_js)
        chrome_options.add_extension(pluginfile)
    if user_agent:
        chrome_options.add_argument('--user-agent=%s' % user_agent)
    driver = webdriver.Chrome(
        os.path.join(path, 'chromedriver'),
        chrome_options=chrome_options)
    return driver

def main():
    driver = get_chromedriver(use_proxy=True)
    driver.get('https://httpbin.org/ip')  # any url you want to crawl

Function get_chromedriver returns configured selenium webdriver that you can use in your application.

2. Using Selenium-Wire Package

Selenium Wire extends Selenium’s Python bindings to give you access to the underlying requests made by the browser. You author your code in the same way as you do with Selenium, but you get extra APIs for inspecting requests and responses and making changes to them on the fly.

Selenium-Wire on GitHub

Example code from the documentation:

HTTP proxies

from seleniumwire import webdriver

options = {
    'proxy': {
        'http': 'http://user:pass@192.168.10.100:8888',
        'https': 'https://user:pass@192.168.10.100:8888',
        'no_proxy': 'localhost,127.0.0.1'
    }
}

driver = webdriver.Chrome(seleniumwire_options=options)

SOCKS proxies

from seleniumwire import webdriver

options = {
   'proxy': {
        'http': 'socks5://user:pass@192.168.10.100:8888',
        'https': 'socks5://user:pass@192.168.10.100:8888',
        'no_proxy': 'localhost,127.0.0.1'
    }
}
driver = webdriver.Chrome(seleniumwire_options=options)

Install with:

pip install selenium-wire

Another recommended package is webdriver-manager. It’s a package that helps with the management of binary drivers for different browsers. There’s no need to manually download a new version of a web driver after each update.

You can install the webdriver-manager using the pip command:

pip install webdrive-manager

Selenium is a great tool for public web scraping, especially when learning the basics. With the help of ProxyEmpire’s Residential And Mobile Proxies, web scraping becomes even more efficient.

Get Full Access To All Of Our Residential Proxies.

ProxyEmpire
Works With All Of Your Favorite Tools.

Learning Center