Underlying concepts behind IP-based rate-limiting. We then build a Golang HTTP client with Tor and SmartProxy to bypass them
This article introduces the underlying concepts behind IP-based rate-limiting when communicating with a service and what can be done to circumvent it. We will first build a simple HTTP Client in Golang that will have the ability to rotate IP addresses when necessary and then look at how to build our own local proxy pool with Tor, to utilizing commercial proxy pool services with SmartProxy. By the end, we should know the effectiveness of rotating IP addresses to bypass IP-based rate limits and how to improve them.
There are many different methods of implementing rate limits. Some websites block access based on the IP range the address belongs to. For instance, websites may deny access to known ranges of Tor Nodes or even ranges of Amazon Web Service’s IP addresses. Generally, this approach is aimed at reducing the amount of non-human traffic, a bot will typically send many more requests from a single IP address than a human user could generate over the same period.
Websites can easily monitor traffic and know how many requests are being received from a specific IP address. If the number of requests exceeds a certain limit, websites can block that IP address or require a CAPTCHA test.
There are many ways to work around IP-based rate limiting. One option is to limit how many pages on a single site you scrape concurrently, and possibly even introduce delays (after reaching the original limit). The simplest way to work around an IP address-based rate limit is by changing the IP address from which the requests are sent. We can accomplish this functionality through the use of a proxy pool. If we assign each request another proxy, by modifying the reference of our HTTP Transport. We can make it appear each request is coming from a different user.
Now it’s time to discuss how to create a proxy pool with Tor. Tor is an overlay network protocol that enables computers to anonymously communicate to the web by passing communication through a series of proxies. We will be utilizing Stream Isolation over SOCKS. With this method, you only need one Tor instance and each request can use a different stream with a different exit node however one caveat, a different exit node does not guarantee us a different IP address. In order to isolate streams, we have to create unique
username:password values for each connection. To start we create a source of pseudo-randomness and create a random int to be used in our Tor SOCKS string. Next, we can parse our newly created Tor SOCKS string into a URL structure. Finally, we can then pass our Tor URL to
ProxyURL which returns a proxy function that will be used in our Transport.
There are a few problems we should address using Tor SOCKS as our proxy. Tor encrypts and anonymizes your connection by passing it through 3 relays. Your traffic is bouncing through multiple nodes in various parts of the world. This causes some bottlenecks and network latency that will always be present, but the most prominent bottleneck is the amount of time it takes to create a Tor Circuit. Since we are using one Tor instance and separating sessions across different ports we can potentially run out of available ports on our host machine if we are not keeping track of sessions. Luckily we don’t have to worry about these problems with SmartProxy.
SmartProxy gives you access to a pool of over 40 million IP addresses through a single IP address. Just like in the Tor section we can utilize a proxy pool however, in this case, we do not have to create the different IPs. We use our SmartProxy credentials to connect to the proxy pool to be used in our Transport.
We can now access the whole 40 million address pool with unlimited connection requests. An additional feature that I found to come in handy is the ability to toggle between sticky or rotating IPs upon each request. Having sticky IPs could allow us to only change our IP when it becomes exhausted (blocked) freeing up connection time. Now let’s use these proxies in our HTTP Client.
We are going to create our own HTTP client with the new proxy functions we created. We first create a singleton to ensure that we only create a singleton
Transport only once.
Note: In order to let the client reuse the underlying connection we just need to read the body entirely and close it before we issue a new HTTP request
We specified a
Timeout field, which is of type
time.Duration. When a client opens a connection to the server via HTTP, the server may take some time to respond to the request. This field enables us to specify a maximum waiting time to get a response from the server. The function
NewIP creates another proxy function and reassigns the proxy field inside of the
Transport field. Now let’s tie everything together in our main function.
After configuring and instantiating our Tor Client we can check the IP address of the request. We then call
NewIP to rotate our client’s IP address and check again and voilà!
Implementing rotating IP addresses with Tor or SmartProxy is an easy task that can be used to successfully bypass an IP-based rate limit. This can significantly reduce bottlenecks when making requests or web scraping.
While the code in this example is a rather naive approach, there are many ways to improve this example. To see the source code of the project you can visit my Github Repo.
I hope this will help you to build fun projects!