Optimizing Content Migrations With Edge Compute

The more you learn about edge computing, the more you look for compelling use cases that developers can relate to. Today, I have one: edge redirects.

background

Imagine that you are running a site with a lot of content. I’m talking, like hundreds of thousands of articles. One day, I decided it was time to change the domain name.

By this time, you’ve probably collected a million links to your old domain. In order to prevent these links from breaking, you will need to set up a redirection server to handle traffic going to the old domain, and forward it to the new domain.

If your old server is using NGINX, you can add some configuration that looks like this:

server {
    listen 80;
    listen 443 ssl;
    server_name www.old-name.com;
    return 301 $scheme://www.new-name.com$request_uri;
}

This small snippet takes any request coming to the old URL and passes it to a new URL. This has worked well for a long time, but what if I told you we could improve it by using edge arithmetic instead?

Latency issue

Imagine that your servers live somewhere in North America. If a user in Asia goes to your old URL, their request has to travel all the way to North America only to be told “sorry, your princess is in another castle” (aka redirect instructions). This redirection response returns across the Pacific Ocean with the new location. Upon arrival, their browser now sends another request back to North America for the real content, and finally, the user reaches their original target, which is a great article on HTML forms. ;Dr

Here is a small diagram I made myself to help visualize it:

Imagine latency

The user requests the old URL, the old server responds with instructions to redirect to the new URL, the browser redirects the request to the new URL, and the response is finally sent to the user.

In case I haven’t made it painfully clear yet, one of the big problems with this redirect chain is that the user has to wait Two round trips halfway across the world.

This extra time spent on getting redirect instructions can actually be greatly reduced by using edge functions.

Edge jobs are serverless jobs that are spread all over the world. Akamai, for example, has over 250,000 websites. By setting up an edge function to handle the old URL, the initial user request only needs to go to the nearest edge server for redirection instructions.

Let’s compare this with the same example above, but this time using edge arithmetic.

A user in Asia goes to your old URL, and their request should only go to the nearest edge server location (very likely in their same city). The princess is still in a different castle (aka redirection instructions) but at least they are told right away. Once the browser sees the redirection instructions, everything runs as above except that now the user didn’t have to wait long to read that great article (seriously, I spent a lot of time on that. Please check out).

Once again, here’s my wonderful perception.

time flow chart

The user requests the old URL, the edge server closest to the user responds with redirect instructions, the browser redirects the request to the new URL, and the response is finally sent to the user.

(Do you see how I made the arrows longer and shorter? This is a symbolic representation of time. My dog ​​says I’m smart.) This might save a few hundredths of a second. In the grand scheme, this might not seem like much. But it’s time to do it Nothing at all.

just wait.

Not waiting for the server to perform some calculation or update the database. Just waiting for a request to fly into the air. The thing is, for some organizations where even tens of milliseconds can mean the difference between sales, that’s actually a lot of time to spend doing nothing. So I say don’t do.

restructuring issue

This next issue becomes more interesting. Let’s say, in addition to changing the domain name, you want to change the URL structure of your blog posts. This might happen because the old website used blog post IDs in the URL, but someone told you it’s better to use the blog post address in the URL.

So your redirects should look like this:

  • From: old-url.com/blog/1
  • To: new-url.com/articles/10-reasons-nugget-why-is-a-good-boy

There are 3 things going on here:

  1. field changed
  2. The path ‘/blog’ has been renamed to ‘/articles’
  3. Permalink for individual posts uses title instead of ID

The first two requirements are easy enough to handle with something that rewrites the rules of NGINX and the regular expression. The latter is kind of a pain in the ass because there is no way to programmatically define how an old URL should point to a new URL.

One solution might be to have the server accept requests to the old URL, search for the requested blog post in a database using the blog post ID, build the new URL using the blog post address, and respond with redirect instructions to the generated URL.

This would work, but there are two problems. Database queries will likely add more response time to the request and may end up costing you more money to keep running.

The best solution, in my opinion, is to create a 1 to 1 mapping for all old URLs for all new URLs. This means that you will need to create one static rewrite rule for each URL on your old domain. When you have hundreds of thousands of posts, it takes a lot of work.

Fortunately, you will likely only need to do this once during the migration, and you can create a script that iterates through each database entry and creates the rewrite rule for you (yay robots).

Unfortunately, this is also the case if you are using edge functions. There is no way around the need to create a 1 to 1 mapping.

To which you can reply:

If we need the URL map anyway, how is this different from the response time issue?

I will reply to him:

I’m glad you asked.

the key (This is a pun that will make sense soon) The difference here is that web servers like NGINX read the rules sequentially. This means that if we have 100,000 redirect rules and someone asks for the last rule, the server must read all the previous rules before finally reaching the last rule and respond with instructions.

Sharp computing has another trick up its sleeve: principal value storage. Most large computing players also have a key-value storage offering:

In the above example, NGINX handles each of the 100,000 redirect rules has a complex algorithm the(n). Which is a great way of saying that it takes longer to get to the last item in the list as the list grows.

On the other hand, most of the key-value storage services have a complexity algorithm the(1). Which is a great way of saying that no matter how long the list of items is, our search times will be the same. Going back to the example, we can store all of our URL mappings in edge key value storage and use edge functions to dynamically search for the redirect for each request.

Let’s break it down into steps:

  1. The user submits a request to old-url.com
  2. The order is handled by the user’s cabinet edge function.
  3. The Edge function looks for the redirect URL from the key-value store based on the requested URL.
  4. Edge function returns redirection instructions to the user.
  5. The user’s browser redirects to the new URL.
  6. The original server for the new URL handles the request.

Hope this makes sense, but the catch here is that in addition to the feature described in the previous section regarding latency, we may also see performance improvements for large-scale redirects where wildcards or regex transformations don’t make sense.

complexity issue

Before we get too deep into this, I want to admit that I have an inherent bias from being a front-end/JavaScript developer, but I still think that’s a compelling point.

In the examples above, I am making comparisons between edge computing and NGINX. It is one of the most widely used server technologies in the world. However (and here’s where my bias is), I’m willing to bet that NGINX isn’t the main programming language for you or your team and it just adds more complexity to your stack.

The added complexity is worth it for more organizations because it’s so good at servicing fixed assets, being a reverse agent, load balancing, and doing all the other things we use them for.

But do we really need that extra complexity of a redirection server?

I have to admit that one of the reasons edge computing is so attractive to me is that most edge platforms support my favorite programming language, JavaScript. This makes it easier for me to stay productive. Although I may have to switch context between frontend, backend and edge runtimes, at least I can keep writing the same language everywhere.

I cannot say that my experience is the same with NGINX.

In addition to setting up NGINX in the first place, any time I need to make configuration changes I need to look up how to do it again in the documentation. Even for things you’ve done hundreds of times before.

Another great benefit of edge computing is that we are dealing with serverless functions that can be incremented automatically. We, as developers, don’t have to worry about providing a server, knowing how much resources it needs, determining the best area to deploy, and yada yada.

We just have to write the functions and let the service provider know that our code is running in the most efficient way. And when traffic ramps up, it expands to handle the load automatically.

This does not mean that I think NGINX is the wrong choice for the server. In fact, a few paragraphs ago I listed some excellent use cases for NGINX. But for redirects, I prefer spreading the edge function all over the world and using the programming language I know a lot to write the logic. Then I can go back to the hangout with Nugget.

conclusion

This blog post was mostly inspired by some internal conversations that became part of the work at Akamai. We work with huge organizations that solve some of the most technically challenging problems, and the people I work with are super smart.

Unfortunately, many of the details of these conversations can’t be shared publicly for one reason or another, so I wanted to take a moment to point out some really cool things that I think can be shared. The above examples where we talked about migrating 100,000 blog posts to a new domain. This has already happened. It is based on a real migration made by Akamai through an internal website about a third more redirects.

I think the coolest thing about this is that it’s an example of the Akamai feeding their own product. This is so as not to enter into the work of clients. I saw one case where a customer needs almost custom redirects million URLsand it looks like EdgeWorkers would be the solution.

Anyway, I’m pretty excited about the future of edge computing, and I hope this article has given you at least one solid use case to get excited about, you.

So, in conclusion, the main benefits of edge redirects over traditional redirects are:

  • Spend less time waiting for requests to travel to the redirect server.
  • Spend less time computing on redirect servers that require large URL mappings.
  • Less complicated than provisioning and scaling a forwarding server.

Thank you very much for your reading. If you liked this article please share it and if you want to know when I publish more articles, Follow me on Twitter. cheers!

.

Leave a Comment