Caching Pacman Packages for Your Arch Linux Network With NGINX

This blog post will explain how to set up an NGINX server to act as a reverse caching proxy for pacman. This can benefit anyone who runs more than one Arch Linux machine on the same network and wants to cut down on bandwidth usage over their WAN connection. This is especially useful if you have a slow Internet connection or your ISP subjects you to data caps.

I read Ars Technica’s Steam caching guide and was intrigued by the lower bandwidth requirements and download speed increases that can be provided for dozens of videogame aficionados at LANs. The article talks about setting up a reverse caching proxy for Steam using a Docker image that provides a pre-configured NGINX instance and uses the bind9 DNS server. Although that docker image could be useful in this case, it’s much too heavy-handed of an approach, so we will use the relevant portions of the steamcache project’s NGINX configuration and work from there.

I was inspired to try out this kind of solution on my own network. At the time of writing the linux package available in the core repository is version 4.11.5-1, the download size is 61.32 MiB. If your network has three separate computers running Arch, you would have to use around 180 megabytes of bandwidth to update all of them. If you were using a caching server, NGINX would download and cache that 61.32 MiB file once, and all three of your computers would then pull the update from the cache. You would only consume one third of the bandwidth over your WAN connection that would be used without a cache.

The benefit of using NGINX is that we can set it up as a reverse caching proxy quite easily. Instead of locally mirroring every single package available in the Arch repositories, NGINX will only download the packages you update on demand. This will save tons of bandwidth and hard drive space on your server.

There already exist some projects such as pacserve for sharing packages on your network. Since I already had a server running NGINX on my own network, adding one more virtual host configuration seemed like an easier task for my own machines.

Installing NGINX

The NGINX package is available from the extra repository. We’re going to install the non-mainline version of NGINX.

$ sudo pacman -Sy nginx
$ nginx -v
nginx version: nginx/1.12.0

Configuring the Cache

Now we can begin configuring nginx. By default the nginx config resides in /etc/nginx/nginx.conf.

$ sudo vim /etc/nginx/nginx.conf

We’ll leave all the default configuration options intact. You need to add the cache configuration inside the http block.

http {

...

    # Use a custom log format that will show response times and cache status
    log_format archmirror '$remote_addr - $upstream_cache_status [$time_local] $request_method $host$request_uri $server_protocol $status $body_bytes_sent $request_time $upstream_response_time';

    # Configure the cache directory, size and keys
    proxy_cache_path /path/to/cache/archmirror
                     levels=1:2  keys_zone=archmirror:60m
                     inactive=365d use_temp_path=off max_size=5g;

    server {
        listen 80;
        server_name archmirror archmirror.lan;

        access_log /var/log/nginx/archmirror.access.log archmirror;
        error_log /var/log/nginx/archmirror.error.log;

        # Force proxy to use TLS for upstream server requests
        proxy_ssl_protocols     TLSv1 TLSv1.1 TLSv1.2;
        # Use previously negotiated connection parameters
        proxy_ssl_session_reuse on;
        # Enables revalidation of expired cache items using conditional requests with the "If-Modified-Since" and "If-None-Match" header fields.
        proxy_cache_revalidate  on;
        # Only one request at a time will be allowed to populate a new cache element
        proxy_cache_lock        on;
        # Cache any responses for 1 minute by default, can be overridden by more specific response codes
        proxy_cache_valid       any 1m;

        # Keep connections to upstream server open
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        proxy_read_timeout     300;
        proxy_connect_timeout  300;

        location / {
            proxy_pass             https://mirrors.kernel.org;
            proxy_cache            archmirror; # This directive should match the keys_zone option
            proxy_cache_valid      200 5m;
            proxy_cache_use_stale  error timeout invalid_header updating http_500 http_502 http_503 http_504;

            # Add some cache status headers for debugging purposes, you can remove these lines if you want
            add_header X-Upstream-Status $upstream_status;
            add_header X-Cache-Status $upstream_cache_status;
        }
    }
}

Save the file and exit vim or your editor of choice.

Note

I chose the kernel.org mirror since their servers are located physically close to me and they provide HTTPS. You may want to choose a different mirror that will work better for you.

Although it would be cleaner to set up the caching configuration up using a separate vhost file, that setup falls outside the scope of this article. For information on how to set up vhosts with NGINX you can see the documentation of the include directive.

Cache Validity

The configuration we are using does several things.

  • Uses https to connect to the Arch mirror on mirrors.kernel.org
  • Keeps cached files stored for up to 365 days.
  • Keeps a maximum of five gigabytes’ worth of cached files.
  • Caches any response — including error responses — for one minute proxy_cache_valid any 1m;
  • Caches any valid 200 responses for five minutes proxy_cache_valid 200 5m;.

The proxy_cache_valid directives are slightly misleading. It may seem that HTTP 200 responses are only valid for five minutes and are discarded but we also enabled proxy_cache_revalidate. This will make NGINX serve HTTP 200 responses from the local cache for five minutes after the file is first downloaded without performing any new requests to the upstream server for that duration. After five minutes have elapsed any new request for that resources will cause NGINX to set the “If-Modified-Since” or “If-None-Match” headers as needed when requesting from the upstream server. If the file has not been modified since it was locally cached, the upstream server will respond with a 304 Not Modified status, signifying that the locally cached file is still valid.

If we use our 61.32 MiB linux package example from before, as long as the file in the cache remains valid, NGINX will use less than one kilobyte (the HTTP request and response headers) of bandwidth to re-check the validity of the locally stored file. Even if your LAN is only 100 megabits, your Arch machine just received its linux package update in five seconds and you’ve only used one kilobyte of bandwidth over your WAN connection. Pretty sweet!

Testing the Configuration

We can test the configuration syntax using nginx -t. If the test is successful we will enable and start the nginx service.

$ sudo nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
$ sudo systemctl enable --now nginx
Created symlink /etc/systemd/system/multi-user.target.wants/nginx.service → /usr/lib/systemd/system/nginx.service.

Updating Your DNS

Now you can update your DNS server configuration to point the archmirror.lan domain to the IP address of the server running our NGINX cache. Since I am not sure how your DNS is set up on your own network, we will test our changes directly on the server running the cache by editing /etc/hosts, but you should not treat that edit as a permanent solution.

$ sudo vim /etc/hosts

Now we add new entries for our archmirror.lan domain at the bottom of the file, and save it

#
# /etc/hosts: static lookup table for host names
#

#<ip-address>   <hostname.domain.org>   <hostname>
127.0.0.1 localhost.localdomain  localhost
::1       localhost.localdomain  localhost

...

127.0.0.1 archmirror.lan archmirror
::1       archmirror.lan archmirror

Now we make sure our machine can resolve the domain name using ping.

$ ping archmirror.lan
PING archmirror.lan(localhost.localdomain (::1)) 56 data bytes
64 bytes from localhost.localdomain (::1): icmp_seq=1 ttl=64 time=0.053 ms
^C
--- archmirror.lan ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.053/0.053/0.053/0.000 ms

Updating Your Pacman Configuration

First we have to find out the URL format for the mirror we wish to use. As stated before, I have chosen mirrors.kernel.org since their servers are physically close to me and they provide HTTPS for their mirror. Do note however that if the mirror you wish to use as your upstream server changes their URL format, you will need to update your pacman configuration accordingly.

We’ll use grep to show us the relevant URLs for our mirror of choice

$ grep https://mirrors.kernel.org /etc/pacman.d/mirrorlist
#Server = https://mirrors.kernel.org/archlinux/$repo/os/$arch

Now we take that URL, and replace the domain name with archmirror.lan in our mirror configuration file

$ sudo vim /etc/pacman.d/archmirror

The /etc/pacman.d/archmirror file should contain the single line below

Server = http://archmirror.lan/archlinux/$repo/os/$arch

Save the file, and we’ll edit the main pacman configuration file.

$ sudo vim /etc/pacman.conf

We’re looking to include our archmirror configuration lines above the existing lines that include the main mirrorlist. Add the include to every repository you’d like to use your NGINX cache for.

...

[core]
Include = /etc/pacman.d/archmirror
Include = /etc/pacman.d/mirrorlist

[extra]
Include = /etc/pacman.d/archmirror
Include = /etc/pacman.d/mirrorlist

[community]
Include = /etc/pacman.d/archmirror
Include = /etc/pacman.d/mirrorlist

...

Save the pacman configuration. It’s time to test our work.

Testing the Cache

Let’s watch the NGINX log to see our cache being used.

$ tail -f /var/log/nginx/archmirror.access.log

Now we run a pacman update

$ sudo pacman -Syy
:: Synchronizing package databases...
 core                               124.3 KiB   464K/s 00:00 [################################] 100%
 extra                             1668.0 KiB   401K/s 00:04 [################################] 100%
 community                            3.9 MiB   399K/s 00:10 [################################] 100%
 multilib                           176.5 KiB   426K/s 00:00 [################################] 100%
127.0.0.1 - MISS [16/Jun/2017:19:09:09 -0700] GET archmirror.lan/archlinux/core/os/x86_64/core.db HTTP/1.1 200 127249 0.423 0.423
127.0.0.1 - MISS [16/Jun/2017:19:09:09 -0700] GET archmirror.lan/archlinux/core/os/x86_64/core.db.sig HTTP/1.1 404 311 0.104 0.104
127.0.0.1 - MISS [16/Jun/2017:19:09:13 -0700] GET archmirror.lan/archlinux/extra/os/x86_64/extra.db HTTP/1.1 200 1708020 4.267 4.267
127.0.0.1 - MISS [16/Jun/2017:19:09:13 -0700] GET archmirror.lan/archlinux/extra/os/x86_64/extra.db.sig HTTP/1.1 404 311 0.100 0.100
127.0.0.1 - MISS [16/Jun/2017:19:09:23 -0700] GET archmirror.lan/archlinux/community/os/x86_64/community.db HTTP/1.1 200 4057104 10.049 10.049
127.0.0.1 - MISS [16/Jun/2017:19:09:23 -0700] GET archmirror.lan/archlinux/community/os/x86_64/community.db.sig HTTP/1.1 404 311 0.094 0.094
127.0.0.1 - MISS [16/Jun/2017:19:09:24 -0700] GET archmirror.lan/archlinux/multilib/os/x86_64/multilib.db HTTP/1.1 200 180770 0.531 0.531
127.0.0.1 - MISS [16/Jun/2017:19:09:24 -0700] GET archmirror.lan/archlinux/multilib/os/x86_64/multilib.db.sig HTTP/1.1 404 311 0.110 0.110

As you can see, the download speed is relatively low since the files are being downloaded from the upstream mirror into our local NGINX cache. The log file shows us in the third column that all the files were not in the cache (a cache MISS) along with HTTP response codes and upstream server response times. This particular mirror doesn’t provide any signature files, hence the 404 errors for all the .sig requests.

Now we’ll force pacman to update its database files again.

$ sudo pacman -Syy
:: Synchronizing package databases...
 core                               124.3 KiB  0.00B/s 00:00 [################################] 100%
 extra                             1668.0 KiB   271M/s 00:00 [################################] 100%
 community                            3.9 MiB   276M/s 00:00 [################################] 100%
 multilib                           176.5 KiB  0.00B/s 00:00 [################################] 100%
127.0.0.1 - HIT [16/Jun/2017:19:09:50 -0700] GET archmirror.lan/archlinux/core/os/x86_64/core.db HTTP/1.1 200 127249 0.000 -
127.0.0.1 - MISS [16/Jun/2017:19:09:50 -0700] GET archmirror.lan/archlinux/core/os/x86_64/core.db.sig HTTP/1.1 404 311 0.097 0.097
127.0.0.1 - HIT [16/Jun/2017:19:09:50 -0700] GET archmirror.lan/archlinux/extra/os/x86_64/extra.db HTTP/1.1 200 1708020 0.000 -
127.0.0.1 - MISS [16/Jun/2017:19:09:50 -0700] GET archmirror.lan/archlinux/extra/os/x86_64/extra.db.sig HTTP/1.1 404 311 0.098 0.098
127.0.0.1 - HIT [16/Jun/2017:19:09:50 -0700] GET archmirror.lan/archlinux/community/os/x86_64/community.db HTTP/1.1 200 4057104 0.000 -
127.0.0.1 - MISS [16/Jun/2017:19:09:50 -0700] GET archmirror.lan/archlinux/community/os/x86_64/community.db.sig HTTP/1.1 404 311 0.094 0.094
127.0.0.1 - HIT [16/Jun/2017:19:09:50 -0700] GET archmirror.lan/archlinux/multilib/os/x86_64/multilib.db HTTP/1.1 200 180770 0.000 -
127.0.0.1 - MISS [16/Jun/2017:19:09:50 -0700] GET archmirror.lan/archlinux/multilib/os/x86_64/multilib.db.sig HTTP/1.1 404 311 0.099 0.099

This time the download speeds are much faster (~400K/s vs ~270M/s) since the files were read from the hard drive of the machine performing the update. Looking at the logs we can see we got cache HITs for every requests that wasn’t for a .sig file.

All that’s left to do now is to update the rest of your Arch machines’ pacman configurations to use the mirror and you’re done!

Final Thoughts

I love the flexibility of pacman. It’s fast and dead simple to be able to easily choose your mirrors like this. NGINX is easy to set up and will run reliably, requiring little maintenance. On my own network I’ve been running the cache on a Raspberry Pi for months without any issues. I hope this helps you speed up pacman updates on your own network.