So maybe you’ve followed our post on how to compile HAProxy, or maybe you even read the one on how to configure internal company services to use SSL and now want to put Apache Archiva behind HAProxy with SSL termination.

To install Apache Archiva you just really need to download the binary and spin it up. There is a caveat, though. Up until version 2.2.8 you'll only be able to run it using Java8. If you try to start it using Java11 or similar, it will output:

archiva BeanDefinitionStoreException: Unexpected exception parsing XML document from URL

So be sure to edit the conf/wrapper.conf and change it so that it will grab the correct Java version:

# Wrapper Properties

As soon as you place Archiva behind an SSL-terminating proxy, you’ll get errors like these from Jetty (web-server powering Archiva):

HTTP Header check failed. Assiming CSRF attack.
Origin Header does not match: originUrl=, targetUrl= Matches: Host=true, Port=true, Protocol=false

This is because Jetty is protecting you from Cross-Origin Resource Sharing (CORS) attacks. You can simply edit conf/archiva.xml and look for rest.baseUrl field.

You'll also want to add your scheme and domain to the application and applicationUrl sections.

edit the archiva.xml and look for the rest, application and ui sections

And that's all there's to it!

Note that trying to download artifacts via the GUI will still point you to a non-https page. That's a reported UI bug which will not interfere with Maven, etc.

Edit 2022-09-20
All the information below is here for reference only.
Eventually I'll migrate it to an HAProxy post.

You could probably alter something on Jetty side to make it compliant with the fact that you’re accessing the service from an HTTPS proxy, but we’re a little more familiar with HAProxy, so we’re just going to re-write the queries there.

The configuration below will probably not differ much from what you have:

    tune.h2.max-concurrent-streams 5

frontend archiva
    bind :8888 crt /path-to-crt/cert.pem alpn h2,http/1.1
    default_backend archiva_backend

backend archiva_backend
    option forwardfor
    http-request set-header Origin http://%[hdr(host)]/
    server server1 check inter rise 2 fall 2 maxconn 5

But there are a couple of very important details!

option forwardfor, makes HAProxy sent the X-Forwarded-For to your webserver. For a lot of services this is all you would need to make it workable, so it’s a good practice to include.

http-request set-header Origin http://%[hdr(host)]/, this is where the magic happens. We change the header Origin and replace the https with http, while keeping the same host thus making the Origin HTTP header what Jetty/Apache Archiva is expecting.

That's all! You’ll have a working Apache Archiva behind HAProxy with SSL termination.

What about performance?

Notice that we’re using a very low value for maxconn? You can profile your setup for the best throughput, but as a rule of thumb it’s ideal to have the backend servers respond very fast to very few queries. Imagine your setup as an Ice-Cream shop; the Archiva Server is the guy piling scoops. If you start overloading him with requests, he won’t respond faster, but those incoming solicitations will take a toll on his performance so, overall, everything will run slower.

This is more or less what happens with servers. Your server will have the ability to respond to way less requests than HAProxy can handle, so it’s more efficient to just let HAProxy deal with the pile and have the servers work at maximum speed.

The guys at Lucid Chart wrote a post on why turning http2 on for them was a mistake. It’s a good read and one you should probably be aware. You see, even if you limit the number of connections to the backend server(s), as http2 puts all requests in a single stream, they will pile up very fast.

On Lucid Chart, the performance plummeted. Backend servers were overwhelmed, probably started swapping and just spiralled out of control. Mind you that they were not using a load-balancer with the amount of precision HAProxy gives you.

tune.h2.max-concurrent-streams 5, simple as that. Now every server will take at most 5 connections and any of those 5 connections can have up to 5 parallel streams, so that makes it at most 5*5=25 simultaneous requests per server.

Maybe the ideal number for you is even less. Don’t be shy on tuning parallelism down, what matters is throughput.