I've switched blog engines and now every post has 100 million valid URLs

2025-03-09

This article is part of a series: le blog. Other articles in the series include:

A new blog theme!
Publish a Zola blog with Gitlab CI, real fast
I've switched blog engines and now every post has 100 million valid URLs (👈 you are here)

Another very self-centered update post about me merging my personal and work blogs, and some fun facts about the changes in URLs this caused.

First, a nice trick: tracking Atom feed subscribers server-side, with Nginx

Did you know that using the Nginx reverse proxy, it’s quite easy to have a rough idea of the number of subscribers to an RSS (or an Atom) feed?

Find the logs, usually in /var/logs/nginx, then in the access log of the given website, look for the feed URL. For me it’s atom.xml, so I can run this line to get the number of regular subscribers:

sudo awk '/atom/ { print $1 }' bouvier.cc.log | sort -u | wc -l

This can be decomposed as such:

sudo because reading files in this directory requires root access on my server.
the awk bit of the command will find all the lines that include “atom”, in the bouvier.cc.log file, and select the first word, aka set of characters split by a space (print $1) — in this case, the IP address of the visitor.
sort -u will sort and group the IP addresses (i.e. removes duplicates).
wc -l will count the number of lines, that is, the number of unique IP addresses.

This makes the assumption that one IP address = one visitor, but this can be wrong in multiple ways:

any local network using NAT will appear as a single IP address, while there can be many users in that network: for instance, on a university network, etc.
feed aggregator Web services may access the blog from a unique IP, and re-dispatch a blog’s feed to many users.

Ok cool, so the good news is that it’s an underestimate, at worst 😎

Merging notes.bouvier.cc with this blog because ain’t nobody got time for that

So, to the 35 regular readers of notes.bouvier.cc — my former French rather personal blog — and to the 48 regular readers of this blog bouvier.cc ^[1], I’ve got great news: I’m merging my two blogs back into one, because I don’t see the point anymore of having two blogs, and as such the overhead of maintaining two workflows/blogs/themes isn’t worth it.

Posting anything personal on the Internet makes it public forever and more, thanks to search engines, the Wayback Machine, and LLMs trained on that content. As such, there’s not really a point in half-hiding “personal notes”, as long as they’re tied to my name, more or less closely. Better anonymity would only happen with using the Tor browser, a pseudonym, and an hosted blog platform — things I don’t consider are worth it, at this point, nor that I am interested in doing.

To be honest, this may be slightly disruptive only to people who followed this blog via the Atom feed, as they’ll get all the notes in French, now. I’ve made use of tags and sections to try to keep the content tidy, so there are actually new feeds:

This Atom feed, for notes: this is equivalent to the Atom feed for notes.bouvier.cc.
This Atom feed, for tech posts: this is equivalent to the former Atom feed for bouvier.cc.
Now this website’s feed (that you can get by clicking “atom feed”) includes both kinds of posts.

So how did I get there?

Switching domain name because now I’m a grownup

First of all, the 3 people really following this blog for a long time may have noticed that I’ve changed the domain name benj.me to move to this new one, bouvier.cc. The main reason was that benj.me was from a previous era where I found it cool to have a .me website, as it was supposed to be the pinacle of personal sites.

I’ve also hosted email addresses on that domain. But over the years, I’ve started to feel a tiny bit silly/unhinged when I read out benj.me to my banker or in some other formal settings — benj sounds rather familiar to me, as a shorthand for Benjamin, and there are some boundaries that are sane to set up between me and my bank account manager.

But more importantly, my user experience of spelling address@benj.me out loud was terrible, especially on the phone:

benj.me
yes, like the 4 first letters of Benjamin, B, E, N, J.
no, not G, but J
no I, by the way
no, not M, but N, like Nadine
dot me, not mi ^[2]

And later I would not receive the email, because they sent it to benji.me, or benjamin.me, or benj.ne. Uhhh.

I’ve had enough of that, and wanted to have a new domain name that sounded serious and would say “Now you shall call me Sir, or else! I have a tie and I won’t hesitate using it (conditions apply)”, and at the same time be unambiguous to spell out loud, both in French and in English. bouvier.cc was the perfect match for these two conditions: includes my name (which is quite common in France, so still too not targeting me ^[3]), and dot cc may be a bit surprising, but at least people don’t ask me to repeat it multiple times. Mission accomplished 💪

Migrating from Pelican 🐍 to Zola 🦀 because now I’m a zealot

On benj.me, where my previous blog was hosted, I was using the blog engine Pelican, originally authored by a friend and former colleague Alexis Metaireau. Not only it was doing its job perfectly, but its name was also based on the anagram of calepin, French for a small notebook, and I find a good pun is always the cherry on top of software that does a great job at what it’s supposed to be doing. Over the years, I’ve started to become an irritating Rust 🦀 Evangelism 🙏 Task 🚀 Force 😎 member though, so I’ve moved over to Zola, which brought a few other benefits. More about this in a previous blog post.

There showed the first backwards compatibility issue. I had my Pelican-deployed blog at blog.benj.me, and I really wanted the URLs to still be valid, because URLs are one of the most important things to preserve on the Internet: they tend to be visited quite a long time after they’ve been created, people create links to them so they’d better be valid to not break all existing links — while some people may spent a bit of time if they’re really interested in a post, most people would likely just close the tab if they ran into a 404 error “page not found”. So, easy, right? I just have to use an Nginx redirect rule to redirect all my posts over to the new domain, right? 👧😁

😏😏😏

RIGHT? 👧🥴

server {
	# (other non interesting details removed from this excerpt)
	
	server_name blog.benj.me;
	
	return 301 $scheme://bouvier.cc$request_uri;
}

Now, the thing is that my Pelican included the post’s date in the URL. For instance, a post on my previous blog was hosted at https://blog.benj.me/2021/02/17/cranelift-codegen-primer. Zola does not include this URL — or more precisely, maybe it can do it, but I decided I wanted to drop the dates from URLs, to make my posts look more mysterious, and my URLs look shorter and cooler. So the same post URL, generated with Zola, would be https://bouvier.cc/cranelift-codegen-primer, after switching the domain name and removing the date information. How do I make sure that all my links remain backwards-compatible, then?

One solution could have been to manually create directories in my Zola source, that reflect the years and months and days structures, for each blog post I had written in the previous blog. Like, for the above example, create a directory called 2021, in which I create another directory 02, in which I create another directory 17, and then I’d put the cranelift-codegen-primer markdown source in there. But that’s a lot of work, for each blog post, and as I said I don’t really care about having the dates in the URLs (they’re still useful to have in the content, to make sure something isn’t wildly outdated, though).

So I looked into a more general and contrived solution, using a more imaginative rewrite rule, that would match any combination of base domain, then 4 numbers (the year), then 2 numbers (the month), the 2 numbers (the day), and redirect all these URLs to the Zola one that doesn’t have the numbers. That is, remove the /2021/02/17 part of the URL in the above example. With Nginx, it would look like this:

server {
	server_name bouvier.cc;

	location /rss {
		# Maintain compatibility with Pelican's feed URL.
		try_files atom.xml $uri $uri/;
		default_type "application/rss+xml";
	}
	
	location / {
		# Maintain compatibility with Pelican URLs.
		rewrite "^/[0-9]{4}/[0-9]{2}/[0-9]{2}/(.*)$" /$1 last;
		try_files $uri $uri/ /index.html;
	}
}

Now for the fun thing part. Not only https://blog.benj.me/2021/02/17/cranelift-codegen-primer will be redirected to https://bouvier.cc/cranelift-codegen-primer, but also will https://blog.benj.me/2025/03/09/cranelift-codegen-primer, or even https://blog.benj.me/1337/42/42/cranelift-codegen-primer. In fact, since digit can be a value from 0 to 9, and there are 8 such digits, that means we have 10**8 = 100,000,000 = 100 million valid URLs for each post.

Yes, you read that right. A one, then 8 zeroes behind it.

And since this rewrite rule doesn’t only apply to old posts, but to new ones as well, this is true of this post as well. Each post on this blog has one hundred million valid URLs, and one extra: the one that doesn’t have any date prefix.

You may not like it, but this is what peak URLing looks like. Growth mindset at its maximum. I got 99,999,999 problems, but keeping URLs alive ain’t one.

Appendix: maintaining URL backwards compatibility for `notes.bouvier.cc`

Now, all the posts on notes.bouvier.cc/XYZ now live in bouvier.cc/notes/XYZ. Compared to the above abomination, this was a rather simple Nginx rewrite rule, with an exception for the Atom feed:

server {
	server_name notes.bouvier.cc;

	# Redirect feed users (with a 301).
    rewrite ^/atom.xml$ https://bouvier.cc/tags/notes/atom.xml permanent;

	# Redirect non feed users.
    return 301 $scheme://bouvier.cc/notes$request_uri;
}

probably the 35 are part of the 48 🥲 ↩
in French, me pronounced in English or mi may sound pretty close. ↩
in the city where I come from, there are at least 15 or 20 other Benjamin Bouvier, so I’m totally incognito there. ↩

First, a nice trick: tracking Atom feed subscribers server-side, with Nginx

Merging notes.bouvier.cc with this blog because ain’t nobody got time for that

Switching domain name because now I’m a grownup

Migrating from Pelican 🐍 to Zola 🦀 because now I’m a zealot

Appendix: maintaining URL backwards compatibility for notes.bouvier.cc

Appendix: maintaining URL backwards compatibility for `notes.bouvier.cc`