Botzilla, a multi-purpose Matrix bot tuned for Mozilla

2020-11-12

In this post I reflect on my personal history of writing chat bots, and then present a panel of features that the bot has, some user-facing ones, some others that embody what I esteem to be a sane, well-behaved Matrix bot.

Over the last year, Mozilla has decided to shut down the IRC network and replace it with a more modern platform. To my greatest delight, the Matrix ecosystem has been selected among all the possible replacements. For those who might not know Matrix, it’s a modern, decentralized protocol, using plain HTTP JSON-formatted endpoints, well-documented, and it implements both features that are common in recent messaging systems (e.g. file attachments, message edits and deletions), as well as those needed to handle large groups (e.g. moderation tools, private rooms, invite-only rooms).

but first, some history

Back in 2014 when I was an intern at Mozilla, I made a silly IRC JavaScript bot that would quote the @horsejs twitter account, when asked to do so. Then a few other useless features were added: “karma” tracking ^[1], being a karma guardian angel (lowering the karma of people lowering the karma of some predefined people), keeping track of contextless quotes from misc people…

Over time, it slowly transformed into an IRC bot framework, with modules you could attach and configure at startup, setting which rooms the bot would join, what should be the cooldowns for message sending (more on this later), and so much more! Hence it was renamed meta-bot.

an aside on the morality of bots

I find making bots a fun activity, since once you’ve passed the step of connecting and sending messages, the rest is mostly easy (cough cough regular expressions cough cough) and creative work. And it’s unfortunately easy to be reckless too.

At this time, I never considered the potentially bad effects of quoting text from a random source, viz. fetching tweets from the @horsejs account. If the source would return a message that was inconsiderate, rude, or even worse, aggressive, then the bot would replicate this behavior. It is a real issue because although the bot doesn’t think by itself and doesn’t mean any harm, its programmers can do better, and they should try to avoid these issues at all costs. A chat bot replicates the culture of the engineers who made it on one hand, but also contributes to propagating this culture in the chat rooms it participates in, normalizing it to the chat participants.

My bot happened to be well-behaved most of the time… until one time where it was not. After noticing the incident and expressing my deepest apologies, I deactivated the module and went through the whole list of modules, to make sure none could cause any harm, in any possible way. I should have known better in the first place! I am really not trying to signal my own virtue, since I failed in a way that should have been predictable. I hope by writing this that other people may reflect about the actions of their bots as well, in case they could be misbehaving like this.

the former fleet of mozilla bots

There were a few other useful IRC bots (of which I wasn’t the author) hanging out in the Mozilla IRC rooms, notably Firebot and mrggigles. The latter probably started as a joke too, to enumerate puns from a list in the JavaScript channel. Then it outgrew its responsibilities by helping with a handful of requests: who can review this or this file in Mozilla’s source code? what’s the status of the continuous integration trees? can this particular C++ function used in Gecko cause a garbage collection?

When we moved over to Matrix, the bots unfortunately became outdated, since the communication protocol (IRC) they were using was different. We could have ported them to the Matrix protocol, but the Not-Invented-Here syndrom was strong with this one: I’ve been making bots for a while, and I was personally interested in the Matrix protocol and trying out the JS facilities offered by the Matrix ecosystem.

Botzilla features

So I’ve decided to write Botzilla, a successor in spirit to meta-bot and mrgiggles, written in TypeScript. This is a very unofficial bot, tailored for Mozilla’s needs but probably useful in other contexts. I’ve worked on it informally as a side-project, on my copious spare time. Crafting tools that show useful to other people has been sufficient a reward to motivate me to work on it, so it’s been quite fun!

Botzilla’s logo, courtesy of Nical

Let’s take a look at all the features that the bot offers, at this point.

uuid: Generate unique IDs

This was a feature of Firebot, and easy enough to replicate, so this was the test feature for the Matrix bot. When saying !uuid, the bot will automatically generate a unique id (using uuid v4), guaranteed GMO-free and usable in any context that would require it. This was the first module, designed to test the framework.

Demo of uuid

treestatus: Inform about CI tree status

Mozilla developers tend to interact a lot with the continuous integration trees, because code is sometimes landed, sometimes backed out (sorry/thank you sheriffs!), sometimes merged across branches. This leads to the integration trees being closed. Before we had the feature to automatically land patch stacks when the trees reopened, it was useful to be able to get the open/close status of a tree. Asking !treestatus will answer with a list of the status of some common trees. It is also possible to request the status of a particular tree, e.g. for the “mozilla-central” tree, by asking !treestatus mozilla-central (or just central, as a handy shortcut).

Demo of treestatus

Expand bug status

If you have ever interacted with Mozilla’s code, there’s chances that you’ve used Bugzilla, and mentioned bug numbers in conversations. The bot caches any message containing bug XXX and will respond with a link to this bug, the nickname of the person assigned to this bug if there’s one, and the summary of this bug, if it’s public. This is by far the most used and useful module, since it doesn’t require a special incantation, but will react automatically to a lot of messages written with no particular intent (see below where it’s explained how to not be spammy, though).

Who Can Review X?

This was a very nice feature that mrgiggles had: ask for potential reviewers for a particular file in the Gecko source tree and get a list of most recent reviewers. Botzilla replicates this, when seeing the trigger: who can review js/src/wasm/WasmJS.cpp?. The list of potential reviewers is extracted from Mercurial logs, looking for the N last reviewers of this particular file.

As a bonus, there’s no need to pass the full path to the file, if the file’s name is unique in the tree’s source code. Botzilla will trigger a search in Searchfox, and will use the unique name in the result list, if there’s such a unique result. The previous example thus can be shortened to who can review WasmJS.cpp? since the file’s name is unique in the whole code base.

Demo of who can review

{Github,Gitlab} {issues,{P,M}Rs}

It is possible for a room administrator to “connect” a given Matrix room to a Github repository. Later on, any mention of issues or pull requests by their number, e.g. #1234, will make Botzilla react with the summary and a link to the issue/PR at stake.

This also works for Gitlab repositories, with slight differences: the administrator has to precise what’s the root URL of the Gitlab instance (since Gitlab can be selfhosted). Issues are caught when numbers follows a # sign, while merge requests are caught when the numbers follow a ! sign.

Demo of gitlab

!tweet/!toot: Post on Twitter/Mastodon

An administrator can configure a room to tie it up to a Twitter (respectively Mastodon) user account, using API tokens. Then, any person with an administrative role can post messages with !tweet something shocking for the bird site(respectively !toot something heartful for the mammoth site). This makes it possible to allow other people to post on these social networks without the need to give them the account’s password.

Unfortunately, the Twitter module hasn’t ever been tested, since when I’ve tried to create a developer account, Twitter accepted it after a few days but then never displayed the API tokens on the interface. The support also never answered when I asked for help. Thankfully Mastodon can be self-hosted and thus it is easier to test. I’m happy to report that it works quite well!

`confession` and histoire

It is quite common in teams to set up regular standup meetings, where everyone in the team announces what they’ve been working on in the last few days or week. It also strikes me as important for personal recognition, including towards management, to be able to show off (just a bit!) what you’ve accomplished recently, and to remember this when times are harder (see also Julia Evans’ blog post on the topic).

There’s a Botzilla module for this. Every time someone starts a message with confession:, then everything after the colon will be saved in a database (…wait for it!). Then, all the confessions are displayed on the Histoire ^[2] website, with one message feed per user. Note it is possible to send confessions privately to Botzilla (that doesn’t affect the frontend though, which is open and public to all!), or in a public channel. Public channels somehow equate to team members, so channels also get their own pages on the frontend.

Demo of confession

Screenshot of Histoire

Now the fun/cursed part is how all of this works. This was implemented in mrgiggles, and I liked it a lot, since it required no kind of backend or frontend server. How so? By (ab)using Github files as the database and Github pages as the frontend. Sending a confession will trigger a request to a Github endpoint to find a database file segregated by time, then it will trigger another request to create/modify it with the content of the confession. The frontend then uses other requests to public Github APIs to read the confessions before dynamically rendering those. Astute readers will notice that under a lot of confession activity, the bot would be a bit slowed down by Github’s API use rates. In this case, there’s some exponential backoff behavior before trying to re-send unsaved confessions to Github. Overall it works great, and API limitation rates have never quite been a problem.

Intrinsic features: they’re good bots, bront

In addition to all the user-facing features, the bot has a few other interesting attributes that are more relevant to consider from a framework point of view. Hopefully some of these ideas can be useful for other bot authors!

Join All The Rooms!

Every time the bot is invited to a channel, be it public or private, it will join the channel, making it easy to use in general. It was implemented for free by the JS framework I’ve been using, and it is a definitive improvement over the IRC version of the bot.

Sometimes Matrix rooms are upgraded to a new version of the room. The bot will try to join the upgraded room if it can, keeping all its room settings intact during the transition.

Thou shalt not spam

To avoid spamming the channel, especially for modules that are reactions to other messages (think: bug numbers, issues/pull requests mentions), the bot has had to learn how to keep quiet. There are two rules triggering the quieting behavior:

if the bot has already reacted less than N minutes ago (where N is a configurable amount) in the same room,
or if it has already reacted to some entity in a message, and there’s been fewer than M messages in between the last reaction and the last message mentioning the same entity in the same room (M is also configurable)

If any of these two criteria is met, then the bot will keep quiet and it will not react to another similar message. The combination of these two has proven over time to be quite solid in my experience, based on observing the bot’s behavior and public reactions to its behavior.

Some similar mechanism is used for the confession module: on a first confession, the bot will answer with a message saying it has seen the confession, including a link to where it is going to be posted, and will add an emoji “eyes” reaction to the message. Posting this long form message could be quite spammy, if there’s a lot of confessions around the same time. Under the same criteria, it will just react with an “eyes” emoji to other confessions. Later on, it’ll resend the full message, once both criterias aren’t blocking it from doing so.

Decentralized administration self-service

The bot can be administrated, by discussing with it using the !admin command. This can happen in both a private conversation with it, or in public channels, yet it is recommended to do so in private channels. To confirm that an admin action has succeeded, it’ll use the thumbs-up emoji on the message doing the particular action.

To have a single administrator for the bot would be quite the burden, and it is not resilient to people switching roles, leaving the company, etc. Normally you’d solve this by implementing your own access control lists. Fortunately, Matrix already has a concept of power levels that assigns roles to users, among which there are the administrator and moderator roles.

The bot will rely on this to decide to which requests it will answer. Somebody marked as an administrator or a moderator of a room can administrate Botzilla in this particular room, using the !admin commands. There’s still a super-admin role, that must be defined in the configuration, in case things go awry. While administrators only have power over the current room, a super-admin can use its super-powers to change anything in any room. This decentralization of the administrative roles makes it easy to have different settings for different rooms, and to rely a bit less on single individuals.

Key-value store

In general, the bot contains a key-value store implemented in an sqlite database, making it easy to migrate and add context that’s preserved across restarts of the bot. This is used to store private information like user repository information and settings for most rooms. Conceptually, each pair of room and module has its own key-value store, so that there’s no risk of confusion between different rooms and modules. There’s also a key-value per-module store that’s applicable to all the rooms, to represent global settings. If there’s some non-global (per room) settings for a room, these are preferred over the global settings.

Self-documentation

Each chat module is implemented as a ECMAScript module and must export an help string along the main reaction function. This is then captured and aggregated as part of an !help command, that can be used to request help about usage of the bot. The main help message will display the list of all the enabled modules, and help about a specific module may be queried with e.g. !help uuid.

Future work and conclusion

If I were to start again, I’d do a few things differently:

now that the Rust ecosystem around the Matrix platform has matured a bit, I’d probably write this bot in Rust. Starting from JavaScript and moving to TypeScript has helped me catch a few static issues. I’d expect moving to Rust would help handling Matrix events faster, provide end-to-end encryption support for free, and be quite pleasant to use in general thanks to the awesome Rust tooling.
use a real single-page app framework for the Histoire website. Maybe? I mean I’m a big fan of VanillaJS, but using it means re-creating your own Web framework like thing to make it nice and productive to use.
despite being a fun hack, using Github as a backend has algorithmic limitations, that can make the web app sluggish. In particular, a combined feed for N users on M eras (think: periods) will trigger NxM Github API requests. Using a plain database with a plain API would probably be simpler at this point. This is mitigated with an in-memory cache so only the first time all the requests happen, but crafting my own requests would be more expressive and efficient, and allow for more features too (like displaying the list of rooms on the start view).
provide a (better) commands parser. Regular expressions in this context are a bit feeble and limited. Also right now each module could in theory reuse the same command triggers as another one, etc.
implement the chat modules in WebAssembly :-) In fact, I think there’s a whole business model which would consist in having the bot framework including a wasm VM, and interacting with different communication platforms (not restricted to Matrix). Developers in such a bot platform could choose which source language to use for developing their own modules. It ought to be possible to define a clear, restricted, WASI-like capabilities-based interface that gets passed to each chat module. In such a sandboxed environment, the responsibility for hosting the bot’s code is decoupled from the responsibility of writing modules. So a company could make the platform available, and paying users would develop the modules and host them. Imagine git pushing your chat modules and they get compiled to wasm and deployed on the fly. But I digress! (Please do not forget to credit me with a large $$$ envelope/a nice piece of swag if implementing this at least multi-billion dollars idea.)

I’d like to finish by thanking the authors of the previous Mozilla bots, namely sfink and glob: your puppets have been incredible sources of inspiration. Also huge thanks to the people hanging in the matrix-bot-sdk chat room, who’ve answered questions and provided help in a few occasions.

I hope you liked this presentation of Botzilla and its features! Of course, all the code is free and open-source, including the bot as well as the histoire frontend. At this point it is addressing most of the needs I had, so I don’t have immediate plans to extend it further. I’d happily take contributions, though, so feel free to chime in if you’d like to implement anything! It’s also a breeze to run on any machine, thanks to Docker-based deployment. Have fun with it!

Karma is an IRC idiosyncrasy, in which users rate up and down other users using their nickname suffixed with ++ or –. Karma tracking consists in keeping scores and displaying those. ↩
Histoire is the French for “history” and “story”. Inherited from Steve Fink’s very own mrgiggles :-) ↩