<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
    <title>@bnjbvr - mozilla</title>
    <subtitle>Technical blog and random musings.</subtitle>
    <link rel="self" type="application/atom+xml" href="https://bouvier.cc/tags/mozilla/atom.xml"/>
    <link rel="alternate" type="text/html" href="https://bouvier.cc"/>
    <generator uri="https://www.getzola.org/">Zola</generator>
    <updated>2021-02-17T19:00:42+00:00</updated>
    <id>https://bouvier.cc/tags/mozilla/atom.xml</id>
    <entry xml:lang="en">
        <title>A primer on code generation in Cranelift</title>
        <published>2021-02-17T19:00:42+00:00</published>
        <updated>2021-02-17T19:00:42+00:00</updated>
        
        <author>
          <name>
            
              Benjamin Bouvier
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://bouvier.cc/tech/cranelift-codegen-primer/"/>
        <id>https://bouvier.cc/tech/cranelift-codegen-primer/</id>
        <content type="html" xml:base="https://bouvier.cc/tech/cranelift-codegen-primer/">&lt;script src=&quot;https:&#x2F;&#x2F;cdn.jsdelivr.net&#x2F;npm&#x2F;mermaid&#x2F;dist&#x2F;mermaid.min.js&quot;&gt;&lt;&#x2F;script&gt;
&lt;script&gt;mermaid.initialize({startOnLoad:true});&lt;&#x2F;script&gt;
&lt;p&gt;&lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;bytecodealliance&#x2F;wasmtime&#x2F;tree&#x2F;main&#x2F;cranelift#cranelift-code-generator&quot;&gt;Cranelift&lt;&#x2F;a&gt; is a code generator written in the Rust programming language that aims to be a fast code generator, which outputs machine code that runs at reasonable speeds.&lt;&#x2F;p&gt;
&lt;span id=&quot;continue-reading&quot;&gt;&lt;&#x2F;span&gt;
&lt;p&gt;The Cranelift compilation model consists in compiling functions one by one, holding extra information about external entities, like external functions, memory addresses, and so on. This model allows for concurrent and parallel compilation of individual functions, which supports the goal of fast compilation. It was designed this way to allow for just-in-time (JIT) compilation of WebAssembly binary code in Firefox, although its scope has broadened a bit. Nowadays it is used in a few different WebAssembly runtimes, including &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;bytecodealliance&#x2F;wasmtime#wasmtime&quot;&gt;Wasmtime&lt;&#x2F;a&gt; and &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;wasmer.io&#x2F;&quot;&gt;Wasmer&lt;&#x2F;a&gt;, but also as an alternative backend for Rust debug compilation, thanks to &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;bjorn3&#x2F;rustc_codegen_cranelift&quot;&gt;cg_clif&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;A classic compiler design usually includes running a parser to translate the source to some form of intermediate representations, then run optimization passes onto them, then feeds this to the machine code generator.&lt;&#x2F;p&gt;
&lt;p&gt;This blog post focuses on the final step, namely the concepts that are involved in code generation, and what they map to in Cranelift. To make things more concrete, we’ll take a specific instruction, and see how it’s translated, from its creation down to code generation. At each step of the process, I’ll provide a short (&lt;em&gt;ahem&lt;&#x2F;em&gt;) high-level explanation of the concepts involved, and I’ll show what they map to in Cranelift, using the example instruction. While this is not a tutorial detailing how to add new instructions in Cranelift, this should be an interesting read for anyone who’s interested in compilers, and this could be an entry point if you’re interested in hacking on the Cranelift &lt;code&gt;codegen&lt;&#x2F;code&gt; crate.&lt;&#x2F;p&gt;
&lt;p&gt;This is our plan for this blog post: each squared box represents data, each
rounded box is a process. We’re going to go through each of them below.&lt;&#x2F;p&gt;
&lt;div class=&quot;mermaid&quot;&gt;
graph TD;
    clif[Optimized CLIF];
    vcode[VCode];
    final_vcode[Final VCode];
    machine_code[Machine code artifacts];
    lowering([Lowering]);
    regalloc([Register allocation]);
    codegen([Machine code generation]);
    clif --&gt; lowering --&gt; vcode --&gt; regalloc --&gt; final_vcode --&gt; codegen --&gt; machine_code
&lt;&#x2F;div&gt;
&lt;h2 id=&quot;intermediate-representations&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#intermediate-representations&quot; aria-label=&quot;Anchor link for: intermediate-representations&quot;&gt;🔗&lt;&#x2F;a&gt;Intermediate representations&lt;&#x2F;h2&gt;
&lt;p&gt;Compilers use &lt;strong&gt;intermediate representations&lt;&#x2F;strong&gt; (&lt;em&gt;IR&lt;&#x2F;em&gt;) to represent source code. Here we’re interested in representations of the &lt;em&gt;data flow&lt;&#x2F;em&gt;, that is instructions themselves and only that. The IRs contain information about the instructions themselves, their operands, type specialization information, and any additional metadata that might be useful. IRs usually map to a certain level of abstraction, and as such, they are useful for solving different problems that require different levels of abstraction. Their shape (which data structures) and numbers often have a huge impact on the performance of the compiler itself (that is, how fast it is at compiling).&lt;&#x2F;p&gt;
&lt;p&gt;In general, most programming languages use IRs internally, and yet, these are invisible to the programmers. The reason is that source code is usually first &lt;em&gt;parsed&lt;&#x2F;em&gt; (tokenized, verified) and then translated into an IR. The &lt;em&gt;abstract syntax tree&lt;&#x2F;em&gt;, aka AST, is one such IR representing the source code itself, in a format that’s very close to the source code itself. Since the raison d’être of Cranelift is to be a code generator, having a text format is secondary, and only useful for testing and debugging purposes. That’s why embedders directly create and manipulate Cranelift’s IR.&lt;&#x2F;p&gt;
&lt;p&gt;At the time of writing, Cranelift has two IRs to represent the function’s code:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;one external, high-level intermediate representation, called &lt;strong&gt;CLIF&lt;&#x2F;strong&gt; (for &lt;em&gt;Cranelift IR format&lt;&#x2F;em&gt;),&lt;&#x2F;li&gt;
&lt;li&gt;one internal, low-level intermediate representation called &lt;strong&gt;VCode&lt;&#x2F;strong&gt; (for &lt;em&gt;virtual-registerized code&lt;&#x2F;em&gt;).&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h2 id=&quot;clif-ir&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#clif-ir&quot; aria-label=&quot;Anchor link for: clif-ir&quot;&gt;🔗&lt;&#x2F;a&gt;CLIF IR&lt;&#x2F;h2&gt;
&lt;p&gt;CLIF is the IR that Cranelift embedders create and manipulate. It consists of high-level typed operations that are convenient to use and&#x2F;or can be simply translated to machine code. It is in &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Static_single_assignment_form&quot;&gt;static single assignment (SSA) form&lt;&#x2F;a&gt;: each value referenced by an operation (SSA value) is defined only once, and may have as many uses as desired. CLIF is practical to use and manipulate for classic compilers optimization passes (e.g. &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Loop-invariant_code_motion&quot;&gt;LICM&lt;&#x2F;a&gt;), as it is generic over the target architecture which we’re compiling to.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #A9B1D6; background-color: #1A1B26;&quot;&gt;&lt;code data-lang=&quot;rust&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;let&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; x&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; =&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; builder&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;ins&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;().&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;iconst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #0DB9D7;&quot;&gt;types&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;I64&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt; 42&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;);&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;let&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; y&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; =&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; builder&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;ins&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;().&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;iconst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #0DB9D7;&quot;&gt;types&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;I64&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt; 1337&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;);&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;let&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; sum&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; =&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; builder&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;ins&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;().&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;iadd&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;x&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; y&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;);&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;An example of Rust code that would generate CLIF IR: using an IR builder, two constant 64-bits integer SSA values x and y are created, and then added together. The result is stored into the &lt;code&gt;sum&lt;&#x2F;code&gt; SSA value, which can then be consumed by other instructions.&lt;&#x2F;p&gt;
&lt;p&gt;The code for the IR builder we’re manipulating above is automatically generated by the &lt;code&gt;cranelift-codegen&lt;&#x2F;code&gt; build script. The build script uses a domain specific &lt;em&gt;meta&lt;&#x2F;em&gt; language (DSL)&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-2-1&quot;&gt;&lt;a href=&quot;#fn-2&quot;&gt;1&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt; that defines the instructions, their input and output operands, which input types are allowed, how the output type is inferred, etc. We won’t take a look at this &lt;em&gt;today&lt;&#x2F;em&gt;: this is a bit too far from code generation, but this could be material for another blog post.&lt;&#x2F;p&gt;
&lt;p&gt;As an example of a full-blown CLIF generator, there is &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;bytecodealliance&#x2F;wasmtime&#x2F;tree&#x2F;main&#x2F;cranelift&#x2F;wasm&quot;&gt;a crate&lt;&#x2F;a&gt; in the Cranelift project that allows translating from the WebAssembly binary format to CLIF. The Cranelift backend for Rustc uses its own CLIF generator that translates from one of the Rust compiler’s IRs.&lt;&#x2F;p&gt;
&lt;p&gt;Finally, it’s time to reveal what’s going to be our running example! The Chosen One is the &lt;code&gt;iadd&lt;&#x2F;code&gt; CLIF operation, which allows to add two integers of any length together, with wrapping semantics. It is both simple to understand what it does, and exhibits interesting behaviors on the two architectures we’re interested in. So, let’s continue down the pipeline!&lt;&#x2F;p&gt;
&lt;h2 id=&quot;vcode-ir&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#vcode-ir&quot; aria-label=&quot;Anchor link for: vcode-ir&quot;&gt;🔗&lt;&#x2F;a&gt;VCode IR&lt;&#x2F;h2&gt;
&lt;p&gt;Later on, the CLIF intermediate representation is &lt;em&gt;lowered&lt;&#x2F;em&gt;, i.e. transformed from a high-level one into a lower-level one. Here lower level means a form more specialized for a machine architecture. This lower IR is called &lt;em&gt;VCode&lt;&#x2F;em&gt; in Cranelift. The values it references are called &lt;em&gt;virtual registers&lt;&#x2F;em&gt; (more on the &lt;em&gt;virtual&lt;&#x2F;em&gt; bit below). They’re not in SSA form anymore: each virtual register may be redefined as many times as we want. This IR is used to encode register allocation constraints and it guides machine code generation. As a matter of fact, since this information is tied to the machine code’s representation itself, this IR is also target-specific: there’s one flavor of VCode per each CPU architecture we’re compiling to.&lt;&#x2F;p&gt;
&lt;p&gt;Let’s get back to our example, that we’re going to compile on two instruction set architectures:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;ARM 64-bits (aka aarch64), which is used in most mobile devices but start to become mainstream on laptops (Apple’s Mac M1, some Chromebooks)&lt;&#x2F;li&gt;
&lt;li&gt;Intel’s x86 64-bits (aka x86_64, also abbreviated x64), which is used in most desktop and laptop machines).&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;An integer addition machine instruction on aarch64 will take three operands: two input operands (one of which must be a register), and another third output register operand. While on the x86_64 architecture, the equivalent instruction involves a total of two registers: one that is a read-only source register, and another that is an in-out modified register, containing both the second source and the destination register. We’ll get back to this.&lt;&#x2F;p&gt;
&lt;p&gt;So considering &lt;code&gt;iadd&lt;&#x2F;code&gt;, let’s look at (one of&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-4-1&quot;&gt;&lt;a href=&quot;#fn-4&quot;&gt;2&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;) the VCode instruction that’s used to represent integer additions on aarch64 (as defined in &lt;code&gt;cranelift&#x2F;codegen&#x2F;src&#x2F;isa&#x2F;aarch64&#x2F;inst&#x2F;mod.rs&lt;&#x2F;code&gt;):&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #A9B1D6; background-color: #1A1B26;&quot;&gt;&lt;code data-lang=&quot;rust&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;&#x2F;&#x2F;&#x2F; An ALU operation with two register sources and a register destination.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;AluRRR&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;    alu_op&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; ALUOp&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;    rd&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; Writable&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;&amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;Reg&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;&amp;gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;    rn&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; Reg&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;    rm&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; Reg&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;},&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Some details here:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;alu_op&lt;&#x2F;code&gt; defines the sub-opcode used in the ALU (Arithmetic Logic Unit). It will be &lt;code&gt;AluOp::Add64&lt;&#x2F;code&gt; for a 64-bits integer addition.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;rn&lt;&#x2F;code&gt; and &lt;code&gt;rm&lt;&#x2F;code&gt; are the conventional aarch64 names for the two input registers.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;rd&lt;&#x2F;code&gt; is the destination register. See how it’s marked as &lt;code&gt;Writable&lt;&#x2F;code&gt;, while the two others are not? &lt;code&gt;Writable&lt;&#x2F;code&gt; is a plain Rust wrapper that makes sure that we &lt;em&gt;can&lt;&#x2F;em&gt; statically differentiate read-only registers from writable registers; a neat trick that allows us to catch more issues at compile-time.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;All this information is directly tied to the machine code representation of an addition instruction on aarch64: each field is later used to select some bytes that will be generated during code generation.&lt;&#x2F;p&gt;
&lt;p&gt;As said before, the VCode is specific to each architecture, so x86_64 has a different VCode representation for the same instruction (as defined in &lt;code&gt;cranelift&#x2F;codegen&#x2F;src&#x2F;isa&#x2F;x64&#x2F;inst&#x2F;mod.rs&lt;&#x2F;code&gt;):&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #A9B1D6; background-color: #1A1B26;&quot;&gt;&lt;code data-lang=&quot;rust&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;&#x2F;&#x2F;&#x2F; Integer arithmetic&#x2F;bit-twiddling: (add sub and or xor mul adc? sbb?) (32 64) (reg addr imm) reg&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;AluRmiR&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;    is_64&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; bool&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;    op&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; AluRmiROpcode&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;    src&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; RegMemImm&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;    dst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; Writable&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;&amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;Reg&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;&amp;gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;},&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Here, the sub-opcode is defined as part of the &lt;code&gt;AluRmiROpcode&lt;&#x2F;code&gt; enum (the comment hints at which other x86 machine instructions are generated by this same VCode). See how there’s only one &lt;code&gt;src&lt;&#x2F;code&gt; (source) register (or memory or immediate operand), while the instruction conceptually takes two inputs? That’s because it’s expected that the &lt;code&gt;dst&lt;&#x2F;code&gt; (destination) register is &lt;em&gt;modified&lt;&#x2F;em&gt;, that is, both read (so it’s the second input operand) and written to (so it’s the result register). In equivalent C code, the x86’s add instruction doesn’t actually do &lt;code&gt;a = b + c&lt;&#x2F;code&gt;. What it does is &lt;code&gt;a += b&lt;&#x2F;code&gt;, that is, one of the sources is &lt;em&gt;consumed&lt;&#x2F;em&gt; by the instruction. This is an artifact inherited from the design of older x86 machines in the 1970’s, when instructions were designed around an accumulator model (and representing efficiently three operands in a CISC architecture would make the encoding larger and harder than it is).&lt;&#x2F;p&gt;
&lt;h2 id=&quot;instruction-selection-lowering&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#instruction-selection-lowering&quot; aria-label=&quot;Anchor link for: instruction-selection-lowering&quot;&gt;🔗&lt;&#x2F;a&gt;Instruction selection (lowering)&lt;&#x2F;h2&gt;
&lt;p&gt;As said before, converting from the high-level IR (CLIF) to the low-level IR (VCode) is called lowering. Since VCode is target-dependent, this process is also target-dependent. That’s where we consider which machine instructions get eventually used for a given CLIF opcode. There are many ways to achieve the same machine state results for given semantics, but some of these ways are faster than other, and&#x2F;or require fewer code bytes to achieve. The problem can be summed up like this: given some CLIF, which VCode can we create to generate the fastest and&#x2F;or smallest machine code that carries out the desired semantics? This is called &lt;em&gt;instruction selection&lt;&#x2F;em&gt;, because we’re selecting the VCode instructions among a set of different possible instructions.&lt;&#x2F;p&gt;
&lt;p&gt;How do these IR map to each other? A given CLIF node may be lowered into 1 to N VCode instructions. A given VCode instruction may lead to the code generation of 1 to M machine instructions. There are no rules governing the maximum of entities mapped. For instance, the integer addition CLIF opcode &lt;code&gt;iadd&lt;&#x2F;code&gt; on 64-bits inputs maps to a single VCode instruction on aarch64. The VCode instruction then causes a single code instruction to be generated.&lt;&#x2F;p&gt;
&lt;p&gt;Other CLIF opcodes may generate more than a single machine instruction eventually. Consider the CLIF opcode for signed integer division &lt;code&gt;idiv&lt;&#x2F;code&gt;. Its semantics define that it traps for zero inputs and in case of integer overflow&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-3-1&quot;&gt;&lt;a href=&quot;#fn-3&quot;&gt;3&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;. On aarch64, this is lowered into:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;one VCode instruction that checks if the input is zero and trap otherwise&lt;&#x2F;li&gt;
&lt;li&gt;two VCode instructions for comparing the input values against the minimal integer value and -1&lt;&#x2F;li&gt;
&lt;li&gt;one VCode instruction to trap if the two input values match what we checked against&lt;&#x2F;li&gt;
&lt;li&gt;and one VCode instruction that does the actual division operation.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Each of these VCode instruction then generates one or more machine code instructions, resulting in a bit of a longer sequence.&lt;&#x2F;p&gt;
&lt;p&gt;Let’s look at the lowering of &lt;code&gt;iadd&lt;&#x2F;code&gt; on aarch64 (in &lt;code&gt;cranelift&#x2F;codegen&#x2F;src&#x2F;isa&#x2F;aarch64&#x2F;lower_inst.rs&lt;&#x2F;code&gt;), edited and simplified for clarity. I’ve added comments in the code, explaining what each line does:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #A9B1D6; background-color: #1A1B26;&quot;&gt;&lt;code data-lang=&quot;rust&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;Opcode&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;Iadd&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; =&amp;gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;    &#x2F;&#x2F; Get the destination register.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;    let&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rd&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; =&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt; get_output_reg&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;ctx&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; outputs&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;[&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt;0&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;]).&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;only_reg&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;().&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;unwrap&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;();&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;    &#x2F;&#x2F; Get the controlling type of the addition (32-bits int or 64-bits int or&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;    &#x2F;&#x2F; int vector, etc.).&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;    let&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; ty&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; =&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; ty&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;unwrap&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;();&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;    &#x2F;&#x2F; Force one of the inputs into a register, not applying any signed- or&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;    &#x2F;&#x2F; zero-extension.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;    let&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rn&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; =&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt; put_input_in_reg&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;ctx&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; inputs&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;[&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt;0&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;],&lt;&#x2F;span&gt;&lt;span style=&quot;color: #0DB9D7;&quot;&gt; NarrowValueMode&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;None&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;);&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;    &#x2F;&#x2F; Try to see if we can encode the second operand as an immediate on&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;    &#x2F;&#x2F; 12-bits, maybe by negating it;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;    &#x2F;&#x2F; Otherwise, put it into a register.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;    let&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; (&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;rm&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; negated&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;) =&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt; put_input_in_rse_imm12_maybe_negated&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;        ctx&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;        inputs&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;[&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt;1&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;],&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;        ty_bits&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;ty&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;),&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #0DB9D7;&quot;&gt;        NarrowValueMode&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;None&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;    );&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;    &#x2F;&#x2F; Select the ALU subopcode, based on possible negation and controlling&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;    &#x2F;&#x2F; type.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;    let&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; alu_op&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; =&lt;&#x2F;span&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt; if !&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;negated&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;        choose_32_64&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;ty&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #0DB9D7;&quot;&gt; ALUOp&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;Add32&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #0DB9D7;&quot;&gt; ALUOp&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;Add64&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;    }&lt;&#x2F;span&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt; else&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;        choose_32_64&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;ty&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #0DB9D7;&quot;&gt; ALUOp&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;Sub32&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #0DB9D7;&quot;&gt; ALUOp&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;Sub64&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;    };&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;    &#x2F;&#x2F; Emit the VCode instruction in the VCode stream.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;    ctx&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;emit&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;alu_inst_imm12&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;alu_op&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rd&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rn&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rm&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;));&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;In fact, the &lt;code&gt;alu_inst_imm12&lt;&#x2F;code&gt; wrapper can create one VCode instruction among a set of possible ones (since we’re trying to select &lt;em&gt;the best one&lt;&#x2F;em&gt;). For the sake of simplicity, we’ll assume that &lt;code&gt;AluRRR&lt;&#x2F;code&gt; is going to be generated, i.e. the selected instruction is the one using only register encodings for the input values.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;register-allocation&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#register-allocation&quot; aria-label=&quot;Anchor link for: register-allocation&quot;&gt;🔗&lt;&#x2F;a&gt;Register allocation&lt;&#x2F;h2&gt;
&lt;div class=&quot;mermaid&quot;&gt;
graph TD
    vcode_vreg[VCode with virtual registers]
    regalloc([Register allocation])
    vcode_rreg[VCode with real registers]
    codegen([Code generation])
    machine_code(Machine code)
    vcode_vreg --&gt; regalloc --&gt; vcode_rreg --&gt; codegen --&gt; machine_code
&lt;&#x2F;div&gt;
&lt;h3 id=&quot;vcode-registers-and-stack-slots&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#vcode-registers-and-stack-slots&quot; aria-label=&quot;Anchor link for: vcode-registers-and-stack-slots&quot;&gt;🔗&lt;&#x2F;a&gt;VCode, registers and stack slots&lt;&#x2F;h3&gt;
&lt;p&gt;Hey, ever wondered what the V in VCode meant? Back to the drawing board. While a program may reference a theoretically unlimited number of instructions, each referencing a theoretically unlimited number of values as inputs and outputs, the physical machine only has a fixed set of containers for those values:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;either they must live in machine &lt;strong&gt;registers&lt;&#x2F;strong&gt;: very fast to access in the CPU, take some CPU real estate, thus are costly, so there are usually few of them.&lt;&#x2F;li&gt;
&lt;li&gt;or they must live in the process’ &lt;strong&gt;stack memory&lt;&#x2F;strong&gt;: it’s slower to access, but we can have virtually any amount of stack &lt;em&gt;slots&lt;&#x2F;em&gt;.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #A9B1D6; background-color: #1A1B26;&quot;&gt;&lt;code data-lang=&quot;asm&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;mov&lt;&#x2F;span&gt;&lt;span&gt; %&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt;edi&lt;&#x2F;span&gt;&lt;span&gt;,-&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt;0x4&lt;&#x2F;span&gt;&lt;span&gt;(%&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt;rbp&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;mov&lt;&#x2F;span&gt;&lt;span&gt; %&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt;rsi&lt;&#x2F;span&gt;&lt;span&gt;,-&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt;0x10&lt;&#x2F;span&gt;&lt;span&gt;(%&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt;rbp&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;mov&lt;&#x2F;span&gt;&lt;span&gt; -&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt;0x4&lt;&#x2F;span&gt;&lt;span&gt;(%&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt;rbp&lt;&#x2F;span&gt;&lt;span&gt;),%&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt;eax&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;em&gt;In this example of x86 machine code, %edi, %rsi, %rbp, %eax are all registers; stack slots are memory addresses computed as the frame pointer (%rbp) plus an offset value (which happens to be negative here). Note that stack slots may be referred to by the stack pointer (%rsp) in general.&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;h3 id=&quot;defining-the-register-allocation-problem&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#defining-the-register-allocation-problem&quot; aria-label=&quot;Anchor link for: defining-the-register-allocation-problem&quot;&gt;🔗&lt;&#x2F;a&gt;Defining the register allocation problem&lt;&#x2F;h3&gt;
&lt;p&gt;The problem of mapping the IR values (in VCode these are the &lt;code&gt;Reg&lt;&#x2F;code&gt;) to machine “containers” is called &lt;strong&gt;&lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Register_allocation&quot;&gt;register allocation&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt; (aka regalloc). Inputs to register allocation can be as numerous as we want them, and map to “virtual” values, hence we call them &lt;em&gt;virtual registers&lt;&#x2F;em&gt;. And… that’s where the V from VCode comes from: the instructions in VCode reference values that are &lt;em&gt;virtual&lt;&#x2F;em&gt; registers before register allocation, so we say the code is in &lt;em&gt;virtualized&lt;&#x2F;em&gt; register form. The output of register allocation is a set of new instructions, where the virtual registers have been replaced by &lt;em&gt;real registers&lt;&#x2F;em&gt; (the physical ones, limited in quantity) or stack slots references (and other additional metadata).&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #A9B1D6; background-color: #1A1B26;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;&#x2F;&#x2F; Before register allocation, with unlimited virtual registers:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;v2 = v0 + v1&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;v3 = v2 * 2&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;v4 = v2 + 1&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;v5 = v4 + v3&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;return v5&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;&#x2F;&#x2F; One possible register allocation, on a machine that has 2 registers %r0, %r1:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;%r0 = %r0 + %r1&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;%r1 = %r0 * 2&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;%r0 = %r0 + 1&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;%r1 = %r0 + %r1&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;return %r1&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;When all is well, the virtual registers don’t conceptually &lt;em&gt;live&lt;&#x2F;em&gt; at the same time, and they can be put into physical registers. Issues arise when there’s not enough physical registers to contain all the virtual registers that live at the same time, which is the case for… a very large majority of programs. Then, register allocation must decide which registers continue to live in registers at a given program point, and which should be &lt;strong&gt;spilled&lt;&#x2F;strong&gt; into a stack slot, effectively &lt;em&gt;storing&lt;&#x2F;em&gt; them onto the stack for later use. This later reuse will imply to &lt;strong&gt;reload&lt;&#x2F;strong&gt; them from the stack slot, using a &lt;em&gt;load&lt;&#x2F;em&gt; machine instruction. The complexity resides in choosing which registers should be spilled, at which program point they should be spilled, and at which program points we should reload them, if we need to do so. Making good choices there will have a large impact on the speed of the generated code, since memory accesses to the stack imply an additional runtime cost. For instance, a variable that’s frequently used in a hot loop should live in a register for the whole loop’s lifetime, and not be spilled&#x2F;reloaded in the middle of the loop.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #A9B1D6; background-color: #1A1B26;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;&#x2F;&#x2F; Before register allocation, with unlimited virtual registers:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;v2 = v0 + v1&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;v3 = v0 + v2&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;v4 = v3 + v1&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;return v4&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;&#x2F;&#x2F; One possible register allocation, on a machine that has 2 registers %r0, %r1.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;&#x2F;&#x2F; We need to spill one value, because there&amp;#39;s a point where 3 values are live at the same time!&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;spill %r1 --&amp;gt; stack_slot(0)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;%r1 = %r0 + %r1&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;%r1 = %r0 + %r1&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;reload stack_slot(0) --&amp;gt; %r0&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;%r1 = %r1 + %r0&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;return %r1&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;And, since we like to have our cake and eat it too, the register allocator itself should be &lt;em&gt;fast&lt;&#x2F;em&gt;: it should not take an unbounded amount of time to make these allocation decisions. Register allocation has the good taste to be a &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;NP-completeness&quot;&gt;NP-complete&lt;&#x2F;a&gt; problem. Concretely, this means that implementations cannot find the &lt;em&gt;best&lt;&#x2F;em&gt; solutions given arbitrary inputs, but they’ll estimate &lt;em&gt;good&lt;&#x2F;em&gt; solutions based on heuristics, in worst-case quadratic time over the size of the input. All of this makes it so that register allocation has its own whole research field, and has been extensively studied for some time now. It is a fascinating problem.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;register-allocation-in-cranelift&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#register-allocation-in-cranelift&quot; aria-label=&quot;Anchor link for: register-allocation-in-cranelift&quot;&gt;🔗&lt;&#x2F;a&gt;Register allocation in Cranelift&lt;&#x2F;h3&gt;
&lt;p&gt;Back to Cranelift. The register allocation contract is that if a value &lt;em&gt;must&lt;&#x2F;em&gt; live in a real register at a given program point, then it &lt;em&gt;does&lt;&#x2F;em&gt; live where it should (unless register allocation is impossible). At the start of code generation for a VCode instruction, we are guaranteed that the input values live in real registers, and that the output real register is available before the next VCode instruction.&lt;&#x2F;p&gt;
&lt;p&gt;You might have noticed that the VCode instructions only refer to registers, and not stack slots. But where are the stack slots, then? The trick is that the stack slots are &lt;em&gt;invisible&lt;&#x2F;em&gt; to VCode. Register allocation may create an arbitrary number of spills, reloads, and register moves&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-5-1&quot;&gt;&lt;a href=&quot;#fn-5&quot;&gt;4&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt; around VCode instructions, to ensure that their register allocation constraints are met. This is why the output of register allocation is a new list of instructions, that includes not only the initial instructions filled with the actual registers, but also additional spill, reload and move (VCode) instructions added by regalloc.&lt;&#x2F;p&gt;
&lt;p&gt;As said before, this problem is so sufficiently complex, involved and independent from the rest of the code (assuming the right set of interfaces!) that its code lives in a separate crate, &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;bytecodealliance&#x2F;regalloc.rs&quot;&gt;&lt;code&gt;regalloc.rs&lt;&#x2F;code&gt;&lt;&#x2F;a&gt;, with its own fuzzing and testing infrastructure. I hope to shed some light on it at some point too.&lt;&#x2F;p&gt;
&lt;p&gt;What’s interesting to us today is the register allocation &lt;em&gt;constraints&lt;&#x2F;em&gt;. Consider the aarch64 integer add instruction &lt;code&gt;add rd, rn, rm&lt;&#x2F;code&gt;: &lt;code&gt;rd&lt;&#x2F;code&gt; is the output virtual register that’s written to, while &lt;code&gt;rn&lt;&#x2F;code&gt; and &lt;code&gt;rm&lt;&#x2F;code&gt; are the inputs, thus read from. We need to inform the register allocation algorithm about these constraints. In regalloc jargon, “read to” is known as &lt;em&gt;used&lt;&#x2F;em&gt;, while “written to” is known as &lt;em&gt;defined&lt;&#x2F;em&gt;. Here, the aarch64 VCode instruction &lt;code&gt;AluRRR&lt;&#x2F;code&gt; does &lt;em&gt;use&lt;&#x2F;em&gt; &lt;code&gt;rn&lt;&#x2F;code&gt; and &lt;code&gt;rm&lt;&#x2F;code&gt;, and it &lt;em&gt;def&lt;&#x2F;em&gt;ines &lt;code&gt;rd&lt;&#x2F;code&gt;. This usage information is &lt;em&gt;collected&lt;&#x2F;em&gt; in the &lt;code&gt;aarch64_get_regs&lt;&#x2F;code&gt; function (&lt;code&gt;cranelift&#x2F;codegen&#x2F;src&#x2F;isa&#x2F;aarch64&#x2F;inst&#x2F;mod.rs&lt;&#x2F;code&gt;):&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #A9B1D6; background-color: #1A1B26;&quot;&gt;&lt;code data-lang=&quot;rust&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;fn&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt; aarch64_get_regs&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;inst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;: &amp;amp;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;Inst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; collector&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;: &amp;amp;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9D7CD8;font-style: italic;&quot;&gt;mut&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; RegUsageCollector&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;) {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;    match&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; inst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;        &amp;amp;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;Inst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;AluRRR&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; {&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rd&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rn&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rm&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;, .. } =&amp;gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;            collector&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;add_def&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;rd&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;);&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;            collector&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;add_use&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;rn&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;);&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;            collector&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;add_use&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;rm&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;);&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;        }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;        &#x2F;&#x2F; etc.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Then, after register allocation has assigned the physical registers, we need to instruct it how to replace virtual register mentions by physical register mentions. This is done in the &lt;code&gt;aarch64_map_regs&lt;&#x2F;code&gt; function (same file as above):&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #A9B1D6; background-color: #1A1B26;&quot;&gt;&lt;code data-lang=&quot;rust&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;fn&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt; aarch64_map_regs&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;&amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt;RUM&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; RegUsageMapper&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;&amp;gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;inst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;: &amp;amp;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9D7CD8;font-style: italic;&quot;&gt;mut&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; Inst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; mapper&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;: &amp;amp;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt;RUM&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;) {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;    &#x2F;&#x2F; ...&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;    match&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; inst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;        &amp;amp;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9D7CD8;font-style: italic;&quot;&gt;mut&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; Inst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;AluRRR&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;            ref&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9D7CD8;font-style: italic;&quot;&gt; mut&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rd&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;            ref&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9D7CD8;font-style: italic;&quot;&gt; mut&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rn&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;            ref&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9D7CD8;font-style: italic;&quot;&gt; mut&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rm&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;            ..&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;        } =&amp;gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;            map_def&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;mapper&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rd&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;);&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;            map_use&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;mapper&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rn&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;);&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;            map_use&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;mapper&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rm&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;);&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;        }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;        &#x2F;&#x2F; etc.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Note this is reflecting quite precisely what the usage collector did: we’re replacing the virtual register mention for the defined register &lt;code&gt;rd&lt;&#x2F;code&gt; with the information (which real register) provided by the &lt;code&gt;RegUsageMapper&lt;&#x2F;code&gt;. These two functions must stay in sync, otherwise here be dragons! (and bugs very hard to debug!)&lt;&#x2F;p&gt;
&lt;h3 id=&quot;register-allocation-on-x86&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#register-allocation-on-x86&quot; aria-label=&quot;Anchor link for: register-allocation-on-x86&quot;&gt;🔗&lt;&#x2F;a&gt;Register allocation on x86&lt;&#x2F;h3&gt;
&lt;p&gt;On Intel’s x86, register allocation may be a bit trickier: in some cases, the lowering needs to be carefully written so it satisfies some register allocation constraints that are very specific to this architecture. In particular, x86 has &lt;em&gt;fixed register constraints&lt;&#x2F;em&gt; as well as &lt;em&gt;tied operands&lt;&#x2F;em&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;For this specific part, we’ll look at the integer shift-left instruction, which is equivalent to C’s &lt;code&gt;x &amp;lt;&amp;lt; y&lt;&#x2F;code&gt;. Why this particular instruction? It exhibits both properties that we’re interested in studying here. The lowering of &lt;code&gt;iadd&lt;&#x2F;code&gt; is similar, albeit slightly simpler, as it &lt;em&gt;only&lt;&#x2F;em&gt; involves tied operands.&lt;&#x2F;p&gt;
&lt;h4 id=&quot;fixed-register-constraints&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#fixed-register-constraints&quot; aria-label=&quot;Anchor link for: fixed-register-constraints&quot;&gt;🔗&lt;&#x2F;a&gt;Fixed register constraints&lt;&#x2F;h4&gt;
&lt;p&gt;On the one hand, some instructions expect their inputs to be in &lt;em&gt;fixed&lt;&#x2F;em&gt; registers, that is, specific registers arbitrarily predefined by the architecture manual. For the example of the shift instruction, if the count is not statically known at compile time (it’s not a shift by a constant value), then the amount by which we’re shifting must be in the &lt;code&gt;rcx&lt;&#x2F;code&gt; register&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-8-1&quot;&gt;&lt;a href=&quot;#fn-8&quot;&gt;5&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;Now, how do we make sure that the input value actually is in &lt;code&gt;rcx&lt;&#x2F;code&gt;? We can mark &lt;code&gt;rcx&lt;&#x2F;code&gt; as used in the &lt;code&gt;get_regs&lt;&#x2F;code&gt; function so regalloc knows about this, but nothing ensures that the input &lt;em&gt;resides&lt;&#x2F;em&gt; in it at the beginning of the instruction. To resolve this, we’ll introduce a &lt;strong&gt;move instruction&lt;&#x2F;strong&gt; during lowering, that is going to copy the input value into &lt;code&gt;rcx&lt;&#x2F;code&gt;. Then we’re sure it lives there, and register allocation knows it’s used: we’re good to go!&lt;&#x2F;p&gt;
&lt;p&gt;In a nutshell, this shows how lowering and register allocation play together:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;during lowering, we introduce a move from a dynamic shift input value to &lt;code&gt;rcx&lt;&#x2F;code&gt; before the actual shift&lt;&#x2F;li&gt;
&lt;li&gt;in the register usage function, we mark &lt;code&gt;rcx&lt;&#x2F;code&gt; as used&lt;&#x2F;li&gt;
&lt;li&gt;(nothing to do in the register mapping function: &lt;code&gt;rcx&lt;&#x2F;code&gt; is a real register already)&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h4 id=&quot;tied-operands&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#tied-operands&quot; aria-label=&quot;Anchor link for: tied-operands&quot;&gt;🔗&lt;&#x2F;a&gt;Tied operands&lt;&#x2F;h4&gt;
&lt;p&gt;On the other hand, some instructions have operands that are both read and written at the same time: we call them &lt;em&gt;modified&lt;&#x2F;em&gt; in Cranelift and regalloc.rs, but they’re also known as &lt;em&gt;tied operands&lt;&#x2F;em&gt; in the compiler literature. It’s not just that there’s a register that must be read, and a register that must be written to: they &lt;em&gt;must&lt;&#x2F;em&gt; be the same register. How do we model this, then?&lt;&#x2F;p&gt;
&lt;p&gt;Consider a naive solution. We take the input virtual register, and decide it’s allocated to the same register as the output (modified) register. Unfortunately, if the chosen virtual register was going to be reused by another later VCode instruction, then its value would be overwritten (clobbered) by the current instruction. This would result in incorrect code being generated, so this is not acceptable. In general we can’t clobber the value that was in an input value during lowering, because that’s the role of regalloc to make this kind of decisions.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #A9B1D6; background-color: #1A1B26;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;&#x2F;&#x2F; Before register allocation, with virtual registers:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;v2 = v0 + v1&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;v3 = v0 + 42&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;&#x2F;&#x2F; After register allocation, on a machine with two registers %r0 and %r1:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;&#x2F;&#x2F; assign v0 to %r0, v1 to %r1, v2 to %r0&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;%r0 += v1&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;... = %r0 + 42 &#x2F;&#x2F; ohnoes! the value in %r0 is v2, not v0 anymore!&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The right solution is, again, to &lt;em&gt;copy&lt;&#x2F;em&gt; this input virtual register into the output virtual register, right before the instruction. This way, we can still reuse the untouched input register in other instructions without modifying it: only the copy is written to.&lt;&#x2F;p&gt;
&lt;p&gt;Pfew! We can now look at the entire lowering for the shift left instruction, edited and commented for clarity:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #A9B1D6; background-color: #1A1B26;&quot;&gt;&lt;code data-lang=&quot;rust&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;&#x2F;&#x2F; Read the instruction operand size from the output&amp;#39;s type.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;let&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; size&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; =&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; dst_ty&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;bytes&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;() as&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; u8&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;&#x2F;&#x2F; Put the left hand side into a virtual register.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;let&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; lhs&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; =&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt; put_input_in_reg&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;ctx&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; inputs&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;[&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt;0&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;]);&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;&#x2F;&#x2F; Put the right hand side (shift amount) into either an immediate (if it&amp;#39;s&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;&#x2F;&#x2F; statically known at compile time), or into a virtual register.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;let&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; (&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;count&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rhs&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;) =&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;    if let&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; Some&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;cst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;) =&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; ctx&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;get_input_as_source_or_const&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;insn&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt; 1&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;).&lt;&#x2F;span&gt;&lt;span&gt;constant &lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;{&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;        &#x2F;&#x2F; Mask count, according to Cranelift&amp;#39;s semantics.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;        let&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; cst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; = (&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;cst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; as&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; u8&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;) &amp;amp; (&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;dst_ty&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;bits&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;() as&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; u8&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; -&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt; 1&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;);&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;        (&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;Some&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;cst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;),&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; None&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;    }&lt;&#x2F;span&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt; else&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;        (&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;None&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; Some&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;put_input_in_reg&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;ctx&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; inputs&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;[&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt;1&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;])))&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;    };&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;&#x2F;&#x2F; Get the destination virtual register.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;let&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; dst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; =&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt; get_output_reg&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;ctx&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; outputs&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;[&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt;0&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;]).&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;only_reg&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;().&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;unwrap&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;();&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;&#x2F;&#x2F; Copy the left hand side into the (modified) output operand, to satisfy the&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;&#x2F;&#x2F; mod constraint.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;ctx&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;emit&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #0DB9D7;&quot;&gt;Inst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;mov_r_r&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt;true&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; lhs&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; dst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;));&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;&#x2F;&#x2F; If the shift count is statically known: nothing particular to do. Otherwise,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;&#x2F;&#x2F; we need to put it in the RCX register.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;if&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; count&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;is_none&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;() {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;    let&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; w_rcx&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; =&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; Writable&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;from_reg&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #0DB9D7;&quot;&gt;regs&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;rcx&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;());&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;    &#x2F;&#x2F; Copy the shift count (which is in rhs) into RCX.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;    ctx&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;emit&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #0DB9D7;&quot;&gt;Inst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;mov_r_r&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt;true&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rhs&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;unwrap&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(),&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; w_rcx&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;));&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;&#x2F;&#x2F; Generate the actual shift instruction.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;ctx&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;emit&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #0DB9D7;&quot;&gt;Inst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;shift_r&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;size&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #0DB9D7;&quot;&gt; ShiftKind&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;ShiftLeft&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; count&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; dst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;));&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;And this is how we tell the register usage collector about our constraints:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #A9B1D6; background-color: #1A1B26;&quot;&gt;&lt;code data-lang=&quot;rust&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;Inst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;ShiftR&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; {&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; num_bits&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; dst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;, .. } =&amp;gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;    if&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; num_bits&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;is_none&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;() {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;        &#x2F;&#x2F; if the shift count is dynamic, mark RCX as used.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;        collector&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;add_use&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #0DB9D7;&quot;&gt;regs&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;rcx&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;());&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;    }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;    &#x2F;&#x2F; In all the cases, the destination operand is modified.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;    collector&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;add_mod&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(*&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;dst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;);&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Only the modified register needs to be mapped to its allocated physical register:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #A9B1D6; background-color: #1A1B26;&quot;&gt;&lt;code data-lang=&quot;rust&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;Inst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;ShiftR&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; { ref&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9D7CD8;font-style: italic;&quot;&gt; mut&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; dst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;, .. } =&amp;gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;    map_mod&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;mapper&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; dst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;);&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;h3 id=&quot;virtual-registers-copies-and-performance&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#virtual-registers-copies-and-performance&quot; aria-label=&quot;Anchor link for: virtual-registers-copies-and-performance&quot;&gt;🔗&lt;&#x2F;a&gt;Virtual registers copies and performance&lt;&#x2F;h3&gt;
&lt;p&gt;Do these virtual register copies sound costly to you? In theory, they could lead to the code generation of a move instructions, increasing the size of the code generated and causing a small runtime cost. In practice,
register allocation, through its interface, knows how to identify move instructions, their source and their destination. By analyzing them, it can see when a source isn’t used after a given move instruction, and thus allocate the same register for the source and the destination of the move. Then, when Cranelift generates the code, it will avoid generating a move from a physical register to the same one&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-7-1&quot;&gt;&lt;a href=&quot;#fn-7&quot;&gt;6&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;. As a matter of fact, creating a VCode copy doesn’t necessarily mean that it will generate a machine code move instruction later: it is present just in case regalloc &lt;em&gt;needs&lt;&#x2F;em&gt; it, but it can be avoided when it’s spurious.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;code-generation&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#code-generation&quot; aria-label=&quot;Anchor link for: code-generation&quot;&gt;🔗&lt;&#x2F;a&gt;Code generation&lt;&#x2F;h2&gt;
&lt;p&gt;Oh my, we’re getting closer to actually being able to run the code! Once register allocation has run, we can generate the actual machine code for the VCode instructions. Cool kids call this step of the pipeline &lt;em&gt;codegen&lt;&#x2F;em&gt;, for code generation. This is the part where we decipher the architecture manuals provided by the CPU vendors, and generate the raw machine bytes for our machine instructions. In Cranelift, this means filling a code buffer (there’s a &lt;code&gt;MachBuffer&lt;&#x2F;code&gt; sink interface for this!), returned along some internal relocations&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-6-1&quot;&gt;&lt;a href=&quot;#fn-6&quot;&gt;7&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt; and additional metadata. Let’s see what happens for our integer addition, when the times come to generate the code for its VCode equivalent &lt;code&gt;AluRRR&lt;&#x2F;code&gt; on &lt;code&gt;aarch64&lt;&#x2F;code&gt; (in &lt;code&gt;cranelift&#x2F;codegen&#x2F;src&#x2F;isa&#x2F;aarch64&#x2F;inst&#x2F;emit.rs&lt;&#x2F;code&gt;):&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #A9B1D6; background-color: #1A1B26;&quot;&gt;&lt;code data-lang=&quot;rust&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;&#x2F;&#x2F; We match on the VCode&amp;#39;s identity here:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;&amp;amp;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;Inst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;AluRRR&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; {&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; alu_op&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rd&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rn&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rm&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; } =&amp;gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;    &#x2F;&#x2F; First select the top 11 bits based on the ALU subopcode.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;    let&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; top11&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; =&lt;&#x2F;span&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt; match&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; alu_op&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;        ALUOp&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;Add32&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; =&amp;gt;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt; 0b00001011_000&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;        ALUOp&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;Add64&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; =&amp;gt;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt; 0b10001011_000&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;        &#x2F;&#x2F; etc&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;    };&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;    &#x2F;&#x2F; Then decide the bits 10 to 15, based on the ALU subopcode as well.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;    let&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; bit15_10&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; =&lt;&#x2F;span&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt; match&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; alu_op&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;        &#x2F;&#x2F; other cases&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;        _&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; =&amp;gt;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt; 0b000000&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;    };&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;    &#x2F;&#x2F; Then use an helper and pass forward the allocated physical registers&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;    &#x2F;&#x2F; values.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;    sink&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;put4&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;enc_arith_rrr&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;top11&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; bit15_10&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rd&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rn&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rm&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;));&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;And what’s this &lt;code&gt;enc_arith_rrr&lt;&#x2F;code&gt; doing, then?&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #A9B1D6; background-color: #1A1B26;&quot;&gt;&lt;code data-lang=&quot;rust&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;fn&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt; enc_arith_rrr&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;bits_31_21&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; u32&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; bits_15_10&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; u32&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rd&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; Writable&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;&amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;Reg&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;&amp;gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rn&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; Reg&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rm&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; Reg&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;) -&amp;gt;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; u32&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;    (&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;bits_31_21&lt;&#x2F;span&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt; &amp;lt;&amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt; 21&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;        |&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; (&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;bits_15_10&lt;&#x2F;span&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt; &amp;lt;&amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt; 10&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;        |&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt; machreg_to_gpr&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;rd&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;to_reg&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;())&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;        |&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; (&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;machreg_to_gpr&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;rn&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;)&lt;&#x2F;span&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt; &amp;lt;&amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt; 5&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;        |&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; (&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;machreg_to_gpr&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;rm&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;)&lt;&#x2F;span&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt; &amp;lt;&amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt; 16&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Encoding the instruction parts (operands, register mentions) is a lot of bit twiddling and fun. We do so for each VCode instruction, until we’ve generated the whole function’s body. If you remember correctly, at this point register allocation may have added some spills&#x2F;reloads&#x2F;move instructions. From the codegen’s point of view, these are just regular instructions with precomputed operands (either real registers, or memory operands involving the stack pointer), so they’re not treated particularly and they’re just generated the same way other VCode instructions are.&lt;&#x2F;p&gt;
&lt;p&gt;More work is done by the codegen backend then, to optimize blocks placement, compute final branch offsets, etc. If you’re interested by this, I strongly encourage you to go read &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;cfallin.org&#x2F;blog&#x2F;2021&#x2F;01&#x2F;22&#x2F;cranelift-isel-2&#x2F;&quot;&gt;this blog post&lt;&#x2F;a&gt; by Chris Fallin. After this, we’re finally done: we’ve produced a code buffer, as well as external relocations (to other functions, memory addresses, etc.) for a single function. The code generator’s task is complete: the final steps consist in linking and, optionally, producing an executable binary.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;mission-accomplished&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#mission-accomplished&quot; aria-label=&quot;Anchor link for: mission-accomplished&quot;&gt;🔗&lt;&#x2F;a&gt;Mission accomplished!&lt;&#x2F;h2&gt;
&lt;p&gt;So, we’re done for today! Thanks for reading this far, hope it has been a useful and pleasant read to you! Feel free to reach out to me on the &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;twitter.com&#x2F;bnjbvr&quot;&gt;twitterz&lt;&#x2F;a&gt; if you have additional remarks&#x2F;questions, and to go contribute on &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;bytecodealliance&#x2F;wasmtime&quot;&gt;Wasmtime&#x2F;Cranelift&lt;&#x2F;a&gt; if this sort of things is interesting to you 😇. Until next time, take care of yourselves!&lt;&#x2F;p&gt;
&lt;p&gt;Thanks to &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;cfallin.org&quot;&gt;Chris Fallin&lt;&#x2F;a&gt; for reading and suggesting improvements to this blog post.&lt;&#x2F;p&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;Really, Rust &lt;em&gt;is&lt;&#x2F;em&gt; the DSL. It was Python code before, that had the advantage to be faster to update. Yet it was doing a lot of magic behind the curtain, which wasn’t very friendly for new people trying to learn and use Cranelift. Despite a statically typed language helping for exploration through tooling, this meta-language is to partially disappear in the long run, see Chris’ &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;cfallin.org&#x2F;blog&#x2F;2020&#x2F;09&#x2F;18&#x2F;cranelift-isel-1&#x2F;&quot;&gt;blog post&lt;&#x2F;a&gt; on this topic. &lt;a href=&quot;#fr-2-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-4&quot;&gt;
&lt;p&gt;Aarch64 connoisseurs may notice that there are other ways to encode an addition. Say, if one of the input operands was the result of a bit shift instruction by an immediate value, then it’s possible to &lt;em&gt;embed&lt;&#x2F;em&gt; the shift within the add, so we end up with fewer machine instructions (and lower the register pressure). This other possible encoding is sufficiently different in terms of register allocation and code generation that it justifies having its own VCode instruction. &lt;code&gt;AluRRR&lt;&#x2F;code&gt; is simpler in the sense that it’s only concerned with register inputs and outputs, thus a perfect example for this post. &lt;a href=&quot;#fr-4-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;What’s an integer overflow for signed integer division? Consider an integer value represented on &lt;code&gt;N&lt;&#x2F;code&gt; bits. If you try to divide the smallest integer value &lt;code&gt;-2**N&lt;&#x2F;code&gt; by &lt;code&gt;-1&lt;&#x2F;code&gt;, it should return &lt;code&gt;2**N&lt;&#x2F;code&gt;, but this is out of range, since the biggest signed integer value we can represent on &lt;code&gt;N&lt;&#x2F;code&gt; bits is &lt;code&gt;(2**N) - 1&lt;&#x2F;code&gt;! So this will overflow and be set to &lt;code&gt;-2**N&lt;&#x2F;code&gt;, which is the initial value, but not the correct result. Good luck debugging this without a software trap! &lt;a href=&quot;#fr-3-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-5&quot;&gt;
&lt;p&gt;Register moves may be introduced because a successor block (in the control flow graph) expects a given virtual register to live in a particular real register, or because a particular instruction requires a virtual register to be allocated to a &lt;em&gt;fixed&lt;&#x2F;em&gt; real register that’s busy: regalloc can then temporarily divert the busy register into another unused register. &lt;a href=&quot;#fr-5-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-8&quot;&gt;
&lt;p&gt;The &lt;code&gt;c&lt;&#x2F;code&gt; in &lt;code&gt;rcx&lt;&#x2F;code&gt; actually stands for &lt;code&gt;count&lt;&#x2F;code&gt;; this is a property inherited from former CPU designs. &lt;a href=&quot;#fr-8-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-7&quot;&gt;
&lt;p&gt;Unless this move carries sign- or zero-extending semantics, which is the case for e.g. x86’s 32-bits &lt;code&gt;mov&lt;&#x2F;code&gt; instructions on a 64-bits architecture. &lt;a href=&quot;#fr-7-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-6&quot;&gt;
&lt;p&gt;Relocations are placeholders for information we don’t have &lt;em&gt;yet&lt;&#x2F;em&gt; access to. For instance, when we’re generating jump instructions, the jump targets offsets are not determined yet. So we record where the jump instruction is in the code stream, as well as which control flow block it should jump into, so we can &lt;em&gt;patch it&lt;&#x2F;em&gt; later when the final offsets are known: that’s the content of our relocation. &lt;a href=&quot;#fr-6-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;&#x2F;section&gt;
</content>
    </entry>
    <entry xml:lang="en">
        <title>Botzilla, a multi-purpose Matrix bot tuned for Mozilla</title>
        <published>2020-11-12T18:49:42+00:00</published>
        <updated>2020-11-12T18:49:42+00:00</updated>
        
        <author>
          <name>
            
              Benjamin Bouvier
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://bouvier.cc/tech/botzilla/"/>
        <id>https://bouvier.cc/tech/botzilla/</id>
        <content type="html" xml:base="https://bouvier.cc/tech/botzilla/">&lt;p&gt;In this post I reflect on my personal history of writing chat bots, and then
present a panel of features that the bot has, some user-facing ones, some
others that embody what I esteem to be a sane, well-behaved Matrix bot.&lt;&#x2F;p&gt;
&lt;span id=&quot;continue-reading&quot;&gt;&lt;&#x2F;span&gt;
&lt;p&gt;Over the last year, Mozilla has decided to shut down the IRC network and
replace it with a more modern platform. To my greatest delight, &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;http:&#x2F;&#x2F;exple.tive.org&#x2F;blarg&#x2F;2019&#x2F;12&#x2F;19&#x2F;over-the-line&#x2F;&quot;&gt;the Matrix
ecosystem has been
selected&lt;&#x2F;a&gt; among all the
possible replacements. For those who might not know Matrix, it’s a modern,
decentralized protocol, using plain HTTP JSON-formatted endpoints,
well-documented, and it implements both features that are common in recent
messaging systems (e.g. file attachments, message edits and deletions), as well
as those needed to handle large groups (e.g. moderation tools, private rooms,
invite-only rooms).&lt;&#x2F;p&gt;
&lt;h2 id=&quot;but-first-some-history&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#but-first-some-history&quot; aria-label=&quot;Anchor link for: but-first-some-history&quot;&gt;🔗&lt;&#x2F;a&gt;but first, some history&lt;&#x2F;h2&gt;
&lt;p&gt;Back in 2014 when I was an intern at Mozilla, I made a silly IRC JavaScript bot
that would quote the @horsejs twitter account, when asked to do so. Then a few
other useless features were added: “karma” tracking &lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-1-1&quot;&gt;&lt;a href=&quot;#fn-1&quot;&gt;1&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;, being a karma
guardian angel (lowering the karma of people lowering the karma of some
predefined people), keeping track of contextless quotes from misc people…&lt;&#x2F;p&gt;
&lt;p&gt;Over time, it slowly transformed into an IRC bot &lt;em&gt;framework&lt;&#x2F;em&gt;, with &lt;em&gt;modules&lt;&#x2F;em&gt;
you could attach and configure at startup, setting which rooms the bot would
join, what should be the cooldowns for message sending (more on this later),
and so much more! Hence it was renamed &lt;em&gt;meta-bot&lt;&#x2F;em&gt;.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;an-aside-on-the-morality-of-bots&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#an-aside-on-the-morality-of-bots&quot; aria-label=&quot;Anchor link for: an-aside-on-the-morality-of-bots&quot;&gt;🔗&lt;&#x2F;a&gt;an aside on the morality of bots&lt;&#x2F;h3&gt;
&lt;p&gt;I find making bots a fun activity, since once you’ve passed the step of
connecting and sending messages, the rest is mostly easy (&lt;em&gt;cough cough regular
expressions cough cough&lt;&#x2F;em&gt;) and creative work. And it’s unfortunately easy to be
reckless too.&lt;&#x2F;p&gt;
&lt;p&gt;At this time, I never considered the potentially bad effects of quoting text
from a random source, viz. fetching tweets from the @horsejs account. If the
source would return a message that was inconsiderate, rude, or even worse,
aggressive, then the bot would replicate this behavior. It is a real issue
because although the bot doesn’t think by itself and doesn’t &lt;em&gt;mean&lt;&#x2F;em&gt; any harm,
its programmers can do better, and they should try to avoid these issues at all
costs. A chat bot replicates the culture of the engineers who made it on one
hand, but also contributes to propagating this culture in the chat rooms it
participates in, &lt;em&gt;normalizing&lt;&#x2F;em&gt; it to the chat participants.&lt;&#x2F;p&gt;
&lt;p&gt;My bot happened to be well-behaved most of the time… until one time where it
was not. After noticing the incident and expressing my deepest apologies, I
deactivated the module and went through the whole list of modules, to make sure
none could cause any harm, in any possible way. I should have known better in
the first place! I am really not trying to signal my own virtue, since I failed
in a way that should have been predictable. I hope by writing this that other
people may reflect about the actions of their bots as well, in case they could
be misbehaving like this.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;the-former-fleet-of-mozilla-bots&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#the-former-fleet-of-mozilla-bots&quot; aria-label=&quot;Anchor link for: the-former-fleet-of-mozilla-bots&quot;&gt;🔗&lt;&#x2F;a&gt;the former fleet of mozilla bots&lt;&#x2F;h3&gt;
&lt;p&gt;There were a few other useful IRC bots (of which I wasn’t the author) hanging
out in the Mozilla IRC rooms, notably
&lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;globau&#x2F;firebot&quot;&gt;Firebot&lt;&#x2F;a&gt; and
&lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;wiki.mozilla.org&#x2F;Mrgiggles&quot;&gt;mrggigles&lt;&#x2F;a&gt;. The latter probably started as
a joke too, to enumerate puns from a list in the JavaScript channel. Then it
outgrew its responsibilities by helping with a handful of requests: who can
review this or this file in Mozilla’s source code? what’s the status of the
continuous integration trees? can this particular C++ function used in Gecko
cause a garbage collection?&lt;&#x2F;p&gt;
&lt;p&gt;When we moved over to Matrix, the bots unfortunately became outdated, since the
communication protocol (IRC) they were using was different. We could have
ported them to the Matrix protocol, but the Not-Invented-Here syndrom was
strong with this one: I’ve been making bots for a while, and I was personally
interested in the Matrix protocol and trying out the JS facilities offered by
the Matrix ecosystem.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;botzilla-features&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#botzilla-features&quot; aria-label=&quot;Anchor link for: botzilla-features&quot;&gt;🔗&lt;&#x2F;a&gt;&lt;strong&gt;Botzilla features&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;So I’ve decided to write &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;bnjbvr&#x2F;botzilla&quot;&gt;Botzilla&lt;&#x2F;a&gt;, a
successor in spirit to &lt;em&gt;meta-bot&lt;&#x2F;em&gt; and &lt;em&gt;mrgiggles&lt;&#x2F;em&gt;, written in TypeScript. This
is a very &lt;em&gt;unofficial&lt;&#x2F;em&gt; bot, tailored for Mozilla’s needs but probably useful in
other contexts. I’ve worked on it informally as a side-project, on my &lt;em&gt;copious&lt;&#x2F;em&gt;
spare time. Crafting tools that show useful to other people has been sufficient
a reward to motivate me to work on it, so it’s been quite fun!&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;bouvier.cc&#x2F;tech&#x2F;botzilla&#x2F;logo.png&quot; alt=&quot;Botzilla’s logo&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;em&gt;Botzilla’s logo, courtesy of &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;nical.github.io&#x2F;index.html&quot;&gt;Nical&lt;&#x2F;a&gt;&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Let’s take a look at all the features that the bot offers, at this point.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;uuid-generate-unique-ids&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#uuid-generate-unique-ids&quot; aria-label=&quot;Anchor link for: uuid-generate-unique-ids&quot;&gt;🔗&lt;&#x2F;a&gt;&lt;strong&gt;uuid: Generate unique IDs&lt;&#x2F;strong&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;This was a feature of Firebot, and easy enough to replicate, so this was the
test feature for the Matrix bot. When saying &lt;code&gt;!uuid&lt;&#x2F;code&gt;, the bot will
automatically generate a unique id (using uuid v4), guaranteed GMO-free and
usable in any context that would require it. This was the first module,
designed to test the framework.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;bouvier.cc&#x2F;tech&#x2F;botzilla&#x2F;uuid.png&quot; alt=&quot;Demo of uuid&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;h3 id=&quot;treestatus-inform-about-ci-tree-status&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#treestatus-inform-about-ci-tree-status&quot; aria-label=&quot;Anchor link for: treestatus-inform-about-ci-tree-status&quot;&gt;🔗&lt;&#x2F;a&gt;&lt;strong&gt;treestatus: Inform about CI tree status&lt;&#x2F;strong&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;Mozilla developers tend to interact a lot with the continuous integration
trees, because code is sometimes landed, sometimes backed out (sorry&#x2F;thank you
sheriffs!), sometimes merged across branches. This leads to the integration
trees being closed. Before we had the feature to automatically land patch
stacks when the trees reopened, it was useful to be able to get the open&#x2F;close
status of a tree. Asking &lt;code&gt;!treestatus&lt;&#x2F;code&gt; will answer with a list of the status of
some &lt;em&gt;common&lt;&#x2F;em&gt; trees. It is also possible to request the status of a particular
tree, e.g. for the “mozilla-central” tree, by asking &lt;code&gt;!treestatus mozilla-central&lt;&#x2F;code&gt; (or just &lt;code&gt;central&lt;&#x2F;code&gt;, as a handy shortcut).&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;bouvier.cc&#x2F;tech&#x2F;botzilla&#x2F;treestatus.png&quot; alt=&quot;Demo of treestatus&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;h3 id=&quot;expand-bug-status&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#expand-bug-status&quot; aria-label=&quot;Anchor link for: expand-bug-status&quot;&gt;🔗&lt;&#x2F;a&gt;&lt;strong&gt;Expand bug status&lt;&#x2F;strong&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;If you have ever interacted with Mozilla’s code, there’s chances that you’ve
used Bugzilla, and mentioned bug numbers in conversations. The bot caches any
message containing &lt;code&gt;bug XXX&lt;&#x2F;code&gt; and will respond with a link to this bug, the
nickname of the person assigned to this bug if there’s one, and the summary of
this bug, if it’s public. This is by far the most used and useful module, since
it doesn’t require a special incantation, but will react automatically to a lot
of messages written with no particular intent (see below where it’s explained
how to not be spammy, though).&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;bouvier.cc&#x2F;tech&#x2F;botzilla&#x2F;expand-bug.png&quot; alt=&quot;Demo of expand-bug&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;h3 id=&quot;who-can-review-x&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#who-can-review-x&quot; aria-label=&quot;Anchor link for: who-can-review-x&quot;&gt;🔗&lt;&#x2F;a&gt;Who Can Review X?&lt;&#x2F;h3&gt;
&lt;p&gt;This was a very nice feature that mrgiggles had: ask for potential reviewers
for a particular file in the Gecko source tree and get a list of most recent
reviewers. Botzilla replicates this, when seeing the trigger: &lt;code&gt;who can review js&#x2F;src&#x2F;wasm&#x2F;WasmJS.cpp?&lt;&#x2F;code&gt;. The list of potential reviewers is extracted from
Mercurial logs, looking for the N last reviewers of this particular file.&lt;&#x2F;p&gt;
&lt;p&gt;As a bonus, there’s no need to pass the full path to the file, if the file’s
name is unique in the tree’s source code. Botzilla will trigger a search in
Searchfox, and will use the unique name in the result list, if there’s such a
unique result. The previous example thus can be shortened to &lt;code&gt;who can review WasmJS.cpp?&lt;&#x2F;code&gt; since the file’s name is unique in the whole code base.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;bouvier.cc&#x2F;tech&#x2F;botzilla&#x2F;who-can-review.png&quot; alt=&quot;Demo of who can review&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;h3 id=&quot;github-gitlab-issues-p-m-rs&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#github-gitlab-issues-p-m-rs&quot; aria-label=&quot;Anchor link for: github-gitlab-issues-p-m-rs&quot;&gt;🔗&lt;&#x2F;a&gt;{Github,Gitlab} {issues,{P,M}Rs}&lt;&#x2F;h3&gt;
&lt;p&gt;It is possible for a room administrator to “connect” a given Matrix room to a
Github repository. Later on, any mention of issues or pull requests by their
number, e.g. &lt;code&gt;#1234&lt;&#x2F;code&gt;, will make Botzilla react with the summary and a link to
the issue&#x2F;PR at stake.&lt;&#x2F;p&gt;
&lt;p&gt;This also works for Gitlab repositories, with slight differences: the
administrator has to precise what’s the root URL of the Gitlab instance (since
Gitlab can be selfhosted). Issues are caught when numbers follows a &lt;code&gt;#&lt;&#x2F;code&gt; sign,
while merge requests are caught when the numbers follow a &lt;code&gt;!&lt;&#x2F;code&gt; sign.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;bouvier.cc&#x2F;tech&#x2F;botzilla&#x2F;gitlab.png&quot; alt=&quot;Demo of gitlab&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;h3 id=&quot;tweet-toot-post-on-twitter-mastodon&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#tweet-toot-post-on-twitter-mastodon&quot; aria-label=&quot;Anchor link for: tweet-toot-post-on-twitter-mastodon&quot;&gt;🔗&lt;&#x2F;a&gt;!tweet&#x2F;!toot: Post on Twitter&#x2F;Mastodon&lt;&#x2F;h3&gt;
&lt;p&gt;An administrator can configure a room to tie it up to a Twitter (respectively
Mastodon) user account, using API tokens. Then, any person with an
administrative role can post messages with &lt;code&gt;!tweet something shocking for the bird site&lt;&#x2F;code&gt;(respectively &lt;code&gt;!toot something heartful for the mammoth site&lt;&#x2F;code&gt;). This
makes it possible to allow other people to post on these social networks
without the need to give them the account’s password.&lt;&#x2F;p&gt;
&lt;p&gt;Unfortunately, the Twitter module hasn’t ever been tested, since when I’ve
tried to create a developer account, Twitter accepted it after a few days but
then never displayed the API tokens on the interface. The support also never
answered when I asked for help. Thankfully Mastodon can be self-hosted and thus
it is easier to test. I’m happy to report that it works quite well!&lt;&#x2F;p&gt;
&lt;h3 id=&quot;confession-and-histoire&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#confession-and-histoire&quot; aria-label=&quot;Anchor link for: confession-and-histoire&quot;&gt;🔗&lt;&#x2F;a&gt;&lt;code&gt;confession&lt;&#x2F;code&gt; and histoire&lt;&#x2F;h3&gt;
&lt;p&gt;It is quite common in teams to set up regular standup meetings, where everyone
in the team announces what they’ve been working on in the last few days or
week. It also strikes me as important for personal recognition, including
towards management, to be able to &lt;em&gt;show off&lt;&#x2F;em&gt; (just a bit!) what you’ve
accomplished recently, and to remember this when times are harder (see also
&lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;jvns.ca&#x2F;blog&#x2F;brag-documents&#x2F;&quot;&gt;Julia Evans’ blog post on the topic&lt;&#x2F;a&gt;).&lt;&#x2F;p&gt;
&lt;p&gt;There’s a Botzilla module for this. Every time someone starts a message with
&lt;code&gt;confession:&lt;&#x2F;code&gt;, then everything after the colon will be saved in a database
(…wait for it!). Then, all the confessions are displayed on the
&lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;robotzilla.github.io&#x2F;histoire&quot;&gt;Histoire&lt;&#x2F;a&gt; &lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-2-1&quot;&gt;&lt;a href=&quot;#fn-2&quot;&gt;2&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt; website, with one
message feed per user. Note it is possible to send confessions privately to
Botzilla (that doesn’t affect the frontend though, which is open and public to
all!), or in a public channel. Public channels somehow equate to team members,
so channels also get their own pages on the frontend.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;bouvier.cc&#x2F;tech&#x2F;botzilla&#x2F;confession.png&quot; alt=&quot;Demo of confession&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;bouvier.cc&#x2F;tech&#x2F;botzilla&#x2F;histoire.png&quot; alt=&quot;Screenshot of Histoire&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Now the fun&#x2F;cursed part is how all of this &lt;em&gt;works&lt;&#x2F;em&gt;. This was implemented in
&lt;em&gt;mrgiggles&lt;&#x2F;em&gt;, and I liked it a lot, since it required no kind of backend or
frontend server. How so? By (ab)using Github files as the database and Github
pages as the frontend. Sending a confession will trigger a request to a Github
endpoint to find a database file segregated by time, then it will trigger
another request to create&#x2F;modify it with the content of the confession. The
frontend then uses other requests to public Github APIs to read the confessions
before dynamically rendering those. Astute readers will notice that under a lot
of confession activity, the bot would be a bit slowed down by Github’s API use
rates. In this case, there’s some exponential backoff behavior before trying to
re-send unsaved confessions to Github. Overall it works great, and API
limitation rates have never quite been a problem.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;intrinsic-features-they-re-good-bots-bront&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#intrinsic-features-they-re-good-bots-bront&quot; aria-label=&quot;Anchor link for: intrinsic-features-they-re-good-bots-bront&quot;&gt;🔗&lt;&#x2F;a&gt;&lt;strong&gt;Intrinsic features: they’re good bots, bront&lt;&#x2F;strong&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;In addition to all the user-facing features, the bot has a few other
interesting attributes that are more relevant to consider from a framework
point of view. Hopefully some of these ideas can be useful for other bot
authors!&lt;&#x2F;p&gt;
&lt;h3 id=&quot;join-all-the-rooms&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#join-all-the-rooms&quot; aria-label=&quot;Anchor link for: join-all-the-rooms&quot;&gt;🔗&lt;&#x2F;a&gt;&lt;strong&gt;Join All The Rooms!&lt;&#x2F;strong&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;Every time the bot is invited to a channel, be it public or private, it will
join the channel, making it easy to use in general. It was implemented for free
by &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;turt2live&#x2F;matrix-js-bot-sdk&quot;&gt;the JS framework I’ve been
using&lt;&#x2F;a&gt;, and it is a
definitive improvement over the IRC version of the bot.&lt;&#x2F;p&gt;
&lt;p&gt;Sometimes Matrix rooms are upgraded to a new version of the room. The bot will
try to join the upgraded room if it can, keeping all its room settings intact
during the transition.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;thou-shalt-not-spam&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#thou-shalt-not-spam&quot; aria-label=&quot;Anchor link for: thou-shalt-not-spam&quot;&gt;🔗&lt;&#x2F;a&gt;&lt;strong&gt;Thou shalt not spam&lt;&#x2F;strong&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;To avoid spamming the channel, especially for modules that are &lt;em&gt;reactions&lt;&#x2F;em&gt; to
other messages (think: bug numbers, issues&#x2F;pull requests mentions), the bot has
had to learn how to keep quiet. There are two rules triggering the quieting
behavior:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;if the bot has already reacted less than N minutes ago (where N is a
configurable amount) in the same room,&lt;&#x2F;li&gt;
&lt;li&gt;or if it has already reacted to some entity in a message, and there’s been
fewer than M messages in between the last reaction and the last message
mentioning the same entity in the same room (M is also configurable)&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;If any of these two criteria is met, then the bot will keep quiet and it will
not react to another similar message. The combination of these two has proven
over time to be quite solid in my experience, based on observing the bot’s
behavior and public reactions to its behavior.&lt;&#x2F;p&gt;
&lt;p&gt;Some similar mechanism is used for the &lt;em&gt;confession&lt;&#x2F;em&gt; module: on a &lt;em&gt;first&lt;&#x2F;em&gt;
confession, the bot will answer with a message saying it has seen the
confession, including a link to where it is going to be posted, and will add an
emoji “eyes” reaction to the message. Posting this long form message could be
quite spammy, if there’s a lot of confessions around the same time. Under the
same criteria, it will just react with an “eyes” emoji to other confessions.
Later on, it’ll resend the full message, once both criterias aren’t blocking it
from doing so.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;decentralized-administration-self-service&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#decentralized-administration-self-service&quot; aria-label=&quot;Anchor link for: decentralized-administration-self-service&quot;&gt;🔗&lt;&#x2F;a&gt;&lt;strong&gt;Decentralized administration self-service&lt;&#x2F;strong&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;The bot can be administrated, by discussing with it using the &lt;code&gt;!admin&lt;&#x2F;code&gt; command.
This can happen in both a private conversation with it, or in public channels,
yet it is recommended to do so in private channels. To confirm that an admin
action has succeeded, it’ll use the thumbs-up emoji on the message doing the
particular action.&lt;&#x2F;p&gt;
&lt;p&gt;To have a single administrator for the bot would be quite the burden, and it is
not resilient to people switching roles, leaving the company, etc. Normally
you’d solve this by implementing your own access control lists. Fortunately,
Matrix already has a concept of &lt;em&gt;power levels&lt;&#x2F;em&gt; that assigns roles to users,
among which there are the administrator and moderator roles.&lt;&#x2F;p&gt;
&lt;p&gt;The bot will rely on this to decide to which requests it will answer. Somebody
marked as an administrator or a moderator of a room can administrate Botzilla
in this particular room, using &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;bnjbvr&#x2F;botzilla#admin&quot;&gt;the &lt;code&gt;!admin&lt;&#x2F;code&gt;
commands&lt;&#x2F;a&gt;. There’s still a
super-admin role, that must be defined in the configuration, in case things go
awry. While administrators only have power over the current room, a super-admin
can use its super-powers to change anything in any room. This decentralization
of the administrative roles makes it easy to have different settings for
different rooms, and to rely a bit less on single individuals.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;key-value-store&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#key-value-store&quot; aria-label=&quot;Anchor link for: key-value-store&quot;&gt;🔗&lt;&#x2F;a&gt;&lt;strong&gt;Key-value store&lt;&#x2F;strong&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;In general, the bot contains a key-value store implemented in an sqlite
database, making it easy to migrate and add context that’s preserved across
restarts of the bot. This is used to store private information like user
repository information and settings for most rooms. Conceptually, each pair of
room and module has its own key-value store, so that there’s no risk of
confusion between different rooms and modules. There’s also a key-value
per-module store that’s applicable to all the rooms, to represent global
settings. If there’s some non-global (per room) settings for a room, these are
preferred over the global settings.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;self-documentation&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#self-documentation&quot; aria-label=&quot;Anchor link for: self-documentation&quot;&gt;🔗&lt;&#x2F;a&gt;Self-documentation&lt;&#x2F;h3&gt;
&lt;p&gt;Each chat module is implemented as a ECMAScript module and must export an help
string along the main reaction function. This is then captured and aggregated
as part of an &lt;code&gt;!help&lt;&#x2F;code&gt; command, that can be used to request help about usage of
the bot. The main help message will display the list of all the enabled
modules, and help about a specific module may be queried with e.g. &lt;code&gt;!help uuid&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;future-work-and-conclusion&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#future-work-and-conclusion&quot; aria-label=&quot;Anchor link for: future-work-and-conclusion&quot;&gt;🔗&lt;&#x2F;a&gt;Future work and conclusion&lt;&#x2F;h2&gt;
&lt;p&gt;If I were to start again, I’d do a few things differently:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;now that the Rust ecosystem around the Matrix platform has matured a bit, I’d
probably write this bot in Rust. Starting from JavaScript and moving to
TypeScript has helped me catch a few static issues. I’d expect moving to Rust
would help handling Matrix events faster, provide end-to-end encryption
support for free, and be quite pleasant to use in general thanks to the
awesome Rust tooling.&lt;&#x2F;li&gt;
&lt;li&gt;use a real single-page app framework for the Histoire website. Maybe? I mean
I’m a big fan of VanillaJS, but using it means re-creating your own Web
framework like thing to make it nice and productive to use.&lt;&#x2F;li&gt;
&lt;li&gt;despite being a fun hack, using Github as a backend has algorithmic
limitations, that can make the web app sluggish. In particular, a combined
feed for N users on M &lt;em&gt;eras&lt;&#x2F;em&gt; (think: periods) will trigger NxM Github API
requests. Using a plain database with a plain API would probably be simpler
at this point. This is mitigated with an in-memory cache so only the first
time all the requests happen, but crafting my own requests would be more
expressive and efficient, and allow for more features too (like displaying
the list of rooms on the start view).&lt;&#x2F;li&gt;
&lt;li&gt;provide a (better) commands parser. Regular expressions in this context are a
bit feeble and limited. Also right now each module could in theory reuse the
same command triggers as another one, etc.&lt;&#x2F;li&gt;
&lt;li&gt;implement the chat modules in WebAssembly :-) In fact, I think there’s a
whole business model which would consist in having the bot framework
including a wasm VM, and interacting with different communication platforms
(not restricted to Matrix). Developers in such a bot platform could choose
which source language to use for developing their own modules. It ought to be
possible to define a clear, restricted, WASI-like capabilities-based
interface that gets passed to each chat module. In such a sandboxed
environment, the responsibility for hosting the bot’s code is decoupled from
the responsibility of writing modules. So a company could make the platform
available, and paying users would develop the modules and host them. Imagine
&lt;code&gt;git push&lt;&#x2F;code&gt;ing your chat modules and they get compiled to wasm and deployed on
the fly. But I digress! (Please do not forget to credit me with a large $$$
envelope&#x2F;a nice piece of swag if implementing this &lt;em&gt;at least&lt;&#x2F;em&gt; multi-billion
dollars idea.)&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;I’d like to finish by thanking the authors of the previous Mozilla bots, namely
&lt;strong&gt;sfink&lt;&#x2F;strong&gt; and &lt;strong&gt;glob&lt;&#x2F;strong&gt;: your puppets have been incredible sources of
inspiration. Also huge thanks to the people hanging in the &lt;code&gt;matrix-bot-sdk&lt;&#x2F;code&gt;
&lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;matrix.to&#x2F;#&#x2F;!matrix-bot-sdk:t2bot.io&quot;&gt;chat room&lt;&#x2F;a&gt;, who’ve answered questions and provided
help in a few occasions.&lt;&#x2F;p&gt;
&lt;p&gt;I hope you liked this presentation of Botzilla and its features! Of course, all
the code is free and open-source, including &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;bnjbvr&#x2F;botzilla&quot;&gt;the
bot&lt;&#x2F;a&gt; as well as &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;robotzilla&#x2F;histoire&quot;&gt;the histoire
frontend&lt;&#x2F;a&gt;. At this point it is
addressing most of the needs I had, so I don’t have immediate plans to extend
it further. I’d happily take contributions, though, so feel free to chime in if
you’d like to implement anything! It’s also a breeze to run on any machine,
thanks to Docker-based deployment. Have fun with it!&lt;&#x2F;p&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;Karma is an IRC idiosyncrasy, in which users rate up and down other users
using their nickname suffixed with ++ or –. Karma tracking consists in
keeping scores and displaying those. &lt;a href=&quot;#fr-1-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;Histoire is the French for “history” and “story”. Inherited from Steve
Fink’s very own mrgiggles :-) &lt;a href=&quot;#fr-2-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;&#x2F;section&gt;
</content>
    </entry>
    <entry xml:lang="en">
        <title>Improving my Github workflow</title>
        <published>2019-10-10T18:00:42+00:00</published>
        <updated>2019-10-10T18:00:42+00:00</updated>
        
        <author>
          <name>
            
              Benjamin Bouvier
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://bouvier.cc/tech/github-workflow/"/>
        <id>https://bouvier.cc/tech/github-workflow/</id>
        <content type="html" xml:base="https://bouvier.cc/tech/github-workflow/">&lt;p&gt;Since I’ve been working on a &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;CraneStation&#x2F;Cranelift&quot;&gt;Github
project&lt;&#x2F;a&gt; for a while now, I
thought now would be a good time to gather ways to make it easier to work with
Github pull requests (PRs). In particular, it’s easy to drown yourself in the
incoming flow of Github emails.&lt;&#x2F;p&gt;
&lt;span id=&quot;continue-reading&quot;&gt;&lt;&#x2F;span&gt;
&lt;p&gt;This post is for you if:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;you get lost in tracking which pull requests need attention from you, be it
either review requests or just mentions.&lt;&#x2F;li&gt;
&lt;li&gt;you would like to strike a better work-life balance when it gets to Github
notifications.&lt;&#x2F;li&gt;
&lt;li&gt;you would like to filter Github email notifications in smarter ways.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Here are a few tricks I’ve collected over the years that make it easier to deal
with a few things, focusing on Github notifications and emails, since they were
the largest issue for me. This is not an exhaustive list of all the nice
features Github has, or all the WebExtensions that could help with Github: it
is a few things that work for me and are worth sharing. Note that I go from the
most mundane to the more specific advices here.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;notifications-dashboard&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#notifications-dashboard&quot; aria-label=&quot;Anchor link for: notifications-dashboard&quot;&gt;🔗&lt;&#x2F;a&gt;Notifications dashboard&lt;&#x2F;h3&gt;
&lt;p&gt;If you’re working on several projects, Github can end up sending you too many
email notifications.&lt;&#x2F;p&gt;
&lt;p&gt;It’s possible to disable some kinds of notifications entirely &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;settings&#x2F;notifications&quot;&gt;in the
settings&lt;&#x2F;a&gt;, but that’s too radical
for my needs.&lt;&#x2F;p&gt;
&lt;p&gt;However, Github has a &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;notifications&quot;&gt;notification
dashboard&lt;&#x2F;a&gt; that displays all the activity
related to repositories you’re watching or issues&#x2F;pull-requests you’re involved
in. It’s easy to dismiss all the notifications of all projects at once, or per
project. There’s a tab on the left that allows to select more precisely your
level of involvement in the issue: did you participate in it? You can also save
some notifications for later, so they’re not deleted once you’ve clicked them;
they’ll appear under the “Saved for later” tab — I just discovered this!&lt;&#x2F;p&gt;
&lt;p&gt;Note that Github may also send these notifications by email, if you’ve decided
to do so. In this case, I’d strongly recommend allowing the downloads of images
in Github emails. Despite the bad effect on your privacy this might have by
allowing user tracking, it will also synchronize the notifications’ read state,
which is nice.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;bouvier.cc&#x2F;tech&#x2F;github-workflow&#x2F;github-notification-dashboard.png&quot; alt=&quot;Notification dashboard count&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;See how I am totally in control of my notifications? Truth is, I don’t need
notifications in general, because I’m usually more interested in reviews I need
to receive and give.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;pull-requests-dashboard&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#pull-requests-dashboard&quot; aria-label=&quot;Anchor link for: pull-requests-dashboard&quot;&gt;🔗&lt;&#x2F;a&gt;Pull requests dashboard&lt;&#x2F;h3&gt;
&lt;p&gt;Github allows to assign a reviewer to a pull request. At Mozilla, we
require a formal review for each change in the code base, unless it’s really
not meaningful (like, removing trailing whitespaces). Even documentation and
tests changes may require a review, depending on the rules of the code module
you’re working on.&lt;&#x2F;p&gt;
&lt;p&gt;It is very common that a pull request is received with requests for additional
changes. In this case, it is important to explicitly &lt;strong&gt;re-request a review&lt;&#x2F;strong&gt;,
otherwise this breaks all the review tracking Github proposes.&lt;&#x2F;p&gt;
&lt;p&gt;Now Github has two interesting pages for this:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;a list of all the &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;pulls&quot;&gt;pull requests you have created&lt;&#x2F;a&gt;
and that aren’t closed, so you can assign reviewers and follow PR’s progress
over time;&lt;&#x2F;li&gt;
&lt;li&gt;a list of all the &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;pulls&#x2F;review-requested&quot;&gt;pull requests you have been assigned to as a
reviewer&lt;&#x2F;a&gt;.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;navigating-files-quicker-addon&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#navigating-files-quicker-addon&quot; aria-label=&quot;Anchor link for: navigating-files-quicker-addon&quot;&gt;🔗&lt;&#x2F;a&gt;Navigating files quicker (addon)&lt;&#x2F;h3&gt;
&lt;p&gt;When I know my way around a project, I’ll frequently need to see the content of
a particular file or directory, that might be a few directories deep. On
Github, this means going to the files view, clicking once per directory (at
most), and finding the file I want.&lt;&#x2F;p&gt;
&lt;p&gt;The pull request view doesn’t show the directory hierarchy and which files of
which directory have been touched, which is a light inconvenience too.&lt;&#x2F;p&gt;
&lt;p&gt;Good news, everyone! There is one WebExtension called
&lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.octotree.io&#x2F;&quot;&gt;Octotree&lt;&#x2F;a&gt; that adds a directory view within a panel
to the left of Github’s UI. By default, it’s folded and doesn’t take much
space; you need to hover it with the mouse to make it appear. On pull requests,
it will show files that have been modified with the diff summary for each file.
Note the website shows features from the PRO version, but there’s a free
version that addresses the needs detailed above.&lt;&#x2F;p&gt;
&lt;p&gt;This is an example of the Octotree panel on our project’s repository:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;bouvier.cc&#x2F;tech&#x2F;github-workflow&#x2F;github-octotree.png&quot; alt=&quot;Octotree example&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;To be honest, I haven’t investigated using the search bar, which could be quite
handy for this too, especially thanks to keyboard shortcuts.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;dealing-with-work-and-personal-projects&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#dealing-with-work-and-personal-projects&quot; aria-label=&quot;Anchor link for: dealing-with-work-and-personal-projects&quot;&gt;🔗&lt;&#x2F;a&gt;Dealing with work and personal projects&lt;&#x2F;h3&gt;
&lt;p&gt;If you’re using Github for personal and work related projects, you might have
been bothered by work emails coming into your personal mailbox. That has
happened to me in the past, causing some unnecessary mental load over the
weekend and unnecessarily breaking the state of relaxation.&lt;&#x2F;p&gt;
&lt;p&gt;Fortunately, Github allows you to &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;settings&#x2F;notifications#organization_routing&quot;&gt;redirect emails from a particular Github
Organization to a specific email
address&lt;&#x2F;a&gt;. Of
course, this only works when the repository is owned by an organization and
you’re part of this organization.&lt;&#x2F;p&gt;
&lt;p&gt;I’m lucky to work on such projects at the moment. It’s not a silver bullet
though, because some projects are sometimes owned by personal accounts, making
this trick useless. As far as I know, there are no good solutions in this case.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;email-filters&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#email-filters&quot; aria-label=&quot;Anchor link for: email-filters&quot;&gt;🔗&lt;&#x2F;a&gt;Email filters&lt;&#x2F;h3&gt;
&lt;p&gt;The biggest remaining offender certainly is Github emails, in general.
Fortunately, Github has made it easy to filter them. I’ll mention examples in
the Gmail email client, since that’s what we’re using at work, but these apply
to any other modern email client too.&lt;&#x2F;p&gt;
&lt;h4 id=&quot;filter-by-project&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#filter-by-project&quot; aria-label=&quot;Anchor link for: filter-by-project&quot;&gt;🔗&lt;&#x2F;a&gt;Filter by project&lt;&#x2F;h4&gt;
&lt;p&gt;Each email coming from a specific project comes with a mailing &lt;code&gt;list-id&lt;&#x2F;code&gt;, which
is a specific header that some email clients know how to interpret. For
instance, in Gmail, when you click on the small arrow next to the list of
recipients, you’ll see many details about the current email, including, if
there’s one, the “mailing-list” id, and a link to automatically create a filter
for this mailing-list. That allows you to create a particular directory&#x2F;tag in
which the filter can automatically put all the emails with this id.&lt;&#x2F;p&gt;
&lt;h4 id=&quot;filter-by-reason&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#filter-by-reason&quot; aria-label=&quot;Anchor link for: filter-by-reason&quot;&gt;🔗&lt;&#x2F;a&gt;Filter by reason&lt;&#x2F;h4&gt;
&lt;p&gt;In addition to filtering by project (and this is where Gmail tags &#x2F;
Thunderbird’s &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;support.mozilla.org&#x2F;en-US&#x2F;kb&#x2F;using-saved-searches&quot;&gt;saved
searches&lt;&#x2F;a&gt; truly
shine), it’s also possible to infer more information from the Github email
notifications, by looking at the list of recipients or custom email headers.&lt;&#x2F;p&gt;
&lt;p&gt;Indeed, when there’s a specific reason why an email was sent to you, Github
will add a (fake) recipient in the CC field, its address username being the
reason why the email was sent to you. For instance, in an email telling me that
somebody requested a review from me, the email address
&lt;code&gt;review_requested@github.com&lt;&#x2F;code&gt; will appear in the CC list. If you look at the
full message, you’ll also see the custom email header &lt;code&gt;X-GitHub-Reason&lt;&#x2F;code&gt; set to
&lt;code&gt;review_requested&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;All the possible reasons are &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;help.github.com&#x2F;en&#x2F;articles&#x2F;about-email-notifications#filtering-email-notifications&quot;&gt;detailed in Github’s
documentation&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;These extra CC email addresses and email headers allow creating very powerful
filters that add supplementary tags to an email. For me, they relate directly
to the &lt;em&gt;importance&lt;&#x2F;em&gt; of the incoming email: reviews and mentions are usually
something I pay very close attention to, and thus they get filtered in a
special top-level tag in Gmail.&lt;&#x2F;p&gt;
&lt;p&gt;Here’s an example of all the information you might find about a given email in
Gmail: in particular, look at the CC list and mailing-list type ids.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;bouvier.cc&#x2F;tech&#x2F;github-workflow&#x2F;github-email-example.png&quot; alt=&quot;Notification dashboard count&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Note that Gitlab also adds some &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;docs.gitlab.com&#x2F;ee&#x2F;workflow&#x2F;notifications.html#email-headers&quot;&gt;similar custom
headers&lt;&#x2F;a&gt;
that can be filtered by some powerful email clients. I won’t go into detail
about those.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;one-more-thing-mozillian-edition&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#one-more-thing-mozillian-edition&quot; aria-label=&quot;Anchor link for: one-more-thing-mozillian-edition&quot;&gt;🔗&lt;&#x2F;a&gt;One more thing, Mozillian edition&lt;&#x2F;h3&gt;
&lt;p&gt;If you’re working on Mozilla code, Gecko and&#x2F;or external projects, there’s
this &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;mikeconley&#x2F;myqonly&#x2F;&quot;&gt;neat addon&lt;&#x2F;a&gt; that Mike Conley
made. It will add an icon to the Firefox button bar, showing you the number
of requests assigned to you on Github and Phabricator, as well as the number of
pending Bugzilla requests.&lt;&#x2F;p&gt;
&lt;p&gt;It requires a minimal setup step for Github (filling your username) and
Bugzilla (adding a Bugzilla API token), and then it Just Works. It smartly
reuses a Phabricator token from the current Firefox Container’s session, if
there’s one.&lt;&#x2F;p&gt;
&lt;p&gt;You may think that having such a display all the time might provoke anxiety
during non-working hours. And you’d be right to think so! So the author of the
addon has added a feature to &lt;strong&gt;not&lt;&#x2F;strong&gt; display this information outside working
hours, that you can define as you like. Great stuff!&lt;&#x2F;p&gt;
&lt;h3 id=&quot;that-s-it-folks&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#that-s-it-folks&quot; aria-label=&quot;Anchor link for: that-s-it-folks&quot;&gt;🔗&lt;&#x2F;a&gt;That’s it, folks!&lt;&#x2F;h3&gt;
&lt;p&gt;Thanks for reading this far! I hope this helped you to some extent, allowing
you to spend less time in Github and more time doing the actual work. If you
have more interesting tips for using Github effectively, feel free to add a
comment or ping me on &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;twitter.com&#x2F;bnjbvr&quot;&gt;twitter&lt;&#x2F;a&gt;!&lt;&#x2F;p&gt;
</content>
    </entry>
    <entry xml:lang="en">
        <title>Making calls to WebAssembly fast and implementing anyref</title>
        <published>2018-07-04T18:00:42+00:00</published>
        <updated>2018-07-04T18:00:42+00:00</updated>
        
        <author>
          <name>
            
              Benjamin Bouvier
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://bouvier.cc/tech/mozilla-2018-faster-calls-and-anyref/"/>
        <id>https://bouvier.cc/tech/mozilla-2018-faster-calls-and-anyref/</id>
        <content type="html" xml:base="https://bouvier.cc/tech/mozilla-2018-faster-calls-and-anyref/">&lt;p&gt;Since this is the end of the first half-year, I think it is a good time to
reflect and show some work I’ve been doing over the last few months, apart from
the regular batch of random issues, security bugs, reviews and the fixing of 24
bugs found by our &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Fuzzing&quot;&gt;fuzzers&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;span id=&quot;continue-reading&quot;&gt;&lt;&#x2F;span&gt;&lt;h2 id=&quot;bug-1319203-make-js-to-webassembly-calls-blazingly-fast&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#bug-1319203-make-js-to-webassembly-calls-blazingly-fast&quot; aria-label=&quot;Anchor link for: bug-1319203-make-js-to-webassembly-calls-blazingly-fast&quot;&gt;🔗&lt;&#x2F;a&gt;Bug 1319203: Make JS to WebAssembly calls &lt;em&gt;blazingly&lt;&#x2F;em&gt; fast&lt;&#x2F;h2&gt;
&lt;p&gt;If we want more WebAssembly (wasm) adoption, there shouldn’t be a big costly
barrier between the two universes. That is, calls from one world to the other
should be fast. For a very long time, calls from JS to asm.js&#x2F;WebAssembly have
been quite slow in Firefox. In fact, we didn’t optimize them at all. For ease
and speed of implementation at the time, asm.js call activations (data
structures recording information about the function being currently called in
the VM) were very different from the JS ones. This difference indicated some
significant structural differences, like the capability to reconstruct call
stack information used by &lt;code&gt;Error()&lt;&#x2F;code&gt; stack frames, or just tracing the stack for
garbage collection purposes. After putting a lot of hard work into refactoring
and low-level changes over the last year, Spidermonkey was finally ripe for an
optimization.&lt;&#x2F;p&gt;
&lt;p&gt;When we call from JS to asm.js&#x2F;wasm, the call passes through C++, does a bunch of
work and then calls into a piece of glue code directly written in assembly: the
&lt;em&gt;interpreter entry stub&lt;&#x2F;em&gt;. This stub is quite small: it just copies out the C++
arguments into the right places the wasm function being called expects, sets up
some small machine state, calls into the function, then does error checking and
eventually returns to the C++ caller. The critical part is JIT compilation. JIT
compilation means that the code is compiled to machine code by the just-in-time
compiler, IonMonkey. When a JS function has been JIT-compiled and it calls into
wasm, then the caller would have to go back to C++ first, before the control
flow is redirected to WebAssembly.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;bouvier.cc&#x2F;tech&#x2F;mozilla-2018-faster-calls-and-anyref&#x2F;2018-07-interpreter-stub.png&quot; alt=&quot;Diagram showing interpreter entry stub&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Starting with Firefox 60, the JIT compiler makes no distinctions between
calling a JavaScript function or a WebAssembly function, meaning it uses the
same call optimizations for both kinds of function. A new piece of glue code,
the &lt;em&gt;JIT entry stub&lt;&#x2F;em&gt;, is generated for each exported function: it converts and
unboxes the arguments read from the JIT-compiled JS caller into the right
primitive types as expressed in the wasm function’s signature, sets up some
machine registers, calls into the wasm function being called and then converts
the result into a format the JS caller will understand.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;bouvier.cc&#x2F;tech&#x2F;mozilla-2018-faster-calls-and-anyref&#x2F;2018-07-jit-stub.png&quot; alt=&quot;Diagram showing JIT entry stub&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;As you can see, the C++ step that was originally required to call wasm from JS
has been completely eliminated!&lt;&#x2F;p&gt;
&lt;p&gt;This resulted in massive speedups over a variety of different situations: when
a wasm function is directly &#x2F; indirectly &#x2F; polymorphically called, or used as a
getter&#x2F;setter, or called by &lt;code&gt;Function.prototype.call&#x2F;apply&lt;&#x2F;code&gt;, when the call is
missing required arguments, etc. Here’s a brief summary of the results, but
there might be a full-blown blog post about these optimizations coming on
&lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;hacks.mozilla.org&#x2F;&quot;&gt;Mozilla Hacks&lt;&#x2F;a&gt; at some point in the future.
(calling 1 billion times into very simple functions, lower is better)&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;bouvier.cc&#x2F;tech&#x2F;mozilla-2018-faster-calls-and-anyref&#x2F;2018-07-wasm-calls.png&quot; alt=&quot;Charts showing evolution of performance&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;This work is not entirely done yet: we can still even better optimize in the
case of a function call from JS when the called wasm function is definitely
known to be a unique wasm target; see the &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;bugzilla.mozilla.org&#x2F;show_bug.cgi?id=1437065&quot;&gt;tracking
bug&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;bug-1422043-lazy-entry-stub-generation&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#bug-1422043-lazy-entry-stub-generation&quot; aria-label=&quot;Anchor link for: bug-1422043-lazy-entry-stub-generation&quot;&gt;🔗&lt;&#x2F;a&gt;Bug 1422043: Lazy entry stub generation&lt;&#x2F;h2&gt;
&lt;p&gt;The previous bug resolution came with an important memory issue: every exported
function now generates a rather big chunk of code for the JIT entry, having an
impact on the memory occupied by the code itself. This would be fine in most
situations where the number of exported functions is generally low. But when
the wasm module exports a Table (think of the equivalent of a C++ function
table with signature checks), we have to assume that every single function,
including those not explicitly exported, needs entry stubs. Indeed, each
function can be eventually called through the Table, after calls to
&lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;http:&#x2F;&#x2F;webassembly.github.io&#x2F;spec&#x2F;js-api&#x2F;index.html#dom-table-set&quot;&gt;WebAssembly.Table.set&lt;&#x2F;a&gt;.
In fact, the existing code already suffered from this because of the
interpreter entries, but it had been largely amplified by the much larger JIT
entry stubs.&lt;&#x2F;p&gt;
&lt;p&gt;To fix this, we’ve decided to lazily generate all the entry stubs for functions
exported through a table. That is, if a function is &lt;em&gt;explicitly&lt;&#x2F;em&gt; exported, its
stubs will be generated at wasm compile time, but other functions won’t have
stubs yet. If a non-exported function is called through a Table, we’ll generate
the entry stubs the first time it is called. This involves some fun
interactions with our &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;hacks.mozilla.org&#x2F;2018&#x2F;01&#x2F;making-webassembly-even-faster-firefoxs-new-streaming-and-tiering-compiler&#x2F;&quot;&gt;tiered
compilation&lt;&#x2F;a&gt;
mechanism, which can compile functions and create new entry stubs in the
background while the running thread will generate lazy ones.&lt;&#x2F;p&gt;
&lt;p&gt;Not only this fixed the memory regression introduced by bug 1319203, but it
actually made the situation even better than the baseline, because we didn’t
need to generate those interpreter entries for table-exported functions by
default anymore:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;bouvier.cc&#x2F;tech&#x2F;mozilla-2018-faster-calls-and-anyref&#x2F;2018-07-wasm-stubs-memory.png&quot; alt=&quot;Charts showing evolution of memory usage&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Since it’s not entirely readable from the chart: after the patches, the
AngryBots and ZenGarden entry stubs memory usages went down to respectively 262
and 362 KB. This was also a relatively huge win in compilation times, but on
such a low scale that it didn’t make a huge difference on total compile time.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;bug-1447591-remove-wasm-binarytotext&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#bug-1447591-remove-wasm-binarytotext&quot; aria-label=&quot;Anchor link for: bug-1447591-remove-wasm-binarytotext&quot;&gt;🔗&lt;&#x2F;a&gt;Bug 1447591: Remove wasm::BinaryToText&lt;&#x2F;h2&gt;
&lt;p&gt;WebAssembly is a binary format, and there is an equivalent human-readable and
debuggable text format: the WebAssembly Text format, or &lt;em&gt;WAT&lt;&#x2F;em&gt; format. While
SpiderMonkey once directly produced WAT for display in C++, it’s now easier for
&lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;devtools-html&#x2F;debugger.html&quot;&gt;debugger.html&lt;&#x2F;a&gt; to do so in JS.
This also made the mapping between bytecode offsets and text offsets (source
maps) more consistent with the display, and it could be useful in other places
where this project is being used. Recently after confirming that the C++
implementation wasn’t used anymore, I was able to remove it. It’s not every day
that you get a net loss of around 5,500 lines of code, which is always nice:
less code means fewer bugs and less maintenance burden, especially when the code
is dead.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;bug-1445272-1450261-implement-basic-anyref-support&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#bug-1445272-1450261-implement-basic-anyref-support&quot; aria-label=&quot;Anchor link for: bug-1445272-1450261-implement-basic-anyref-support&quot;&gt;🔗&lt;&#x2F;a&gt;Bug 1445272 &#x2F; 1450261: Implement basic &lt;code&gt;anyref&lt;&#x2F;code&gt; support&lt;&#x2F;h2&gt;
&lt;p&gt;A new proposal has been made to the WebAssembly specification committee a few
months ago: to add &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;WebAssembly&#x2F;reference-types&quot;&gt;reference
types&lt;&#x2F;a&gt; to the type system.
Reference types are a new way to represent a reference to any &lt;em&gt;host&lt;&#x2F;em&gt; values. In
a Web environment, this means being capable of playing with JavaScript values
within WebAssembly. This is a huge difference with the existing type system,
which only contains primitive types: integers represented on 32 or 64 bits,
IEEE754 floating-point numbers represented on 32 or 64 bits. This is also a
first step for implementing &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;http:&#x2F;&#x2F;github.com&#x2F;webassembly&#x2F;gc&quot;&gt;garbage
collection&lt;&#x2F;a&gt; (GC) integration within
WebAssembly: since these reference values have been allocated on the GC heap in
JavaScript, they need to be traced during wasm execution.&lt;&#x2F;p&gt;
&lt;p&gt;The basic implementation of this feature in the first bug allows one to use a
new type, called &lt;code&gt;anyref&lt;&#x2F;code&gt;, as part of a function’s signature or in local
variables, be it in a function definition or an imported function. This allows
using JS variables within wasm and pass them around to other JS functions. The
second bug implemented the capability to read and write &lt;code&gt;anyref&lt;&#x2F;code&gt; values in wasm
Globals [1]. Since Globals can be manipulated outside of the wasm Module thanks
to their JS API, and garbage collections can happen at any time in JS, we
needed to implement GC barriers to make sure that the stored value would not be
marked as unused during tracing. There is good literature explaining why these
barriers are needed and what they do, so I will not expand too much on the
topic.&lt;&#x2F;p&gt;
&lt;p&gt;Here’s an example of usage according to latest spec drafts (and therefore
subject to change for now):&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #A9B1D6; background-color: #1A1B26;&quot;&gt;&lt;code data-lang=&quot;common-lisp&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;(module&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    (func $alert (&lt;&#x2F;span&gt;&lt;span style=&quot;color: #0DB9D7;&quot;&gt;import&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; &amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ECE6A;&quot;&gt;env&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;&amp;quot; &amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ECE6A;&quot;&gt;alert&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;) (param anyref))&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    (global $global_ref (mut anyref) (ref.null anyref))&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    (func (&lt;&#x2F;span&gt;&lt;span style=&quot;color: #0DB9D7;&quot;&gt;export&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; &amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ECE6A;&quot;&gt;set_and_alert&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;) (param $param anyref) (result anyref)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;        ;; Put the previous value of $global_ref on the virtual value stack.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        get_global $global_ref&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;        ;; Get the argument anyref value and store it in $global_ref.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        get_local $param&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        set_global $global_ref&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;        ;; Call the $alert method with the argument anyref value.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        get_local $param&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        call $alert&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;        ;; The previous value of $global_ref is still on the stack and will be&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;        ;; returned.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    )&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;em&gt;Example of wasm text format using &lt;code&gt;anyref&lt;&#x2F;code&gt;.&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #A9B1D6; background-color: #1A1B26;&quot;&gt;&lt;code data-lang=&quot;javascript&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9D7CD8;font-style: italic;&quot;&gt;async&lt;&#x2F;span&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt; function&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;() {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #9D7CD8;font-style: italic;&quot;&gt;    let&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; {&lt;&#x2F;span&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt; instance&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; } =&lt;&#x2F;span&gt;&lt;span style=&quot;color: #BB9AF7;font-style: italic;&quot;&gt; await&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; WebAssembly&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;instantiate&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;wasmBinary&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #73DACA;&quot;&gt;        env&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;            alert&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #E0AF68;&quot;&gt;obj&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;) {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;                alert&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;`&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ECE6A;&quot;&gt;Hello, &lt;&#x2F;span&gt;&lt;span style=&quot;color: #7DCFFF;&quot;&gt;${&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;obj&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7DCFFF;&quot;&gt;name}&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ECE6A;&quot;&gt;!&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;`&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;)&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;            }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;        }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;    })&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;    console&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;log&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;instance&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7DCFFF;&quot;&gt;exports&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;set_and_alert&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;({&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #73DACA;&quot;&gt;        name&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;: &amp;#39;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ECE6A;&quot;&gt;world&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;&amp;#39;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #73DACA;&quot;&gt;        secretVal&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt; 42&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;    }))&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;    &#x2F;&#x2F; alerts &amp;quot;Hello, world!&amp;quot;, logs null&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;    console&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;log&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #0DB9D7;&quot;&gt;JSON&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;stringify&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;instance&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7DCFFF;&quot;&gt;exports&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;set_and_alert&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;({&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #73DACA;&quot;&gt;        name&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;: &amp;#39;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ECE6A;&quot;&gt;there&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;&amp;#39;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;    })))&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;    &#x2F;&#x2F; alerts &amp;quot;Hello, there!&amp;quot;, logs { name: &amp;#39;world!&amp;#39;, secretVal: 42 }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;})()&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;em&gt;Example of JavaScript using the module defined above, passing JS values and
reading them from WebAssembly.&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;p&gt;This is a very preliminary prototype and it might change in the next few
months. If you feel adventurous, you can try it on Firefox Nightly by setting
the &lt;code&gt;about:config&lt;&#x2F;code&gt; pref &lt;code&gt;javascript.options.wasm_gc&lt;&#x2F;code&gt; to &lt;code&gt;true&lt;&#x2F;code&gt;; note that we
haven’t fully hooked this up to garbage collection yet, so your experimentation
might occasionally throw out-of-memory exceptions. In any case, if you see
something, &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;bugzilla.mozilla.org&#x2F;enter_bug.cgi?product=Core&amp;amp;component=Javascript%3A%20Web%20Assembly&quot;&gt;say
something&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;Say you are a compiler developer, and you would like to port your language to
WebAssembly, and your language uses a GC. At the moment, the only way you can
do this is by compiling your garbage collector to WebAssembly, and it would be
backed by the wasm Module’s memory. This works, but it won’t be very efficient.
Plus, there’s already a very efficient, solidly tested, constantly improving
garbage collector in your browser that uses all the possible dirty low-level
tricks known to mankind, which is the GC being used for JavaScript. What if we
could give you access to the garbage collector directly? Then you’d just need
to give a way to define structures, and then could use a set of opcodes to
allocate them, read and write fields on them, etc. At the moment, the reference
types proposal only allows you to move garbage-collected values around. There’s
also code in Firefox Nightly to experiment with defining your own data
structures and using them, but it is very very early. If you’re interested in
following us implementing more parts, this &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;bugzilla.mozilla.org&#x2F;show_bug.cgi?id=1444925&quot;&gt;tracking
issue&lt;&#x2F;a&gt; might be of
interest.&lt;&#x2F;p&gt;
&lt;p&gt;[1] Think of a C++ “global” value, not a JavaScript “global”.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;future-work&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#future-work&quot; aria-label=&quot;Anchor link for: future-work&quot;&gt;🔗&lt;&#x2F;a&gt;Future work&lt;&#x2F;h2&gt;
&lt;p&gt;There is still much more work to be done on the implementation of WebAssembly
in Spidermonkey, to implement other new proposals, to make it faster, or to
have even better generated code.&lt;&#x2F;p&gt;
&lt;p&gt;A big thank you for the proofreading to &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;http:&#x2F;&#x2F;whereswalden.com&#x2F;&quot;&gt;Waldo&lt;&#x2F;a&gt;,
&lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;steveklabnik&quot;&gt;steveklabnik&lt;&#x2F;a&gt; and
&lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;twitter.com&#x2F;ag_dubs&quot;&gt;ashleygwilliams&lt;&#x2F;a&gt;. Extra thanks go to
Ashley who also drew the two diagrams showing how stubs evolved.&lt;&#x2F;p&gt;
</content>
    </entry>
    <entry xml:lang="en">
        <title>Making asm.js&#x2F;WebAssembly compilation more parallel in Firefox</title>
        <published>2016-04-22T15:00:42+00:00</published>
        <updated>2016-04-22T15:00:42+00:00</updated>
        
        <author>
          <name>
            
              Benjamin Bouvier
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://bouvier.cc/tech/making-asmjs-webassembly-compilation-more-parallel/"/>
        <id>https://bouvier.cc/tech/making-asmjs-webassembly-compilation-more-parallel/</id>
        <content type="html" xml:base="https://bouvier.cc/tech/making-asmjs-webassembly-compilation-more-parallel/">&lt;p&gt;In December 2015, I’ve worked on reducing startup time of asm.js programs in
Firefox by making compilation more parallel. As our
JavaScript engine, Spidermonkey, uses the same compilation pipeline for both
asm.js and WebAssembly, this also benefitted WebAssembly compilation. Now is a
good time to talk about what it meant, how it got achieved and what are the
next ideas to make it even faster.&lt;&#x2F;p&gt;
&lt;span id=&quot;continue-reading&quot;&gt;&lt;&#x2F;span&gt;&lt;h1 id=&quot;what-does-it-mean-to-make-a-program-more-parallel&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#what-does-it-mean-to-make-a-program-more-parallel&quot; aria-label=&quot;Anchor link for: what-does-it-mean-to-make-a-program-more-parallel&quot;&gt;🔗&lt;&#x2F;a&gt;What does it mean to make a program “more parallel”?&lt;&#x2F;h1&gt;
&lt;p&gt;Parallelization consists of splitting a sequential program into smaller
independent tasks, then having them run on different CPU. If your program
is using &lt;code&gt;N&lt;&#x2F;code&gt; cores, it can be up to &lt;code&gt;N&lt;&#x2F;code&gt; times faster.&lt;&#x2F;p&gt;
&lt;p&gt;Well, in theory. Let’s say you’re in a car, driving on a 100 Km long road.
You’ve already driven the first 50 Km in one hour. Let’s say your car can
have unlimited speed from now on. What is the maximal average speed you can
reach, once you get to the end of the road?&lt;&#x2F;p&gt;
&lt;p&gt;People intuitively answer “If it can go as fast as I want, so nearby lightspeed
sounds plausible”. But this is not true! In fact, if you could teleport from
your current position to the end of the road, you’d have traveled 100 Km in one
hour, so your maximal theoritical speed is 100 Km per hour. This result is a
consequence of &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Amdahl%27s_law&quot;&gt;Amdahl’s law&lt;&#x2F;a&gt;.
When we get back to our initial problem, this means you can expect a &lt;code&gt;N&lt;&#x2F;code&gt; times
speedup if you’re running your program with &lt;code&gt;N&lt;&#x2F;code&gt; cores if, and only if your
program can be &lt;strong&gt;entirely&lt;&#x2F;strong&gt; run in parallel. This is usually not the case, and
that is why most wording refers to &lt;em&gt;speedups &lt;strong&gt;up to&lt;&#x2F;strong&gt; N times faster&lt;&#x2F;em&gt;, when it
comes to parallelization.&lt;&#x2F;p&gt;
&lt;p&gt;Now, say your program is already running some portions in parallel. To make it
faster, one can identify some parts of the program that are sequential, and make
them independent so that you can run them in parallel. With respect to our car
metaphor, this means augmenting the portion of the road on which you can run at
unlimited speed.&lt;&#x2F;p&gt;
&lt;p&gt;This is exactly what we have done with parallel compilation of asm.js programs
under Firefox.&lt;&#x2F;p&gt;
&lt;h1 id=&quot;a-quick-look-at-the-asm-js-compilation-pipeline&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#a-quick-look-at-the-asm-js-compilation-pipeline&quot; aria-label=&quot;Anchor link for: a-quick-look-at-the-asm-js-compilation-pipeline&quot;&gt;🔗&lt;&#x2F;a&gt;A quick look at the asm.js compilation pipeline&lt;&#x2F;h1&gt;
&lt;p&gt;I recommend to read this &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;blog.mozilla.org&#x2F;luke&#x2F;2014&#x2F;01&#x2F;14&#x2F;asm-js-aot-compilation-and-startup-performance&#x2F;&quot;&gt;blog
post&lt;&#x2F;a&gt;.
It clearly explains the differences between JIT (Just In Time) and AOT (Ahead
Of Time) compilation, and elaborates on the different parts of the engines
involved in the compilation pipeline.&lt;&#x2F;p&gt;
&lt;p&gt;As a TL;DR, keep in mind that &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;http:&#x2F;&#x2F;asmjs.org&#x2F;&quot;&gt;asm.js&lt;&#x2F;a&gt; is a strictly
validated, highly optimizable, typed subset of JavaScript. Once
validated, it guarantees high performance and stability (no garbage collector
involved!). That is ensured by
mapping every single JavaScript instruction of this subset to a few CPU
instructions, if not only a single instruction. This means an asm.js program
needs to get &lt;em&gt;compiled&lt;&#x2F;em&gt; to machine code, that is, translated from JavaScript to
the language your CPU directly manipulates (like what GCC would do for a C++
program). If you haven’t heard, the results are impressive and you can run
&lt;a href=&quot;https:&#x2F;&#x2F;bouvier.cc&#x2F;tech&#x2F;making-asmjs-webassembly-compilation-more-parallel&#x2F;beta.unity3d.com&#x2F;jonas&#x2F;DT2&#x2F;&quot;&gt;video&lt;&#x2F;a&gt;
&lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.unrealengine.com&#x2F;html5&quot;&gt;games&lt;&#x2F;a&gt; directly in your browser, without
needing to install anything. No plugins. Nothing more than your usual, everyday
browser.&lt;&#x2F;p&gt;
&lt;p&gt;Because asm.js programs can be gigantic in size (in number of functions as well
as in number of lines of code), the first compilation of the entire program is
going to take some time. Afterwards, Firefox uses a caching mechanism that
prevents the need for recompilation and almost instaneously loads the code, so
subsequent loadings matter less*****. The end user will mostly wait for the
first compilation, thus this one needs to be fast.&lt;&#x2F;p&gt;
&lt;p&gt;Before the work explained below, the pipeline for compiling a single function
(out of an asm.js module) would look like this:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;parse the function, and as we parse, emit intermediate representation (IR)
nodes for the compiler infrastructure. SpiderMonkey has several IRs,
including the MIR (middle-level IR, mostly loaded with semantic) and the LIR
(low-level IR closer to the CPU memory representation: registers, stack,
etc.). The one generated here is the MIR. All of this happens on the main
thread.&lt;&#x2F;li&gt;
&lt;li&gt;once the entire IR graph is generated for the function, optimize the MIR
graph (i.e. apply a few optimization passes). Then, generate the LIR graph
before carrying out register allocation (probably the most costly task of the
pipeline). This can be done on supplementary helper threads, as the MIR
optimization and LIR generation for a given function doesn’t depend on other
ones.&lt;&#x2F;li&gt;
&lt;li&gt;since functions can call between themselves within an asm.js module, they
need references to each other. In assembly, a reference is merely an offset
to somewhere else in memory. In this initial implementation, code generation
is carried out on the main thread, at the cost of speed but for the sake of
simplicity.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;So far, only the MIR optimization passes, register allocation and LIR
generation were done in parallel. Wouldn’t it be nice to be able to do more?&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;*&lt;&#x2F;strong&gt; There are conditions for benefitting from the caching mechanism. In
particular, the script should be loaded
&lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;developer.mozilla.org&#x2F;en-US&#x2F;docs&#x2F;Games&#x2F;Techniques&#x2F;Async_scripts&quot;&gt;asynchronously&lt;&#x2F;a&gt;
and it should be of a consequent size.&lt;&#x2F;p&gt;
&lt;h1 id=&quot;doing-more-in-parallel&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#doing-more-in-parallel&quot; aria-label=&quot;Anchor link for: doing-more-in-parallel&quot;&gt;🔗&lt;&#x2F;a&gt;Doing more in parallel&lt;&#x2F;h1&gt;
&lt;p&gt;Our goal is to make more work in parallel: so can we take out MIR generation
from the main thread? And we can take out code generation as well?&lt;&#x2F;p&gt;
&lt;p&gt;The answer happens to be &lt;em&gt;yes&lt;&#x2F;em&gt; to both questions.&lt;&#x2F;p&gt;
&lt;p&gt;For the former, instead of emitting a MIR graph as we parse the function’s
body, we emit a small, compact, pre-order representation of the function’s
body. In short, a new IR. As work was starting on
&lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;webassembly&#x2F;design&quot;&gt;WebAssembly&lt;&#x2F;a&gt; (wasm) at this time, and
since asm.js semantics and wasm semantics mostly match, the IR could just be
the wasm
&lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;WebAssembly&#x2F;design&#x2F;blob&#x2F;master&#x2F;BinaryEncoding.md&quot;&gt;encoding&lt;&#x2F;a&gt;,
consisting of the wasm opcodes plus a few specific asm.js ones*. Then, wasm
is translated to MIR in another thread.&lt;&#x2F;p&gt;
&lt;p&gt;Now, instead of parsing and generating MIR in a single pass, we would now parse
and generate wasm IR in one pass, and generate the MIR out of the wasm IR in
another pass. The wasm IR is very compact and much cheaper to generate than a
full MIR graph, because generating a MIR graph needs some algorithmic work,
including the creation of Phi nodes (join values after any form of branching).
As a result, it is expected that compilation time won’t suffer.  This was a
large refactoring: taking every single asm.js instructions, and encoding them
in a compact way and later decode these into the equivalent MIR nodes.&lt;&#x2F;p&gt;
&lt;p&gt;For the second part, could we generate code on other threads? One structure in
the code base, the &lt;em&gt;MacroAssembler&lt;&#x2F;em&gt;, is used to generate all the code and it
contains all necessary metadata about offsets. By adding more metadata there to
abstract internal calls &lt;strong&gt;**&lt;&#x2F;strong&gt;, we can describe the new scheme in terms of a
classic functional &lt;code&gt;map&lt;&#x2F;code&gt;&#x2F;&lt;code&gt;reduce&lt;&#x2F;code&gt;:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;the wasm IR is sent to a thread, which will return a MacroAssembler. That
is a &lt;code&gt;map&lt;&#x2F;code&gt; operation, transforming an array of wasm IR into an array of
MacroAssemblers.&lt;&#x2F;li&gt;
&lt;li&gt;When a thread is done compiling, we merge its MacroAssembler into one big
MacroAssembler. Most of the merge consists in taking all the offset metadata
in the thread MacroAssembler, fixing up all the offsets, and concatenate the
two generated code buffers. This is equivalent to a &lt;code&gt;reduce&lt;&#x2F;code&gt; operation,
merging each MacroAssembler within the module’s one.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;At the end of the compilation of the entire module, there is still some light
work to be done: offsets of internal calls need to be translated to their
actual locations. All this work has been done in &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;bugzilla.mozilla.org&#x2F;show_bug.cgi?id=1181612&quot;&gt;this bugzilla
bug&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;*&lt;&#x2F;strong&gt; In fact, at the time when this was being done, we used a different
superset of wasm. Since then, work has been done so that our asm.js frontend is
really just another wasm emitter.&lt;&#x2F;p&gt;
&lt;p&gt;**** ** referencing functions by their appearance order index in the module,
rather than an offset to the actual start of the function. This order is indeed
stable, from a function to the other.&lt;&#x2F;p&gt;
&lt;h1 id=&quot;results&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#results&quot; aria-label=&quot;Anchor link for: results&quot;&gt;🔗&lt;&#x2F;a&gt;Results&lt;&#x2F;h1&gt;
&lt;p&gt;Benchmarking has been done on a Linux x64 machine with 8 cores clocked at 4.2
Ghz.&lt;&#x2F;p&gt;
&lt;p&gt;First, compilation times of a few asm.js massive games:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;http:&#x2F;&#x2F;beta.unity3d.com&#x2F;jonas&#x2F;DT2&#x2F;&quot;&gt;DeadTrigger2&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;http:&#x2F;&#x2F;beta.unity3d.com&#x2F;jonas&#x2F;AngryBots&#x2F;&quot;&gt;AngryBots&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;lukewagner&#x2F;PlatformerGamePacked&quot;&gt;Platformer game&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.unrealengine.com&#x2F;html5&quot;&gt;Tappy Chicken&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;The X scale is the compilation time in seconds, so lower is better. Each value
point is the best one of three runs. For the new scheme, the corresponding
relative speedup (in percentage) has been added:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;bouvier.cc&#x2F;tech&#x2F;making-asmjs-webassembly-compilation-more-parallel&#x2F;2016-04-22_parallelization-times.png&quot; alt=&quot;Compilation times of various benchmarks&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;For all games, compilation is much faster with the new parallelization scheme.&lt;&#x2F;p&gt;
&lt;p&gt;Now, let’s go a bit deeper. The Linux CLI tool &lt;code&gt;perf&lt;&#x2F;code&gt; has a &lt;code&gt;stat&lt;&#x2F;code&gt; command
that gives you an average of the number of utilized CPUs during the program
execution. This is a great measure of threading efficiency: the more a CPU is
utilized, the more it is not idle, waiting for other results to come, and thus
useful. For a constant task execution time, the more utilized CPUs, the more
likely the program will execute quickly.&lt;&#x2F;p&gt;
&lt;p&gt;The X scale is the number of utilized CPUs, according to the &lt;code&gt;perf stat&lt;&#x2F;code&gt;
command, so higher is better. Again, each value point is the best one of three
runs.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;bouvier.cc&#x2F;tech&#x2F;making-asmjs-webassembly-compilation-more-parallel&#x2F;2016-04-22_parallelization-cpu-utilized.png&quot; alt=&quot;CPU utilized on DeadTrigger2&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;With the older scheme, the number of utilized CPUs quickly rises up from 1 to 4
cores, then more slowly from 5 cores and beyond. Intuitively, this means that
with 8 cores, we almost reached the theoritical limit of the portion of the
program that can be made parallel (not considering the overhead introduced by
parallelization or altering the scheme).&lt;&#x2F;p&gt;
&lt;p&gt;But with the newer scheme, we get much more CPU usage even after 6 cores! Then
it slows down a bit, although it is still more significant than the slow rise
of the older scheme. So it is likely that with even more threads, we could have
even better speedups than the one mentioned beforehand. In fact, we have moved
the theoritical limit mentioned above a bit further: we have expanded the
portion of the program that can be made parallel. Or to keep on using the
initial car&#x2F;road metaphor, we’ve shortened the constant speed portion of the
road to the benefit of the unlimited speed portion of the road, resulting in a
shorter trip overall.&lt;&#x2F;p&gt;
&lt;h1 id=&quot;future-steps&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#future-steps&quot; aria-label=&quot;Anchor link for: future-steps&quot;&gt;🔗&lt;&#x2F;a&gt;Future steps&lt;&#x2F;h1&gt;
&lt;p&gt;Despite these improvements, compilation time can still be a pain, especially on
mobile. This is mostly due to the fact that we’re running a whole multi-million
line codebase through the backend of a compiler to generate optimized code.
Following this work, the next bottleneck during the compilation process is
parsing, which matters for asm.js in particular, which source is plain text.
Decoding WebAssembly is an order of magnitude faster though, and it can be made
even faster. Moreover, we have even more load-time optimizations coming down
the pipeline!&lt;&#x2F;p&gt;
&lt;p&gt;In the meanwhile, we keep on improving the WebAssembly backend. Keep track of
our progress on &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;bugzilla.mozilla.org&#x2F;show_bug.cgi?id=1188259&quot;&gt;bug
1188259&lt;&#x2F;a&gt;!&lt;&#x2F;p&gt;
</content>
    </entry>
    <entry xml:lang="en">
        <title>Previous writings about Mozilla work</title>
        <published>2016-03-09T18:00:42+00:00</published>
        <updated>2016-03-09T18:00:42+00:00</updated>
        
        <author>
          <name>
            
              Benjamin Bouvier
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://bouvier.cc/tech/previous-writing-about-mozilla-work/"/>
        <id>https://bouvier.cc/tech/previous-writing-about-mozilla-work/</id>
        <content type="html" xml:base="https://bouvier.cc/tech/previous-writing-about-mozilla-work/">&lt;p&gt;I am currently a compiler engineer at Mozilla corporation, the company making
the Firefox browser among else. Our JavaScript virtual machine, Spidermonkey,
is split in several tiers, including an highly optimizing Just-In-Time (JIT)
compiler able to compile JavaScript to assembly at runtime. My previous work
has involved efficiently compiling Float32 arithmetic to hardware instructions
and implement a new SIMD API for the Web.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;about-float32-optimizations&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#about-float32-optimizations&quot; aria-label=&quot;Anchor link for: about-float32-optimizations&quot;&gt;🔗&lt;&#x2F;a&gt;About Float32 optimizations&lt;&#x2F;h2&gt;
&lt;p&gt;The full blog post is
&lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;blog.mozilla.org&#x2F;javascript&#x2F;2013&#x2F;11&#x2F;07&#x2F;efficient-float32-arithmetic-in-javascript&#x2F;&quot;&gt;there&lt;&#x2F;a&gt;.
It has been written in November 2013.&lt;&#x2F;p&gt;
&lt;p&gt;The main idea is that if you have float32 inputs to an operation; and you cast
them to doubles; and you apply an arithmetic operation to these inputs; and you
cast the result back to a float32, then you’d have the same result as if you
did the entire computation with float32 values and operations.&lt;&#x2F;p&gt;
&lt;p&gt;So we’ve introduced an operation in JavaScript that converts a Number to its
closest float32 IEEE754 representation: &lt;code&gt;Math.fround&lt;&#x2F;code&gt;. Said differently, the
above equivalence says that:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #A9B1D6; background-color: #1A1B26;&quot;&gt;&lt;code data-lang=&quot;javascript&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;function&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt; f&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #E0AF68;&quot;&gt;x&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #E0AF68;&quot;&gt; y&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;) {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;font-style: italic;&quot;&gt;    return&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; x&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; +&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; y&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;function&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt; g&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #E0AF68;&quot;&gt;x&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #E0AF68;&quot;&gt; y&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;) {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #9D7CD8;font-style: italic;&quot;&gt;    var&lt;&#x2F;span&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt; xf&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; =&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; Math&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;fround&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;x&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;)&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #9D7CD8;font-style: italic;&quot;&gt;    var&lt;&#x2F;span&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt; yf&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; =&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; Math&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;fround&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;y&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;)&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;font-style: italic;&quot;&gt;    return&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; Math&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;fround&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;xf&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; +&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; yf&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;)&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;&#x2F;&#x2F; For all x, y that can be represented exactly as float32:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;assert&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;f&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;x&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; y&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;)&lt;&#x2F;span&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt; ===&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt; g&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;x&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; y&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;))&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Yes, &lt;code&gt;===&lt;&#x2F;code&gt;. The same &lt;code&gt;===&lt;&#x2F;code&gt; you’ve been told &lt;strong&gt;not&lt;&#x2F;strong&gt; to use for floating-point
Numbers. But here, we have &lt;em&gt;bitwise&lt;&#x2F;em&gt; equality, so we can use strict equality*.&lt;&#x2F;p&gt;
&lt;p&gt;Processors have special instructions for carrying out float32 arithmetic, which
have higher throughput than the equivalent double ones. With this result in
mind, we could add a pass that would spot opportunities where the computations
are equivalent (thanks to &lt;code&gt;Math.fround&lt;&#x2F;code&gt; hints) and emit float32 instructions
instead of double instructions. This sped up a some numerical applications and
games engines by a few points.&lt;&#x2F;p&gt;
&lt;p&gt;* a careful reader would object that this is wrong for &lt;code&gt;x = y = NaN&lt;&#x2F;code&gt;, which
I’ve put away for the sake of simplicity.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;about-simd-js&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#about-simd-js&quot; aria-label=&quot;Anchor link for: about-simd-js&quot;&gt;🔗&lt;&#x2F;a&gt;About SIMD.js&lt;&#x2F;h2&gt;
&lt;p&gt;The full blog post is
&lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;blog.mozilla.org&#x2F;javascript&#x2F;2015&#x2F;03&#x2F;10&#x2F;state-of-simd-js-performance-in-firefox&#x2F;&quot;&gt;there&lt;&#x2F;a&gt;.
It has been written in March 2015.&lt;&#x2F;p&gt;
&lt;p&gt;Nowadays, processors have instructions sets that allow them to execute several
simple arithmetic operations at once. For instance, let’s say you have two
arrays of integers and you want to add each element to the corresponding one in
the other array. If both arrays have size &lt;code&gt;N&lt;&#x2F;code&gt;, this means you’ll have to carry
out &lt;code&gt;N&lt;&#x2F;code&gt; scalar additions. But processors can actually group these into bundles
of several additions, with SIMD; for the case of 32-bits wide integers, on most
modern processors, you need at most &lt;code&gt;Math.ceil(N &#x2F; 4)&lt;&#x2F;code&gt; instructions. The blog
post details what SIMD.js is and what bottlenecks we hit during implementation.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;conclusion&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#conclusion&quot; aria-label=&quot;Anchor link for: conclusion&quot;&gt;🔗&lt;&#x2F;a&gt;Conclusion&lt;&#x2F;h2&gt;
&lt;p&gt;This was a small reminder about previously written blog posts. If you’re into
JavaScript, compilers or low-level optimization, I can only recommend you to go
read the &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;blog.mozilla.org&#x2F;javascript&#x2F;&quot;&gt;Mozilla’s JavaScript blog&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
</content>
    </entry>
    <entry xml:lang="en">
        <title>Hello Mozilla, world!</title>
        <published>2015-12-31T23:59:42+00:00</published>
        <updated>2015-12-31T23:59:42+00:00</updated>
        
        <author>
          <name>
            
              Benjamin Bouvier
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://bouvier.cc/tech/hello-mozilla-world/"/>
        <id>https://bouvier.cc/tech/hello-mozilla-world/</id>
        <content type="html" xml:base="https://bouvier.cc/tech/hello-mozilla-world/">&lt;h2 id=&quot;hello-mozilla&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#hello-mozilla&quot; aria-label=&quot;Anchor link for: hello-mozilla&quot;&gt;🔗&lt;&#x2F;a&gt;Hello Mozilla!&lt;&#x2F;h2&gt;
&lt;p&gt;Hello world! And in advance, happy new year!&lt;&#x2F;p&gt;
&lt;p&gt;I am going to try to blog more about my work at Mozilla, and this will happen
in this new &lt;code&gt;mozilla&lt;&#x2F;code&gt; category. You can expect technical posts about
SpiderMonkey (the JS virtual machine implementation in Mozilla Firefox),
OdinMonkey &#x2F; asm.js, WebAssembly (wasm), and all things Mozilla in this
category.&lt;&#x2F;p&gt;
&lt;p&gt;Feel free to reach me out via &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;twitter.com&#x2F;bnjbvr&quot;&gt;twitter&lt;&#x2F;a&gt;, and see you
soon!&lt;&#x2F;p&gt;
</content>
    </entry>
</feed>

