<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
    <title>@bnjbvr - compilers</title>
    <subtitle>Technical blog and random musings.</subtitle>
    <link rel="self" type="application/atom+xml" href="https://bouvier.cc/tags/compilers/atom.xml"/>
    <link rel="alternate" type="text/html" href="https://bouvier.cc"/>
    <generator uri="https://www.getzola.org/">Zola</generator>
    <updated>2021-02-17T19:00:42+00:00</updated>
    <id>https://bouvier.cc/tags/compilers/atom.xml</id>
    <entry xml:lang="en">
        <title>A primer on code generation in Cranelift</title>
        <published>2021-02-17T19:00:42+00:00</published>
        <updated>2021-02-17T19:00:42+00:00</updated>
        
        <author>
          <name>
            
              Benjamin Bouvier
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://bouvier.cc/tech/cranelift-codegen-primer/"/>
        <id>https://bouvier.cc/tech/cranelift-codegen-primer/</id>
        <content type="html" xml:base="https://bouvier.cc/tech/cranelift-codegen-primer/">&lt;script src=&quot;https:&#x2F;&#x2F;cdn.jsdelivr.net&#x2F;npm&#x2F;mermaid&#x2F;dist&#x2F;mermaid.min.js&quot;&gt;&lt;&#x2F;script&gt;
&lt;script&gt;mermaid.initialize({startOnLoad:true});&lt;&#x2F;script&gt;
&lt;p&gt;&lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;bytecodealliance&#x2F;wasmtime&#x2F;tree&#x2F;main&#x2F;cranelift#cranelift-code-generator&quot;&gt;Cranelift&lt;&#x2F;a&gt; is a code generator written in the Rust programming language that aims to be a fast code generator, which outputs machine code that runs at reasonable speeds.&lt;&#x2F;p&gt;
&lt;span id=&quot;continue-reading&quot;&gt;&lt;&#x2F;span&gt;
&lt;p&gt;The Cranelift compilation model consists in compiling functions one by one, holding extra information about external entities, like external functions, memory addresses, and so on. This model allows for concurrent and parallel compilation of individual functions, which supports the goal of fast compilation. It was designed this way to allow for just-in-time (JIT) compilation of WebAssembly binary code in Firefox, although its scope has broadened a bit. Nowadays it is used in a few different WebAssembly runtimes, including &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;bytecodealliance&#x2F;wasmtime#wasmtime&quot;&gt;Wasmtime&lt;&#x2F;a&gt; and &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;wasmer.io&#x2F;&quot;&gt;Wasmer&lt;&#x2F;a&gt;, but also as an alternative backend for Rust debug compilation, thanks to &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;bjorn3&#x2F;rustc_codegen_cranelift&quot;&gt;cg_clif&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;A classic compiler design usually includes running a parser to translate the source to some form of intermediate representations, then run optimization passes onto them, then feeds this to the machine code generator.&lt;&#x2F;p&gt;
&lt;p&gt;This blog post focuses on the final step, namely the concepts that are involved in code generation, and what they map to in Cranelift. To make things more concrete, we’ll take a specific instruction, and see how it’s translated, from its creation down to code generation. At each step of the process, I’ll provide a short (&lt;em&gt;ahem&lt;&#x2F;em&gt;) high-level explanation of the concepts involved, and I’ll show what they map to in Cranelift, using the example instruction. While this is not a tutorial detailing how to add new instructions in Cranelift, this should be an interesting read for anyone who’s interested in compilers, and this could be an entry point if you’re interested in hacking on the Cranelift &lt;code&gt;codegen&lt;&#x2F;code&gt; crate.&lt;&#x2F;p&gt;
&lt;p&gt;This is our plan for this blog post: each squared box represents data, each
rounded box is a process. We’re going to go through each of them below.&lt;&#x2F;p&gt;
&lt;div class=&quot;mermaid&quot;&gt;
graph TD;
    clif[Optimized CLIF];
    vcode[VCode];
    final_vcode[Final VCode];
    machine_code[Machine code artifacts];
    lowering([Lowering]);
    regalloc([Register allocation]);
    codegen([Machine code generation]);
    clif --&gt; lowering --&gt; vcode --&gt; regalloc --&gt; final_vcode --&gt; codegen --&gt; machine_code
&lt;&#x2F;div&gt;
&lt;h2 id=&quot;intermediate-representations&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#intermediate-representations&quot; aria-label=&quot;Anchor link for: intermediate-representations&quot;&gt;🔗&lt;&#x2F;a&gt;Intermediate representations&lt;&#x2F;h2&gt;
&lt;p&gt;Compilers use &lt;strong&gt;intermediate representations&lt;&#x2F;strong&gt; (&lt;em&gt;IR&lt;&#x2F;em&gt;) to represent source code. Here we’re interested in representations of the &lt;em&gt;data flow&lt;&#x2F;em&gt;, that is instructions themselves and only that. The IRs contain information about the instructions themselves, their operands, type specialization information, and any additional metadata that might be useful. IRs usually map to a certain level of abstraction, and as such, they are useful for solving different problems that require different levels of abstraction. Their shape (which data structures) and numbers often have a huge impact on the performance of the compiler itself (that is, how fast it is at compiling).&lt;&#x2F;p&gt;
&lt;p&gt;In general, most programming languages use IRs internally, and yet, these are invisible to the programmers. The reason is that source code is usually first &lt;em&gt;parsed&lt;&#x2F;em&gt; (tokenized, verified) and then translated into an IR. The &lt;em&gt;abstract syntax tree&lt;&#x2F;em&gt;, aka AST, is one such IR representing the source code itself, in a format that’s very close to the source code itself. Since the raison d’être of Cranelift is to be a code generator, having a text format is secondary, and only useful for testing and debugging purposes. That’s why embedders directly create and manipulate Cranelift’s IR.&lt;&#x2F;p&gt;
&lt;p&gt;At the time of writing, Cranelift has two IRs to represent the function’s code:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;one external, high-level intermediate representation, called &lt;strong&gt;CLIF&lt;&#x2F;strong&gt; (for &lt;em&gt;Cranelift IR format&lt;&#x2F;em&gt;),&lt;&#x2F;li&gt;
&lt;li&gt;one internal, low-level intermediate representation called &lt;strong&gt;VCode&lt;&#x2F;strong&gt; (for &lt;em&gt;virtual-registerized code&lt;&#x2F;em&gt;).&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h2 id=&quot;clif-ir&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#clif-ir&quot; aria-label=&quot;Anchor link for: clif-ir&quot;&gt;🔗&lt;&#x2F;a&gt;CLIF IR&lt;&#x2F;h2&gt;
&lt;p&gt;CLIF is the IR that Cranelift embedders create and manipulate. It consists of high-level typed operations that are convenient to use and&#x2F;or can be simply translated to machine code. It is in &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Static_single_assignment_form&quot;&gt;static single assignment (SSA) form&lt;&#x2F;a&gt;: each value referenced by an operation (SSA value) is defined only once, and may have as many uses as desired. CLIF is practical to use and manipulate for classic compilers optimization passes (e.g. &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Loop-invariant_code_motion&quot;&gt;LICM&lt;&#x2F;a&gt;), as it is generic over the target architecture which we’re compiling to.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #A9B1D6; background-color: #1A1B26;&quot;&gt;&lt;code data-lang=&quot;rust&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;let&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; x&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; =&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; builder&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;ins&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;().&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;iconst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #0DB9D7;&quot;&gt;types&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;I64&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt; 42&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;);&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;let&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; y&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; =&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; builder&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;ins&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;().&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;iconst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #0DB9D7;&quot;&gt;types&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;I64&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt; 1337&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;);&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;let&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; sum&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; =&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; builder&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;ins&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;().&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;iadd&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;x&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; y&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;);&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;An example of Rust code that would generate CLIF IR: using an IR builder, two constant 64-bits integer SSA values x and y are created, and then added together. The result is stored into the &lt;code&gt;sum&lt;&#x2F;code&gt; SSA value, which can then be consumed by other instructions.&lt;&#x2F;p&gt;
&lt;p&gt;The code for the IR builder we’re manipulating above is automatically generated by the &lt;code&gt;cranelift-codegen&lt;&#x2F;code&gt; build script. The build script uses a domain specific &lt;em&gt;meta&lt;&#x2F;em&gt; language (DSL)&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-2-1&quot;&gt;&lt;a href=&quot;#fn-2&quot;&gt;1&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt; that defines the instructions, their input and output operands, which input types are allowed, how the output type is inferred, etc. We won’t take a look at this &lt;em&gt;today&lt;&#x2F;em&gt;: this is a bit too far from code generation, but this could be material for another blog post.&lt;&#x2F;p&gt;
&lt;p&gt;As an example of a full-blown CLIF generator, there is &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;bytecodealliance&#x2F;wasmtime&#x2F;tree&#x2F;main&#x2F;cranelift&#x2F;wasm&quot;&gt;a crate&lt;&#x2F;a&gt; in the Cranelift project that allows translating from the WebAssembly binary format to CLIF. The Cranelift backend for Rustc uses its own CLIF generator that translates from one of the Rust compiler’s IRs.&lt;&#x2F;p&gt;
&lt;p&gt;Finally, it’s time to reveal what’s going to be our running example! The Chosen One is the &lt;code&gt;iadd&lt;&#x2F;code&gt; CLIF operation, which allows to add two integers of any length together, with wrapping semantics. It is both simple to understand what it does, and exhibits interesting behaviors on the two architectures we’re interested in. So, let’s continue down the pipeline!&lt;&#x2F;p&gt;
&lt;h2 id=&quot;vcode-ir&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#vcode-ir&quot; aria-label=&quot;Anchor link for: vcode-ir&quot;&gt;🔗&lt;&#x2F;a&gt;VCode IR&lt;&#x2F;h2&gt;
&lt;p&gt;Later on, the CLIF intermediate representation is &lt;em&gt;lowered&lt;&#x2F;em&gt;, i.e. transformed from a high-level one into a lower-level one. Here lower level means a form more specialized for a machine architecture. This lower IR is called &lt;em&gt;VCode&lt;&#x2F;em&gt; in Cranelift. The values it references are called &lt;em&gt;virtual registers&lt;&#x2F;em&gt; (more on the &lt;em&gt;virtual&lt;&#x2F;em&gt; bit below). They’re not in SSA form anymore: each virtual register may be redefined as many times as we want. This IR is used to encode register allocation constraints and it guides machine code generation. As a matter of fact, since this information is tied to the machine code’s representation itself, this IR is also target-specific: there’s one flavor of VCode per each CPU architecture we’re compiling to.&lt;&#x2F;p&gt;
&lt;p&gt;Let’s get back to our example, that we’re going to compile on two instruction set architectures:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;ARM 64-bits (aka aarch64), which is used in most mobile devices but start to become mainstream on laptops (Apple’s Mac M1, some Chromebooks)&lt;&#x2F;li&gt;
&lt;li&gt;Intel’s x86 64-bits (aka x86_64, also abbreviated x64), which is used in most desktop and laptop machines).&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;An integer addition machine instruction on aarch64 will take three operands: two input operands (one of which must be a register), and another third output register operand. While on the x86_64 architecture, the equivalent instruction involves a total of two registers: one that is a read-only source register, and another that is an in-out modified register, containing both the second source and the destination register. We’ll get back to this.&lt;&#x2F;p&gt;
&lt;p&gt;So considering &lt;code&gt;iadd&lt;&#x2F;code&gt;, let’s look at (one of&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-4-1&quot;&gt;&lt;a href=&quot;#fn-4&quot;&gt;2&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;) the VCode instruction that’s used to represent integer additions on aarch64 (as defined in &lt;code&gt;cranelift&#x2F;codegen&#x2F;src&#x2F;isa&#x2F;aarch64&#x2F;inst&#x2F;mod.rs&lt;&#x2F;code&gt;):&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #A9B1D6; background-color: #1A1B26;&quot;&gt;&lt;code data-lang=&quot;rust&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;&#x2F;&#x2F;&#x2F; An ALU operation with two register sources and a register destination.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;AluRRR&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;    alu_op&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; ALUOp&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;    rd&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; Writable&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;&amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;Reg&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;&amp;gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;    rn&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; Reg&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;    rm&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; Reg&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;},&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Some details here:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;alu_op&lt;&#x2F;code&gt; defines the sub-opcode used in the ALU (Arithmetic Logic Unit). It will be &lt;code&gt;AluOp::Add64&lt;&#x2F;code&gt; for a 64-bits integer addition.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;rn&lt;&#x2F;code&gt; and &lt;code&gt;rm&lt;&#x2F;code&gt; are the conventional aarch64 names for the two input registers.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;rd&lt;&#x2F;code&gt; is the destination register. See how it’s marked as &lt;code&gt;Writable&lt;&#x2F;code&gt;, while the two others are not? &lt;code&gt;Writable&lt;&#x2F;code&gt; is a plain Rust wrapper that makes sure that we &lt;em&gt;can&lt;&#x2F;em&gt; statically differentiate read-only registers from writable registers; a neat trick that allows us to catch more issues at compile-time.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;All this information is directly tied to the machine code representation of an addition instruction on aarch64: each field is later used to select some bytes that will be generated during code generation.&lt;&#x2F;p&gt;
&lt;p&gt;As said before, the VCode is specific to each architecture, so x86_64 has a different VCode representation for the same instruction (as defined in &lt;code&gt;cranelift&#x2F;codegen&#x2F;src&#x2F;isa&#x2F;x64&#x2F;inst&#x2F;mod.rs&lt;&#x2F;code&gt;):&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #A9B1D6; background-color: #1A1B26;&quot;&gt;&lt;code data-lang=&quot;rust&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;&#x2F;&#x2F;&#x2F; Integer arithmetic&#x2F;bit-twiddling: (add sub and or xor mul adc? sbb?) (32 64) (reg addr imm) reg&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;AluRmiR&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;    is_64&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; bool&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;    op&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; AluRmiROpcode&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;    src&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; RegMemImm&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;    dst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; Writable&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;&amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;Reg&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;&amp;gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;},&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Here, the sub-opcode is defined as part of the &lt;code&gt;AluRmiROpcode&lt;&#x2F;code&gt; enum (the comment hints at which other x86 machine instructions are generated by this same VCode). See how there’s only one &lt;code&gt;src&lt;&#x2F;code&gt; (source) register (or memory or immediate operand), while the instruction conceptually takes two inputs? That’s because it’s expected that the &lt;code&gt;dst&lt;&#x2F;code&gt; (destination) register is &lt;em&gt;modified&lt;&#x2F;em&gt;, that is, both read (so it’s the second input operand) and written to (so it’s the result register). In equivalent C code, the x86’s add instruction doesn’t actually do &lt;code&gt;a = b + c&lt;&#x2F;code&gt;. What it does is &lt;code&gt;a += b&lt;&#x2F;code&gt;, that is, one of the sources is &lt;em&gt;consumed&lt;&#x2F;em&gt; by the instruction. This is an artifact inherited from the design of older x86 machines in the 1970’s, when instructions were designed around an accumulator model (and representing efficiently three operands in a CISC architecture would make the encoding larger and harder than it is).&lt;&#x2F;p&gt;
&lt;h2 id=&quot;instruction-selection-lowering&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#instruction-selection-lowering&quot; aria-label=&quot;Anchor link for: instruction-selection-lowering&quot;&gt;🔗&lt;&#x2F;a&gt;Instruction selection (lowering)&lt;&#x2F;h2&gt;
&lt;p&gt;As said before, converting from the high-level IR (CLIF) to the low-level IR (VCode) is called lowering. Since VCode is target-dependent, this process is also target-dependent. That’s where we consider which machine instructions get eventually used for a given CLIF opcode. There are many ways to achieve the same machine state results for given semantics, but some of these ways are faster than other, and&#x2F;or require fewer code bytes to achieve. The problem can be summed up like this: given some CLIF, which VCode can we create to generate the fastest and&#x2F;or smallest machine code that carries out the desired semantics? This is called &lt;em&gt;instruction selection&lt;&#x2F;em&gt;, because we’re selecting the VCode instructions among a set of different possible instructions.&lt;&#x2F;p&gt;
&lt;p&gt;How do these IR map to each other? A given CLIF node may be lowered into 1 to N VCode instructions. A given VCode instruction may lead to the code generation of 1 to M machine instructions. There are no rules governing the maximum of entities mapped. For instance, the integer addition CLIF opcode &lt;code&gt;iadd&lt;&#x2F;code&gt; on 64-bits inputs maps to a single VCode instruction on aarch64. The VCode instruction then causes a single code instruction to be generated.&lt;&#x2F;p&gt;
&lt;p&gt;Other CLIF opcodes may generate more than a single machine instruction eventually. Consider the CLIF opcode for signed integer division &lt;code&gt;idiv&lt;&#x2F;code&gt;. Its semantics define that it traps for zero inputs and in case of integer overflow&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-3-1&quot;&gt;&lt;a href=&quot;#fn-3&quot;&gt;3&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;. On aarch64, this is lowered into:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;one VCode instruction that checks if the input is zero and trap otherwise&lt;&#x2F;li&gt;
&lt;li&gt;two VCode instructions for comparing the input values against the minimal integer value and -1&lt;&#x2F;li&gt;
&lt;li&gt;one VCode instruction to trap if the two input values match what we checked against&lt;&#x2F;li&gt;
&lt;li&gt;and one VCode instruction that does the actual division operation.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Each of these VCode instruction then generates one or more machine code instructions, resulting in a bit of a longer sequence.&lt;&#x2F;p&gt;
&lt;p&gt;Let’s look at the lowering of &lt;code&gt;iadd&lt;&#x2F;code&gt; on aarch64 (in &lt;code&gt;cranelift&#x2F;codegen&#x2F;src&#x2F;isa&#x2F;aarch64&#x2F;lower_inst.rs&lt;&#x2F;code&gt;), edited and simplified for clarity. I’ve added comments in the code, explaining what each line does:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #A9B1D6; background-color: #1A1B26;&quot;&gt;&lt;code data-lang=&quot;rust&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;Opcode&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;Iadd&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; =&amp;gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;    &#x2F;&#x2F; Get the destination register.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;    let&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rd&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; =&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt; get_output_reg&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;ctx&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; outputs&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;[&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt;0&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;]).&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;only_reg&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;().&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;unwrap&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;();&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;    &#x2F;&#x2F; Get the controlling type of the addition (32-bits int or 64-bits int or&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;    &#x2F;&#x2F; int vector, etc.).&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;    let&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; ty&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; =&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; ty&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;unwrap&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;();&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;    &#x2F;&#x2F; Force one of the inputs into a register, not applying any signed- or&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;    &#x2F;&#x2F; zero-extension.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;    let&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rn&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; =&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt; put_input_in_reg&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;ctx&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; inputs&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;[&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt;0&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;],&lt;&#x2F;span&gt;&lt;span style=&quot;color: #0DB9D7;&quot;&gt; NarrowValueMode&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;None&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;);&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;    &#x2F;&#x2F; Try to see if we can encode the second operand as an immediate on&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;    &#x2F;&#x2F; 12-bits, maybe by negating it;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;    &#x2F;&#x2F; Otherwise, put it into a register.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;    let&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; (&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;rm&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; negated&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;) =&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt; put_input_in_rse_imm12_maybe_negated&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;        ctx&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;        inputs&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;[&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt;1&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;],&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;        ty_bits&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;ty&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;),&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #0DB9D7;&quot;&gt;        NarrowValueMode&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;None&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;    );&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;    &#x2F;&#x2F; Select the ALU subopcode, based on possible negation and controlling&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;    &#x2F;&#x2F; type.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;    let&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; alu_op&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; =&lt;&#x2F;span&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt; if !&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;negated&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;        choose_32_64&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;ty&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #0DB9D7;&quot;&gt; ALUOp&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;Add32&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #0DB9D7;&quot;&gt; ALUOp&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;Add64&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;    }&lt;&#x2F;span&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt; else&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;        choose_32_64&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;ty&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #0DB9D7;&quot;&gt; ALUOp&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;Sub32&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #0DB9D7;&quot;&gt; ALUOp&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;Sub64&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;    };&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;    &#x2F;&#x2F; Emit the VCode instruction in the VCode stream.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;    ctx&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;emit&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;alu_inst_imm12&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;alu_op&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rd&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rn&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rm&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;));&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;In fact, the &lt;code&gt;alu_inst_imm12&lt;&#x2F;code&gt; wrapper can create one VCode instruction among a set of possible ones (since we’re trying to select &lt;em&gt;the best one&lt;&#x2F;em&gt;). For the sake of simplicity, we’ll assume that &lt;code&gt;AluRRR&lt;&#x2F;code&gt; is going to be generated, i.e. the selected instruction is the one using only register encodings for the input values.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;register-allocation&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#register-allocation&quot; aria-label=&quot;Anchor link for: register-allocation&quot;&gt;🔗&lt;&#x2F;a&gt;Register allocation&lt;&#x2F;h2&gt;
&lt;div class=&quot;mermaid&quot;&gt;
graph TD
    vcode_vreg[VCode with virtual registers]
    regalloc([Register allocation])
    vcode_rreg[VCode with real registers]
    codegen([Code generation])
    machine_code(Machine code)
    vcode_vreg --&gt; regalloc --&gt; vcode_rreg --&gt; codegen --&gt; machine_code
&lt;&#x2F;div&gt;
&lt;h3 id=&quot;vcode-registers-and-stack-slots&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#vcode-registers-and-stack-slots&quot; aria-label=&quot;Anchor link for: vcode-registers-and-stack-slots&quot;&gt;🔗&lt;&#x2F;a&gt;VCode, registers and stack slots&lt;&#x2F;h3&gt;
&lt;p&gt;Hey, ever wondered what the V in VCode meant? Back to the drawing board. While a program may reference a theoretically unlimited number of instructions, each referencing a theoretically unlimited number of values as inputs and outputs, the physical machine only has a fixed set of containers for those values:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;either they must live in machine &lt;strong&gt;registers&lt;&#x2F;strong&gt;: very fast to access in the CPU, take some CPU real estate, thus are costly, so there are usually few of them.&lt;&#x2F;li&gt;
&lt;li&gt;or they must live in the process’ &lt;strong&gt;stack memory&lt;&#x2F;strong&gt;: it’s slower to access, but we can have virtually any amount of stack &lt;em&gt;slots&lt;&#x2F;em&gt;.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #A9B1D6; background-color: #1A1B26;&quot;&gt;&lt;code data-lang=&quot;asm&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;mov&lt;&#x2F;span&gt;&lt;span&gt; %&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt;edi&lt;&#x2F;span&gt;&lt;span&gt;,-&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt;0x4&lt;&#x2F;span&gt;&lt;span&gt;(%&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt;rbp&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;mov&lt;&#x2F;span&gt;&lt;span&gt; %&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt;rsi&lt;&#x2F;span&gt;&lt;span&gt;,-&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt;0x10&lt;&#x2F;span&gt;&lt;span&gt;(%&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt;rbp&lt;&#x2F;span&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;mov&lt;&#x2F;span&gt;&lt;span&gt; -&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt;0x4&lt;&#x2F;span&gt;&lt;span&gt;(%&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt;rbp&lt;&#x2F;span&gt;&lt;span&gt;),%&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt;eax&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;em&gt;In this example of x86 machine code, %edi, %rsi, %rbp, %eax are all registers; stack slots are memory addresses computed as the frame pointer (%rbp) plus an offset value (which happens to be negative here). Note that stack slots may be referred to by the stack pointer (%rsp) in general.&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;h3 id=&quot;defining-the-register-allocation-problem&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#defining-the-register-allocation-problem&quot; aria-label=&quot;Anchor link for: defining-the-register-allocation-problem&quot;&gt;🔗&lt;&#x2F;a&gt;Defining the register allocation problem&lt;&#x2F;h3&gt;
&lt;p&gt;The problem of mapping the IR values (in VCode these are the &lt;code&gt;Reg&lt;&#x2F;code&gt;) to machine “containers” is called &lt;strong&gt;&lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Register_allocation&quot;&gt;register allocation&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt; (aka regalloc). Inputs to register allocation can be as numerous as we want them, and map to “virtual” values, hence we call them &lt;em&gt;virtual registers&lt;&#x2F;em&gt;. And… that’s where the V from VCode comes from: the instructions in VCode reference values that are &lt;em&gt;virtual&lt;&#x2F;em&gt; registers before register allocation, so we say the code is in &lt;em&gt;virtualized&lt;&#x2F;em&gt; register form. The output of register allocation is a set of new instructions, where the virtual registers have been replaced by &lt;em&gt;real registers&lt;&#x2F;em&gt; (the physical ones, limited in quantity) or stack slots references (and other additional metadata).&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #A9B1D6; background-color: #1A1B26;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;&#x2F;&#x2F; Before register allocation, with unlimited virtual registers:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;v2 = v0 + v1&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;v3 = v2 * 2&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;v4 = v2 + 1&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;v5 = v4 + v3&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;return v5&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;&#x2F;&#x2F; One possible register allocation, on a machine that has 2 registers %r0, %r1:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;%r0 = %r0 + %r1&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;%r1 = %r0 * 2&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;%r0 = %r0 + 1&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;%r1 = %r0 + %r1&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;return %r1&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;When all is well, the virtual registers don’t conceptually &lt;em&gt;live&lt;&#x2F;em&gt; at the same time, and they can be put into physical registers. Issues arise when there’s not enough physical registers to contain all the virtual registers that live at the same time, which is the case for… a very large majority of programs. Then, register allocation must decide which registers continue to live in registers at a given program point, and which should be &lt;strong&gt;spilled&lt;&#x2F;strong&gt; into a stack slot, effectively &lt;em&gt;storing&lt;&#x2F;em&gt; them onto the stack for later use. This later reuse will imply to &lt;strong&gt;reload&lt;&#x2F;strong&gt; them from the stack slot, using a &lt;em&gt;load&lt;&#x2F;em&gt; machine instruction. The complexity resides in choosing which registers should be spilled, at which program point they should be spilled, and at which program points we should reload them, if we need to do so. Making good choices there will have a large impact on the speed of the generated code, since memory accesses to the stack imply an additional runtime cost. For instance, a variable that’s frequently used in a hot loop should live in a register for the whole loop’s lifetime, and not be spilled&#x2F;reloaded in the middle of the loop.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #A9B1D6; background-color: #1A1B26;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;&#x2F;&#x2F; Before register allocation, with unlimited virtual registers:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;v2 = v0 + v1&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;v3 = v0 + v2&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;v4 = v3 + v1&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;return v4&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;&#x2F;&#x2F; One possible register allocation, on a machine that has 2 registers %r0, %r1.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;&#x2F;&#x2F; We need to spill one value, because there&amp;#39;s a point where 3 values are live at the same time!&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;spill %r1 --&amp;gt; stack_slot(0)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;%r1 = %r0 + %r1&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;%r1 = %r0 + %r1&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;reload stack_slot(0) --&amp;gt; %r0&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;%r1 = %r1 + %r0&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;return %r1&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;And, since we like to have our cake and eat it too, the register allocator itself should be &lt;em&gt;fast&lt;&#x2F;em&gt;: it should not take an unbounded amount of time to make these allocation decisions. Register allocation has the good taste to be a &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;NP-completeness&quot;&gt;NP-complete&lt;&#x2F;a&gt; problem. Concretely, this means that implementations cannot find the &lt;em&gt;best&lt;&#x2F;em&gt; solutions given arbitrary inputs, but they’ll estimate &lt;em&gt;good&lt;&#x2F;em&gt; solutions based on heuristics, in worst-case quadratic time over the size of the input. All of this makes it so that register allocation has its own whole research field, and has been extensively studied for some time now. It is a fascinating problem.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;register-allocation-in-cranelift&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#register-allocation-in-cranelift&quot; aria-label=&quot;Anchor link for: register-allocation-in-cranelift&quot;&gt;🔗&lt;&#x2F;a&gt;Register allocation in Cranelift&lt;&#x2F;h3&gt;
&lt;p&gt;Back to Cranelift. The register allocation contract is that if a value &lt;em&gt;must&lt;&#x2F;em&gt; live in a real register at a given program point, then it &lt;em&gt;does&lt;&#x2F;em&gt; live where it should (unless register allocation is impossible). At the start of code generation for a VCode instruction, we are guaranteed that the input values live in real registers, and that the output real register is available before the next VCode instruction.&lt;&#x2F;p&gt;
&lt;p&gt;You might have noticed that the VCode instructions only refer to registers, and not stack slots. But where are the stack slots, then? The trick is that the stack slots are &lt;em&gt;invisible&lt;&#x2F;em&gt; to VCode. Register allocation may create an arbitrary number of spills, reloads, and register moves&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-5-1&quot;&gt;&lt;a href=&quot;#fn-5&quot;&gt;4&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt; around VCode instructions, to ensure that their register allocation constraints are met. This is why the output of register allocation is a new list of instructions, that includes not only the initial instructions filled with the actual registers, but also additional spill, reload and move (VCode) instructions added by regalloc.&lt;&#x2F;p&gt;
&lt;p&gt;As said before, this problem is so sufficiently complex, involved and independent from the rest of the code (assuming the right set of interfaces!) that its code lives in a separate crate, &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;bytecodealliance&#x2F;regalloc.rs&quot;&gt;&lt;code&gt;regalloc.rs&lt;&#x2F;code&gt;&lt;&#x2F;a&gt;, with its own fuzzing and testing infrastructure. I hope to shed some light on it at some point too.&lt;&#x2F;p&gt;
&lt;p&gt;What’s interesting to us today is the register allocation &lt;em&gt;constraints&lt;&#x2F;em&gt;. Consider the aarch64 integer add instruction &lt;code&gt;add rd, rn, rm&lt;&#x2F;code&gt;: &lt;code&gt;rd&lt;&#x2F;code&gt; is the output virtual register that’s written to, while &lt;code&gt;rn&lt;&#x2F;code&gt; and &lt;code&gt;rm&lt;&#x2F;code&gt; are the inputs, thus read from. We need to inform the register allocation algorithm about these constraints. In regalloc jargon, “read to” is known as &lt;em&gt;used&lt;&#x2F;em&gt;, while “written to” is known as &lt;em&gt;defined&lt;&#x2F;em&gt;. Here, the aarch64 VCode instruction &lt;code&gt;AluRRR&lt;&#x2F;code&gt; does &lt;em&gt;use&lt;&#x2F;em&gt; &lt;code&gt;rn&lt;&#x2F;code&gt; and &lt;code&gt;rm&lt;&#x2F;code&gt;, and it &lt;em&gt;def&lt;&#x2F;em&gt;ines &lt;code&gt;rd&lt;&#x2F;code&gt;. This usage information is &lt;em&gt;collected&lt;&#x2F;em&gt; in the &lt;code&gt;aarch64_get_regs&lt;&#x2F;code&gt; function (&lt;code&gt;cranelift&#x2F;codegen&#x2F;src&#x2F;isa&#x2F;aarch64&#x2F;inst&#x2F;mod.rs&lt;&#x2F;code&gt;):&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #A9B1D6; background-color: #1A1B26;&quot;&gt;&lt;code data-lang=&quot;rust&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;fn&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt; aarch64_get_regs&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;inst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;: &amp;amp;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;Inst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; collector&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;: &amp;amp;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9D7CD8;font-style: italic;&quot;&gt;mut&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; RegUsageCollector&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;) {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;    match&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; inst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;        &amp;amp;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;Inst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;AluRRR&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; {&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rd&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rn&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rm&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;, .. } =&amp;gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;            collector&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;add_def&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;rd&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;);&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;            collector&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;add_use&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;rn&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;);&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;            collector&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;add_use&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;rm&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;);&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;        }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;        &#x2F;&#x2F; etc.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Then, after register allocation has assigned the physical registers, we need to instruct it how to replace virtual register mentions by physical register mentions. This is done in the &lt;code&gt;aarch64_map_regs&lt;&#x2F;code&gt; function (same file as above):&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #A9B1D6; background-color: #1A1B26;&quot;&gt;&lt;code data-lang=&quot;rust&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;fn&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt; aarch64_map_regs&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;&amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt;RUM&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; RegUsageMapper&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;&amp;gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;inst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;: &amp;amp;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9D7CD8;font-style: italic;&quot;&gt;mut&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; Inst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; mapper&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;: &amp;amp;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt;RUM&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;) {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;    &#x2F;&#x2F; ...&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;    match&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; inst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;        &amp;amp;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9D7CD8;font-style: italic;&quot;&gt;mut&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; Inst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;AluRRR&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;            ref&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9D7CD8;font-style: italic;&quot;&gt; mut&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rd&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;            ref&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9D7CD8;font-style: italic;&quot;&gt; mut&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rn&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;            ref&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9D7CD8;font-style: italic;&quot;&gt; mut&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rm&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;            ..&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;        } =&amp;gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;            map_def&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;mapper&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rd&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;);&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;            map_use&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;mapper&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rn&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;);&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;            map_use&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;mapper&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rm&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;);&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;        }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;        &#x2F;&#x2F; etc.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Note this is reflecting quite precisely what the usage collector did: we’re replacing the virtual register mention for the defined register &lt;code&gt;rd&lt;&#x2F;code&gt; with the information (which real register) provided by the &lt;code&gt;RegUsageMapper&lt;&#x2F;code&gt;. These two functions must stay in sync, otherwise here be dragons! (and bugs very hard to debug!)&lt;&#x2F;p&gt;
&lt;h3 id=&quot;register-allocation-on-x86&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#register-allocation-on-x86&quot; aria-label=&quot;Anchor link for: register-allocation-on-x86&quot;&gt;🔗&lt;&#x2F;a&gt;Register allocation on x86&lt;&#x2F;h3&gt;
&lt;p&gt;On Intel’s x86, register allocation may be a bit trickier: in some cases, the lowering needs to be carefully written so it satisfies some register allocation constraints that are very specific to this architecture. In particular, x86 has &lt;em&gt;fixed register constraints&lt;&#x2F;em&gt; as well as &lt;em&gt;tied operands&lt;&#x2F;em&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;For this specific part, we’ll look at the integer shift-left instruction, which is equivalent to C’s &lt;code&gt;x &amp;lt;&amp;lt; y&lt;&#x2F;code&gt;. Why this particular instruction? It exhibits both properties that we’re interested in studying here. The lowering of &lt;code&gt;iadd&lt;&#x2F;code&gt; is similar, albeit slightly simpler, as it &lt;em&gt;only&lt;&#x2F;em&gt; involves tied operands.&lt;&#x2F;p&gt;
&lt;h4 id=&quot;fixed-register-constraints&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#fixed-register-constraints&quot; aria-label=&quot;Anchor link for: fixed-register-constraints&quot;&gt;🔗&lt;&#x2F;a&gt;Fixed register constraints&lt;&#x2F;h4&gt;
&lt;p&gt;On the one hand, some instructions expect their inputs to be in &lt;em&gt;fixed&lt;&#x2F;em&gt; registers, that is, specific registers arbitrarily predefined by the architecture manual. For the example of the shift instruction, if the count is not statically known at compile time (it’s not a shift by a constant value), then the amount by which we’re shifting must be in the &lt;code&gt;rcx&lt;&#x2F;code&gt; register&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-8-1&quot;&gt;&lt;a href=&quot;#fn-8&quot;&gt;5&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;Now, how do we make sure that the input value actually is in &lt;code&gt;rcx&lt;&#x2F;code&gt;? We can mark &lt;code&gt;rcx&lt;&#x2F;code&gt; as used in the &lt;code&gt;get_regs&lt;&#x2F;code&gt; function so regalloc knows about this, but nothing ensures that the input &lt;em&gt;resides&lt;&#x2F;em&gt; in it at the beginning of the instruction. To resolve this, we’ll introduce a &lt;strong&gt;move instruction&lt;&#x2F;strong&gt; during lowering, that is going to copy the input value into &lt;code&gt;rcx&lt;&#x2F;code&gt;. Then we’re sure it lives there, and register allocation knows it’s used: we’re good to go!&lt;&#x2F;p&gt;
&lt;p&gt;In a nutshell, this shows how lowering and register allocation play together:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;during lowering, we introduce a move from a dynamic shift input value to &lt;code&gt;rcx&lt;&#x2F;code&gt; before the actual shift&lt;&#x2F;li&gt;
&lt;li&gt;in the register usage function, we mark &lt;code&gt;rcx&lt;&#x2F;code&gt; as used&lt;&#x2F;li&gt;
&lt;li&gt;(nothing to do in the register mapping function: &lt;code&gt;rcx&lt;&#x2F;code&gt; is a real register already)&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h4 id=&quot;tied-operands&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#tied-operands&quot; aria-label=&quot;Anchor link for: tied-operands&quot;&gt;🔗&lt;&#x2F;a&gt;Tied operands&lt;&#x2F;h4&gt;
&lt;p&gt;On the other hand, some instructions have operands that are both read and written at the same time: we call them &lt;em&gt;modified&lt;&#x2F;em&gt; in Cranelift and regalloc.rs, but they’re also known as &lt;em&gt;tied operands&lt;&#x2F;em&gt; in the compiler literature. It’s not just that there’s a register that must be read, and a register that must be written to: they &lt;em&gt;must&lt;&#x2F;em&gt; be the same register. How do we model this, then?&lt;&#x2F;p&gt;
&lt;p&gt;Consider a naive solution. We take the input virtual register, and decide it’s allocated to the same register as the output (modified) register. Unfortunately, if the chosen virtual register was going to be reused by another later VCode instruction, then its value would be overwritten (clobbered) by the current instruction. This would result in incorrect code being generated, so this is not acceptable. In general we can’t clobber the value that was in an input value during lowering, because that’s the role of regalloc to make this kind of decisions.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #A9B1D6; background-color: #1A1B26;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;&#x2F;&#x2F; Before register allocation, with virtual registers:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;v2 = v0 + v1&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;v3 = v0 + 42&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;&#x2F;&#x2F; After register allocation, on a machine with two registers %r0 and %r1:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;&#x2F;&#x2F; assign v0 to %r0, v1 to %r1, v2 to %r0&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;%r0 += v1&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;... = %r0 + 42 &#x2F;&#x2F; ohnoes! the value in %r0 is v2, not v0 anymore!&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The right solution is, again, to &lt;em&gt;copy&lt;&#x2F;em&gt; this input virtual register into the output virtual register, right before the instruction. This way, we can still reuse the untouched input register in other instructions without modifying it: only the copy is written to.&lt;&#x2F;p&gt;
&lt;p&gt;Pfew! We can now look at the entire lowering for the shift left instruction, edited and commented for clarity:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #A9B1D6; background-color: #1A1B26;&quot;&gt;&lt;code data-lang=&quot;rust&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;&#x2F;&#x2F; Read the instruction operand size from the output&amp;#39;s type.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;let&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; size&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; =&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; dst_ty&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;bytes&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;() as&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; u8&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;&#x2F;&#x2F; Put the left hand side into a virtual register.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;let&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; lhs&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; =&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt; put_input_in_reg&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;ctx&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; inputs&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;[&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt;0&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;]);&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;&#x2F;&#x2F; Put the right hand side (shift amount) into either an immediate (if it&amp;#39;s&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;&#x2F;&#x2F; statically known at compile time), or into a virtual register.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;let&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; (&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;count&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rhs&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;) =&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;    if let&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; Some&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;cst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;) =&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; ctx&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;get_input_as_source_or_const&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;insn&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt; 1&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;).&lt;&#x2F;span&gt;&lt;span&gt;constant &lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;{&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;        &#x2F;&#x2F; Mask count, according to Cranelift&amp;#39;s semantics.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;        let&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; cst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; = (&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;cst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; as&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; u8&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;) &amp;amp; (&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;dst_ty&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;bits&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;() as&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; u8&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; -&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt; 1&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;);&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;        (&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;Some&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;cst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;),&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; None&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;    }&lt;&#x2F;span&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt; else&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;        (&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;None&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; Some&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;put_input_in_reg&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;ctx&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; inputs&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;[&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt;1&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;])))&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;    };&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;&#x2F;&#x2F; Get the destination virtual register.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;let&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; dst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; =&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt; get_output_reg&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;ctx&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; outputs&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;[&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt;0&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;]).&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;only_reg&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;().&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;unwrap&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;();&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;&#x2F;&#x2F; Copy the left hand side into the (modified) output operand, to satisfy the&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;&#x2F;&#x2F; mod constraint.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;ctx&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;emit&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #0DB9D7;&quot;&gt;Inst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;mov_r_r&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt;true&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; lhs&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; dst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;));&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;&#x2F;&#x2F; If the shift count is statically known: nothing particular to do. Otherwise,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;&#x2F;&#x2F; we need to put it in the RCX register.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;if&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; count&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;is_none&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;() {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;    let&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; w_rcx&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; =&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; Writable&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;from_reg&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #0DB9D7;&quot;&gt;regs&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;rcx&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;());&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;    &#x2F;&#x2F; Copy the shift count (which is in rhs) into RCX.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;    ctx&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;emit&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #0DB9D7;&quot;&gt;Inst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;mov_r_r&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt;true&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rhs&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;unwrap&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(),&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; w_rcx&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;));&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;&#x2F;&#x2F; Generate the actual shift instruction.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;ctx&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;emit&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #0DB9D7;&quot;&gt;Inst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;shift_r&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;size&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #0DB9D7;&quot;&gt; ShiftKind&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;ShiftLeft&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; count&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; dst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;));&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;And this is how we tell the register usage collector about our constraints:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #A9B1D6; background-color: #1A1B26;&quot;&gt;&lt;code data-lang=&quot;rust&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;Inst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;ShiftR&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; {&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; num_bits&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; dst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;, .. } =&amp;gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;    if&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; num_bits&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;is_none&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;() {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;        &#x2F;&#x2F; if the shift count is dynamic, mark RCX as used.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;        collector&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;add_use&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #0DB9D7;&quot;&gt;regs&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;rcx&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;());&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;    }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;    &#x2F;&#x2F; In all the cases, the destination operand is modified.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;    collector&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;add_mod&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(*&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;dst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;);&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Only the modified register needs to be mapped to its allocated physical register:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #A9B1D6; background-color: #1A1B26;&quot;&gt;&lt;code data-lang=&quot;rust&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;Inst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;ShiftR&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; { ref&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9D7CD8;font-style: italic;&quot;&gt; mut&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; dst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;, .. } =&amp;gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;    map_mod&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;mapper&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; dst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;);&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;h3 id=&quot;virtual-registers-copies-and-performance&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#virtual-registers-copies-and-performance&quot; aria-label=&quot;Anchor link for: virtual-registers-copies-and-performance&quot;&gt;🔗&lt;&#x2F;a&gt;Virtual registers copies and performance&lt;&#x2F;h3&gt;
&lt;p&gt;Do these virtual register copies sound costly to you? In theory, they could lead to the code generation of a move instructions, increasing the size of the code generated and causing a small runtime cost. In practice,
register allocation, through its interface, knows how to identify move instructions, their source and their destination. By analyzing them, it can see when a source isn’t used after a given move instruction, and thus allocate the same register for the source and the destination of the move. Then, when Cranelift generates the code, it will avoid generating a move from a physical register to the same one&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-7-1&quot;&gt;&lt;a href=&quot;#fn-7&quot;&gt;6&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;. As a matter of fact, creating a VCode copy doesn’t necessarily mean that it will generate a machine code move instruction later: it is present just in case regalloc &lt;em&gt;needs&lt;&#x2F;em&gt; it, but it can be avoided when it’s spurious.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;code-generation&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#code-generation&quot; aria-label=&quot;Anchor link for: code-generation&quot;&gt;🔗&lt;&#x2F;a&gt;Code generation&lt;&#x2F;h2&gt;
&lt;p&gt;Oh my, we’re getting closer to actually being able to run the code! Once register allocation has run, we can generate the actual machine code for the VCode instructions. Cool kids call this step of the pipeline &lt;em&gt;codegen&lt;&#x2F;em&gt;, for code generation. This is the part where we decipher the architecture manuals provided by the CPU vendors, and generate the raw machine bytes for our machine instructions. In Cranelift, this means filling a code buffer (there’s a &lt;code&gt;MachBuffer&lt;&#x2F;code&gt; sink interface for this!), returned along some internal relocations&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-6-1&quot;&gt;&lt;a href=&quot;#fn-6&quot;&gt;7&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt; and additional metadata. Let’s see what happens for our integer addition, when the times come to generate the code for its VCode equivalent &lt;code&gt;AluRRR&lt;&#x2F;code&gt; on &lt;code&gt;aarch64&lt;&#x2F;code&gt; (in &lt;code&gt;cranelift&#x2F;codegen&#x2F;src&#x2F;isa&#x2F;aarch64&#x2F;inst&#x2F;emit.rs&lt;&#x2F;code&gt;):&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #A9B1D6; background-color: #1A1B26;&quot;&gt;&lt;code data-lang=&quot;rust&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;&#x2F;&#x2F; We match on the VCode&amp;#39;s identity here:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;&amp;amp;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;Inst&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;AluRRR&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; {&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; alu_op&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rd&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rn&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rm&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; } =&amp;gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;    &#x2F;&#x2F; First select the top 11 bits based on the ALU subopcode.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;    let&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; top11&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; =&lt;&#x2F;span&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt; match&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; alu_op&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;        ALUOp&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;Add32&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; =&amp;gt;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt; 0b00001011_000&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;        ALUOp&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;::&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;Add64&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; =&amp;gt;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt; 0b10001011_000&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;        &#x2F;&#x2F; etc&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;    };&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;    &#x2F;&#x2F; Then decide the bits 10 to 15, based on the ALU subopcode as well.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;    let&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; bit15_10&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; =&lt;&#x2F;span&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt; match&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; alu_op&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;        &#x2F;&#x2F; other cases&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;        _&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; =&amp;gt;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt; 0b000000&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;    };&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;    &#x2F;&#x2F; Then use an helper and pass forward the allocated physical registers&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;    &#x2F;&#x2F; values.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;    sink&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;put4&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;enc_arith_rrr&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;top11&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; bit15_10&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rd&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rn&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rm&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;));&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;And what’s this &lt;code&gt;enc_arith_rrr&lt;&#x2F;code&gt; doing, then?&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #A9B1D6; background-color: #1A1B26;&quot;&gt;&lt;code data-lang=&quot;rust&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;fn&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt; enc_arith_rrr&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;bits_31_21&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; u32&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; bits_15_10&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; u32&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rd&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; Writable&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;&amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;Reg&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;&amp;gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rn&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; Reg&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; rm&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; Reg&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;) -&amp;gt;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; u32&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;    (&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;bits_31_21&lt;&#x2F;span&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt; &amp;lt;&amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt; 21&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;        |&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; (&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;bits_15_10&lt;&#x2F;span&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt; &amp;lt;&amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt; 10&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;        |&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt; machreg_to_gpr&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;rd&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;to_reg&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;())&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;        |&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; (&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;machreg_to_gpr&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;rn&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;)&lt;&#x2F;span&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt; &amp;lt;&amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt; 5&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;        |&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; (&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;machreg_to_gpr&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;rm&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;)&lt;&#x2F;span&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt; &amp;lt;&amp;lt;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt; 16&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Encoding the instruction parts (operands, register mentions) is a lot of bit twiddling and fun. We do so for each VCode instruction, until we’ve generated the whole function’s body. If you remember correctly, at this point register allocation may have added some spills&#x2F;reloads&#x2F;move instructions. From the codegen’s point of view, these are just regular instructions with precomputed operands (either real registers, or memory operands involving the stack pointer), so they’re not treated particularly and they’re just generated the same way other VCode instructions are.&lt;&#x2F;p&gt;
&lt;p&gt;More work is done by the codegen backend then, to optimize blocks placement, compute final branch offsets, etc. If you’re interested by this, I strongly encourage you to go read &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;cfallin.org&#x2F;blog&#x2F;2021&#x2F;01&#x2F;22&#x2F;cranelift-isel-2&#x2F;&quot;&gt;this blog post&lt;&#x2F;a&gt; by Chris Fallin. After this, we’re finally done: we’ve produced a code buffer, as well as external relocations (to other functions, memory addresses, etc.) for a single function. The code generator’s task is complete: the final steps consist in linking and, optionally, producing an executable binary.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;mission-accomplished&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#mission-accomplished&quot; aria-label=&quot;Anchor link for: mission-accomplished&quot;&gt;🔗&lt;&#x2F;a&gt;Mission accomplished!&lt;&#x2F;h2&gt;
&lt;p&gt;So, we’re done for today! Thanks for reading this far, hope it has been a useful and pleasant read to you! Feel free to reach out to me on the &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;twitter.com&#x2F;bnjbvr&quot;&gt;twitterz&lt;&#x2F;a&gt; if you have additional remarks&#x2F;questions, and to go contribute on &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;bytecodealliance&#x2F;wasmtime&quot;&gt;Wasmtime&#x2F;Cranelift&lt;&#x2F;a&gt; if this sort of things is interesting to you 😇. Until next time, take care of yourselves!&lt;&#x2F;p&gt;
&lt;p&gt;Thanks to &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;cfallin.org&quot;&gt;Chris Fallin&lt;&#x2F;a&gt; for reading and suggesting improvements to this blog post.&lt;&#x2F;p&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;Really, Rust &lt;em&gt;is&lt;&#x2F;em&gt; the DSL. It was Python code before, that had the advantage to be faster to update. Yet it was doing a lot of magic behind the curtain, which wasn’t very friendly for new people trying to learn and use Cranelift. Despite a statically typed language helping for exploration through tooling, this meta-language is to partially disappear in the long run, see Chris’ &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;cfallin.org&#x2F;blog&#x2F;2020&#x2F;09&#x2F;18&#x2F;cranelift-isel-1&#x2F;&quot;&gt;blog post&lt;&#x2F;a&gt; on this topic. &lt;a href=&quot;#fr-2-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-4&quot;&gt;
&lt;p&gt;Aarch64 connoisseurs may notice that there are other ways to encode an addition. Say, if one of the input operands was the result of a bit shift instruction by an immediate value, then it’s possible to &lt;em&gt;embed&lt;&#x2F;em&gt; the shift within the add, so we end up with fewer machine instructions (and lower the register pressure). This other possible encoding is sufficiently different in terms of register allocation and code generation that it justifies having its own VCode instruction. &lt;code&gt;AluRRR&lt;&#x2F;code&gt; is simpler in the sense that it’s only concerned with register inputs and outputs, thus a perfect example for this post. &lt;a href=&quot;#fr-4-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;What’s an integer overflow for signed integer division? Consider an integer value represented on &lt;code&gt;N&lt;&#x2F;code&gt; bits. If you try to divide the smallest integer value &lt;code&gt;-2**N&lt;&#x2F;code&gt; by &lt;code&gt;-1&lt;&#x2F;code&gt;, it should return &lt;code&gt;2**N&lt;&#x2F;code&gt;, but this is out of range, since the biggest signed integer value we can represent on &lt;code&gt;N&lt;&#x2F;code&gt; bits is &lt;code&gt;(2**N) - 1&lt;&#x2F;code&gt;! So this will overflow and be set to &lt;code&gt;-2**N&lt;&#x2F;code&gt;, which is the initial value, but not the correct result. Good luck debugging this without a software trap! &lt;a href=&quot;#fr-3-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-5&quot;&gt;
&lt;p&gt;Register moves may be introduced because a successor block (in the control flow graph) expects a given virtual register to live in a particular real register, or because a particular instruction requires a virtual register to be allocated to a &lt;em&gt;fixed&lt;&#x2F;em&gt; real register that’s busy: regalloc can then temporarily divert the busy register into another unused register. &lt;a href=&quot;#fr-5-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-8&quot;&gt;
&lt;p&gt;The &lt;code&gt;c&lt;&#x2F;code&gt; in &lt;code&gt;rcx&lt;&#x2F;code&gt; actually stands for &lt;code&gt;count&lt;&#x2F;code&gt;; this is a property inherited from former CPU designs. &lt;a href=&quot;#fr-8-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-7&quot;&gt;
&lt;p&gt;Unless this move carries sign- or zero-extending semantics, which is the case for e.g. x86’s 32-bits &lt;code&gt;mov&lt;&#x2F;code&gt; instructions on a 64-bits architecture. &lt;a href=&quot;#fr-7-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-6&quot;&gt;
&lt;p&gt;Relocations are placeholders for information we don’t have &lt;em&gt;yet&lt;&#x2F;em&gt; access to. For instance, when we’re generating jump instructions, the jump targets offsets are not determined yet. So we record where the jump instruction is in the code stream, as well as which control flow block it should jump into, so we can &lt;em&gt;patch it&lt;&#x2F;em&gt; later when the final offsets are known: that’s the content of our relocation. &lt;a href=&quot;#fr-6-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;&#x2F;section&gt;
</content>
    </entry>
    <entry xml:lang="en">
        <title>Making calls to WebAssembly fast and implementing anyref</title>
        <published>2018-07-04T18:00:42+00:00</published>
        <updated>2018-07-04T18:00:42+00:00</updated>
        
        <author>
          <name>
            
              Benjamin Bouvier
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://bouvier.cc/tech/mozilla-2018-faster-calls-and-anyref/"/>
        <id>https://bouvier.cc/tech/mozilla-2018-faster-calls-and-anyref/</id>
        <content type="html" xml:base="https://bouvier.cc/tech/mozilla-2018-faster-calls-and-anyref/">&lt;p&gt;Since this is the end of the first half-year, I think it is a good time to
reflect and show some work I’ve been doing over the last few months, apart from
the regular batch of random issues, security bugs, reviews and the fixing of 24
bugs found by our &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Fuzzing&quot;&gt;fuzzers&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;span id=&quot;continue-reading&quot;&gt;&lt;&#x2F;span&gt;&lt;h2 id=&quot;bug-1319203-make-js-to-webassembly-calls-blazingly-fast&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#bug-1319203-make-js-to-webassembly-calls-blazingly-fast&quot; aria-label=&quot;Anchor link for: bug-1319203-make-js-to-webassembly-calls-blazingly-fast&quot;&gt;🔗&lt;&#x2F;a&gt;Bug 1319203: Make JS to WebAssembly calls &lt;em&gt;blazingly&lt;&#x2F;em&gt; fast&lt;&#x2F;h2&gt;
&lt;p&gt;If we want more WebAssembly (wasm) adoption, there shouldn’t be a big costly
barrier between the two universes. That is, calls from one world to the other
should be fast. For a very long time, calls from JS to asm.js&#x2F;WebAssembly have
been quite slow in Firefox. In fact, we didn’t optimize them at all. For ease
and speed of implementation at the time, asm.js call activations (data
structures recording information about the function being currently called in
the VM) were very different from the JS ones. This difference indicated some
significant structural differences, like the capability to reconstruct call
stack information used by &lt;code&gt;Error()&lt;&#x2F;code&gt; stack frames, or just tracing the stack for
garbage collection purposes. After putting a lot of hard work into refactoring
and low-level changes over the last year, Spidermonkey was finally ripe for an
optimization.&lt;&#x2F;p&gt;
&lt;p&gt;When we call from JS to asm.js&#x2F;wasm, the call passes through C++, does a bunch of
work and then calls into a piece of glue code directly written in assembly: the
&lt;em&gt;interpreter entry stub&lt;&#x2F;em&gt;. This stub is quite small: it just copies out the C++
arguments into the right places the wasm function being called expects, sets up
some small machine state, calls into the function, then does error checking and
eventually returns to the C++ caller. The critical part is JIT compilation. JIT
compilation means that the code is compiled to machine code by the just-in-time
compiler, IonMonkey. When a JS function has been JIT-compiled and it calls into
wasm, then the caller would have to go back to C++ first, before the control
flow is redirected to WebAssembly.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;bouvier.cc&#x2F;tech&#x2F;mozilla-2018-faster-calls-and-anyref&#x2F;2018-07-interpreter-stub.png&quot; alt=&quot;Diagram showing interpreter entry stub&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Starting with Firefox 60, the JIT compiler makes no distinctions between
calling a JavaScript function or a WebAssembly function, meaning it uses the
same call optimizations for both kinds of function. A new piece of glue code,
the &lt;em&gt;JIT entry stub&lt;&#x2F;em&gt;, is generated for each exported function: it converts and
unboxes the arguments read from the JIT-compiled JS caller into the right
primitive types as expressed in the wasm function’s signature, sets up some
machine registers, calls into the wasm function being called and then converts
the result into a format the JS caller will understand.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;bouvier.cc&#x2F;tech&#x2F;mozilla-2018-faster-calls-and-anyref&#x2F;2018-07-jit-stub.png&quot; alt=&quot;Diagram showing JIT entry stub&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;As you can see, the C++ step that was originally required to call wasm from JS
has been completely eliminated!&lt;&#x2F;p&gt;
&lt;p&gt;This resulted in massive speedups over a variety of different situations: when
a wasm function is directly &#x2F; indirectly &#x2F; polymorphically called, or used as a
getter&#x2F;setter, or called by &lt;code&gt;Function.prototype.call&#x2F;apply&lt;&#x2F;code&gt;, when the call is
missing required arguments, etc. Here’s a brief summary of the results, but
there might be a full-blown blog post about these optimizations coming on
&lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;hacks.mozilla.org&#x2F;&quot;&gt;Mozilla Hacks&lt;&#x2F;a&gt; at some point in the future.
(calling 1 billion times into very simple functions, lower is better)&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;bouvier.cc&#x2F;tech&#x2F;mozilla-2018-faster-calls-and-anyref&#x2F;2018-07-wasm-calls.png&quot; alt=&quot;Charts showing evolution of performance&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;This work is not entirely done yet: we can still even better optimize in the
case of a function call from JS when the called wasm function is definitely
known to be a unique wasm target; see the &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;bugzilla.mozilla.org&#x2F;show_bug.cgi?id=1437065&quot;&gt;tracking
bug&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;bug-1422043-lazy-entry-stub-generation&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#bug-1422043-lazy-entry-stub-generation&quot; aria-label=&quot;Anchor link for: bug-1422043-lazy-entry-stub-generation&quot;&gt;🔗&lt;&#x2F;a&gt;Bug 1422043: Lazy entry stub generation&lt;&#x2F;h2&gt;
&lt;p&gt;The previous bug resolution came with an important memory issue: every exported
function now generates a rather big chunk of code for the JIT entry, having an
impact on the memory occupied by the code itself. This would be fine in most
situations where the number of exported functions is generally low. But when
the wasm module exports a Table (think of the equivalent of a C++ function
table with signature checks), we have to assume that every single function,
including those not explicitly exported, needs entry stubs. Indeed, each
function can be eventually called through the Table, after calls to
&lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;http:&#x2F;&#x2F;webassembly.github.io&#x2F;spec&#x2F;js-api&#x2F;index.html#dom-table-set&quot;&gt;WebAssembly.Table.set&lt;&#x2F;a&gt;.
In fact, the existing code already suffered from this because of the
interpreter entries, but it had been largely amplified by the much larger JIT
entry stubs.&lt;&#x2F;p&gt;
&lt;p&gt;To fix this, we’ve decided to lazily generate all the entry stubs for functions
exported through a table. That is, if a function is &lt;em&gt;explicitly&lt;&#x2F;em&gt; exported, its
stubs will be generated at wasm compile time, but other functions won’t have
stubs yet. If a non-exported function is called through a Table, we’ll generate
the entry stubs the first time it is called. This involves some fun
interactions with our &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;hacks.mozilla.org&#x2F;2018&#x2F;01&#x2F;making-webassembly-even-faster-firefoxs-new-streaming-and-tiering-compiler&#x2F;&quot;&gt;tiered
compilation&lt;&#x2F;a&gt;
mechanism, which can compile functions and create new entry stubs in the
background while the running thread will generate lazy ones.&lt;&#x2F;p&gt;
&lt;p&gt;Not only this fixed the memory regression introduced by bug 1319203, but it
actually made the situation even better than the baseline, because we didn’t
need to generate those interpreter entries for table-exported functions by
default anymore:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;bouvier.cc&#x2F;tech&#x2F;mozilla-2018-faster-calls-and-anyref&#x2F;2018-07-wasm-stubs-memory.png&quot; alt=&quot;Charts showing evolution of memory usage&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Since it’s not entirely readable from the chart: after the patches, the
AngryBots and ZenGarden entry stubs memory usages went down to respectively 262
and 362 KB. This was also a relatively huge win in compilation times, but on
such a low scale that it didn’t make a huge difference on total compile time.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;bug-1447591-remove-wasm-binarytotext&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#bug-1447591-remove-wasm-binarytotext&quot; aria-label=&quot;Anchor link for: bug-1447591-remove-wasm-binarytotext&quot;&gt;🔗&lt;&#x2F;a&gt;Bug 1447591: Remove wasm::BinaryToText&lt;&#x2F;h2&gt;
&lt;p&gt;WebAssembly is a binary format, and there is an equivalent human-readable and
debuggable text format: the WebAssembly Text format, or &lt;em&gt;WAT&lt;&#x2F;em&gt; format. While
SpiderMonkey once directly produced WAT for display in C++, it’s now easier for
&lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;devtools-html&#x2F;debugger.html&quot;&gt;debugger.html&lt;&#x2F;a&gt; to do so in JS.
This also made the mapping between bytecode offsets and text offsets (source
maps) more consistent with the display, and it could be useful in other places
where this project is being used. Recently after confirming that the C++
implementation wasn’t used anymore, I was able to remove it. It’s not every day
that you get a net loss of around 5,500 lines of code, which is always nice:
less code means fewer bugs and less maintenance burden, especially when the code
is dead.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;bug-1445272-1450261-implement-basic-anyref-support&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#bug-1445272-1450261-implement-basic-anyref-support&quot; aria-label=&quot;Anchor link for: bug-1445272-1450261-implement-basic-anyref-support&quot;&gt;🔗&lt;&#x2F;a&gt;Bug 1445272 &#x2F; 1450261: Implement basic &lt;code&gt;anyref&lt;&#x2F;code&gt; support&lt;&#x2F;h2&gt;
&lt;p&gt;A new proposal has been made to the WebAssembly specification committee a few
months ago: to add &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;WebAssembly&#x2F;reference-types&quot;&gt;reference
types&lt;&#x2F;a&gt; to the type system.
Reference types are a new way to represent a reference to any &lt;em&gt;host&lt;&#x2F;em&gt; values. In
a Web environment, this means being capable of playing with JavaScript values
within WebAssembly. This is a huge difference with the existing type system,
which only contains primitive types: integers represented on 32 or 64 bits,
IEEE754 floating-point numbers represented on 32 or 64 bits. This is also a
first step for implementing &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;http:&#x2F;&#x2F;github.com&#x2F;webassembly&#x2F;gc&quot;&gt;garbage
collection&lt;&#x2F;a&gt; (GC) integration within
WebAssembly: since these reference values have been allocated on the GC heap in
JavaScript, they need to be traced during wasm execution.&lt;&#x2F;p&gt;
&lt;p&gt;The basic implementation of this feature in the first bug allows one to use a
new type, called &lt;code&gt;anyref&lt;&#x2F;code&gt;, as part of a function’s signature or in local
variables, be it in a function definition or an imported function. This allows
using JS variables within wasm and pass them around to other JS functions. The
second bug implemented the capability to read and write &lt;code&gt;anyref&lt;&#x2F;code&gt; values in wasm
Globals [1]. Since Globals can be manipulated outside of the wasm Module thanks
to their JS API, and garbage collections can happen at any time in JS, we
needed to implement GC barriers to make sure that the stored value would not be
marked as unused during tracing. There is good literature explaining why these
barriers are needed and what they do, so I will not expand too much on the
topic.&lt;&#x2F;p&gt;
&lt;p&gt;Here’s an example of usage according to latest spec drafts (and therefore
subject to change for now):&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #A9B1D6; background-color: #1A1B26;&quot;&gt;&lt;code data-lang=&quot;common-lisp&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;(module&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    (func $alert (&lt;&#x2F;span&gt;&lt;span style=&quot;color: #0DB9D7;&quot;&gt;import&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; &amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ECE6A;&quot;&gt;env&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;&amp;quot; &amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ECE6A;&quot;&gt;alert&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;) (param anyref))&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    (global $global_ref (mut anyref) (ref.null anyref))&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    (func (&lt;&#x2F;span&gt;&lt;span style=&quot;color: #0DB9D7;&quot;&gt;export&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; &amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ECE6A;&quot;&gt;set_and_alert&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;&amp;quot;&lt;&#x2F;span&gt;&lt;span&gt;) (param $param anyref) (result anyref)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;        ;; Put the previous value of $global_ref on the virtual value stack.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        get_global $global_ref&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;        ;; Get the argument anyref value and store it in $global_ref.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        get_local $param&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        set_global $global_ref&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;        ;; Call the $alert method with the argument anyref value.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        get_local $param&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;        call $alert&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;        ;; The previous value of $global_ref is still on the stack and will be&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;        ;; returned.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    )&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;em&gt;Example of wasm text format using &lt;code&gt;anyref&lt;&#x2F;code&gt;.&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #A9B1D6; background-color: #1A1B26;&quot;&gt;&lt;code data-lang=&quot;javascript&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9D7CD8;font-style: italic;&quot;&gt;async&lt;&#x2F;span&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt; function&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;() {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #9D7CD8;font-style: italic;&quot;&gt;    let&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; {&lt;&#x2F;span&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt; instance&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; } =&lt;&#x2F;span&gt;&lt;span style=&quot;color: #BB9AF7;font-style: italic;&quot;&gt; await&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; WebAssembly&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;instantiate&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;wasmBinary&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #73DACA;&quot;&gt;        env&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt; {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;            alert&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #E0AF68;&quot;&gt;obj&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;) {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;                alert&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;`&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ECE6A;&quot;&gt;Hello, &lt;&#x2F;span&gt;&lt;span style=&quot;color: #7DCFFF;&quot;&gt;${&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;obj&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7DCFFF;&quot;&gt;name}&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ECE6A;&quot;&gt;!&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;`&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;)&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;            }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;        }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;    })&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;    console&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;log&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;instance&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7DCFFF;&quot;&gt;exports&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;set_and_alert&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;({&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #73DACA;&quot;&gt;        name&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;: &amp;#39;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ECE6A;&quot;&gt;world&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;&amp;#39;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #73DACA;&quot;&gt;        secretVal&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #FF9E64;&quot;&gt; 42&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;    }))&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;    &#x2F;&#x2F; alerts &amp;quot;Hello, world!&amp;quot;, logs null&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;    console&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;log&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #0DB9D7;&quot;&gt;JSON&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;stringify&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;instance&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7DCFFF;&quot;&gt;exports&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;set_and_alert&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;({&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #73DACA;&quot;&gt;        name&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;: &amp;#39;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ECE6A;&quot;&gt;there&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;&amp;#39;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;    })))&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;    &#x2F;&#x2F; alerts &amp;quot;Hello, there!&amp;quot;, logs { name: &amp;#39;world!&amp;#39;, secretVal: 42 }&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;})()&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;em&gt;Example of JavaScript using the module defined above, passing JS values and
reading them from WebAssembly.&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;p&gt;This is a very preliminary prototype and it might change in the next few
months. If you feel adventurous, you can try it on Firefox Nightly by setting
the &lt;code&gt;about:config&lt;&#x2F;code&gt; pref &lt;code&gt;javascript.options.wasm_gc&lt;&#x2F;code&gt; to &lt;code&gt;true&lt;&#x2F;code&gt;; note that we
haven’t fully hooked this up to garbage collection yet, so your experimentation
might occasionally throw out-of-memory exceptions. In any case, if you see
something, &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;bugzilla.mozilla.org&#x2F;enter_bug.cgi?product=Core&amp;amp;component=Javascript%3A%20Web%20Assembly&quot;&gt;say
something&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;Say you are a compiler developer, and you would like to port your language to
WebAssembly, and your language uses a GC. At the moment, the only way you can
do this is by compiling your garbage collector to WebAssembly, and it would be
backed by the wasm Module’s memory. This works, but it won’t be very efficient.
Plus, there’s already a very efficient, solidly tested, constantly improving
garbage collector in your browser that uses all the possible dirty low-level
tricks known to mankind, which is the GC being used for JavaScript. What if we
could give you access to the garbage collector directly? Then you’d just need
to give a way to define structures, and then could use a set of opcodes to
allocate them, read and write fields on them, etc. At the moment, the reference
types proposal only allows you to move garbage-collected values around. There’s
also code in Firefox Nightly to experiment with defining your own data
structures and using them, but it is very very early. If you’re interested in
following us implementing more parts, this &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;bugzilla.mozilla.org&#x2F;show_bug.cgi?id=1444925&quot;&gt;tracking
issue&lt;&#x2F;a&gt; might be of
interest.&lt;&#x2F;p&gt;
&lt;p&gt;[1] Think of a C++ “global” value, not a JavaScript “global”.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;future-work&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#future-work&quot; aria-label=&quot;Anchor link for: future-work&quot;&gt;🔗&lt;&#x2F;a&gt;Future work&lt;&#x2F;h2&gt;
&lt;p&gt;There is still much more work to be done on the implementation of WebAssembly
in Spidermonkey, to implement other new proposals, to make it faster, or to
have even better generated code.&lt;&#x2F;p&gt;
&lt;p&gt;A big thank you for the proofreading to &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;http:&#x2F;&#x2F;whereswalden.com&#x2F;&quot;&gt;Waldo&lt;&#x2F;a&gt;,
&lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;steveklabnik&quot;&gt;steveklabnik&lt;&#x2F;a&gt; and
&lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;twitter.com&#x2F;ag_dubs&quot;&gt;ashleygwilliams&lt;&#x2F;a&gt;. Extra thanks go to
Ashley who also drew the two diagrams showing how stubs evolved.&lt;&#x2F;p&gt;
</content>
    </entry>
    <entry xml:lang="en">
        <title>Making asm.js&#x2F;WebAssembly compilation more parallel in Firefox</title>
        <published>2016-04-22T15:00:42+00:00</published>
        <updated>2016-04-22T15:00:42+00:00</updated>
        
        <author>
          <name>
            
              Benjamin Bouvier
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://bouvier.cc/tech/making-asmjs-webassembly-compilation-more-parallel/"/>
        <id>https://bouvier.cc/tech/making-asmjs-webassembly-compilation-more-parallel/</id>
        <content type="html" xml:base="https://bouvier.cc/tech/making-asmjs-webassembly-compilation-more-parallel/">&lt;p&gt;In December 2015, I’ve worked on reducing startup time of asm.js programs in
Firefox by making compilation more parallel. As our
JavaScript engine, Spidermonkey, uses the same compilation pipeline for both
asm.js and WebAssembly, this also benefitted WebAssembly compilation. Now is a
good time to talk about what it meant, how it got achieved and what are the
next ideas to make it even faster.&lt;&#x2F;p&gt;
&lt;span id=&quot;continue-reading&quot;&gt;&lt;&#x2F;span&gt;&lt;h1 id=&quot;what-does-it-mean-to-make-a-program-more-parallel&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#what-does-it-mean-to-make-a-program-more-parallel&quot; aria-label=&quot;Anchor link for: what-does-it-mean-to-make-a-program-more-parallel&quot;&gt;🔗&lt;&#x2F;a&gt;What does it mean to make a program “more parallel”?&lt;&#x2F;h1&gt;
&lt;p&gt;Parallelization consists of splitting a sequential program into smaller
independent tasks, then having them run on different CPU. If your program
is using &lt;code&gt;N&lt;&#x2F;code&gt; cores, it can be up to &lt;code&gt;N&lt;&#x2F;code&gt; times faster.&lt;&#x2F;p&gt;
&lt;p&gt;Well, in theory. Let’s say you’re in a car, driving on a 100 Km long road.
You’ve already driven the first 50 Km in one hour. Let’s say your car can
have unlimited speed from now on. What is the maximal average speed you can
reach, once you get to the end of the road?&lt;&#x2F;p&gt;
&lt;p&gt;People intuitively answer “If it can go as fast as I want, so nearby lightspeed
sounds plausible”. But this is not true! In fact, if you could teleport from
your current position to the end of the road, you’d have traveled 100 Km in one
hour, so your maximal theoritical speed is 100 Km per hour. This result is a
consequence of &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Amdahl%27s_law&quot;&gt;Amdahl’s law&lt;&#x2F;a&gt;.
When we get back to our initial problem, this means you can expect a &lt;code&gt;N&lt;&#x2F;code&gt; times
speedup if you’re running your program with &lt;code&gt;N&lt;&#x2F;code&gt; cores if, and only if your
program can be &lt;strong&gt;entirely&lt;&#x2F;strong&gt; run in parallel. This is usually not the case, and
that is why most wording refers to &lt;em&gt;speedups &lt;strong&gt;up to&lt;&#x2F;strong&gt; N times faster&lt;&#x2F;em&gt;, when it
comes to parallelization.&lt;&#x2F;p&gt;
&lt;p&gt;Now, say your program is already running some portions in parallel. To make it
faster, one can identify some parts of the program that are sequential, and make
them independent so that you can run them in parallel. With respect to our car
metaphor, this means augmenting the portion of the road on which you can run at
unlimited speed.&lt;&#x2F;p&gt;
&lt;p&gt;This is exactly what we have done with parallel compilation of asm.js programs
under Firefox.&lt;&#x2F;p&gt;
&lt;h1 id=&quot;a-quick-look-at-the-asm-js-compilation-pipeline&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#a-quick-look-at-the-asm-js-compilation-pipeline&quot; aria-label=&quot;Anchor link for: a-quick-look-at-the-asm-js-compilation-pipeline&quot;&gt;🔗&lt;&#x2F;a&gt;A quick look at the asm.js compilation pipeline&lt;&#x2F;h1&gt;
&lt;p&gt;I recommend to read this &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;blog.mozilla.org&#x2F;luke&#x2F;2014&#x2F;01&#x2F;14&#x2F;asm-js-aot-compilation-and-startup-performance&#x2F;&quot;&gt;blog
post&lt;&#x2F;a&gt;.
It clearly explains the differences between JIT (Just In Time) and AOT (Ahead
Of Time) compilation, and elaborates on the different parts of the engines
involved in the compilation pipeline.&lt;&#x2F;p&gt;
&lt;p&gt;As a TL;DR, keep in mind that &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;http:&#x2F;&#x2F;asmjs.org&#x2F;&quot;&gt;asm.js&lt;&#x2F;a&gt; is a strictly
validated, highly optimizable, typed subset of JavaScript. Once
validated, it guarantees high performance and stability (no garbage collector
involved!). That is ensured by
mapping every single JavaScript instruction of this subset to a few CPU
instructions, if not only a single instruction. This means an asm.js program
needs to get &lt;em&gt;compiled&lt;&#x2F;em&gt; to machine code, that is, translated from JavaScript to
the language your CPU directly manipulates (like what GCC would do for a C++
program). If you haven’t heard, the results are impressive and you can run
&lt;a href=&quot;https:&#x2F;&#x2F;bouvier.cc&#x2F;tech&#x2F;making-asmjs-webassembly-compilation-more-parallel&#x2F;beta.unity3d.com&#x2F;jonas&#x2F;DT2&#x2F;&quot;&gt;video&lt;&#x2F;a&gt;
&lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.unrealengine.com&#x2F;html5&quot;&gt;games&lt;&#x2F;a&gt; directly in your browser, without
needing to install anything. No plugins. Nothing more than your usual, everyday
browser.&lt;&#x2F;p&gt;
&lt;p&gt;Because asm.js programs can be gigantic in size (in number of functions as well
as in number of lines of code), the first compilation of the entire program is
going to take some time. Afterwards, Firefox uses a caching mechanism that
prevents the need for recompilation and almost instaneously loads the code, so
subsequent loadings matter less*****. The end user will mostly wait for the
first compilation, thus this one needs to be fast.&lt;&#x2F;p&gt;
&lt;p&gt;Before the work explained below, the pipeline for compiling a single function
(out of an asm.js module) would look like this:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;parse the function, and as we parse, emit intermediate representation (IR)
nodes for the compiler infrastructure. SpiderMonkey has several IRs,
including the MIR (middle-level IR, mostly loaded with semantic) and the LIR
(low-level IR closer to the CPU memory representation: registers, stack,
etc.). The one generated here is the MIR. All of this happens on the main
thread.&lt;&#x2F;li&gt;
&lt;li&gt;once the entire IR graph is generated for the function, optimize the MIR
graph (i.e. apply a few optimization passes). Then, generate the LIR graph
before carrying out register allocation (probably the most costly task of the
pipeline). This can be done on supplementary helper threads, as the MIR
optimization and LIR generation for a given function doesn’t depend on other
ones.&lt;&#x2F;li&gt;
&lt;li&gt;since functions can call between themselves within an asm.js module, they
need references to each other. In assembly, a reference is merely an offset
to somewhere else in memory. In this initial implementation, code generation
is carried out on the main thread, at the cost of speed but for the sake of
simplicity.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;So far, only the MIR optimization passes, register allocation and LIR
generation were done in parallel. Wouldn’t it be nice to be able to do more?&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;*&lt;&#x2F;strong&gt; There are conditions for benefitting from the caching mechanism. In
particular, the script should be loaded
&lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;developer.mozilla.org&#x2F;en-US&#x2F;docs&#x2F;Games&#x2F;Techniques&#x2F;Async_scripts&quot;&gt;asynchronously&lt;&#x2F;a&gt;
and it should be of a consequent size.&lt;&#x2F;p&gt;
&lt;h1 id=&quot;doing-more-in-parallel&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#doing-more-in-parallel&quot; aria-label=&quot;Anchor link for: doing-more-in-parallel&quot;&gt;🔗&lt;&#x2F;a&gt;Doing more in parallel&lt;&#x2F;h1&gt;
&lt;p&gt;Our goal is to make more work in parallel: so can we take out MIR generation
from the main thread? And we can take out code generation as well?&lt;&#x2F;p&gt;
&lt;p&gt;The answer happens to be &lt;em&gt;yes&lt;&#x2F;em&gt; to both questions.&lt;&#x2F;p&gt;
&lt;p&gt;For the former, instead of emitting a MIR graph as we parse the function’s
body, we emit a small, compact, pre-order representation of the function’s
body. In short, a new IR. As work was starting on
&lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;webassembly&#x2F;design&quot;&gt;WebAssembly&lt;&#x2F;a&gt; (wasm) at this time, and
since asm.js semantics and wasm semantics mostly match, the IR could just be
the wasm
&lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;WebAssembly&#x2F;design&#x2F;blob&#x2F;master&#x2F;BinaryEncoding.md&quot;&gt;encoding&lt;&#x2F;a&gt;,
consisting of the wasm opcodes plus a few specific asm.js ones*. Then, wasm
is translated to MIR in another thread.&lt;&#x2F;p&gt;
&lt;p&gt;Now, instead of parsing and generating MIR in a single pass, we would now parse
and generate wasm IR in one pass, and generate the MIR out of the wasm IR in
another pass. The wasm IR is very compact and much cheaper to generate than a
full MIR graph, because generating a MIR graph needs some algorithmic work,
including the creation of Phi nodes (join values after any form of branching).
As a result, it is expected that compilation time won’t suffer.  This was a
large refactoring: taking every single asm.js instructions, and encoding them
in a compact way and later decode these into the equivalent MIR nodes.&lt;&#x2F;p&gt;
&lt;p&gt;For the second part, could we generate code on other threads? One structure in
the code base, the &lt;em&gt;MacroAssembler&lt;&#x2F;em&gt;, is used to generate all the code and it
contains all necessary metadata about offsets. By adding more metadata there to
abstract internal calls &lt;strong&gt;**&lt;&#x2F;strong&gt;, we can describe the new scheme in terms of a
classic functional &lt;code&gt;map&lt;&#x2F;code&gt;&#x2F;&lt;code&gt;reduce&lt;&#x2F;code&gt;:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;the wasm IR is sent to a thread, which will return a MacroAssembler. That
is a &lt;code&gt;map&lt;&#x2F;code&gt; operation, transforming an array of wasm IR into an array of
MacroAssemblers.&lt;&#x2F;li&gt;
&lt;li&gt;When a thread is done compiling, we merge its MacroAssembler into one big
MacroAssembler. Most of the merge consists in taking all the offset metadata
in the thread MacroAssembler, fixing up all the offsets, and concatenate the
two generated code buffers. This is equivalent to a &lt;code&gt;reduce&lt;&#x2F;code&gt; operation,
merging each MacroAssembler within the module’s one.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;At the end of the compilation of the entire module, there is still some light
work to be done: offsets of internal calls need to be translated to their
actual locations. All this work has been done in &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;bugzilla.mozilla.org&#x2F;show_bug.cgi?id=1181612&quot;&gt;this bugzilla
bug&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;*&lt;&#x2F;strong&gt; In fact, at the time when this was being done, we used a different
superset of wasm. Since then, work has been done so that our asm.js frontend is
really just another wasm emitter.&lt;&#x2F;p&gt;
&lt;p&gt;**** ** referencing functions by their appearance order index in the module,
rather than an offset to the actual start of the function. This order is indeed
stable, from a function to the other.&lt;&#x2F;p&gt;
&lt;h1 id=&quot;results&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#results&quot; aria-label=&quot;Anchor link for: results&quot;&gt;🔗&lt;&#x2F;a&gt;Results&lt;&#x2F;h1&gt;
&lt;p&gt;Benchmarking has been done on a Linux x64 machine with 8 cores clocked at 4.2
Ghz.&lt;&#x2F;p&gt;
&lt;p&gt;First, compilation times of a few asm.js massive games:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;http:&#x2F;&#x2F;beta.unity3d.com&#x2F;jonas&#x2F;DT2&#x2F;&quot;&gt;DeadTrigger2&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;http:&#x2F;&#x2F;beta.unity3d.com&#x2F;jonas&#x2F;AngryBots&#x2F;&quot;&gt;AngryBots&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;lukewagner&#x2F;PlatformerGamePacked&quot;&gt;Platformer game&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.unrealengine.com&#x2F;html5&quot;&gt;Tappy Chicken&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;The X scale is the compilation time in seconds, so lower is better. Each value
point is the best one of three runs. For the new scheme, the corresponding
relative speedup (in percentage) has been added:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;bouvier.cc&#x2F;tech&#x2F;making-asmjs-webassembly-compilation-more-parallel&#x2F;2016-04-22_parallelization-times.png&quot; alt=&quot;Compilation times of various benchmarks&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;For all games, compilation is much faster with the new parallelization scheme.&lt;&#x2F;p&gt;
&lt;p&gt;Now, let’s go a bit deeper. The Linux CLI tool &lt;code&gt;perf&lt;&#x2F;code&gt; has a &lt;code&gt;stat&lt;&#x2F;code&gt; command
that gives you an average of the number of utilized CPUs during the program
execution. This is a great measure of threading efficiency: the more a CPU is
utilized, the more it is not idle, waiting for other results to come, and thus
useful. For a constant task execution time, the more utilized CPUs, the more
likely the program will execute quickly.&lt;&#x2F;p&gt;
&lt;p&gt;The X scale is the number of utilized CPUs, according to the &lt;code&gt;perf stat&lt;&#x2F;code&gt;
command, so higher is better. Again, each value point is the best one of three
runs.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;bouvier.cc&#x2F;tech&#x2F;making-asmjs-webassembly-compilation-more-parallel&#x2F;2016-04-22_parallelization-cpu-utilized.png&quot; alt=&quot;CPU utilized on DeadTrigger2&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;With the older scheme, the number of utilized CPUs quickly rises up from 1 to 4
cores, then more slowly from 5 cores and beyond. Intuitively, this means that
with 8 cores, we almost reached the theoritical limit of the portion of the
program that can be made parallel (not considering the overhead introduced by
parallelization or altering the scheme).&lt;&#x2F;p&gt;
&lt;p&gt;But with the newer scheme, we get much more CPU usage even after 6 cores! Then
it slows down a bit, although it is still more significant than the slow rise
of the older scheme. So it is likely that with even more threads, we could have
even better speedups than the one mentioned beforehand. In fact, we have moved
the theoritical limit mentioned above a bit further: we have expanded the
portion of the program that can be made parallel. Or to keep on using the
initial car&#x2F;road metaphor, we’ve shortened the constant speed portion of the
road to the benefit of the unlimited speed portion of the road, resulting in a
shorter trip overall.&lt;&#x2F;p&gt;
&lt;h1 id=&quot;future-steps&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#future-steps&quot; aria-label=&quot;Anchor link for: future-steps&quot;&gt;🔗&lt;&#x2F;a&gt;Future steps&lt;&#x2F;h1&gt;
&lt;p&gt;Despite these improvements, compilation time can still be a pain, especially on
mobile. This is mostly due to the fact that we’re running a whole multi-million
line codebase through the backend of a compiler to generate optimized code.
Following this work, the next bottleneck during the compilation process is
parsing, which matters for asm.js in particular, which source is plain text.
Decoding WebAssembly is an order of magnitude faster though, and it can be made
even faster. Moreover, we have even more load-time optimizations coming down
the pipeline!&lt;&#x2F;p&gt;
&lt;p&gt;In the meanwhile, we keep on improving the WebAssembly backend. Keep track of
our progress on &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;bugzilla.mozilla.org&#x2F;show_bug.cgi?id=1188259&quot;&gt;bug
1188259&lt;&#x2F;a&gt;!&lt;&#x2F;p&gt;
</content>
    </entry>
    <entry xml:lang="en">
        <title>Previous writings about Mozilla work</title>
        <published>2016-03-09T18:00:42+00:00</published>
        <updated>2016-03-09T18:00:42+00:00</updated>
        
        <author>
          <name>
            
              Benjamin Bouvier
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://bouvier.cc/tech/previous-writing-about-mozilla-work/"/>
        <id>https://bouvier.cc/tech/previous-writing-about-mozilla-work/</id>
        <content type="html" xml:base="https://bouvier.cc/tech/previous-writing-about-mozilla-work/">&lt;p&gt;I am currently a compiler engineer at Mozilla corporation, the company making
the Firefox browser among else. Our JavaScript virtual machine, Spidermonkey,
is split in several tiers, including an highly optimizing Just-In-Time (JIT)
compiler able to compile JavaScript to assembly at runtime. My previous work
has involved efficiently compiling Float32 arithmetic to hardware instructions
and implement a new SIMD API for the Web.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;about-float32-optimizations&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#about-float32-optimizations&quot; aria-label=&quot;Anchor link for: about-float32-optimizations&quot;&gt;🔗&lt;&#x2F;a&gt;About Float32 optimizations&lt;&#x2F;h2&gt;
&lt;p&gt;The full blog post is
&lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;blog.mozilla.org&#x2F;javascript&#x2F;2013&#x2F;11&#x2F;07&#x2F;efficient-float32-arithmetic-in-javascript&#x2F;&quot;&gt;there&lt;&#x2F;a&gt;.
It has been written in November 2013.&lt;&#x2F;p&gt;
&lt;p&gt;The main idea is that if you have float32 inputs to an operation; and you cast
them to doubles; and you apply an arithmetic operation to these inputs; and you
cast the result back to a float32, then you’d have the same result as if you
did the entire computation with float32 values and operations.&lt;&#x2F;p&gt;
&lt;p&gt;So we’ve introduced an operation in JavaScript that converts a Number to its
closest float32 IEEE754 representation: &lt;code&gt;Math.fround&lt;&#x2F;code&gt;. Said differently, the
above equivalence says that:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #A9B1D6; background-color: #1A1B26;&quot;&gt;&lt;code data-lang=&quot;javascript&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;function&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt; f&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #E0AF68;&quot;&gt;x&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #E0AF68;&quot;&gt; y&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;) {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;font-style: italic;&quot;&gt;    return&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; x&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; +&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; y&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt;function&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt; g&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #E0AF68;&quot;&gt;x&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #E0AF68;&quot;&gt; y&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;) {&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #9D7CD8;font-style: italic;&quot;&gt;    var&lt;&#x2F;span&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt; xf&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; =&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; Math&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;fround&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;x&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;)&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #9D7CD8;font-style: italic;&quot;&gt;    var&lt;&#x2F;span&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt; yf&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; =&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; Math&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;fround&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;y&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;)&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #BB9AF7;font-style: italic;&quot;&gt;    return&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; Math&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;fround&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;xf&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt; +&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; yf&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;)&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #51597D;font-style: italic;&quot;&gt;&#x2F;&#x2F; For all x, y that can be represented exactly as float32:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;assert&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt;f&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;x&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; y&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;)&lt;&#x2F;span&gt;&lt;span style=&quot;color: #BB9AF7;&quot;&gt; ===&lt;&#x2F;span&gt;&lt;span style=&quot;color: #7AA2F7;&quot;&gt; g&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt;x&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;,&lt;&#x2F;span&gt;&lt;span style=&quot;color: #C0CAF5;&quot;&gt; y&lt;&#x2F;span&gt;&lt;span style=&quot;color: #9ABDF5;&quot;&gt;))&lt;&#x2F;span&gt;&lt;span style=&quot;color: #89DDFF;&quot;&gt;;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Yes, &lt;code&gt;===&lt;&#x2F;code&gt;. The same &lt;code&gt;===&lt;&#x2F;code&gt; you’ve been told &lt;strong&gt;not&lt;&#x2F;strong&gt; to use for floating-point
Numbers. But here, we have &lt;em&gt;bitwise&lt;&#x2F;em&gt; equality, so we can use strict equality*.&lt;&#x2F;p&gt;
&lt;p&gt;Processors have special instructions for carrying out float32 arithmetic, which
have higher throughput than the equivalent double ones. With this result in
mind, we could add a pass that would spot opportunities where the computations
are equivalent (thanks to &lt;code&gt;Math.fround&lt;&#x2F;code&gt; hints) and emit float32 instructions
instead of double instructions. This sped up a some numerical applications and
games engines by a few points.&lt;&#x2F;p&gt;
&lt;p&gt;* a careful reader would object that this is wrong for &lt;code&gt;x = y = NaN&lt;&#x2F;code&gt;, which
I’ve put away for the sake of simplicity.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;about-simd-js&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#about-simd-js&quot; aria-label=&quot;Anchor link for: about-simd-js&quot;&gt;🔗&lt;&#x2F;a&gt;About SIMD.js&lt;&#x2F;h2&gt;
&lt;p&gt;The full blog post is
&lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;blog.mozilla.org&#x2F;javascript&#x2F;2015&#x2F;03&#x2F;10&#x2F;state-of-simd-js-performance-in-firefox&#x2F;&quot;&gt;there&lt;&#x2F;a&gt;.
It has been written in March 2015.&lt;&#x2F;p&gt;
&lt;p&gt;Nowadays, processors have instructions sets that allow them to execute several
simple arithmetic operations at once. For instance, let’s say you have two
arrays of integers and you want to add each element to the corresponding one in
the other array. If both arrays have size &lt;code&gt;N&lt;&#x2F;code&gt;, this means you’ll have to carry
out &lt;code&gt;N&lt;&#x2F;code&gt; scalar additions. But processors can actually group these into bundles
of several additions, with SIMD; for the case of 32-bits wide integers, on most
modern processors, you need at most &lt;code&gt;Math.ceil(N &#x2F; 4)&lt;&#x2F;code&gt; instructions. The blog
post details what SIMD.js is and what bottlenecks we hit during implementation.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;conclusion&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#conclusion&quot; aria-label=&quot;Anchor link for: conclusion&quot;&gt;🔗&lt;&#x2F;a&gt;Conclusion&lt;&#x2F;h2&gt;
&lt;p&gt;This was a small reminder about previously written blog posts. If you’re into
JavaScript, compilers or low-level optimization, I can only recommend you to go
read the &lt;a rel=&quot;noopener noreferrer external&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;blog.mozilla.org&#x2F;javascript&#x2F;&quot;&gt;Mozilla’s JavaScript blog&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
</content>
    </entry>
</feed>

