the avatar image of Benjamin Bouvier

cargo-machete, find unused dependencies quickly

cargo-machete is a new Cargo tool that detects unused dependencies in Rust projects, in a fast (yet imprecise) way. As of today you can install it with cargo install cargo-machete and then run it with cargo machete from any folder that contains a workspace or crate, to find if you have potentially unused dependencies. Beware, it can report a few false positives!

Problem statement

When developers hack on code, it’s a pretty common to reuse software that already exists and has been written, optimized, and battle-tested by many others. In fact, that’s a core idea of the open-source movement, and one historical reason for its existence.

When zooming in into the Rust programming language case, my opinion is that it is also a key reason why Rust has been so successful: having plenty of crates doing everything you might need, already implemented for you and at hand’s reach on crates.io. Plus, having the wonder of a one-does-it-all Cargo tool that makes it very easy to use those crates as dependencies in your project. [1]

However, this comes with a price: sometimes you add a dependency because it’s useful at a particular point in time. Much later, it’s not useful anymore, but you may have forgotten about it. And then, the dependency remains as a zombie in your Cargo.toml file. Cargo will include it in the compilation graph, despite the compilation artifacts not being used at all. The unused dependency will just stay there, silently weep, waiting for you to recall it exists.

Of course, the problem can even become worse: maybe you maintain several crates that have unused dependencies. Or maybe you work with many crates as part of a workspace, and each may have unused dependencies. Or simply you use many dependencies yourself, and some may include unused dependencies. If you’ve published your crates and others use those, then everyone could also compile unused dependencies. At the scale of the entire Rust crates ecosystem, it can have a huge impact on the compile times, produced heat and wasted energy.

Have you heard about our lord and savior, cargo-udeps

There’s already a nice tool for this in the ecosystem: cargo-udeps. It will compile your crate (or workspace) and then infer from the compiled artifacts what dependencies are used by your project, and thus show you which dependencies are unused.

That’s great, but the way it works forces a few tradeoffs:

Let’s dive a bit deeper into the last item, which I’ll refer to as the transitively-used dependencies problem. Say you have your project AAA that contains a dependency to serde in its Cargo.toml file, while it’s not directly used by your code. In fact, if you did a text-search of serde in AAA’s code with grep, you wouldn’t find a single match[2]. But now AAA is using another crate, AB, that itself depends on serde. cargo-udeps will see that serde is used overall, so it cannot let you know that AAA’s Cargo.toml file references an unused dependency to serde.

Graph of crates containing one unused crate

How is this a problem? After all, if the workspace uses serde even indirectly, then we will have to compile it at some point, so it’s not like it’s really unused.

First of all, the AAA crate might be using a different version of serde than the AB crate, and this could result in different copies of the same crate in your workspace. Note there are other nice tools that automatically detect this kind of situation (hi there cargo-deny).

Second, the order in which crates are compiled has an impact on compilation parallelism, and having unused dependencies may add spurious synchronization points in the compilation graph. When a Rust crate gets compiled by Cargo, Cargo proceeds in two phases:

The advantage of this two-phases scheme is that once Cargo is done with phase 1 for a particular crate, it can kick off the same process for other crates up the dependency tree, while it runs phase 2 concurrently. With a multi-core machine as is the norm on desktop computers, it’s almost certain that this will bring speedups!

For instance, consider the following Cargo.toml file from our previous example project:

[dependencies]
serde = "1.0"

Then a possible compilation graph could look like that:

Compilation graph showing phases

In this case, ab phase 1 can start as soon as serde phase 1 has finished, while serde’s compilation phase 2 happens in the background.

If you’re interested in reducing the overall compile times of your Rust project, I would strongly suggest to go read Rust’s documentation around timings visualization. Crates which spend lots of time in the first phase (or more generally, in both phases) are basically pipelining bottlenecks, so identifying/removing/working around them overall speeds up compile times.

Back to our small unused dependency problem: an unused dependency in your Cargo.toml may block the compilation of other crates up the dependency tree, and thus may slow down the whole compilation process by creating useless check points.

Consider a crate C that depends on crates A and B, with B actually unused:

Pipeline stall

Here, the compilation of the crate C could start way earlier, but it’s blocking waiting for the compilation of B to finish first, while it’s not even used!

Solving this, the naive way

So when I was trying to confirm whether crates found by cargo-udeps were actually used or not in my Rust projects, the thing I’d do would be to grep (or better, use the blazingly fast Rust replacement ripgrep) the crate’s name in the project. After all, the crate’s name is in the source directory, if and only if the crate is used, right?

The answer is… mostly, yes. If we exclude dynamic code loading via mechanisms like dlopen or WebAssembly, then there aren’t so many ways to use other crates directly, in Rust code. In fact, we can exhaustively enumerate all the syntax items to use other dependencies in Rust:

use my_crate;
use your_crate as my_crate;
use { your_crate as my_crate };
extern crate my_crate;

fn main() {
    my_crate::something();
}

I’ve looked at a bit of Rust code now, and I haven’t seen other direct forms; if I am missing any, please let me know! Now, these are the most frequent ways to use a dependency, but there are in fact other ways:

And then, there’s still a bit of room for some false positives:

But that would do most of the job, wouldn’t it? In particular, compared to cargo-udeps, this approach doesn’t suffer from the transitively-used dependencies problem. If you look for a crate’s name in the src/ directory and it’s not there, it’s likely not used by your crate. The End.

A tedious process calls for automation, so I made a tool

And I’ve called it cargo-machete. Like a machete, it is very useful for quickly weeding out things, but it is very imprecise and you wouldn’t trust it at 100%.

The gist of it is:

This tool is fast, because it combines the core library behind ripgrep for matching regular expressions, with rayon for running it in parallel across all the dependencies of a project. On my machine, the problem is CPU-bound, because of the execution of the regular expression (and maybe thanks to my NVME storage too). That’s only one data point, but on this particular beefy desktop I use, it scans the entirety of the rust-lang/rust repository in 1.08 seconds, or all of BytecodeAlliance/wasmtime in 0.58 seconds.

The tool is open source, of course.

As is the tradition for Cargo tools, it can be installed with:

cargo install cargo-machete

and then can be used, from any directory that contains Rust code (be it a workspace, a single project, or a directory on top of many Rust projects), with the following line:

cargo machete

Here’s an output example:

> cargo machete
Looking for crates in this directory and analyzing their dependencies...
/home/ben/code/cargo-machete/integration-tests/with-bench/Cargo.toml -- no package, must be a workspace
just-unused -- /home/ben/code/cargo-machete/integration-tests/just-unused/Cargo.toml:
	log
unused-transitive -- /home/ben/code/cargo-machete/integration-tests/unused-transitive/Cargo.toml:
	lib1
Done!

There are false positives: code generated via macros or build scripts aren’t inspected as they’re not in the src/ directory and cargo-machete doesn’t run any compile step. For instance, if a project depends on log , but uses it only through log_once, then cargo-machete will incorrectly flag log as an unused dependency.

The good news is that, thanks to a contribution from @daniel5151, you can specify known false positives in the Cargo.toml file of your crate, allowing use of cargo-machete in CI setups:

[package.metadata.cargo-machete]
ignored = ["log"] # false positive, used by log_once! macro

As far as I know, the risk for false negatives (i.e. crates that are unused, but the tool thinks they’re used) is pretty low. One such instance would be a multi-line string containing one of the use forms, but that seems rather unlikely to be present in most Rust projects.

The tool is still a bit rough, but it’s been already quite useful for some projects I’ve been working on! In a particular work project, most unused dependencies were transitively used and compiled, but the rejiggering of the compilation graph lead to a 5% compile time speedup overall. Good impact over effort ratio.

What about other languages?

What makes this possible in Rust, and could it be extended to other languages?

Dynamic languages by nature dynamically load code, but there are still ways to try to automate detecting unused dependencies same as cargo-machete does. Consider JavaScript and its require function, that can dynamically evaluate a string that’s a path to a file with code we want to import. Since there’s an infinity of ways to create a string, we can’t just perfectly rely on finding require("abc") and assume that if not present, then abc isn’t used. Ditto with import statements, which can evaluate dynamic sources. That being said, if JS code is restricted to use require statements with only static strings or static import statements, then this may work too! Although when restricting to static requires, even just loading the code in NodeJS would be sufficient to find unused dependencies with perfect accuraccy.

Back to static languages, where I constrain the problem to non-dynamic dependencies (loaded via dlopen etc.). In a language like C or C++, there are no unified module systems or package description (yet! although cmake might be a de-facto standard). We can still apply this to header files, and look for their inclusion via #include statements. Macros and preprocessed code would also throw a wrench in the process. Then some human intervention would still be required to eliminate the .c files, but I haven’t thought about it too much.

Static analysis of compiled binaries might be simpler, for that matter. If we consider the problem for WebAssembly, we can frame it as “which imported functions are not used in the module”, potentially eliminating an entire range of host functions. In the simplest case, we could just look at the code section, through the function bodies, and see if there’s any reference to indices of every single imported function in call opcodes. Then, there can be function Tables referencing those, so we have to make sure no table elements reference the function. And if any table is mutable and publicly exposed via an export, then a user of the wasm module may reference any function declared in the wasm module, including imported functions, so all bets are off. Note dead-code elimination in wasm would be pretty similar and suffer from the same limitations: after all, a function dependency is just another kind of function, in wasm! Each format may have such idiosyncrasies like that. Static analysis of final binaries (as opposed to libraries) might be possible and reliable, though.

Closing thoughts

For the sake of completeness, I should mention the existence of a rustc crate-wide lint for this, since May 2020 or so: #![warn(unused_crate_dependencies)]. This tells about unused crate dependencies directly as a Rust warning, which in my opinion would be the ideal end goal! Unfortunately, some Github comments suggest it suffers from having too many false positives, and still it requires compiling the code.

@est31, of cargo-udeps’s fame, has been working on a better solution. It seems to not be so far from completion, so between this and the Rust lint, I’m hopeful that there could be a time where we have a solution that is perfectly precise, with neither false positives nor false negatives.

In the meanwhile, I hope that cargo-machete can be useful to some of you, or that it inspires others to make similar quick-and-dirty tools, in Rust or in other languages. Thanks for reading this far, and please get in touch if you have any thoughts about this!


  1. If you don’t know about the cargo-edit tool that allows you to add a dependency in one line with cargo add serde to your project: now you do.

  2. Wait for it…