cargo-machete, find unused dependencies quickly
cargo-machete
is a new Cargo tool that detects unused dependencies in Rust
projects, in a fast (yet imprecise) way. As of today you can install it with
cargo install cargo-machete
and then run it with cargo machete
from any
folder that contains a workspace or crate, to find if you have potentially
unused dependencies. Beware, it can report a few false positives!
Problem statement
When developers hack on code, it’s a pretty common to reuse software that already exists and has been written, optimized, and battle-tested by many others. In fact, that’s a core idea of the open-source movement, and one historical reason for its existence.
When zooming in into the Rust programming language case, my opinion is that it
is also a key reason why Rust has been so successful: having plenty of crates
doing everything you might need, already implemented for you and at hand’s
reach on crates.io
. Plus, having the wonder of a one-does-it-all Cargo tool
that makes it very easy to use those crates as dependencies in your project.
[1]
However, this comes with a price: sometimes you add a dependency because it’s
useful at a particular point in time. Much later, it’s not useful anymore, but
you may have forgotten about it. And then, the dependency remains as a zombie
in your Cargo.toml
file. Cargo will include it in the compilation graph,
despite the compilation artifacts not being used at all. The unused dependency
will just stay there, silently weep, waiting for you to recall it exists.
Of course, the problem can even become worse: maybe you maintain several crates that have unused dependencies. Or maybe you work with many crates as part of a workspace, and each may have unused dependencies. Or simply you use many dependencies yourself, and some may include unused dependencies. If you’ve published your crates and others use those, then everyone could also compile unused dependencies. At the scale of the entire Rust crates ecosystem, it can have a huge impact on the compile times, produced heat and wasted energy.
Have you heard about our lord and savior, cargo-udeps
There’s already a nice tool for this in the ecosystem:
cargo-udeps
. It will compile your
crate (or workspace) and then infer from the compiled artifacts what
dependencies are used by your project, and thus show you which dependencies are
unused.
That’s great, but the way it works forces a few tradeoffs:
- it requires to compile the whole crate with the Rustc
nightly
compiler. For me that means recompiling the whole project from scratch, most of the time, since I’m mostly using stable rustc as my daily driver. - if you compile for multiple targets (i.e. different combinations of CPU
flavor, OS, environment, etc.), you’d need to run
cargo-udeps
on each of those to find per-target unused dependencies. For instance, if a dependency is only configured when compiling for x86_64 machines, then it may be flagged as unused on every other configuration. - most of all, since it look at compilation artifacts, it cannot know if a specific dependency is directly used by your crate, or indirectly, leading to somehwat mystifying results in the case of workspaces.
Let’s dive a bit deeper into the last item, which I’ll refer to as the
transitively-used dependencies problem. Say you have your project AAA
that
contains a dependency to serde
in its Cargo.toml
file, while it’s not
directly used by your code. In fact, if you did a text-search of serde
in
AAA
’s code with grep
, you wouldn’t find a single match[2]. But now AAA
is using another crate, AB
, that itself depends on serde
. cargo-udeps
will see that serde
is used overall, so it cannot let you know that AAA
’s
Cargo.toml
file references an unused dependency to serde
.
How is this a problem? After all, if the workspace uses serde
even
indirectly, then we will have to compile it at some point, so it’s not like
it’s really unused.
First of all, the AAA
crate might be using a different version of serde
than the AB
crate, and this could result in different copies of the same
crate in your workspace. Note there are other nice tools that automatically
detect this kind of situation (hi there
cargo-deny).
Second, the order in which crates are compiled has an impact on compilation parallelism, and having unused dependencies may add spurious synchronization points in the compilation graph. When a Rust crate gets compiled by Cargo, Cargo proceeds in two phases:
- first, it collects information so as to unlock the compilation of other
crates further down the road that may depend on this particular one. I don’t
know precisely what it entails, but one can make educated guesses: parse the
code, analyze which items are
pub
lic, compute memory layouts for public types, collect type information and so on and so forth. - then, it does the actual compilation: optimize and generate the actual machine code for that particular crate, that will be later linked with other artifacts to form the final executable program.
The advantage of this two-phases scheme is that once Cargo is done with phase 1 for a particular crate, it can kick off the same process for other crates up the dependency tree, while it runs phase 2 concurrently. With a multi-core machine as is the norm on desktop computers, it’s almost certain that this will bring speedups!
For instance, consider the following Cargo.toml
file from our previous
example project:
[]
= "1.0"
Then a possible compilation graph could look like that:
In this case, ab
phase 1 can start as soon as serde
phase 1 has finished,
while serde
’s compilation phase 2 happens in the background.
If you’re interested in reducing the overall compile times of your Rust project, I would strongly suggest to go read Rust’s documentation around timings visualization. Crates which spend lots of time in the first phase (or more generally, in both phases) are basically pipelining bottlenecks, so identifying/removing/working around them overall speeds up compile times.
Back to our small unused dependency problem: an unused dependency in your
Cargo.toml
may block the compilation of other crates up the dependency tree,
and thus may slow down the whole compilation process by creating useless check
points.
Consider a crate C
that depends on crates A
and B
, with B
actually
unused:
Here, the compilation of the crate C
could start way earlier, but it’s
blocking waiting for the compilation of B
to finish first, while it’s not
even used!
Solving this, the naive way
So when I was trying to confirm whether crates found by cargo-udeps
were
actually used or not in my Rust projects, the thing I’d do would be to grep
(or better, use the blazingly fast Rust replacement
ripgrep) the crate’s name in the
project. After all, the crate’s name is in the source directory, if and only if
the crate is used, right?
The answer is… mostly, yes. If we exclude dynamic code loading via mechanisms
like dlopen
or WebAssembly, then there aren’t so many ways to use other
crates directly, in Rust code. In fact, we can exhaustively enumerate all the
syntax items to use other dependencies in Rust:
use my_crate;
use your_crate as my_crate;
use ;
extern crate my_crate;
I’ve looked at a bit of Rust code now, and I haven’t seen other direct forms; if I am missing any, please let me know! Now, these are the most frequent ways to use a dependency, but there are in fact other ways:
build.rs
scripts can generate code that could use other crates, and that would not be visible through a text search in thesrc/
directory, as the generated code is somewhere inside thetarget/build/
directory.- macros (procedural or not) can expand to code that’s using other crates,
while the source code doesn’t explicitly mention them. For instance, the
log_once
crate uses thelog
macros in its own macros, butlog_once
doesn’t depend onlog
explicitly. It’s a bold and smart move: it breaks the coupling with the specific version oflog
, and as long as the high-level API oflog
is stable (which is the case), thenlog_once
works with any version oflog
.
And then, there’s still a bit of room for some false positives:
- raw text submatches: e.g. if a crate is named
bar
, thenfoobar::
would be a match if we’re doing a rawgrep
search - text search isn’t syntaxic analysis, and we wouldn’t know if a match is in a
comment (
// use foo;
), or a string (String::from("use foo;")
).
But that would do most of the job, wouldn’t it? In particular, compared to
cargo-udeps
, this approach doesn’t suffer from the transitively-used
dependencies problem. If you look for a crate’s name in the src/
directory
and it’s not there, it’s likely not used by your crate. The End.
A tedious process calls for automation, so I made a tool
And I’ve called it cargo-machete
. Like a machete, it is very useful for
quickly weeding out things, but it is very imprecise and you wouldn’t trust it
at 100%.
The gist of it is:
- find directories that might contain Rust projects, as indicated by the
presence of a
Cargo.toml
file - for each dependency, create an absolutely ugly regular expression that
matches any of the syntaxic forms presented above. The regular expression
does better than just raw text search, in particular it doesn’t run into the
text submatch issue.
- then for each file in the project, try to match the regular expression against each line of any source file, and stop at the first successful match (which means the dependency is used)
This tool is fast, because it combines the core library behind ripgrep
for
matching regular expressions, with rayon for running it in parallel across
all the dependencies of a project. On my machine, the problem is CPU-bound,
because of the execution of the regular expression (and maybe thanks to my NVME
storage too). That’s only one data point, but on this particular beefy desktop
I use, it scans the entirety of the rust-lang/rust
repository in 1.08
seconds, or all of BytecodeAlliance/wasmtime
in 0.58 seconds.
The tool is open source, of course.
As is the tradition for Cargo tools, it can be installed with:
cargo install cargo-machete
and then can be used, from any directory that contains Rust code (be it a workspace, a single project, or a directory on top of many Rust projects), with the following line:
cargo machete
Here’s an output example:
> cargo
There are false positives: code generated via macros or build scripts aren’t
inspected as they’re not in the src/
directory and cargo-machete
doesn’t
run any compile step. For instance, if a project depends on log
, but uses it
only through log_once
, then cargo-machete
will incorrectly flag log
as an
unused dependency.
The good news is that, thanks to a contribution from @daniel5151
, you can
specify known false positives in the Cargo.toml
file of your crate,
allowing use of cargo-machete
in CI setups:
[]
= ["log"] # false positive, used by log_once! macro
As far as I know, the risk for false negatives (i.e. crates that are unused,
but the tool thinks they’re used) is pretty low. One such instance would be a
multi-line string containing one of the use
forms, but that seems rather
unlikely to be present in most Rust projects.
The tool is still a bit rough, but it’s been already quite useful for some projects I’ve been working on! In a particular work project, most unused dependencies were transitively used and compiled, but the rejiggering of the compilation graph lead to a 5% compile time speedup overall. Good impact over effort ratio.
What about other languages?
What makes this possible in Rust, and could it be extended to other languages?
Dynamic languages by nature dynamically load code, but there are still ways to
try to automate detecting unused dependencies same as cargo-machete
does.
Consider JavaScript and its require
function, that can dynamically evaluate a
string that’s a path to a file with code we want to import. Since there’s an
infinity of ways to create a string, we can’t just perfectly rely on finding
require("abc")
and assume that if not present, then abc
isn’t used. Ditto
with import
statements, which can evaluate dynamic sources. That being said,
if JS code is restricted to use require
statements with only static strings
or static import
statements, then this may work too! Although when
restricting to static require
s, even just loading the code in NodeJS would
be sufficient to find unused dependencies with perfect accuraccy.
Back to static languages, where I constrain the problem to non-dynamic
dependencies (loaded via dlopen
etc.). In a language like C or C++, there are
no unified module systems or package description (yet! although cmake
might
be a de-facto standard). We can still apply this to header files, and look for
their inclusion via #include
statements. Macros and preprocessed code would
also throw a wrench in the process. Then some human intervention would still be
required to eliminate the .c files, but I haven’t thought about it too much.
Static analysis of compiled binaries might be simpler, for that matter. If we
consider the problem for WebAssembly, we can frame it as “which imported
functions are not used in the module”, potentially eliminating an entire range
of host functions. In the simplest case, we could just look at the code
section, through the function bodies, and see if there’s any reference to
indices of every single imported function in call
opcodes. Then, there can be
function Table
s referencing those, so we have to make sure no table elements
reference the function. And if any table is mutable and publicly exposed via an
export, then a user of the wasm module may reference any function declared in
the wasm module, including imported functions, so all bets are off. Note
dead-code elimination in wasm would be pretty similar and suffer from the same
limitations: after all, a function dependency is just another kind of function,
in wasm! Each format may have such idiosyncrasies like that. Static analysis of
final binaries (as opposed to libraries) might be possible and reliable,
though.
Closing thoughts
For the sake of completeness, I should mention the existence of a rustc
crate-wide lint for this, since May 2020 or
so:
#![warn(unused_crate_dependencies)]
. This tells about unused crate
dependencies directly as a Rust warning, which in my opinion would be the ideal
end goal! Unfortunately, some Github comments suggest it suffers from having
too many false positives, and still it requires compiling the code.
@est31
, of cargo-udeps
’s fame, has been working on a better
solution. It seems to not be so
far from completion, so between this and the Rust lint, I’m hopeful that there
could be a time where we have a solution that is perfectly precise, with
neither false positives nor false negatives.
In the meanwhile, I hope that cargo-machete
can be useful to some of you, or
that it inspires others to make similar quick-and-dirty tools, in Rust or in
other languages. Thanks for reading this far, and please get in
touch if you have any thoughts about this!
-
If you don’t know about the
cargo-edit
tool that allows you to add a dependency in one line withcargo add serde
to your project: now you do. ↩ -
Wait for it… ↩