Introduction
Scripting languages provide immediate feedback: They are usually interpreted or compiled to code just-in-time, enabling a rapid edit-test-cycle. On the other hand, languages like C++ and Rust generate all code upfront before anything gets executed. While this can make for some very efficient code, it slows down development: For any change you make, there is a certain time you have to wait before you can test it. For C++ specifically this is exacerbated by these traits:
- It is a complicated language with lots of baggage – hard, and therefore slow, to parse.
- Modules still haven’t “arrived” – preprocessor inclusion of a large amount of header code is still necessary and can lead to very large translation units that must be processed.
- Template instantiation that can take long and runs single-threaded in all major compilers.
- Templates produce large binaries and debug info, both during compilation and linking. More bytes = more time needed!
- There are two steps to building an executable: Compiling and linking.
Depending on the size of the codebase and the hardware building it, there can be substantial turnaround time forcing developers to switch to other tasks, and losing flow and context.
Best practices for writing C++ code and a distributed build system can go a long way in reducing compile times. The clang compiler has also focused on fast compilation speeds and beats gcc in most cases. But in this post we want to focus on speeding up the linking step, which comes after building the object files of a library or executable.
Linking
In the typical edit-build-test cycle, one or a few source files will be built, and afterwards linked into an executable or shared library. Linking can easily take up a majority of the time: Even if only one file changed, the entire binary is recreated (there have been attempts at incremental linking, but in practice they are brittle and hardly used, at least on Unix).
Overview of Unix Linkers
Here’s a brief overview of the major linkers I have used on Linux the past years, in chronological order of release:
- ld: The GNU linker (sometimes called bfd linker or
ld.bfd
since it uses the BFD libraries for accessing the object files). Part of binutils. - gold: A linker developed at Google to reduce link times of large applications. Also part of binutils.
- lld: The linker of the LLVM project. It is usually faster than gold.
At work we have a large C++ codebase that clocs in at nearly 12 million lines. It contains lots of templated code to effortlessly generate fast code across distinct data types and for special cases. Binaries can get quite large, especially with full debug info. To deal with this huge beast and keep developers productive we made these adjustments over the years:
- June 2012: Port to clang. In addition to quicker compilation, this also enabled better tooling (e.g. sanitizers that weren’t on par in gcc at that time). Developers can use either gcc or clang. For testing and production gcc remains king.
- August 2013: Integrate gold. This provided a noticeable reduction of link times.
- December 2017: Enable
-gsplit-dwarf
. This outsources the debug info from the object file into an adjacent file and therefore reduces the work the linker has to perform (more details). - March 2017: Integrate lld, which was faster than gold, but caused some issues in conjunction with
-gsplit-dwarf
so in the end we didn’t use it.
Then in May 2021 a new linker was released: mold. Its main goal was to be a faster (much faster) replacement for all the existing linkers. To be honest, with all the development that had been going on with lld and gold and the speedups they provided I wasn’t expecting any major breakthroughs on the linker front anymore. But benchmarks looked promising. So I moved to the fun part – integrating a new tool into our build, analyzing the resulting bugs and, finally, benchmarking!
Using mold
Building mold is easy – just follow the instructions. You will end up with the mold
executable and an ld
symlink pointing to it. But how do we make gcc or clang use it? When the compiler generates a shared library or an executable, it searches for the ld
command in some predefined locations, which can be overridden by -B/path/to/mold/install/dir
(see the gcc docs):
1 2 3 4 5 6 |
# Uses ld from standard location $ gcc f1.cpp.o f2.cpp.o -shared -o lib1.so # Uses ld from -B override $ gcc -B/path/to/mold/install/dir f1.cpp.o f2.cpp.o -shared -o lib1.so |
That’s it, next up is a scratch make and analyzing the errors 😉
Issues encountered
1 |
mold: warning: libsse_icc.a(searchbv.cpp.o):(.gnu.linkonce.d.DW.ref.__gxx_personality_v0): R_X86_64_64 relocation against symbol `__gxx_personality_v0' can not be used; recompile with -fPIC |
Some of our object files are built by Intel’s ICC compiler for better optimization. This triggered an error during linking. Apparently ICC emits a legacy .gnu.linkonce
section, which can be ignored. This was workarounded quickly in mold.
We had something like this in our codebase:
1 2 |
$ mold main.cpp.o -l/path/to/lib1.a -o exe mold: library not found: /path/to/lib1.a |
This worked in gold, but does not in ld nor lld. It was trivial to fix this in our codebase by including the archive without -l
:
1 |
$ mold main.cpp.o /path/to/lib1.a -o exe |
This issue turned out to be two distinct ones. First, we had a crash in a binary linked by mold. This was fixed by a later version, though it’s unknown to me what exactly fixed it, and I did not have time to investigate. The second issue was very slow debugging: It took ages to attach to or start mold-linked binaries – which is a tough sell if you want to speed up developer workflows. Analysis of the binaries showed that there was no gdb index section in the binaries since mold did not support this yet (the gdb index is optionally constructed at link time when passing -Wl,--gdb-index
and allows the debugger to look up symbols quickly). For our very large binaries it was essential. Eventually the feature was implemented (very quickly!) and this problem was solved. To construct the index, the object files need to be built with -ggnu-pubnames
.
When taking the new gdb index feature for a test drive, mold crashed during linking of some binaries. After providing a reproducer the issue was again fixed quickly!
- Executables failing to link with undefined symbols
There were a handful of targets that failed to link with undefined symbols. All these linker invocations featured static archives. The reason here was that mold has a slightly different archive handling than gold and ld, but identical to lld. That handling is described here. In practice that means mold can use different object files to resolve symbols, and these different object files in turn have dependencies themselves which need to be fulfilled.
Here’s an example:
main.cpp.o
references functionf1
- Both
a.cpp.o
andb.cpp.o
define functionf1
(e.g. it’s an inline function so it is not an ODR violation) a.cpp.o
is part of a static librarytools.a
and references lots of other symbols from another library, let’s call itlibext.a
b.cpp.o
has no external references- The link command is
gcc -o exe tools.a main.cpp.o b.cpp.o
The “standard” linker semantics of ld and gold use objects from an archive only to resolve undefined symbols from objects previously encountered on the command line. They would look at main.cpp.o
, see the reference to f1
, and look for it in all objects that come after, b.cpp.o
in our case, and pull that in. In contrast, mold and lld remember the contents of the archive and the definition of f1
in a.cpp.o
, and pull that in. When you consider that a.cpp.o
references symbols from libext.a
which is not linked here, it becomes clear that this will lead to linker errors with mold or lld only.
While it’s possible to refactor the targets so that they do link with mold, we still need to retain gold compatibility. So for such targets we just fall back to linking with gold.
Conclusion
While looking at these issues please keep in mind:
- mold is still in development and barely one year old! Yet there were only two bugs.
- Our codebase is large and definitely has some edge cases. It’s likely even easier to get this running for smaller projects.
- Rui, the creator of mold, is incredibly responsive and helpful, fixing bugs quickly and implementing features with amazing quality and speed! Thanks Rui!
Benchmarks
So what have we gained by integrating mold? Let’s take a look! The charts below compare several predefined build profiles on the y-axis that can be used by our developers. Some explanations:
- gcc still is our default compiler and gold the default linker. So when there is neither clang nor mold in the profile name, it’s a gcc+gold build.
- ASan refers to AddressSanitizer. The
ClangASanOptimized
profile is just for comparison, and there is no corresponding mold profile. -gsplit-dwarf
is enabled in all profiles.- Measurements were performed three times and the minimum was used.
- The bars represent the percentage of the maximum value of each chart.
Scratch make
These numbers should be taken with a grain of salt since a compile cluster was used, which can lead to fluctuations. Although the scratch make is heavily dominated by compilation (not linking), and therefore not the target of our improvements, there is still a visible speedup. Note that OptimizedMold
takes longer than Optimized
, while all others are faster with mold. This was not analyzed further but the incremental link times below indicate this might have been such a fluctuation.
Build size
As expected, mold does not make much difference here. On the compiler side, clang generates noticeably smaller artifacts which I believe is due to smaller debug info.
Linking large shared libraries
lib1.so
lib2.so
Here we are measuring the link time of the two largest shared libraries in our system. This is a major bottleneck in the edit-build-test cycle and as you can see, mold performs very well here (speedup as measured over corresponding gold profiles).
Build Profile | Link Speedup lib1.so | Link Speedup lib2.so |
---|---|---|
DebugMold | 6.4x | 3.7x |
ClangDebugMold | 6.5x | 5.7x |
OptimizedMold | 8.4x | 4.5x |
Attaching gdb to a running process and setting breakpoints
In this benchmark mold-linked binaries perform just as quickly as those from gold, which shows that the gdb index works. Interestingly clang shines again here with much faster gdb attach times.
Other observations
As expected, mold-linked binaries had the same runtime performance.
On another positive note, the peak memory consumption of mold is roughly half of gold’s, I’ve also seen 30% for some libraries. This can be very helpful when linking many libraries in parallel.
Integrating mold into a Rust build is also quite simple and showed similar improvements on an internal project. Scratch make times didn’t improve dramatically but the more important incremental builds did.
Takeaways
- There is still innovation in linkers. Thanks Rui for developing this great tool!
- If you are a C++ developer, I recommend integrating mold into your projects. It is straightforward and the benefits are huge!