Large- and medium-sized C++ projects often suffer from long build times. We can distinguish these two scenarios:
- Scratch build
After pulling in the latest changes from upstream, or after a major refactoring that affects central headers, a lot of source files need to be rebuilt. This can take a long time. To a large extent this is caused by the insufficient module concept of C++: Each source includes a lot of headers, so after preprocessing, there can be thousands or even millions of lines of C++ code the compiler has to process. Therefore, each source file will take seconds to compile, and a large application can have thousands of source files. Compile clusters can speed things up by distributing the compile jobs across multiple machines. - Incremental build
Most builds are incremental. You already have done a full build, then you make a small change to one file to fix a bug or do an enhancement, and you build and test it. Such an incremental build compiles only a handful of source files and then links the libraries and the application. This is much less work than a scratch make, but since you are doing it many times a day, it is even more critical to get it fast. Compared to scripting languages, the turnaround time for making a change and testing it is quite long for C++.
In this article we will be discussing a great way of speeding up incremental builds which also benefits scratch makes. The goal is to increase developer productivity: Shorter turnaround times allow for quicker iterations. You don’t have to switch to other tasks while the build runs, but can keep your focus on the problem at hand.
Prerequisites
- Your incremental build should be “minimal”: No unneeded steps should be performed when invoking the build. A repeated build without any changes should not perform any actions at all. Sometimes this is not easy to achieve, and if the redundant actions only take a short time it is tolerable. But a large overhead here means your developers are already wasting time.
- You should have a somewhat recent toolchain supporting split DWARF (introduced below) across the board. The versions below are the absolute minimum. More recent versions are recommended, especially for gdb.
gcc >= 4.8
clang >= 3.3
gdb >= 7.7
binutils >= 2.24
(including gold)
What Makes Incremental Builds Slow?
For an incremental build, only a few source files have to be recompiled. Most of the time is spent linking the application. And here we can find lots of overhead:
- The libraries or executables containing the changed source files need to be rebuilt. This means creating them from scratch. All the contained objects need to be read again, even if unchanged, then processed by the linker, and the new binary must be written to disk. 1
- All other binaries which are depending on binaries that were rebuilt must also be relinked. Although a smarter approach seems possible, in most build systems this means recreating these binaries from scratch.
Note that the linker also needs to process all debug information contained in the object files. Duplicate information gets removed, and the merged debug information is written to the generated binary. It gets duplicated on disk, since it is already contained in the object files. And debug information tends to be very large:
In a large C++ application compiled with -O2 and -g, the debug information accounts for 87% of the total size of the object files sent as inputs to the link step, and 84% of the total size of the output binary.
So a large bottleneck for an incremental build is processing of debug info. Ironically, debug info is most important when analyzing and fixing bugs, during which you are doing lots of incremental builds! For release builds without debug info, linking can be surprisingly fast, and sometimes developers working on large projects use them as a last resort.
Introducing Split DWARF
Linking, and therefore incremental builds, could be much faster if the linker didn’t have to process all the debug information. Split DWARF² makes this possible: It generates a separate file for the debug info which the linker can ignore. This file has the suffix .dwo
(DWARF object file). DWARF is a debugging file format generally used on Unix. It is the default on most Linux distributions, the only special thing here is that the DWARF info is split from the code.
The binaries generated by the linker will not contain debug information, but references to the .dwo
files that are already on disk. Let’s examine how this works in detail:
1 2 3 4 5 6 7 8 9 |
#include <iostream> int main() { int a = 1; std::cout << "Split DWARF test" << std::endl; return 0; } |
We compile this simple program in two ways, with and without split DWARF. First, compiling with debug information only (-g
):
1 2 |
$ g++ -c -g main.cpp -o main.o $ g++ main.o -o app |
Now we also enable split DWARF by adding -gsplit-dwarf
to the compiler invocation:
1 2 |
$ g++ -c -g -gsplit-dwarf main.cpp -o main_splitdwarf.o $ g++ main_splitdwarf.o -o app_splitdwarf |
The program is not interesting here, but let’s take a look at the files generated:
1 2 3 4 5 6 |
-rwxrwxr-x 1 prodcpp prodcpp 20256 Oct 7 23:39 app* -rwxrwxr-x 1 prodcpp prodcpp 12728 Oct 7 23:39 app_splitdwarf* -rw-r--r-- 1 prodcpp prodcpp 110 Oct 7 22:36 main.cpp -rw-rw-r-- 1 prodcpp prodcpp 22112 Oct 7 23:39 main.o -rw-rw-r-- 1 prodcpp prodcpp 12296 Oct 7 23:39 main_splitdwarf.dwo -rw-rw-r-- 1 prodcpp prodcpp 6968 Oct 7 23:39 main_splitdwarf.o |
No surprises for the regular build, which produces main.o
and app
. The split DWARF compilation creates two files, main_splitdwarf.o
and main_splitdwarf.dwo
. app_splitdwarf
takes up only 12728 bytes, in contrast to app
, which is 20224 bytes. The reason is that it references the debug info, instead of containing it:
1 2 |
$ readelf -wi app_splitdwarf | grep dwo <20> DW_AT_GNU_dwo_name: (indirect string, offset: 0x0): main_splitdwarf.dwo |
That reference is already present in the object file, so all the linker had to do with regards to debugging information is copying that reference:
1 2 3 |
$ readelf -wi main_splitdwarf.o | grep dwo <20> DW_AT_GNU_dwo_name: (indirect string, offset: 0x0): main_splitdwarf.dwo <2c> DW_AT_GNU_dwo_id : 0xae0d75cbd6671bc1 |
This also means you need to keep the .dwo
files as long as you want to debug your application.
Although I couldn’t get gdb
to trace loading of .dwo
files, you can see via strace
that it pulls them in:
1 2 3 4 5 |
$ strace -o log gdb --batch-silent --eval-command=quit app_splitdwarf $ grep dwo log stat("/projects/prodcpp/splitdwarf/main_splitdwarf.dwo", {st_mode=S_IFREG|0664, st_size=12256, ...}) = 0 open("/projects/prodcpp/splitdwarf/main_splitdwarf.dwo", O_RDONLY|O_CLOEXEC) = 8 lstat("/projects/prodcpp/splitdwarf/main_splitdwarf.dwo", {st_mode=S_IFREG|0664, st_size=12256, ...}) = 0 |
A Real-Life Example: llvm
In the previous toy example, gains are minimal and speedup for incremental builds would be non-existent since we only have one source file. So let’s take a look at a real application and perform some measurements to gauge the benefits of the split DWARF approach.
We will be building llvm 7.0.0 with and without split DWARF. llvm in it’s latest incarnation is a rather large C++ project, clocking in at 22838 C/C++ files. On top of that, the clang compiler is linked statically against the llvm libraries, so a lot of work has to be redone even if only one file changes.
First, let’s do a scratch build. I’m using a clone of the git monorepo with the tag RELEASE_700/final
checked out. The root of the cmake project is in the llvm
directory. To also build all the other projects, I have symlinked them to the root as follows:
1 2 3 4 5 6 7 8 |
$ pwd /h/sources/llvm-project-20170507 $ cd llvm/tools $ ln -s ../../lld lld $ ln -s ../../lldb lldb $ ln -s ../../clang clang $ cd ../projects $ ln -s ../../compiler-rt compiler-rt |
First, let’s use the defaults, which is a Debug
build without split DWARF.
1 2 3 4 5 6 7 8 9 10 |
$ mkdir llvm $ cd llvm $ cmake /h/sources/llvm-project-20170507/llvm/ $ /usr/bin/time -v make -j 80 Percent of CPU this job got: 5072% Elapsed (wall clock) time (h:mm:ss or m:ss): 13:00.54 ... Maximum resident set size (kbytes): 11222228 $ du -shL llvm 55G |
Now, with split DWARF:
1 2 3 4 5 6 7 8 9 10 |
$ mkdir llvm_sd $ cd llvm_sd $ cmake /h/sources/llvm-project-20170507/llvm/ -DLLVM_USE_SPLIT_DWARF=ON $ /usr/bin/time -v make -j 80 Percent of CPU this job got: 5939% Elapsed (wall clock) time (h:mm:ss or m:ss): 11:01.42 ... Maximum resident set size (kbytes): 4940236 $ du -shL . 36G |
Let’s look at the numbers:
- Elapsed time goes down from 13min 01sec to 11min 2sec (about 15%). More CPU is used by the second build on average, which probably means that the linker completes some blocking links faster and parallelism during the build can increase.
- Maximum resident size set halves (10.7GB to 4.7GB). The linker does not have to process debug info, therefore it needs much less memory. This is significant on constrained machines.
- Disk consumption goes down by 50% (55GB to 36GB). There is no duplication of debug info in the binaries, only a reference to the
.dwo
file is stored. For you as developer this means you can keep more builds around.
These improvements are nice, considering the low effort needed to obtain them. But what about an incremental build? Let’s change one file, and then rebuild clang. First, without split DWARF:
1 2 3 4 5 6 |
$ echo "//" >>./llvm/lib/Target/X86/X86FlagsCopyLowering.cpp $ /usr/bin/time -v make 80 clang Percent of CPU this job got: 150% Elapsed (wall clock) time (h:mm:ss or m:ss): 3:16.90 ... Maximum resident set size (kbytes): 11222232 |
With split DWARF:
1 2 3 4 5 6 |
$ echo "//" >>./llvm/lib/Target/X86/X86FlagsCopyLowering.cpp $ /usr/bin/time -v make -j 80 clang Percent of CPU this job got: 195% Elapsed (wall clock) time (h:mm:ss or m:ss): 1:42.74 ... Maximum resident set size (kbytes): 4940236 |
Elapsed time nearly halves from 3min 17 sec to 1min 43 sec. Resident set size shows the same behavior as above, which makes sense considering that clang is probably the largest executable in llvm, linking in all needed llvm libraries statically.
All in all, split DWARF is a huge win for development workflows. At the cost of adding a flag, you get significant improvements for everybody building the code base.
Packaging a Release from a Split DWARF Build
While split DWARF is great for developers, it doesn’t come in so handy for building a release that needs to work on another machine. The debug info is spread over many files, and the dwo
references stored in the binaries will expose all your source file names and hierarchy. To solve this, a new tool called dwp
was added to binutils
. It operates on an executable or shared library and produces a .dwp
file with all relevant info to debug that file. gdb
in turn will look for dwp
files and load debug info from them.
Continuing our example:
1 2 3 4 5 |
$ dwp -e app_splitdwarf $ ll app_* -rwxrwxr-x 1 prodcpp prodcpp 20256 Oct 7 23:39 app* -rwxrwxr-x 1 prodcpp prodcpp 12728 Oct 7 23:39 app_splitdwarf* -rw-rw-r-- 1 prodcpp prodcpp 12440 Oct 7 23:41 app_splitdwarf.dwp |
We now have a new file app_splitdwarf.dwp
containing all debug info we need. We can now delete the .dwo
file. Let’s verify that debugging still works afterwards:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
$ rm *dwo $ gdb app_splitdwarf GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1 ... Reading symbols from app_splitdwarf...done. (gdb) b main Breakpoint 1 at 0x40084e: file main.cpp, line 5. (gdb) r Starting program: app_splitdwarf Breakpoint 1, main () at main.cpp:5 5 int a = 1; (gdb) p a $1 = 0 |
The variable can be printed, so debug information is available. Without the .dwp
file you will get a warning as follows:
1 2 3 4 5 6 7 |
$ rm *dwp $ gdb app_splitdwarf GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1 ... Reading symbols from app_splitdwarf... warning: Could not find DWO CU main_splitdwarf.dwo(0xae0d75cbd6671bc1) referenced by CU at offset 0x0 [in module app_splitdwarf] done. |
You will get the same warning when removing the .dwo
file (provided there is no .dwp
file either).
When there is a .dwp
file present for a shared library or executable, gdb will not look for .dwo
files. When your binaries are more recent than your .dwp
file, you will not be able to debug the changed files until you remove or update the .dwp
file.
That wraps up our discussion of split DWARF. Hopefully you can make use of it in your projects and reduce your build times!
Remarks
Since you want to speed up your link as much as possible, you should also use the fastest linker. gold
is faster than the default ld.bfd
linker, lld
is even faster. lld
is still under development, so may have more issues. gold
is more mature. So add -fuse-ld=linker
during linking. Example:
1 2 |
-fuse-ld=gold -fuse-ld=/path/to/lld |
Also, you may want to use -Wl,--gdb-index
. This creates the .gdb_index
section in binaries, which speeds up debugging a bit.
Limitations
Split DWARF is not used by that many projects, so some friction with tooling is possible. If you encounter any problems, please let me know in the comments.
- icecream supports split DWARF, distcc doesn’t. I have tested neither. In a build cluster you need to ship two files as a result of the compilation. Other than that, there is only one piece of information that needs to be adjusted: The reference to the
.dwo
file encoded in the object file. To make it fit to the node running the build, the compiler options-fdebug-map-prefix
(gcc
) or-fdebug-compilation-dir
(clang
) can be used. - clang 7.0.0 recently implemented partial support for DWARF5, which does not support split DWARF yet. But it is not the default.
Notes
1 Incremental linking is another solution to this problem, but not discussed here. In my experience it does not work as reliably as the split DWARF approach.
² Debug Fission is another name for this technique.
References
DebugFission – DWARF Extensions for Separate Debug Information Files
Absolutely love this article! Implemented it in the build system at work and got these results:
* A full build “ninja install” with clean ccache is 1% slower (10m12s
to 10m19s).
* A full build “ninja install” with fully populated ccache is 25%
faster (35s to 26s).
* build folder size is reduced 25% (from 5.6 GiB to 4.2 GiB)
* Finally and most important, an incremental build where
a single source file in a core shared library is the only change is 41% faster (from 17s to
10s).
For the incremental build a significant portion of time is now spent by CMake automoc (our software uses Qt), so if it wasn’t for the that the improvement would be even more significant.
Does split-dwarf help when we use shared-lib build of LLVM?
It should also help. The effect won’t be as dramatic, but it’s worth the effort.