Improving C++ Builds with Split DWARF

Large- and medium-sized C++ projects often suffer from long build times. We can distinguish these two scenarios:

Scratch build
After pulling in the latest changes from upstream, or after a major refactoring that affects central headers, a lot of source files need to be rebuilt. This can take a long time. To a large extent this is caused by the insufficient module concept of C++: Each source includes a lot of headers, so after preprocessing, there can be thousands or even millions of lines of C++ code the compiler has to process. Therefore, each source file will take seconds to compile, and a large application can have thousands of source files. Compile clusters can speed things up by distributing the compile jobs across multiple machines.
Incremental build
Most builds are incremental. You already have done a full build, then you make a small change to one file to fix a bug or do an enhancement, and you build and test it. Such an incremental build compiles only a handful of source files and then links the libraries and the application. This is much less work than a scratch make, but since you are doing it many times a day, it is even more critical to get it fast. Compared to scripting languages, the turnaround time for making a change and testing it is quite long for C++.

In this article we will be discussing a great way of speeding up incremental builds which also benefits scratch makes. The goal is to increase developer productivity: Shorter turnaround times allow for quicker iterations. You don’t have to switch to other tasks while the build runs, but can keep your focus on the problem at hand.

Prerequisites

Your incremental build should be “minimal”: No unneeded steps should be performed when invoking the build. A repeated build without any changes should not perform any actions at all. Sometimes this is not easy to achieve, and if the redundant actions only take a short time it is tolerable. But a large overhead here means your developers are already wasting time.
You should have a somewhat recent toolchain supporting split DWARF (introduced below) across the board. The versions below are the absolute minimum. More recent versions are recommended, especially for gdb.
- gcc >= 4.8
- clang >= 3.3
- gdb >= 7.7
- binutils >= 2.24 (including gold)

What Makes Incremental Builds Slow?

For an incremental build, only a few source files have to be recompiled. Most of the time is spent linking the application. And here we can find lots of overhead:

The libraries or executables containing the changed source files need to be rebuilt. This means creating them from scratch. All the contained objects need to be read again, even if unchanged, then processed by the linker, and the new binary must be written to disk. ¹
All other binaries which are depending on binaries that were rebuilt must also be relinked. Although a smarter approach seems possible, in most build systems this means recreating these binaries from scratch.

Note that the linker also needs to process all debug information contained in the object files. Duplicate information gets removed, and the merged debug information is written to the generated binary. It gets duplicated on disk, since it is already contained in the object files. And debug information tends to be very large:

In a large C++ application compiled with -O2 and -g, the debug information accounts for 87% of the total size of the object files sent as inputs to the link step, and 84% of the total size of the output binary.

So a large bottleneck for an incremental build is processing of debug info. Ironically, debug info is most important when analyzing and fixing bugs, during which you are doing lots of incremental builds! For release builds without debug info, linking can be surprisingly fast, and sometimes developers working on large projects use them as a last resort.

Introducing Split DWARF

Linking, and therefore incremental builds, could be much faster if the linker didn’t have to process all the debug information. Split DWARF² makes this possible: It generates a separate file for the debug info which the linker can ignore. This file has the suffix .dwo (DWARF object file). DWARF is a debugging file format generally used on Unix. It is the default on most Linux distributions, the only special thing here is that the DWARF info is split from the code.

The binaries generated by the linker will not contain debug information, but references to the .dwo files that are already on disk. Let’s examine how this works in detail:

#include <iostream>

int main()
{
  int a = 1;
  std::cout << "Split DWARF test" << std::endl;

  return 0;
}

#include <iostream>

int main()

{

int a = 1;

std::cout << "Split DWARF test" << std::endl;

return 0;

}

We compile this simple program in two ways, with and without split DWARF. First, compiling with debug information only (-g):

$ g++ -c -g main.cpp -o main.o
$ g++ main.o -o app

1 2	$ g++ -c -g main.cpp -o main.o $ g++ main.o -o app

Now we also enable split DWARF by adding -gsplit-dwarf to the compiler invocation:

$ g++ -c -g -gsplit-dwarf main.cpp -o main_splitdwarf.o
$ g++ main_splitdwarf.o -o app_splitdwarf

1 2	$ g++ -c -g -gsplit-dwarf main.cpp -o main_splitdwarf.o $ g++ main_splitdwarf.o -o app_splitdwarf

The program is not interesting here, but let’s take a look at the files generated:

-rwxrwxr-x 1 prodcpp prodcpp 20256 Oct  7 23:39 app*
-rwxrwxr-x 1 prodcpp prodcpp 12728 Oct  7 23:39 app_splitdwarf*
-rw-r--r-- 1 prodcpp prodcpp   110 Oct  7 22:36 main.cpp
-rw-rw-r-- 1 prodcpp prodcpp 22112 Oct  7 23:39 main.o
-rw-rw-r-- 1 prodcpp prodcpp 12296 Oct  7 23:39 main_splitdwarf.dwo
-rw-rw-r-- 1 prodcpp prodcpp  6968 Oct  7 23:39 main_splitdwarf.o

-rwxrwxr-x 1 prodcpp prodcpp 20256 Oct 7 23:39 app*

-rwxrwxr-x 1 prodcpp prodcpp 12728 Oct 7 23:39 app_splitdwarf*

-rw-r--r-- 1 prodcpp prodcpp 110 Oct 7 22:36 main.cpp

-rw-rw-r-- 1 prodcpp prodcpp 22112 Oct 7 23:39 main.o

-rw-rw-r-- 1 prodcpp prodcpp 12296 Oct 7 23:39 main_splitdwarf.dwo

-rw-rw-r-- 1 prodcpp prodcpp 6968 Oct 7 23:39 main_splitdwarf.o

No surprises for the regular build, which produces main.o and app. The split DWARF compilation creates two files, main_splitdwarf.o and main_splitdwarf.dwo. app_splitdwarf takes up only 12728 bytes, in contrast to app, which is 20224 bytes. The reason is that it references the debug info, instead of containing it:

$ readelf -wi app_splitdwarf | grep dwo
    <20>   DW_AT_GNU_dwo_name: (indirect string, offset: 0x0): main_splitdwarf.dwo

1 2	$ readelf -wi app_splitdwarf \| grep dwo <20> DW_AT_GNU_dwo_name: (indirect string, offset: 0x0): main_splitdwarf.dwo

That reference is already present in the object file, so all the linker had to do with regards to debugging information is copying that reference:

$ readelf -wi main_splitdwarf.o | grep dwo
    <20>   DW_AT_GNU_dwo_name: (indirect string, offset: 0x0): main_splitdwarf.dwo
    <2c>   DW_AT_GNU_dwo_id  : 0xae0d75cbd6671bc1

$ readelf -wi main_splitdwarf.o | grep dwo

<20> DW_AT_GNU_dwo_name: (indirect string, offset: 0x0): main_splitdwarf.dwo

<2c> DW_AT_GNU_dwo_id : 0xae0d75cbd6671bc1

This also means you need to keep the .dwo files as long as you want to debug your application.

Although I couldn’t get gdb to trace loading of .dwo files, you can see via strace that it pulls them in:

$ strace -o log gdb --batch-silent --eval-command=quit app_splitdwarf
$ grep dwo log
stat("/projects/prodcpp/splitdwarf/main_splitdwarf.dwo", {st_mode=S_IFREG|0664, st_size=12256, ...}) = 0
open("/projects/prodcpp/splitdwarf/main_splitdwarf.dwo", O_RDONLY|O_CLOEXEC) = 8
lstat("/projects/prodcpp/splitdwarf/main_splitdwarf.dwo", {st_mode=S_IFREG|0664, st_size=12256, ...}) = 0

$ strace -o log gdb --batch-silent --eval-command=quit app_splitdwarf

$ grep dwo log

stat("/projects/prodcpp/splitdwarf/main_splitdwarf.dwo", {st_mode=S_IFREG|0664, st_size=12256, ...}) = 0

open("/projects/prodcpp/splitdwarf/main_splitdwarf.dwo", O_RDONLY|O_CLOEXEC) = 8

lstat("/projects/prodcpp/splitdwarf/main_splitdwarf.dwo", {st_mode=S_IFREG|0664, st_size=12256, ...}) = 0

A Real-Life Example: llvm

In the previous toy example, gains are minimal and speedup for incremental builds would be non-existent since we only have one source file. So let’s take a look at a real application and perform some measurements to gauge the benefits of the split DWARF approach.

We will be building llvm 7.0.0 with and without split DWARF. llvm in it’s latest incarnation is a rather large C++ project, clocking in at 22838 C/C++ files. On top of that, the clang compiler is linked statically against the llvm libraries, so a lot of work has to be redone even if only one file changes.

First, let’s do a scratch build. I’m using a clone of the git monorepo with the tag RELEASE_700/final checked out. The root of the cmake project is in the llvm directory. To also build all the other projects, I have symlinked them to the root as follows:

$ pwd
/h/sources/llvm-project-20170507
$ cd llvm/tools
$ ln -s ../../lld lld
$ ln -s ../../lldb lldb
$ ln -s ../../clang clang
$ cd ../projects
$ ln -s ../../compiler-rt compiler-rt

$ pwd

/h/sources/llvm-project-20170507

$ cd llvm/tools

$ ln -s ../../lld lld

$ ln -s ../../lldb lldb

$ ln -s ../../clang clang

$ cd ../projects

$ ln -s ../../compiler-rt compiler-rt

First, let’s use the defaults, which is a Debug build without split DWARF.

$ mkdir llvm
$ cd llvm
$ cmake /h/sources/llvm-project-20170507/llvm/
$ /usr/bin/time -v make -j 80
        Percent of CPU this job got: 5072%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 13:00.54
        ...
        Maximum resident set size (kbytes): 11222228
$ du -shL llvm
55G

$ mkdir llvm

$ cd llvm

$ cmake /h/sources/llvm-project-20170507/llvm/

$ /usr/bin/time -v make -j 80

Percent of CPU this job got: 5072%

Elapsed (wall clock) time (h:mm:ss or m:ss): 13:00.54

...

Maximum resident set size (kbytes): 11222228

$ du -shL llvm

55G

Now, with split DWARF:

$ mkdir llvm_sd
$ cd llvm_sd
$ cmake /h/sources/llvm-project-20170507/llvm/ -DLLVM_USE_SPLIT_DWARF=ON 
$ /usr/bin/time -v make -j 80
        Percent of CPU this job got: 5939%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 11:01.42
        ...
        Maximum resident set size (kbytes): 4940236
$ du -shL .
36G

$ mkdir llvm_sd

$ cd llvm_sd

$ cmake /h/sources/llvm-project-20170507/llvm/ -DLLVM_USE_SPLIT_DWARF=ON

$ /usr/bin/time -v make -j 80

Percent of CPU this job got: 5939%

Elapsed (wall clock) time (h:mm:ss or m:ss): 11:01.42

...

Maximum resident set size (kbytes): 4940236

$ du -shL .

36G

Let’s look at the numbers:

Elapsed time goes down from 13min 01sec to 11min 2sec (about 15%). More CPU is used by the second build on average, which probably means that the linker completes some blocking links faster and parallelism during the build can increase.
Maximum resident size set halves (10.7GB to 4.7GB). The linker does not have to process debug info, therefore it needs much less memory. This is significant on constrained machines.
Disk consumption goes down by 50% (55GB to 36GB). There is no duplication of debug info in the binaries, only a reference to the .dwo file is stored. For you as developer this means you can keep more builds around.

These improvements are nice, considering the low effort needed to obtain them. But what about an incremental build? Let’s change one file, and then rebuild clang. First, without split DWARF:

$ echo "//" >>./llvm/lib/Target/X86/X86FlagsCopyLowering.cpp
$ /usr/bin/time -v make 80 clang
        Percent of CPU this job got: 150%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 3:16.90
        ...
        Maximum resident set size (kbytes): 11222232

$ echo "//" >>./llvm/lib/Target/X86/X86FlagsCopyLowering.cpp

$ /usr/bin/time -v make 80 clang

Percent of CPU this job got: 150%

Elapsed (wall clock) time (h:mm:ss or m:ss): 3:16.90

...

Maximum resident set size (kbytes): 11222232

With split DWARF:

$ echo "//" >>./llvm/lib/Target/X86/X86FlagsCopyLowering.cpp
$ /usr/bin/time -v make -j 80 clang
        Percent of CPU this job got: 195%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 1:42.74
        ...
        Maximum resident set size (kbytes): 4940236

$ echo "//" >>./llvm/lib/Target/X86/X86FlagsCopyLowering.cpp

$ /usr/bin/time -v make -j 80 clang

Percent of CPU this job got: 195%

Elapsed (wall clock) time (h:mm:ss or m:ss): 1:42.74

...

Maximum resident set size (kbytes): 4940236

Elapsed time nearly halves from 3min 17 sec to 1min 43 sec. Resident set size shows the same behavior as above, which makes sense considering that clang is probably the largest executable in llvm, linking in all needed llvm libraries statically.

All in all, split DWARF is a huge win for development workflows. At the cost of adding a flag, you get significant improvements for everybody building the code base.

Packaging a Release from a Split DWARF Build

While split DWARF is great for developers, it doesn’t come in so handy for building a release that needs to work on another machine. The debug info is spread over many files, and the dwo references stored in the binaries will expose all your source file names and hierarchy. To solve this, a new tool called dwp was added to binutils. It operates on an executable or shared library and produces a .dwp file with all relevant info to debug that file. gdb in turn will look for dwp files and load debug info from them.

Continuing our example:

$ dwp -e app_splitdwarf
$ ll app_*
-rwxrwxr-x 1 prodcpp prodcpp 20256 Oct  7 23:39 app*
-rwxrwxr-x 1 prodcpp prodcpp 12728 Oct  7 23:39 app_splitdwarf*
-rw-rw-r-- 1 prodcpp prodcpp 12440 Oct  7 23:41 app_splitdwarf.dwp

$ dwp -e app_splitdwarf

$ ll app_*

-rwxrwxr-x 1 prodcpp prodcpp 20256 Oct 7 23:39 app*

-rwxrwxr-x 1 prodcpp prodcpp 12728 Oct 7 23:39 app_splitdwarf*

-rw-rw-r-- 1 prodcpp prodcpp 12440 Oct 7 23:41 app_splitdwarf.dwp

We now have a new file app_splitdwarf.dwp containing all debug info we need. We can now delete the .dwo file. Let’s verify that debugging still works afterwards:

$ rm *dwo
$ gdb app_splitdwarf
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
...
Reading symbols from app_splitdwarf...done.
(gdb) b main
Breakpoint 1 at 0x40084e: file main.cpp, line 5.
(gdb) r
Starting program: app_splitdwarf

Breakpoint 1, main () at main.cpp:5
5         int a = 1;
(gdb) p a
$1 = 0

$ rm *dwo

$ gdb app_splitdwarf

GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1

...

Reading symbols from app_splitdwarf...done.

(gdb) b main

Breakpoint 1 at 0x40084e: file main.cpp, line 5.

(gdb) r

Starting program: app_splitdwarf

Breakpoint 1, main () at main.cpp:5

5 int a = 1;

(gdb) p a

$1 = 0

The variable can be printed, so debug information is available. Without the .dwp file you will get a warning as follows:

$ rm *dwp
$ gdb app_splitdwarf
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
...
Reading symbols from app_splitdwarf...
warning: Could not find DWO CU main_splitdwarf.dwo(0xae0d75cbd6671bc1) referenced by CU at offset 0x0 [in module app_splitdwarf]
done.

$ rm *dwp

$ gdb app_splitdwarf

GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1

...

Reading symbols from app_splitdwarf...

warning: Could not find DWO CU main_splitdwarf.dwo(0xae0d75cbd6671bc1) referenced by CU at offset 0x0 [in module app_splitdwarf]

done.

You will get the same warning when removing the .dwo file (provided there is no .dwp file either).

When there is a .dwp file present for a shared library or executable, gdb will not look for .dwo files. When your binaries are more recent than your .dwp file, you will not be able to debug the changed files until you remove or update the .dwp file.

That wraps up our discussion of split DWARF. Hopefully you can make use of it in your projects and reduce your build times!

Remarks

Since you want to speed up your link as much as possible, you should also use the fastest linker. gold is faster than the default ld.bfd linker, lld is even faster. lld is still under development, so may have more issues. gold is more mature. So add -fuse-ld=linker during linking. Example:

-fuse-ld=gold
-fuse-ld=/path/to/lld

1 2	-fuse-ld=gold -fuse-ld=/path/to/lld

Also, you may want to use -Wl,--gdb-index. This creates the .gdb_index section in binaries, which speeds up debugging a bit.

Limitations

Split DWARF is not used by that many projects, so some friction with tooling is possible. If you encounter any problems, please let me know in the comments.

icecream supports split DWARF, distcc doesn’t. I have tested neither. In a build cluster you need to ship two files as a result of the compilation. Other than that, there is only one piece of information that needs to be adjusted: The reference to the .dwo file encoded in the object file. To make it fit to the node running the build, the compiler options -fdebug-map-prefix (gcc) or -fdebug-compilation-dir (clang) can be used.
clang 7.0.0 recently implemented partial support for DWARF5, which does not support split DWARF yet. But it is not the default.

Notes

¹Incremental linking is another solution to this problem, but not discussed here. In my experience it does not work as reliably as the split DWARF approach.

² Debug Fission is another name for this technique.

References

DebugFission – DWARF Extensions for Separate Debug Information Files

DWP tool

3 Replies to “Improving C++ Builds with Split DWARF”

Thomas Sondergaard says:

October 11, 2018 at 7:58 pm

Absolutely love this article! Implemented it in the build system at work and got these results:

* A full build “ninja install” with clean ccache is 1% slower (10m12s
to 10m19s).

* A full build “ninja install” with fully populated ccache is 25%
faster (35s to 26s).

* build folder size is reduced 25% (from 5.6 GiB to 4.2 GiB)

* Finally and most important, an incremental build where
a single source file in a core shared library is the only change is 41% faster (from 17s to
10s).

For the incremental build a significant portion of time is now spent by CMake automoc (our software uses Qt), so if it wasn’t for the that the improvement would be even more significant.

Gabor Horvath says:

October 24, 2018 at 4:33 am

Does split-dwarf help when we use shared-lib build of LLVM?

1. Martin says:
  
  October 24, 2018 at 8:34 pm
  
  It should also help. The effect won’t be as dramatic, but it’s worth the effort.