2020-06-02

Misadventures in CMake

The Goal

At some point, I decided to learn a bit of CMake for my pet projects. CMake is a meta-build system in widespread use and, although I like other build systems more, I figured getting some hands-on experience with it wouldn't hurt.
In the end, it didn't hurt badly, but I found CMake far from approachable and usable. The language is far too flexible. CMake has a legacy way of doing things and a new way, but no bright red line between the two. In fact, there are very few bright lines between anything. You're on a 6-lane highway with almost no lane markers. The overall system allows too many ways of achieving similar goals. Documentation is sometimes unclear, lacks some kinds of internal consistency and frequently lacks examples. I couldn't discern an idiomatic way of writing CMake scripts and I couldn't find a static analyzer that could warn me about trivial mistakes.

The Laundry List

When learning new things, I usually seek the key concepts I need to understand, how to achieve a few basic tasks and how to satisfy more advanced requirements with time.
A good build system should do a few things well:
  1. Each input is named explicitly. For C++, this includes headers. The build should fail if a header or a translation unit under the project directory gets built without having been named.
  2. Each input is named exactly once and feeds into a single build target.
  3. The build process is purely functional: No build action ever mutates its inputs in-place. No build action mutates the output of a previous action after it was produced. In other words, inputs are read-only and outputs are write-once at creation and read-only thereafter.
  4. The build is hermetic: Absolute source paths, intermediate paths, extraneous environment variables, build machine configuration or hardware details don't leak into the outputs.
  5. The build is deterministic: Running the same build command with the same inputs and environment produces the exact same output.
  6. The build graph is acyclic: Libraries and binaries can't form dependency cycles.
  7. The language compiler and the build system should flag unused dependencies as build errors.
  8. The build system should scale to various ways of organizing projects. A repository with a large numbers of small libraries should work just as well as a repository with a small number of large libraries.
  9. Build rules should declare their dependencies, not their dependents. You'd think this goes without saying, but some build systems (MSBuild) allow some hacky forms of the latter.
  10. The build system should distinguish between interface items (e.g. the public headers of a library) and implementation items (e.g. internal headers of the implementation). Ideally, this should work together with the language compiler to ensure that private symbols don't leak into the public interface of that build target.
  11. The build system should distinguish between public and private targets. Languages have public and private symbols (e.g. class members). Build languages should have private targets (for consumption in the same project, repository or even more specific scopes) and public targets (for consumption by anyone).
  12. Build outputs are outside the source directory (out-of-source builds, also known as VPATH builds for GNU tools). The build never creates or modifies anything under the source path. It's possible to build from read-only mounts, network shares etc.
  13. Incremental builds work correctly: Changing any input should cause the transitive closure of dependent targets to get rebuilt correctly - not more (spurious rebuild) and not less (stale output). In addition to source files, things such as build scripts themselves, build settings, environment variables that feed into the build process, toolchain versions etc also count as inputs. Upgrading GCC should result in a full rebuild without manual intervention. The build system must handle clock adjustments correctly. Ideally, rebuilds should be triggered by the fingerprint of an input changing, and not be based on timestamps. Dependency tracking needs to be fine-grained; otherwise, it's possible that minor changes trigger full project rebuilds (precompiled headers on MSVC are a classic example, especially when used incorrectly).
  14. Partial builds work correctly: It should be possible to build only a specific target and its corresponding tests, without building the entire project or repository.
  15. The build system should work easily with multiple toolchains (e.g. GCC and Clang).
  16. The build system should work easily with multiple flavors (debug, optimized debug, release, various instrumented builds for e.g. code coverage, profilers, sanitizers). Adding custom build flavors should be straightforward.
  17. The build system should make it easy to apply a base template for all projects (language version, compiler flags) without requiring all projects (possibly in multiple repositories) to refer to a base configuration explicitly.
  18. The build system should work easily for cross-compilation (e.g. targeting ARM from a x86_64 machine).
  19. For C++, the build system should make it possible to ask the librarian to replace libraries (e.g. using libc++ instead of libstdc++, using jemalloc or tcmalloc etc).
  20. The build system should make it straightforward to depend on other projects, possibly by integrating with a source package manager.
  21. To the extent that a project deals with versioned dependencies, the build system should cooperate with a package manager to feed correct dependencies to the toolchain.
  22. The build system should make it trivial to package and deploy output artifacts (binaries, libraries, headers etc) to the /usr hierarchy, to the local hierarchy and to private work directories for those who wish to avoid containers, chroot jails and fakeroot.
  23. Builds should be inert: It's bad practice to activate any code that just got built in a subsequent action of the same build.
  24. The build system should aid ancillary tools that benefit from understanding a project's build graph (e.g. static analysis, formatters).
  25. The build system can exploit machine-level parallelism.
  26. The build system can run distributed on multiple machines.
  27. The build should be transparent and errors should present the underlying problem unambiguously.
  28. To the extent that build systems allow custom actions, the custom actions should have all these properties, as well. These requirements also apply to compiler and tool writers.
That was a mouthful. A list with this many entries is probably missing a few. I also omitted some advanced features deliberately.
Furthermore, I'd like my code to compile at high warning levels. It's easier to start with tight rules and maybe relax them in exceptional cases than to start with lax rules and deal with accumulated debris.
With GCC, I've generally used something like:
Common flags (both languages):
 -Wall -Wextra -Wcast-align -Wcast-qual -Wconversion -Wsign-conversion\
 -Wdate-time -Wduplicated-cond -Wfloat-equal -Wformat=2 -Wformat-signedness\
-Winit-self -Wmissing-declarations -Wmissing-include-dirs -Wmultichar\
-Wnull-dereference -Wpacked -Wpointer-arith -Wredundant-decls\
-Wsuggest-final-types -Wsuggest-final-methods -Wwrite-strings -Wshadow\
-fstack-protector-strong -Wstack-protector\
-Werror -g
C11 flags:
 -Wbad-function-cast -Wjump-misses-init\
-Wmissing-prototypes -Wnested-externs -Wold-style-definition\
-Wstrict-prototypes
C++17 flags:
 -Wsign-promo -Wctor-dtor-privacy\
-Wdelete-non-virtual-dtor -Wnoexcept -Wnon-virtual-dtor -Wold-style-cast\
-Woverloaded-virtual -Wstrict-null-sentinel -Wsuggest-override\
-Wzero-as-null-pointer-constant\
-D_GLIBCXX_USE_CXX11_ABI
Debug builds (both languages):
-ftrapv -D_GLIBCXX_DEBUG
Optimized builds (both languages):
-O3 -DNDEBUG -D_FORTIFY_SOURCE=2
With Clang, my setup uses:
Common flags (both languages):
 -Wall -Wextra -Weverything -Wno-c++98-compat-pedantic\
-Wno-disabled-macro-expansion -Wno-padded\
-fstack-protector-strong\
-Werror -g
C11 flags: No additional flags.
C++17 flags: No additional flags.
Debug builds (both languages):
-D_LIBCPP_DEBUG
Optimized builds (both languages):
-O3 -DNDEBUG -D_FORTIFY_SOURCE=2

The Devil Is In The Details

So, how does CMake fare? Not too badly, but you'll have to do some of the work yourself and you'll get lost or stuck at times.
Pretend you wrote a short test program and you want to build it.
You spend a bit of time learning the basic concepts. You'll write a list file. Everything in a list file is a command. You'll need to require a minimum version of CMake. You'll have a project. You learn that targets are the idiomatic way starting with CMake 3.0 and you do everything by manipulating target properties. This renders obsolete all older materials about manipulating strings. The CMake tutorial helps.
You don't want to spend your time adding boilerplate to your CMakeLists.txt. You write this:
cmake_minimum_required (VERSION 3.17)

project (hello
VERSION 0.1
DESCRIPTION "Hello, world. The legendary demo"
LANGUAGES CXX)

add_executable (hello)
target_sources (hello PRIVATE hello.cc)
So far, so good. What's a good way to switch between GCC and Clang? You learn about CMAKE_<LANG>_COMPILER, so your configure commands are going to look something like this:
c_compiler=gcc # or clang
cxx_compiler=g++ # or clang++
flavor=dbg
build_type=Debug # Needs to match ${flavor}
src_dir=...
build_dir=~/.build-${c_compiler}-$(uname -m)-${flavor}/...
# The configure step.
cmake \
-Werror=dev \
-G "Unix Makefiles" \ # Or Ninja
-DCMAKE_C_COMPILER:STRING="${c_compiler}" \
-DCMAKE_CXX_COMPILER:STRING="${cxx_compiler}" \
-DCMAKE_BUILD_TYPE:STRING="${build_type}" \
-S "${src_dir}" \
-B "${build_dir}"
# The build step
cmake --build "${build_dir}" -j "$(nproc)"
This is enough to get a small program off the ground. A few additional requirements are easy to meet:
  • Adding dependencies for more complex projects is typically easy. I was happy with find_package and with the existing family of Find modules. For instance, it was trivial to enable testing and write unit tests with Google Test.
    • Because I sometimes build dependencies from source, I had to understand a bit of pkg-config on my own. It was surprisingly smooth overall.
  • For debugging the configuration and build steps, you can use a combination of -DCMAKE_VERBOSE_MAKEFILE=TRUE, --debug-output, --trace or --trace-expand.
  • If you want to search for dependencies in a custom location, use -DCMAKE_PREFIX_PATH:PATH="prefix-path". There's a family of related variables that I haven't explored in depth.
  • If you want the build outputs to go to a custom install path, use -DCMAKE_INSTALL_PREFIX:PATH="install-path". This defaults to /usr/local, so I always customize it for on-going work to avoid changing machine state. My install prefix is usually something like "~/usr-${c_compiler}-$(uname -m)-${flavor}" or some appropriate variant.
  • You can build individual targets instead of the whole project with --target. You still build with CMake, not with the underlying build tool.
If your needs are more extravagant, things get very confusing fast.
First, I really wanted to share compiler flags and other build settings between unrelated projects without resorting to include() commands to paths outside project trees. It turns out you'll need to understand 3 concepts and ignore one of them just to get started:
  • You might be tempted to customize baseline build flags with -DCMAKE_USER_MAKE_RULES_OVERRIDE:PATH=.... This works, but it will apply your compiler flags to everything CMake does, including the test programs it builds to determine target system features.
  • Instead, you'll feed the flags by pre-loading a script to populate the CMake cache with -C "preload.cmake".
  • If you're cross-compiling, a toolchain file will also be involved with -DCMAKE_TOOLCHAIN_FILE="toolchain.cmake".
I found this part confusing largely because I couldn't find a single description of which variables are primary and which are derived from other values. For example, I know CMAKE_<LANG>_COMPILER_ID is derived from CMAKE_<LANG>_COMPILER, not the other way. But I don't have a clear picture of how compiler and linker flags get from A to B.
  • Should I use the _INIT flags or the other ones?
  • The _INIT flags should be set in toolchain files. Do I need a toolchain file even when I'm not cross-compiling?
  • CMAKE_<LANG>_FLAGS takes the corresponding environment variables (e.g. ${CFLAGS}) into account. How do they relate to the corresponding CMAKE_<LANG>_FLAGS_INIT? Do they get merged or overwritten?
  • If I set the _INIT flags in a toolchain file and feed a preload script with -C, what will happen?
If you're tempted to say "try it out and see what happens" or "just use include()", you're not wrong, but that's not the point. The point is that a simpler model wouldn't raise all these questions and a complicated, but clear, intuitive or at least documented model would clarify them. Oh, and all these things are variables, but people more experienced at CMake than me strongly suggest to rely on target properties (and other types of properties) instead (e.g. "target_compile_options").
Similarly, adding custom build types also relies on variables. I couldn't find an approach that uses properties.
Other than that, CMake is fairly OK. I was able to deal with bigger projects easily. Intra-project dependencies work satisfactorily. I avoided cross-project dependencies as much as I could so far. I found "ExternalProject" unusable because it doesn't do incremental builds at all. I just wrote a shell script for non-CMake dependencies I want to build from source.
Adding support for clang-tidy was piece of cake.
I was a bit bothered by the mix of low-level variables and properties that allow you to change compiler flags directly, as opposed to the higher-level variables and properties that deal with preprocessor definitions and compiler options, but it's a minor grievance in comparison.
Checking some items off the laundry list is trivial, others require following a convention on your own, some are unattainable.

Edge Cases

Sometimes, software developers seem tempted to treat the part they like about a problem as the important part, and everything else they must...