One set of requirements, zillions of SBOMs
In this post, we explain how dependency resolution works in package managers, with the npm ecosystem as an example. We also explain how it directly affects the accuracy of SBOMs you generate and ingest.
Let’s take the package d3 as an example. Version d3@7.8.5 requires 30 dependencies in its npm package.json. How many different Node.js applications respecting these 30 requirements may be generated? One? Quite a few actually: at the time of writing, there are ~1.9X10^81. That makes as many possible SBOMs for this application as the estimated number of atoms in the universe.
Composition in package managers
Open source ecosystems in general thrive on sharing and reusing components. When a developer builds an application, they compose their application’s code with multiple components created by other developers, that in turn may rely on multiple components created by other developers. To facilitate this composition, each ecosystem provides tools to install libraries and applications.
Among them and of particular interest to us is the dependency resolver, that goes beyond direct dependencies and ensures that transitive requirements are satisfied. For example if an application A depends on two libraries B and C that both in turn depend on a library D with conflicting requirements, which version of D should be chosen?
Let’s dive deeper. We will focus our discussion on npm as a concrete example in the rest of the post. The insights are typically applicable to other ecosystems.
Dependency requirements
In the npm ecosystem, npm registry is the de facto public registry and npm the de facto associated tooling. CLI alternatives to npm exist, including yarn and pnpm.
Developers express their dependency requirements in a manifest file “package.json”, specifying a package and a set of acceptable versions for each package. For example, let’s create an application with four dependencies:
$ cat >package.json <<EOF
{
"name": "example",
"version": "0.0.0-alpha.0",
"dependencies": {
"d3-time": "^3",
"d3-array": "<2",
"array": "npm:d3-array@^2",
"color": "npm:d3-color@^3"
}
}
EOF
This manifest declares four dependencies named “d3-time”, “d3-array”, “array”, and “color”, each with a corresponding version range. For example, “d3-time” is declared with the constraint “^3” which means that any 3.x version may be installed. (More information about version declarations are in the official documentation). In addition to version set requirements, packages can be aliased, as illustrated by “array” and “color” that are aliasing “d3-array” and “d3-color” respectively.
Dependency resolution
Given this set of dependency requirements, the ecosystem tooling selects and installs the versions that it deems adequate (often the latest version of the matching set) to create an application. This process is referred to as “dependency resolution”. The selected packages are physically installed as a file tree for npm, so that Node.js imports them at runtime. Below is an example of an installation by npm@6.14.13 for the former manifest:
$ npm -v && npm install && npm ls
6.14.13
+-- array@npm:d3-array@2.12.1 // d3-array@2.12.1 installed under an alias.
| `-- internmap@1.0.1
+-- color@npm:d3-color@3.1.0
+-- d3-array@1.2.4 // d3-array@1.2.4 installed at the root.
`-- d3-time@3.1.0
`-- d3-array@3.2.4 // d3-array@3.2.4 installed locally, shadowing the root.
`-- internmap@1.0.1 deduped
In the snippet above, we note that “d3-array” is installed multiple times with different versions:
- Once under the alias “array” for the main application’s code, as “d3-array@2.12.1”
- Once for the main application’s code, as “d3-array@1.2.4”
- Once for “d3-time”, at version “d3-array@3.2.4”. This means “d3-time” will use version 3.2.4.
If we use different tools to install from the same manifest file, they may install different dependency versions. On our example manifest file, npm, yarn, and pnpm produce different installations:
All three dependency resolutions are valid and are among the set of graphs that satisfy the constraint requirements. Other factors besides the tool chain can affect the resolution. For example, the time: if we install this manifest today, the result may be different from the installation we made yesterday: version resolutions change as new (dependency) package versions become available or deleted from the npm registry.
Downstream consumers control the composition
It is worth noting that the composition (aka version resolution) is triggered by the downstream user who creates an application. In other words, dependency packages (libraries) are oblivious to the composition. They express requirements for their own direct dependencies as abstract strings (like “d3-time@^3”) that define the set of functionally compatible versions. But it is the tooling run by the downstream user creating the application that selects and installs a concrete version from among this set. The selected version may be different from the version that would have been selected by the dependency package itself, as it is made in a different context, using different tooling, at a different time. For example, d3-array@3.2.4 resolves differently in two different contexts:
It is impossible for the maintainer of a package to enforce dependencies’ versions in downstream users’ applications. For example, they may try to “pin” their own dependency requirement to a specific version, in the hope to force downstream users to use that particular version transitively. But this still can be overruled by a downstream user by:
- Using an overrides directive to override the dependency.
- Bundling of packages.
- Using a custom alias. In the example manifest, if “d3-array” was defined as an alias for “d3-color”, npm would install d3-color in lieu of d3-array to the surprise of the library.
Furthermore, pinning a dependency (by a strict requirement or by providing a bundle) in a library is considered bad practice because it prevents downstream users from upgrading the dependency independently if they need to (to resolve a vulnerability for example).
So how does all this affect SBOMs? Read on!
Impact of composition on SBOMs
A Software Bill of Materials (SBOM) is a document listing the dependencies (and their relationship to one another) used to build software. For more details, see the Minimum Elements For a Software Bill of Materials (SBOM).
As we have seen in the first part of this blog post, packages that are libraries have dependencies, but their versions are resolved by the final application, not the library itself. When an SBOM is generated by the library maintainers at the time of publication, the dependency resolution happens in a different context from the context in which the final application is built (package manager CLI version, available packages on registry, etc). As a result, the dependencies listed in a library SBOM are irrelevant for downstream applications.
Conclusion
In this post, we saw that one set of requirements yields a vast number of applications: the decision on which concrete dependencies are installed lay in the hands of consumers. So library SBOMs cannot list the exact dependency used, but application SBOMs can. Furthermore, the composition involves dependency resolution that relies on complex algorithms. Given the space for error and the nuances of dependency resolution, it might be beneficial to develop tooling to ensure that the application SBOM describes faithfully what has been installed.