After the Advisory
This blog is based on a presentation given by Nicky Ringland at Google Open Source Live.
Open source software powers the world. Open source libraries allow developers to build things faster, organizations to be more nimble, and all of us to be more productive.
But dependencies bring complexity. Popular open source packages are often used directly or indirectly by a significant portion of the packages within an ecosystem. As a result, a vulnerability in a popular package can have a massive impact across an entire ecosystem.
Different software ecosystems have different conventions for specifying dependency requirements and different algorithms for resolving them. We will take a look at a couple of large profile incidents that discuss some of these differences.
The amplification of vulnerability impact
To measure the potential impact of a vulnerability, we can look at how many dependents it has. That is, how many other packages that use a specific version that is affected by a vulnerability. We can get a view of an ecosystem by looking at all package versions that are affected - either directly or indirectly - by a vulnerability.
First off: packages that are directly affected. At the time of writing, across all the packaging systems supported by deps.dev, over 200 thousand package versions (0.4%) are directly named as vulnerable by a known advisory.
In contrast, almost 15 million package versions (33%) are affected only indirectly, by having an affected package in their dependency graph. That’s two orders of magnitude difference!
That underpins just how hard it is to fix a vulnerability in an ecosystem. When a package explicitly named by an advisory publishes a fix for the issue, the story is far from over. Many users of the packaging ecosystem will still be at risk, because they depend on vulnerable versions of the package deep within their dependency graphs. Fixing the directly affected package is often only the tip of the iceberg.
Addressing vulnerabilities in your dependencies
There are several ways an application maintainer could mitigate a vulnerability affecting one of their dependencies. Let’s be kind to our hypothetical maintainer and consider a simple dependency tree with two layers of dependencies.
If this maintainer is lucky, they depend on the affected package directly. That means as soon as the affected package publishes a fixed version they can update their project or application to depend on the fixed version.
But if the vulnerable package is among their indirect dependencies the situation could be much more complex.
In the best case scenario, the intermediate packages already depend on the patched version.
If this is not the case, our hypothetical maintainer may still have a course of action. To update to the fixed version of the indirect dependency the maintainer may be able to specify the fixed version as a minimum for the entire dependency graph. For this to work, however, the fixed version of the affected package and its direct dependents must be compatible. If not, the maintainer may have to wait for a new release of the intermediate dependent.
Another alternative is to remove the dependency on the affected package. But this often involves considerable effort; you would never have added a dependency without good reason, right?
In practice, dependency trees are rarely so simple and clean. Usually they are complex, interconnected graphs. Just take a look at the dependency graphs for popular frameworks and tools like express or kubernetes.
These complex graphs can make remediating a vulnerability far more difficult than the simple examples given above. There may be many paths through which a fix must propagate before it gets to you. Or, in order to remove a dependency, you might need to remove a significant portion of your dependency graph.
For example, consider the many paths by which one package depends on a vulnerable version of log4j:
With this in mind, perhaps you can imagine why it often takes a long time for a patched version of a popular package to roll out to the ecosystem.
log4shell in the Maven Central ecosystem
On December 9th last year, over 17,000 of the Java packages available from Maven central were impacted by the log4j vulnerabilities, known as log4shell, resulting in widespread fallout across the software industry. The vulnerabilities allowed an attacker to perform remote code execution by exploiting the insecure JNDI lookups feature exposed by the logging library log4j. This exploitable feature was present in multiple versions, and was enabled by default in many versions of the library. We wrote about this incident shortly after it occurred in a previous blog.
A new version of log4j with the vulnerability patched (albeit with few false starts due to incomplete fixes) was available almost immediately. So once that patched version was published had the ecosystem freed itself of log4shell? Unfortunately not. Part of what makes fixing log4shell hard is Java’s conventions on how dependency requirements are specified, and Maven’s dependency resolution algorithm itself.
In the Java ecosystem, it’s common practice to specify “soft” version requirements. That is, the dependent package specifies a single version for each dependency, which is usually the version that ends up being used. (The dependency resolution algorithm may choose a different version under certain rare conditions – for example, a different version already in the graph). While it is possible to specify ranges of suitable versions, this is unusual. More than 99% of dependency requirements in the Maven Central ecosystem are specified using soft requirements.
Here’s where Maven’s dependency resolution algorithm comes in. Since almost all the time, a specific version has been specified, that’s almost always the version that the dependency resolution will pick. So if a newer version with that important new bug fix is released, it won’t be included automatically. It usually requires explicit action by the maintainer to update the dependency requirements to a patched version.
In this case, consumers of any one of the 17,000 odd packages affected by the log4j vulnerabilities would likely still depend on an affected version of log4j, even after the first fix was published. Ideally the maintainers of around 4,000 packages that directly depend on log4j would promptly release a new version of their package that explicitly requires a fixed version of log4j. Then the maintainers of packages that depend on those packages can update their version requirements, and then maintainers of those packages, and so on. There are methods to pin the version of indirect dependencies accelerating this process, but many consumers rely on the default behavior of their tools.
It’s been over six months since the log4 advisory was disclosed. How well has the underlying fix to log4shell propagated throughout the ecosystem? A little less than a week after the disclosure around 13% of affected packages had remediated the issue by releasing a new version. 10 days after disclosure this number had risen to around 25%. Now a few months after that we see around 40% of the affected packages have remediated the problem. Considering how widespread the problem was, and the complexity of the dependencies between packages, this is amazing progress, but there’s clearly a lot more to go.
Default versions: new or old?
Package managers differ in which versions they choose to install by default. For example, systems like Maven or Go err on the side of choosing earlier matching versions, while npm and Pip tend to choose later versions. This design choice can have a big impact on how a fix rolls out or, conversely, how quickly an exploit can propagate.
Choosing the earlier versions has the benefit of stability; dependency graphs remain stable whether you install today or tomorrow, even if new versions are released. The downside is that the consumer must be conscientious in updating their dependencies when security issues arise.
Choosing the later versions has the benefit of currency; you get the latest fixes automatically just by reinstalling. The downside here is that your dependencies can change underfoot, sometimes in dramatic and unexpected ways.
With this in mind, if log4shell had occurred in the npm or PyPI ecosystems the story would have been quite different. In these ecosystems, packages typically ask for the most recent compatible versions of their dependencies.
Looking at the dependency requirements across all versions of all packages in npm we find around three quarters use the caret (^) or tilde (~) allowing a new patch or minor version of the dependency to be automatically selected when available. When adhering to semantic versioning, this means that many users will use the newest release with a compatible API by default.
This practice would likely have been a substantial benefit in remediating a log4shell-like event, where a vulnerability is discovered in widely used versions of a popular package.
But as we shall see, sometimes we really, really don’t want to use the latest version.
The case of colors
In early January 2022, the developer of the popular npm packages colors and faker intentionally published several releases containing breaking changes. These were picked up rapidly due to the npm resolution algorithm preferencing recent releases, and the norm in javascript of using dependency requirements that allow the use of new compatible versions automatically.
At the time of the incident, more than 100,000 packages’ most recent releases depended on a version of colors, and around half of them had a dependency on a problematic version. The following graph shows the dependency flow in the ecosystem over the 72 hours where the action happened.
About half the packages depending on colors remained unaffected throughout the incident because they depended on earlier versions of colors. But the other half of packages had some rapid and widespread changes in the exact version of colors that would have been used depending on the time at which their dependencies were resolved.
The first problematic version was 1.4.44-liberty-2. Due to version naming conventions this isn’t considered a stable version and as a result it wasn’t depended on by many packages.
A few hours later version 1.4.1 was released, and almost all packages using the 1.4 minor version immediately began to depend on this problematic version. Several hours later, 1.4.2 was released, and again most packages affected by the incident immediately depended on this new problematic version. After a few more hours npm stepped in and removed all the bad versions of colors, at which point all dependents moved back to safe versions.
The speed of this incident impacting the ecosystem was rapid. But so too was the rapid response of maintainers. Between the initial release of bad versions and their removal from npm, a period of less than 72 hours, nearly half of all affected packages were able to mitigate the issue. A small number of packages were able to remove their dependency on colors, about 4% of affected packages, seen as a drop in total number of dependent packages. Many more packages, 40% of those affected, were able to pin the version of colors being used to a safe version. This can be seen in the gradual increase of packages depending on an unaffected 1.4.x version.
Interestingly this rapid mitigation was the work of very few people. Just a little over 1% of the affected packages actually made a release during this time period. But their work resulted in 43% of the total affected packages mitigating the issue. This is a result of the same use of open dependency requirements that allowed the rapid spread of the issue and enabled rapid mitigation.
Every dependency is a trust relationship
The colors and log4shell incidents were very similar in terms of wide-reaching impact, but quite different in onset and response. In the case of log4shell, a new vulnerability was discovered in old and widely used versions, resulting in a need for dependents to move to a new release of the package. In the case of colors, a new release introduced breaking changes. This resulted in an initial automated surge to the problematic version, followed by a concerted effort for dependents to move to an older release of the package.
While the widespread use of open dependency constraints in npm led to a rapid and widespread impact of colors, it was also helpful in its mitigation. Conversely Maven’s approach of favoring stability resulted in difficulty resolving log4shell, but also means Maven is much less susceptible to a colors-type incident. Neither approach is obviously superior, just different.
While there is no silver bullet solution, there are best practices that consumers, maintainers, and packaging system developers can observe to reduce risk. Always understand your dependencies and why they were chosen, and always make sure your dependency requirements are well maintained.