Understanding the Impact of Apache Log4j Vulnerability

James Wetter and Nicky Ringland, Open Source Insights Team

More than 17,000 Java packages, amounting to over 4% of the Maven Central repository (the most significant Java package repository), have been impacted by the recently disclosed log4j vulnerabilities (1, 2), with widespread fallout across the software industry.1 The vulnerabilities allow an attacker to perform remote code execution by exploiting the insecure JNDI lookups feature exposed by the logging library log4j. This exploitable feature was enabled by default in many versions of the library.

This vulnerability has captivated the information security ecosystem since its disclosure on December 9th because of both its severity and widespread impact. As a popular logging tool, log4j is used by tens of thousands of software packages (known as artifacts in the Java ecosystem) and projects across the software industry. User’s lack of visibility into their dependencies and transitive dependencies has made patching difficult; it has also made it difficult to determine the full blast radius of this vulnerability. Using Open Source Insights, a project to help understand open source dependencies, we surveyed all versions of all artifacts in the Maven Central Repository to determine the scope of the issue in the open source ecosystem of JVM based languages, and to track the ongoing efforts to mitigate the affected packages.

How widespread is the log4j vulnerability?

As of December 16, 2021, we found that over 17,000 of the available Java artifacts from Maven Central depend on the affected log4j code. This means that more than 4% of all packages on Maven Central have at least one version that is impacted by this vulnerability.1 (These numbers do not encompass all Java packages, such as directly distributed binaries, but Maven Central is a strong proxy for the state of the ecosystem.)

As far as ecosystem impact goes, 4% is enormous. The average ecosystem impact of advisories affecting Maven Central is 2%, with the median less than 0.1%.

Maven ecosystem affected by the vulnerability
Maven ecosystem affected by the vulnerability

Direct dependencies account for around 3,500 of the affected artifacts, meaning that any of its versions depend upon an affected version of log4j-core as described in the CVEs. The majority of affected artifacts come from indirect dependencies (that is, the dependencies of one’s own dependencies), meaning log4j is not explicitly defined as a dependency of the artifact, but gets pulled in as a transitive dependency.

Direct dependencies vs. indirect dependencies
Direct dependencies vs. indirect dependencies

What is the current progress in fixing the open source JVM ecosystem?

We counted an artifact as fixed if the artifact had at least one version affected and has released a greater stable version (according to semantic versioning) that is unaffected. An artifact affected by log4j is considered fixed if it has updated to 2.16.0 or removed its dependency on log4j altogether.

At the time of writing, nearly five thousand of the affected artifacts have been fixed. This represents a rapid response and mammoth effort both by the log4j maintainers and the wider community of open source consumers. That leaves over 12,000 artifacts affected, many of which are dependent on another artifact to patch (the transitive dependency) and are likely blocked.

An example of affected artifact blocked by an intermediate dependency
An example of affected artifact blocked by an intermediate dependency

Why is fixing the JVM ecosystem hard?

Most artifacts that depend on log4j do so indirectly. The deeper the vulnerability is in a dependency chain, the more steps that may be required for it to be fixed. The following diagram shows a histogram of how deeply an affected log4j package (core or api) first appears in consumers dependency graphs. For greater than 80% of the packages, the vulnerability is more than one level deep, with a majority affected five levels down (and some as many as nine levels down). These packages may require fixes throughout all parts of the tree, starting from the deepest dependencies first.

Depth of log4j dependencies
Depth of log4j dependencies

Another difficulty is caused by ecosystem-level choices in the dependency resolution algorithm and requirement specification conventions.

In the Java ecosystem, it’s common practice to specify “soft” version requirements — exact versions that are used by the resolution algorithm if no other version of the same package appears earlier in the dependency graph. Propagating a fix often requires explicit action by the maintainers to update the dependency requirements to a patched version.

This practice is in contrast to other ecosystems, such as npm, where it’s common for developers to specify open ranges for dependency requirements. Open ranges allow the resolution algorithm to select the most recently released version that satisfies dependency requirements, thereby pulling in new fixes. Consumers can get a patched version on the next build after the patch is available, which propagates up the dependencies quickly. (This approach is not without its drawbacks; pulling in new fixes can also pull in new problems.)

How long will it take for this vulnerability to be fixed across the entire ecosystem?

It’s hard to say. We looked at all publicly disclosed critical advisories affecting Maven packages to get a sense of how quickly other vulnerabilities have been fully addressed. Less than half (48%) of the artifacts affected by a vulnerability have been fixed, so we might be in for a long wait, likely years.

But things are looking promising on the log4j front. After less than a week, around 25% of affected artifacts have been fixed. This, more than any other stat, speaks to the massive effort by open source maintainers, information security teams and consumers across the globe.

Where to focus next?

Thanks and congratulations are due to the open source maintainers and consumers who have already upgraded their versions of log4j. As part of our investigation, we pulled together a list of 500 affected packages with some of the highest transitive usage. If you are a maintainer or user helping with the patching effort, prioritizing these packages could maximize your impact and unblock more of the community.

We encourage the open source community to continue to strengthen security in these packages by enabling automated dependency updates and adding security mitigations. Improvements such as these could qualify for financial rewards from the Secure Open Source Rewards program.

You can explore your package dependencies and their vulnerabilities by using Open Source Insights.

  1. When this blog post was initially published, count numbers included all packages dependent on either log4j-core or log4j-api, as both were listed as affected in the CVE. The numbers have been updated to account for only packages dependent on log4j-core.  2