James Wetter and Nicky Ringland, Open Source Insights Team
How can a user of open source software (OSS) assess their risk of exposure to a
future vulnerability when taking on a new dependency?
Vulnerabilities will always find their way into software, and in an ideal world
those vulnerabilities will be fixed in a reasonable amount of time. This is a
critical factor for building trust between OSS maintainers and the users of
This blog post looks at the events around the remediation of a vulnerability,
and a few ways that trust can be established between maintainers and users of
OSS. In particular we examine how often OSS packages remediate known
vulnerabilities and if their users were left exposed after the vulnerability was
An ideal remediation
next-auth is an npm package that provides
tools to help implement authentication for the web development framework
Next.js. next-auth is popular, with almost 200,000
weekly downloads according to npm. Recently an
was published detailing a critical vulnerability in the next-auth package. Due
to this vulnerability, an attacker could potentially gain access to another
Fortunately for the users of next-auth, the reporter of the vulnerability and
package maintainer practiced coordinated vulnerability
a result a fixed version of next-auth was already available when this advisory
was published. Both versions 4.10.3 and 3.29.10 include a patch remediating the
The advisory itself contains a brief timeline of key events. The vulnerability
was discovered by Socket, and privately disclosed to the
maintainers of next-auth on the 26th of July. The maintainers acknowledged the
private disclosure within 1 hour, and released remediating versions on the 1st
of August. Two days later, an advisory disclosing the vulnerability was
published. The time between private disclosure and the release of a fix, the
time to remediation, was approximately 5 working days.
This situation is ideal. Both the private disclosure of the vulnerability and
rapid response of the package maintainers meant that the two most recent major
versions both had patched versions available for users before the publication of
By the time the advisory was published, most users of the next-auth package
would be able to move to a patched version immediately with little effort. This
virtually eliminated the post-advisory exposure time for the many users of the
What can go wrong?
Things don’t always work out as well as this, though. There are a few ways in
which the process could go awry such as the discovery of a zero-day exploit, or
a vulnerability in an unmaintained package.
A zero-day exploit
A zero-day exploit is when a vulnerability is being actively exploited by the
time the package maintainers become aware of the issue. In these situations it
may be better to publish an advisory before the maintainers have developed a
patch in order to raise awareness as quickly as possible. This was the case for
the well publicized remote code injection
vulnerability in the
In this scenario, it is not reasonable to expect the maintainers to remediate
the vulnerability before the advisory is published - increased awareness is a
higher priority. And as a result the users of the package will be exposed to a
publicly known vulnerability until a remediation is made available, or they
remove their dependency on the affected package.
An unmaintained package
When a vulnerability is discovered in a package that is no longer maintained
there will be no response to private disclosure, leaving the reporter no choice
but for the reporter to publish an advisory without a fix available.
An example of this situation is the once popular npm package
parsejson. Its most recent release has an
unremedied, high severity
vulnerability that was
publicly disclosed in 2018. But the package hasn’t seen a new release since 2016.
Its GitHub repository has been archived
and clearly states that it is no longer maintained. Worryingly, the package is
still widely used: npm reports that the package still gets almost 250,000 weekly
It’s clear that users of OSS should not introduce new dependencies on an
unmaintained package like parsejson. Existing users should remove such
dependencies from their libraries and applications as quickly as they can. But
it can be hard for a developer to know when one of their dependencies is no
longer maintained or less actively maintained. Signals to help identify changes
in the maintenance status are critical.
What usually happens after an advisory?
For our discussion here, we consider a package to have remediated an advisory
when it has a release that
is not affected by the advisory, and
has a greater version number than all affected releases.
The semantics of versions and release differ between systems. For example PyPI
uses pep440, while npm uses semantic
This definition of remediation means that if the greatest major version of a
package has a fix available, the package is considered to have remediated the
vulnerability even if lesser major versions remain affected. There is more to be
said about packages that have multiple major versions, each of which may be
fixed independently, but we will leave a discussion of the nuance of
vulnerabilities and multiple major versions for another time.
First let’s take a look at how many known vulnerabilities are remediated.
Across every package management system supported by deps.dev, we see that most
package maintainers do respond to vulnerability advisories in their packages.
There is considerable variation between ecosystems. The lower clearance rate
seen in the Cargo ecosystem is expected. Within that ecosystem, there is a
practice of publishing an advisory that a package is unmaintained, such as this
advisory and this
advisories are not expected to be remediated, but publishing them helps raise
awareness of the package’s unmaintained state amongst its users.
Taking a closer look at individual packages, the clearance rate of
vulnerabilities gives an indication of the health of the package, and consequent
risk of using the package. Some packages have a very high number of known
vulnerabilities in older versions, but all of the vulnerabilities have been
remediated. For example
These packages are healthy and well maintained, and their high clearance rates
are a good indication of that.
Post-advisory exposure time
Now let’s consider how long users are exposed to a known vulnerability without a
fix. That is, the interval between the publication of an advisory and the
publication of a release to remediate it. We call this the post-advisory
The PyPI, Cargo and npm packaging systems expose the publication times for each
version. Using this data we can examine the post-advisory exposure time.
At a glance these graphs paint a positive picture. Each ecosystem appears
healthy, with the majority of vulnerabilities disclosed in an advisory being
remediated very quickly. This demonstrates that security is a priority for most
But it should be noted that vulnerabilities where coordinated disclosure was
successful will have zero post-advisory exposure time (or even negative
time!). In npm and PyPI almost 60% of the vulnerabilities in our database were
remediated before the publication of the corresponding advisory. Cargo has a
much lower percentage, around 16%; more on that shortly.
Let’s direct our attention to cases that did not see a coordinated vulnerability
disclosure. The following histograms show the post-advisory exposure time,
excluding successfully coordinated disclosures.
In all three systems, many vulnerabilities are remediated within 30 days of
advisory publication. This includes many zero-day exploits, such as
log4shell, that were fixed as quickly
as possible, even without the more ideal option of a coordinated vulnerability
In the case of Cargo, the number resolved in the first 30 days is a staggering
70% of all vulnerabilities remediated after advisory publication. This is
because many maintainers choose to release the remediation on the same day the
advisory is published, resulting in non-zero but very brief post-advisory
The long tail of vulnerabilities with significant post-advisory exposure time is
a valuable signal on the health of the corresponding packages. For developers
taking on new dependencies, knowing that they will not be left exposed for long
periods of time is critical to their security posture. For existing users of a
dependency, being aware of changes to future remediation likelihood of potential
vulnerabilities is equally important.
Currently it is hard to know how a given package has previously performed
according to this metric. Ideally this information would be easily accessible,
allowing potential and existing users to make informed decisions about their
Mean time to remediation
The number of known vulnerabilities that a package maintainer has remediated in
the past can be used to help build trust between maintainers and users of
OSS. Additionally, the length of time users of a package were left exposed to
known, unremedied vulnerabilities in the past can provide a more detailed
characterization of a package maintainer’s response.
In addition to these signals, Mean Time to Remediation (MTTR) has been proposed
as a useful indicator of the quality of a package’s maintenance.
However, the available data about advisories rarely contains timestamps for
critical events in the remediation process. For example, most advisory
databases, including GitHub Advisories and OSV, do not provide a timestamp field
for the private disclosure of the vulnerability or the maintainers
acknowledgement. And while some advisory write-ups do include an event timeline,
these are quite rare.
These missing timestamps make it impossible to compute the time that elapsed
between a maintainer being notified of a vulnerability and the release of a
remediation, relegating MTTR to a, for now, still hypothetical metric to
Vulnerabilities are an inevitable part of software development. The code reuse
and efficiency gains made possible by OSS broadens the potential impact of
But cooperation between parties that discover vulnerabilities and package
maintainers reduces the time that users are left exposed to publicly known
vulnerabilities. Thanks to the hard work of OSS maintainers, there is no
post-advisory exposure for the majority of vulnerabilities in our advisory
Developers should still prepare for less ideal outcomes. Every dependency they
introduce increases the risk of exposure to future vulnerabilities. The
clearance rate and post-advisory exposure time for past advisories can provide
users of OSS assurance about the quality of maintenance their dependencies
receive. While past performance may not always predict future behavior, it can
be used as a valuable signal to help make informed decisions.
Open source software powers the world. Open source libraries allow
developers to build things faster, organizations to be more nimble,
and all of us to be more productive.
But dependencies bring complexity. Popular open source packages are
often used directly or indirectly by a significant portion of the
packages within an ecosystem. As a result, a vulnerability in a
popular package can have a massive impact across an entire ecosystem.
Different software ecosystems have different conventions for
specifying dependency requirements and different algorithms for
resolving them. We will take a look at a couple of large profile
incidents that discuss some of these differences.
The amplification of vulnerability impact
To measure the potential impact of a vulnerability, we can look at how
many dependents it has. That is, how many other packages that use a
specific version that is affected by a vulnerability. We can get a
view of an ecosystem by looking at all package versions that are
affected - either directly or indirectly - by a vulnerability.
First off: packages that are directly affected. At the time of
writing, across all the packaging systems supported by deps.dev, over
200 thousand package versions (0.4%) are directly named as vulnerable
by a known advisory.
In contrast, almost 15 million package versions (33%) are affected
only indirectly, by having an affected package in their dependency
graph. That’s two orders of magnitude difference!
That underpins just how hard it is to fix a vulnerability in an
ecosystem. When a package explicitly named by an advisory publishes a
fix for the issue, the story is far from over. Many users of the
packaging ecosystem will still be at risk, because they depend on
vulnerable versions of the package deep within their dependency
graphs. Fixing the directly affected package is often only the tip of
Addressing vulnerabilities in your dependencies
There are several ways an application maintainer could mitigate a
vulnerability affecting one of their dependencies. Let’s be kind to
our hypothetical maintainer and consider a simple dependency tree with
two layers of dependencies.
If this maintainer is lucky, they depend on the affected package
directly. That means as soon as the affected package publishes a fixed
version they can update their project or application to depend on the
But if the vulnerable package is among their indirect dependencies the
situation could be much more complex.
In the best case scenario, the intermediate packages already depend on
the patched version.
If this is not the case, our hypothetical maintainer may still have a
course of action. To update to the fixed version of the indirect
dependency the maintainer may be able to specify the fixed version as
a minimum for the entire dependency graph. For this to work, however,
the fixed version of the affected package and its direct dependents
must be compatible. If not, the maintainer may have to wait for a new
release of the intermediate dependent.
Another alternative is to remove the dependency on the affected
package. But this often involves considerable effort; you would never
have added a dependency without good reason, right?
In practice, dependency trees are rarely so simple and clean. Usually
they are complex, interconnected graphs. Just take a look at the
dependency graphs for popular frameworks and tools like
These complex graphs can make remediating a vulnerability far more
difficult than the simple examples given above. There may be many
paths through which a fix must propagate before it gets to you. Or, in
order to remove a dependency, you might need to remove a significant
portion of your dependency graph.
For example, consider the many
by which one package depends on a vulnerable version of log4j:
With this in mind, perhaps you can imagine why it often takes a long
time for a patched version of a popular package to roll out to the
log4shell in the Maven Central ecosystem
On December 9th last year, over 17,000 of the Java packages available
from Maven central were impacted by the log4j vulnerabilities, known
as log4shell, resulting in widespread fallout across the software
industry. The vulnerabilities allowed an attacker to perform remote
code execution by exploiting the insecure JNDI lookups feature exposed
by the logging library log4j. This exploitable feature was present in
multiple versions, and was enabled by default in many versions of the
library. We wrote about this incident shortly after it occurred in a
A new version of log4j with the vulnerability patched (albeit with few
false starts due to incomplete fixes) was available almost
immediately. So once that patched version was published had the
ecosystem freed itself of log4shell? Unfortunately not. Part of what
makes fixing log4shell hard is Java’s conventions on how dependency
requirements are specified, and Maven’s dependency resolution
In the Java ecosystem, it’s common practice to specify “soft” version
requirements. That is, the dependent package specifies a single
version for each dependency, which is usually the version that ends up
being used. (The dependency resolution algorithm may choose a
different version under certain rare conditions – for example, a
different version already in the graph). While it is possible to
specify ranges of suitable versions, this is unusual. More than 99% of
dependency requirements in the Maven Central ecosystem are specified
using soft requirements.
Here’s where Maven’s dependency resolution algorithm comes in. Since
almost all the time, a specific version has been specified, that’s
almost always the version that the dependency resolution will pick. So
if a newer version with that important new bug fix is released, it
won’t be included automatically. It usually requires explicit action
by the maintainer to update the dependency requirements to a patched
In this case, consumers of any one of the 17,000 odd packages affected
by the log4j vulnerabilities would likely still depend on an affected
version of log4j, even after the first fix was published. Ideally the
maintainers of around 4,000 packages that directly depend on log4j
would promptly release a new version of their package that explicitly
requires a fixed version of log4j. Then the maintainers of packages
that depend on those packages can update their version requirements,
and then maintainers of those packages, and so on. There are methods
to pin the version of indirect dependencies accelerating this process,
but many consumers rely on the default behavior of their tools.
It’s been over six months since the log4 advisory was disclosed. How
well has the underlying fix to log4shell propagated throughout the
ecosystem? A little less than a week after the disclosure around 13%
of affected packages had remediated the issue by releasing a new
version. 10 days after disclosure this number had risen to around
25%. Now a few months after that we see around 40% of the affected
packages have remediated the problem. Considering how widespread the
problem was, and the complexity of the dependencies between packages,
this is amazing progress, but there’s clearly a lot more to go.
Default versions: new or old?
Package managers differ in which versions they choose to install by
default. For example, systems like Maven or Go err on the side of
choosing earlier matching versions, while npm and Pip tend to choose
later versions. This design choice can have a big impact on how a fix
rolls out or, conversely, how quickly an exploit can propagate.
Choosing the earlier versions has the benefit of stability; dependency
graphs remain stable whether you install today or tomorrow, even if
new versions are released. The downside is that the consumer must be
conscientious in updating their dependencies when security issues
Choosing the later versions has the benefit of currency; you get the
latest fixes automatically just by reinstalling. The downside here is
that your dependencies can change underfoot, sometimes in dramatic and
With this in mind, if log4shell had occurred in the npm or PyPI
ecosystems the story would have been quite different. In these
ecosystems, packages typically ask for the most recent compatible
versions of their dependencies.
Looking at the dependency requirements across all versions of all
packages in npm we find around three quarters use the caret (^) or
tilde (~) allowing a new patch or minor version of the dependency to
be automatically selected when available. When adhering to semantic
versioning, this means that many users will use
the newest release with a compatible API by default.
This practice would likely have been a substantial benefit in
remediating a log4shell-like event, where a vulnerability is
discovered in widely used versions of a popular package.
But as we shall see, sometimes we really, really don’t want to use the
The case of colors
In early January 2022, the developer of the popular npm packages
colors and faker intentionally published several releases containing
breaking changes. These were picked up rapidly due to the npm
resolution algorithm preferencing recent releases, and the norm in
compatible versions automatically.
At the time of the incident, more than 100,000 packages’ most recent
releases depended on a version of colors, and around half of them had
a dependency on a problematic version. The following graph shows the
dependency flow in the ecosystem over the 72 hours where the action
About half the packages depending on colors remained unaffected
throughout the incident because they depended on earlier versions of
colors. But the other half of packages had some rapid and widespread
changes in the exact version of colors that would have been used
depending on the time at which their dependencies were resolved.
The first problematic version was 1.4.44-liberty-2. Due to version
naming conventions this isn’t considered a stable version and as a
result it wasn’t depended on by many packages.
A few hours later version 1.4.1 was released, and almost all packages
using the 1.4 minor version immediately began to depend on this
problematic version. Several hours later, 1.4.2 was released, and
again most packages affected by the incident immediately depended on
this new problematic version. After a few more hours npm stepped in
and removed all the bad versions of colors, at which point all
dependents moved back to safe versions.
The speed of this incident impacting the ecosystem was rapid. But so
too was the rapid response of maintainers. Between the initial release
of bad versions and their removal from npm, a period of less than 72
hours, nearly half of all affected packages were able to mitigate the
issue. A small number of packages were able to remove their dependency
on colors, about 4% of affected packages, seen as a drop in total
number of dependent packages. Many more packages, 40% of those
affected, were able to pin the version of colors being used to a safe
version. This can be seen in the gradual increase of packages
depending on an unaffected 1.4.x version.
Interestingly this rapid mitigation was the work of very few
people. Just a little over 1% of the affected packages actually made a
release during this time period. But their work resulted in 43% of the
total affected packages mitigating the issue. This is a result of the
same use of open dependency requirements that allowed the rapid spread
of the issue and enabled rapid mitigation.
Every dependency is a trust relationship
The colors and log4shell incidents were very similar in terms of
wide-reaching impact, but quite different in onset and response. In
the case of log4shell, a new vulnerability was discovered in old and
widely used versions, resulting in a need for dependents to move to a
new release of the package. In the case of colors, a new release
introduced breaking changes. This resulted in an initial automated
surge to the problematic version, followed by a concerted effort for
dependents to move to an older release of the package.
While the widespread use of open dependency constraints in npm led to
a rapid and widespread impact of colors, it was also helpful in its
mitigation. Conversely Maven’s approach of favoring stability resulted
in difficulty resolving log4shell, but also means Maven is much less
susceptible to a colors-type incident. Neither approach is obviously
superior, just different.
While there is no silver bullet solution, there are best practices
that consumers, maintainers, and packaging system developers can
observe to reduce risk. Always understand your dependencies and why
they were chosen, and always make sure your dependency requirements
are well maintained.
James Wetter and Nicky Ringland, Open Source Insights Team
More than 17,000 Java packages, amounting to over 4% of the Maven
Central repository (the most significant Java
package repository), have been impacted by the recently disclosed
widespread fallout across the software industry.1 The
vulnerabilities allow an attacker to perform remote code execution by
exploiting the insecure JNDI lookups feature exposed by the logging
library log4j. This exploitable feature was enabled by default in many
versions of the library.
This vulnerability has captivated the information security ecosystem
since its disclosure on December 9th because of both its severity and
widespread impact. As a popular logging tool, log4j is used by tens of
thousands of software packages (known as artifacts in the Java
ecosystem) and projects across the software industry. User’s lack of
visibility into their dependencies and transitive dependencies has
made patching difficult; it has also made it difficult to determine
the full blast radius of this vulnerability. Using Open Source
Insights, a project to help understand open source
dependencies, we surveyed all versions of all artifacts in the Maven
Central Repository to determine the scope of
the issue in the open source ecosystem of JVM based languages, and to
track the ongoing efforts to mitigate the affected packages.
How widespread is the log4j vulnerability?
As of December 16, 2021, we found that over 17,000 of the available
Java artifacts from Maven Central depend on the affected log4j
code. This means that more than 4% of all packages on Maven Central
have at least one version that is impacted by this vulnerability.1
(These numbers do not encompass all Java packages, such as directly
distributed binaries, but Maven Central is a strong proxy for the
state of the ecosystem.)
As far as ecosystem impact goes, 4% is enormous. The average ecosystem
impact of advisories affecting Maven Central is 2%, with the median
less than 0.1%.
Direct dependencies account for around 3,500 of the affected
artifacts, meaning that any of its versions depend upon an affected
version of log4j-core as described in the CVEs. The majority of
affected artifacts come from indirect dependencies (that is, the
dependencies of one’s own dependencies), meaning log4j is not
explicitly defined as a dependency of the artifact, but gets pulled in
as a transitive dependency.
What is the current progress in fixing the open source JVM ecosystem?
We counted an artifact as fixed if the artifact had at least one
version affected and has released a greater stable version (according
to semantic versioning) that is unaffected. An
artifact affected by log4j is considered fixed if it has updated to
2.16.0 or removed its dependency on log4j altogether.
At the time of writing, nearly five thousand of the affected artifacts
have been fixed. This represents a rapid response and mammoth effort
both by the log4j maintainers and the wider community of open source
consumers. That leaves over 12,000 artifacts affected, many of which
are dependent on another artifact to patch (the transitive dependency)
and are likely blocked.
Why is fixing the JVM ecosystem hard?
Most artifacts that depend on log4j do so indirectly. The deeper the
vulnerability is in a dependency chain, the more steps that may be
required for it to be fixed. The following diagram shows a histogram
of how deeply an affected log4j package (core or api) first appears in
consumers dependency graphs. For greater than 80% of the packages, the
vulnerability is more than one level deep, with a majority affected
five levels down (and some as many as nine levels down). These
packages may require fixes throughout all parts of the tree, starting
from the deepest dependencies first.
Another difficulty is caused by ecosystem-level choices in the
dependency resolution algorithm and requirement specification
In the Java ecosystem, it’s common practice to specify
version requirements — exact versions that are used by the resolution
algorithm if no other version of the same package appears earlier in
the dependency graph. Propagating a fix often requires explicit action
by the maintainers to update the dependency requirements to a patched
This practice is in contrast to other ecosystems, such as npm, where
it’s common for developers to specify open ranges for dependency
requirements. Open ranges allow the resolution algorithm to select the
most recently released version that satisfies dependency requirements,
thereby pulling in new fixes. Consumers can get a patched version on
the next build after the patch is available, which propagates up the
dependencies quickly. (This approach is not without its drawbacks;
pulling in new fixes can also pull in new problems.)
How long will it take for this vulnerability to be fixed across the entire ecosystem?
It’s hard to say. We looked at all publicly disclosed critical
advisories affecting Maven packages to get a sense of how quickly
other vulnerabilities have been fully addressed. Less than half (48%)
of the artifacts affected by a vulnerability have been fixed, so we
might be in for a long wait, likely years.
But things are looking promising on the log4j front. After less than a
week, around 25% of affected artifacts have been fixed. This, more
than any other stat, speaks to the massive effort by open source
maintainers, information security teams and consumers across the
Where to focus next?
Thanks and congratulations are due to the open source maintainers and
consumers who have already upgraded their versions of log4j. As part
of our investigation, we pulled together a
of 500 affected packages with some of the highest transitive usage. If
you are a maintainer or user helping with the patching effort,
prioritizing these packages could maximize your impact and unblock
more of the community.
We encourage the open source community to continue to strengthen
security in these packages by enabling automated dependency updates
and adding security mitigations. Improvements such as these could
qualify for financial rewards from the Secure Open Source Rewards
When this blog post was initially published, count numbers
included all packages dependent on either log4j-core or log4j-api,
as both were listed as affected in the CVE. The numbers have been
updated to account for only packages dependent on log4j-core. ↩↩2
We’re pleased to announce deps.dev now has support
for Python packages hosted on the Python Package Index (PyPI). That
means we have over 300k—and counting—Python packages for your
perusal, from boto3 to
Where does the data come from?
We use PyPI’s RSS Feeds to
stay abreast of new and updated packages, with an occasional full sync from the
Simple Repository API.
For each package version, we fetch metadata from the
JSON API and analyze it
to resolve its dependencies, determine the license, and so on.
Dependency resolution is complex in any language, and Python is no
exception. Sometimes you might see an error message about a particular version
of a package. The most common reason for this is packages that only provide a
that specifies the dependencies in a setup.py—which is hard to run safely and
may not even be deterministic. This is not a problem with
as they do not require executing arbitrary Python code to understand the
dependencies. Of course there are any number of other things that can go wrong,
and Python has a long history of packaging formats, so if you find anything not
working as expected, don’t hesitate to
get in touch.
Where do the dependencies come from?
We periodically resolve the full dependencies of every package version we know
about. In pip terms, the graph we show for version 1.0.0 of package a consists
of the packages that would be installed by running pip install a==1.0.0 in a
clean environment with recent versions of setuptools and wheel available.
These graphs are dependent on the versions of both Python and pip, as well as
the operating system, CPU architecture, and so on. It’s not uncommon for
packages to publish different wheels for various different combinations of all
of these, and for each release to have its own metadata with potentially
distinct dependencies. Currently we perform resolution as if we were runnning
pip 21.1.3 with Python 3.9 on an x86_64
manylinux compatible platform, with more
combinations on the way. We think it’s an accurate reproduction but if you see
anything unexpected, please let us know!
We’re excited to add PyPI to our set of supported language ecosystems, and
epecially keen to start digging into the data and do some comparative
analysis. From our first look, there are plenty of interesting things to
uncover, for instance:
4 of the 5 most depended on packages are all dependencies of the 6th most
depended on package: requests
more than half of all package versions on PyPI have zero dependencies,
compared to ≈15-25% across Go, npm, Cargo and Maven
this small package has one
of the lowest ratios of direct to indirect dependents we’ve seen across all
We’re also working on improving our license recognition and figuring out how to
show the differences enabling various
extras makes to the
So slither on in and start exploring! We’ll keep digging into the data and keep
you posted on what we discover!
Modern software is more than just some lines of code checked into a repository.
To build almost any program, one must also install packages from other
developers. These external dependencies are critical components of today’s
software environment, and tooling has been created to make it easy to install
dependencies and update them as required. As a result, the past few years have
witnessed a phenomenal growth in the open source ecosystem as well as a marked
increase in the average number of dependencies for a given package. Meanwhile,
many of these packages are being changed—fixed, expanded,
The rate of change is significant. Our analysis shows that roughly 15% of the
packages in npm see changes to their dependency sets each day, while a majority
of the change is in packages that are widely used.
This activity affects not just your own software, and not even just the
software you call upon, but the entire set of your software’s dependencies,
which may be much larger than those listed explicitly by your project. It is
common to see one package use a handful of other packages that in turn have a
hundred or more dependencies of their own. Many of the most commonly used
packages in open source have large dependency trees that will be pulled in by
the installation process.
Today’s software is therefore built upon a constantly-changing foundation, and
keeping track of that churn is challenging. Your package changes, your
dependencies change, their dependencies change, and so on. Even the most
diligent developers struggle to keep up beyond letting the tooling download
updates to all the dependencies from time to time. Tooling helps manage the
updates, but cannot guarantee what the right update is, or when the time is
right to apply it.
It’s easy to miss important problems deep in the dependencies, such as security
vulnerabilities, license conflicts, or other issues. The tools just do what
they are told, and if a nested dependency has an issue, it will be installed
regardless. Systems have been compromised or exploited by dependencies that
acquired malicious changes that were undetected, sometimes for long periods.
The Open Source Insights project aims to help. It collects
information about open source projects—source code, licenses, releases,
vulnerabilities, owners, and more—and gathers it into a single location,
making it accessible. These interfaces help developers and project owners see
the full dependency graph of their projects and can use it to track release
activity, vulnerabilities, and other information such as licenses that are used
by the components, regardless of how deeply they are nested inside the
In short, all the information about a package is connected to all the other
packages that depend upon it, and Insights shows the connections. For instance,
if your code depends on a package that has a security vulnerability, even if
that vulnerability is in a package 10 dependency hops away in a package that
you don’t even know about, the Insights page for your package will tell you
The Insights project also helps developers see the importance of their project
by showing the projects that depend on them—their dependents. Even a small
project is important if a large number of other projects depend on it, either
directly or through transitive dependencies.
To build deps.dev we dove deep into the fundamentals of
several different package management systems, collecting and organizing the
metadata of millions of packages, and implementing our own bug-for-bug
compatible semver parsers, constraint matchers, and
Along the way, we’ve learnt about the wider problem space, and the varied
challenges that await the unsuspecting programmer. The tools are changing,
and inconsistent, and often poorly understood. Many package management systems
were not designed with today’s security concerns in mind. Semver constraints
are not formally specified, and are implemented arbitrarily by different
package managers. There is not widespread agreement on foundational questions
such as whether it is better to select the newest or oldest matching version,
or whether the ability to “pin” versions is a good or bad thing (and what that
even means). Also whether it is good or bad (or possible or impossible) to be
able to include multiple versions of the same package in a given program.
In future articles we will explore these issues in detail, comparing the
approaches of various communities, so as to contribute to the conversations
that push the open source ecosystem forward. We hope that we can converge, as
an industry, on some fundamental “good ideas” in the space of managing software