Introducing Open Source Insights
Modern software is more than just some lines of code checked into a repository. To build almost any program, one must also install packages from other developers. These external dependencies are critical components of today’s software environment, and tooling has been created to make it easy to install dependencies and update them as required. As a result, the past few years have witnessed a phenomenal growth in the open source ecosystem as well as a marked increase in the average number of dependencies for a given package. Meanwhile, many of these packages are being changed—fixed, expanded, updated—regularly.
The rate of change is significant. Our analysis shows that roughly 15% of the packages in npm see changes to their dependency sets each day, while a majority of the change is in packages that are widely used.
This activity affects not just your own software, and not even just the software you call upon, but the entire set of your software’s dependencies, which may be much larger than those listed explicitly by your project. It is common to see one package use a handful of other packages that in turn have a hundred or more dependencies of their own. Many of the most commonly used packages in open source have large dependency trees that will be pulled in by the installation process.
Today’s software is therefore built upon a constantly-changing foundation, and keeping track of that churn is challenging. Your package changes, your dependencies change, their dependencies change, and so on. Even the most diligent developers struggle to keep up beyond letting the tooling download updates to all the dependencies from time to time. Tooling helps manage the updates, but cannot guarantee what the right update is, or when the time is right to apply it.
It’s easy to miss important problems deep in the dependencies, such as security vulnerabilities, license conflicts, or other issues. The tools just do what they are told, and if a nested dependency has an issue, it will be installed regardless. Systems have been compromised or exploited by dependencies that acquired malicious changes that were undetected, sometimes for long periods.
The Open Source Insights project aims to help. It collects information about open source projects—source code, licenses, releases, vulnerabilities, owners, and more—and gathers it into a single location, making it accessible. These interfaces help developers and project owners see the full dependency graph of their projects and can use it to track release activity, vulnerabilities, and other information such as licenses that are used by the components, regardless of how deeply they are nested inside the dependencies.
In short, all the information about a package is connected to all the other packages that depend upon it, and Insights shows the connections. For instance, if your code depends on a package that has a security vulnerability, even if that vulnerability is in a package 10 dependency hops away in a package that you don’t even know about, the Insights page for your package will tell you about it.
The Insights project also helps developers see the importance of their project by showing the projects that depend on them—their dependents. Even a small project is important if a large number of other projects depend on it, either directly or through transitive dependencies.
This blog
To build deps.dev we dove deep into the fundamentals of several different package management systems, collecting and organizing the metadata of millions of packages, and implementing our own bug-for-bug compatible semver parsers, constraint matchers, and dependency resolvers.
Along the way, we’ve learnt about the wider problem space, and the varied challenges that await the unsuspecting programmer. The tools are changing, and inconsistent, and often poorly understood. Many package management systems were not designed with today’s security concerns in mind. Semver constraints are not formally specified, and are implemented arbitrarily by different package managers. There is not widespread agreement on foundational questions such as whether it is better to select the newest or oldest matching version, or whether the ability to “pin” versions is a good or bad thing (and what that even means). Also whether it is good or bad (or possible or impossible) to be able to include multiple versions of the same package in a given program.
In future articles we will explore these issues in detail, comparing the approaches of various communities, so as to contribute to the conversations that push the open source ecosystem forward. We hope that we can converge, as an industry, on some fundamental “good ideas” in the space of managing software dependencies.