Announcing the API

Jesper Särnesjö and Nicky Ringland, Open Source Insights Team

This post was originally published on the Google Security Blog

Today, we are excited to announce the API, which provides free access to the dataset of security metadata, including dependencies, licenses, advisories, and other critical health and security signals for more than 50 million open source package versions.

Software supply chain attacks are increasingly common and harmful, with high profile incidents such as Log4Shell, Codecov, and the recent 3CX hack. The overwhelming complexity of the software ecosystem causes trouble for even the most diligent and well-resourced developers.

We hope the API will help the community make sense of complex dependency data that allows them to respond to—or even prevent—these types of attacks. By integrating this data into tools, workflows, and analyses, developers can more easily understand the risks in their software supply chains.

The power of dependency data

As part of Google’s ongoing efforts to improve open source security, the Open Source Insights team has built a reliable view of software metadata across 5 packaging ecosystems. The data set is continuously updated from a range of sources: package registries, the Open Source Vulnerability database, code hosts such as GitHub and GitLab, and the software artifacts themselves. This includes 5 million packages, more than 50 million versions, from the Go, Maven, PyPI, npm, and Cargo ecosystems—and you’d better believe we’re counting them!

We collect and aggregate this data and derive transitive dependency graphs, advisory impact reports, OpenSSF Security Scorecard information, and more. Where the website allows human exploration and examination, and the BigQuery dataset supports large-scale bulk data analysis, this new API enables programmatic, real-time access to the corpus for integration into tools, workflows, and analyses.

The API is used by a number of teams internally at Google to support the security of our own products. One of the first publicly visible uses is the GUAC integration, which uses the data to enrich SBOMs. We have more exciting integrations in the works, but we’re most excited to see what the greater open source community builds!

We see the API as being useful for tool builders, researchers, and tinkerers who want to answer questions like:

  • What versions are available for this package?
  • What are the licenses that cover this version of a package—or all the packages in my codebase?
  • How many dependencies does this package have? What are they?
  • Does the latest version of this package include changes to dependencies or licenses?
  • What versions of what packages correspond to this file?

Taken together, this information can help answer the most important overarching question: how much risk would this dependency add to my project?

The API can help surface critical security information where and when developers can act. This data can be integrated into:

  • IDE Plugins, to make dependency and security information immediately available.
  • CI/CD integrations to prevent rolling out code with vulnerability or license problems).
  • Build tools and policy engine integrations to help ensure compliance.
  • Post-release analysis tools to detect newly discovered vulnerabilities in your codebase.
  • Tools to improve inventory management and mystery file identification.
  • Visualizations to help you discover what your dependency graph actually looks like:
What you think your dependency graph looks like vs what your dependency graph actually looks like
What you think your dependency graph looks like vs what your dependency graph actually looks like

Unique features

The API has a couple of great features that aren’t available through the website.

Hash queries

A unique feature of the API is hash queries: you can look up the hash of a file’s contents and find all the package versions that contain that file. This can help figure out what version of which package you have even absent other build metadata, which is useful in areas such as SBOMs, container analysis, incident response, and forensics.

Real dependency graphs

The dependency data is not just what a package declares (its manifests, lock files, etc.), but rather a full dependency graph computed using the same algorithms as the packaging tools (Maven, npm, Pip, Go, Cargo). This gives a real set of dependencies similar to what you would get by actually installing the package, which is useful when a package changes but the developer doesn’t update the lock file. With the API, tools can assess, monitor, or visualize expected (or unexpected!) dependencies.

API in action

For a demonstration of how the API can help software supply chain security efforts, consider the questions it could answer in a situation like the Log4Shell discovery:

  • Am I affected? - A CI/CD integration powered by the free API would automatically detect that a new, critical vulnerability is affecting your codebase, and alert you to act.
  • Where? - A dependency visualization tool pulling from the API transitive dependency graphs would help you identify whether you can update one of your direct dependencies to fix the issue. If you were blocked, the tool would point you at the package(s) that are yet to be patched, so you could contribute a PR and help unblock yourself further up the tree.
  • Where else? - You could query the API with hashes of vendored JAR files to check if vulnerable log4j versions were unexpectedly hiding therein.
  • How much of the ecosystem is impacted? - Researchers, package managers, and other interested observers could use the API to understand how their ecosystem has been affected, as we did in this blog post about Log4Shell’s impact.

Getting started

The API service is globally replicated and highly available, meaning that you and your tools can depend on it being there when you need it.

It’s also free and immediately available—no need to register for an API key. It’s just a simple, unauthenticated HTTPS API that returns JSON objects:

# List the advisories affecting log4j 1.2.17
$ curl \
    | jq '.advisoryKeys[].id'

A single API call to list all the GHSA advisories affecting a specific version of log4j

Check out the API Documentation to get started, or jump straight into the code with some examples.

Dependency graph for react 15.0.0, fetched from the API and rendered using GraphViz
Dependency graph for react 15.0.0, fetched from the API and rendered using GraphViz

Securing supply chains

Software supply chain security is hard, but it’s in all our interests to make it easier. Every day, Google works hard to create a safer internet, and we’re proud to be releasing this API to help do just that, and make this data universally accessible and useful to everyone.

We look forward to seeing what you might do with the API, and would appreciate your feedback. (What works? What doesn’t? What makes it better?) You can reach us at, or by filing an issue on our GitHub repo.