Introducing PyPI Support for

Paul Mathews and the Open Source Insights Team

We’re pleased to announce now has support for Python packages hosted on the Python Package Index (PyPI). That means we have over 300k—and counting—Python packages for your perusal, from boto3 to pandas.

Where does the data come from?

We use PyPI’s RSS Feeds to stay abreast of new and updated packages, with an occasional full sync from the Simple Repository API. For each package version, we fetch metadata from the JSON API and analyze it to resolve its dependencies, determine the license, and so on.

Dependency resolution is complex in any language, and Python is no exception. Sometimes you might see an error message about a particular version of a package. The most common reason for this is packages that only provide a source distribution that specifies the dependencies in a—which is hard to run safely and may not even be deterministic. This is not a problem with wheels as they do not require executing arbitrary Python code to understand the dependencies. Of course there are any number of other things that can go wrong, and Python has a long history of packaging formats, so if you find anything not working as expected, don’t hesitate to get in touch.

Where do the dependencies come from?

We periodically resolve the full dependencies of every package version we know about. In pip terms, the graph we show for version 1.0.0 of package a consists of the packages that would be installed by running pip install a==1.0.0 in a clean environment with recent versions of setuptools and wheel available.

These graphs are dependent on the versions of both Python and pip, as well as the operating system, CPU architecture, and so on. It’s not uncommon for packages to publish different wheels for various different combinations of all of these, and for each release to have its own metadata with potentially distinct dependencies. Currently we perform resolution as if we were runnning pip 21.1.3 with Python 3.9 on an x86_64 manylinux compatible platform, with more combinations on the way. We think it’s an accurate reproduction but if you see anything unexpected, please let us know!

What’s next

We’re excited to add PyPI to our set of supported language ecosystems, and epecially keen to start digging into the data and do some comparative analysis. From our first look, there are plenty of interesting things to uncover, for instance:

  • 4 of the 5 most depended on packages are all dependencies of the 6th most depended on package: requests
  • more than half of all package versions on PyPI have zero dependencies, compared to ≈15-25% across Go, npm, Cargo and Maven
  • this small package has one of the lowest ratios of direct to indirect dependents we’ve seen across all package ecosystems.

We’re also working on improving our license recognition and figuring out how to show the differences enabling various extras makes to the dependency graphs.

So slither on in and start exploring! We’ll keep digging into the data and keep you posted on what we discover!