A software supply chain reading list
The concept of software supply chains is so hot right now. Attackers are attacking, defenders defending, bloggers blogging and academics … academing?
A giant rush of formal papers, blog posts, published attacks and startups have swarmed to this topic. To a newcomer the field may already seem overwhelming. What I aim to do in this blog post is to provide you with a reading list that will help you orient yourself in the software supply chain landscape.
But first: what is a software supply chain? Basically, it reflects that (1) you rely on external dependencies, (2) those dependencies rely on still other dependencies and (3) all these dependencies, direct and indirect, rely on people and systems outside your control. You don’t write the software, you don’t host the source code, you probably don’t build it yourself and you probably don’t manage the final artifacts yourself. Ultimately the software supply chain is about trustworthiness of dependencies and that, for now, you have little choice but to blindly trust everyone and everything in the supply chain for your own software.
SLSA
A good place to start your journey is to learn about “Supply chain Levels for Software Artifacts”, better known as SLSA (pronounced “Salsa”). SLSA introduces a simple model for how software is written, stored, built, distributed and consumed. It then lays out attacks on various points in this model, along with countermeasures to the attacks. These are bundled into 4 levels of increasing implementation difficulty.
To learn about SLSA, start with the Introduction. Then read the Terminology, Threats and Mitigations and the level Requirements. There are other documents on the SLSA site you can read, but I think these are the essential ones.
Becoming familiar with SLSA will give you a helpful mental model of a basic software supply chain. Now let’s zoom in on one topic area: package repositories.
Package repositories
In Ruby-land, this means rubygems.org, the beloved and long-running repository of RubyGems. But it shares a lot of problems, threats and risks in common with its cousins in other language ecosystems, such as PyPI (Python), npm (JavaScript) or Maven Central (Java). If you use OSS software, you are almost certainly using such a repository. The security of the repository, and the integrity and authenticity of individual packages, affects your own security.
Threats and Attacks
What are some of these threats? Academics have begun to till the soil and produced very helpful literature for understanding the threats facing repositories.
A recent and very thorough publication is Taxonomy of Attacks on Open-Source Software Supply Chains by Ladisa et al. In this paper the authors enumerate a complete attack tree for software packages distributed through software repositories, running from social engineering attacks on package authors (e.g., talking them into handing over maintainership to a malicious party) through to attacks on the repository itself. There is a lot of conceptual overlap with SLSA, but this paper really gets into the nitty-gritty. The taxonomy itself is starting to be used for efforts to collect statistics on attacks, so expect to see more of it.
An earlier paper from the same research group was the entertainingly-named Backstabber’s Knife Collection: A Review of Open Source Software Supply Chain Attacks by Ohm et al. Here the emphasis was more about data on attacks rather than a taxonomy derived from both attacks and threat modeling exercises. This paper was influential in forming Shopify’s supply chain strategy and is still worth reading.
Also influential was Towards Measuring Supply Chain Attacks on Package Managers for Interpreted Languages by Duan et al. This paper set out to study the incidence of malicious packages that had been uploaded to software package repositories, using a custom analysis pipeline that looked for patterns of malicious behavior. This paper importantly shaped our thinking about the key attacks that occur in practice. The first is typosquatting or combosquatting, where a package is named to be confused for a legitimate package (eg. rials instead of rails). The second is account takeover, where a legitimate maintainer’s account is attacked, allowing the attacker to then upload a malicious release of legitimate packages. The third and final risk is compromise of the package repository itself. Without drowning you in detail, Shopify has been working on all of these risks.
Countermeasures
So what does one do about these attacks? For typo/combosquatting things are still raw. Multiple repositories have tried various schemes for detecting malicious packages, but results are spotty and false positives are high. This is very much TBD.
For account takeovers, the main answer is: enable Multi-factor Authentication (MFA)! MFA means that you add an additional way of proving your identity alongside username and password (e.g. providing a timed code from an app on your phone, or using a hardware security token). Shopify has invested heavily in improving the state of MFA in rubygems.org. We helped drive a policy that required MFA for the owners of gems with more than 180 million total downloads. You can set up MFA for your account easily, and you can even require all owners of a gem to use MFA when pushing a gem. Please consider doing so — in fact, turn on MFA everywhere you can.
Our latest focus in MFA is to introduce WebAuthn support. WebAuthn is now the basis for the Passkey effort being rolled out by Google, Apple and Microsoft. The Guide to Web Authentication page is a decent introduction to the topic, all the way to explaining how to implement it yourself.
But the big scary thing to worry about is repository compromise. In this scenario an attacker either takes control of the repository application (and by extension its database), or they take control of the bucket which contains the raw package files, or both. In any case, they are now in a position to upload malicious packages which look legitimate, or to modify legitimate packages to be malicious, or to delete packages entirely. None of these are good outcomes.
The first countermeasure folks are thinking about is signing packages and publishing those signatures. The distinction is important: package signing has been around for a long time but very few people bother to do it because it’s finicky and fragile. The big development in this space has been the emergence of Sigstore, which provides “keyless signing”.
Sigstore is notorious for being difficult to grok until the “aha!” moment happens and you get why it’s such a big deal. You can glean some amount of information from the website but the best in-depth introduction I’ve seen is Sigstore: Software Signing for Everybody by Newman, Meyers & Torres-Arias. Once you’ve gotten the basics down you can choose to dig more deeply into the underlying concepts, particular OAuth2/OIDC and transparency logs. For OAuth2/OIDC the best source I’ve come across is OAuth2 in Action by Richer & Sanso. It’s book-length but does a great job of walking through the protocols and how you can implement them. But probably more interesting is the concept of a transparency log, for which the first major application was certificate transparency. How CT fits into the wider Web PKI ecosystem is a good guide on how it works for certificates, the ideas and mechanisms are very similar for software signing.
But package signing doesn’t give all the guarantees you might want. In particular it only protects the integrity and authenticity of a single version of a single package at a time. If you want to protect the whole repo as a single unit, you need to reach for The Update Framework, aka TUF. Like Sigstore this is a little notorious for being difficult to grok at first glance. My advice is to bang your head against the official documentation for a while, try to sketch out how the key hierarchy works, then finally read Dan Lorenc’s The Update Framework and you to pull the pieces together. TUF is tough, don’t worry if you need to make a few passes at it.
Conclusion
This reading list is wildly incomplete. There are many topics I haven’t touched on (eg. dependency confusion) and many great readings have been left out. But hopefully I’ve given you enough of a taste, and enough background knowledge, to continue your supply chain journey on your own. If you’d like to keep reading, startup Chainguard have published their own reading list.
I’ll leave you though with one more recommendation: Security Engineering, 3rd Ed by Ross Anderson. It’s a long read, but very rewarding. It provides the best high-level survey of security as a total subject found anywhere. If you do no other security reading in your career, at least read this one.