Sigstore: Making sure your software is what it claims to be

119 points by saikatsg 6 months ago

Has anyone implemented this end-to-end? This seems production ready for smaller shops where it's feasible for developers to sign artifacts individually. For a system where you'd want CI to publish artifacts, and then use the k8s policy controller to only run verified artifacts, it seems incomplete.

It appears the reason to include this system in a toolchain would be to meet compliance requirements, but even the GCP, AWS, and Azure implementations of artifact signing & verification are in beta.

woodruffw 6 months ago

> Has anyone implemented this end-to-end?
Yes; I (along with a bunch of other fantastic folks) implemented it end-to-end for both Homebrew[1] and PyPI[2]. This is at a "lower" level than most corporate uses, however: the goal with integrating Sigstore into these OSS ecosystems is not to build up complex verifiable policies (which OSS maintainers don't want to deal with), but to enable signing with misuse-resistant machine identities.
[1]: https://blog.trailofbits.com/2023/11/06/adding-build-provena...
[2]: https://blog.pypi.org/posts/2024-11-14-pypi-now-supports-dig...
- arianvanp 6 months ago
  
  Is GitHub id tokens a misuse-resistent though? It seems a very weak machine identity.
  1. It's a repayable bearer token and not a proof of possession. A compromised action (which 99% of people do not pin to specific hashes) could exfiltrate an id token and use it for 15 minutes. 2. There is no proof of provenance of the build machine attached to it all. No attestation about machine state. The only thing you know is "signed by GitHub" which doesn't really tell you anything interesting. Given Microsofts track record of security vulnerabilities in Azure regarding cross-tenant contamination I'd like to see a stronger attestation statement here.
  Minimally this provenance stuff should be built on top of some proof of possession mechanism where a challenge is generated and the builder signs the challenge with its machine identity key.
  Then ideally have an attestation statement that shows you the provenance of the entire machine (what packages , dm-verity hashes, whatever) together with that public key.
  Sure it's better than nothing what GitHub has now. And an attack is obvious in hindsight and clearly in the transparency log . But its definitely not a misuse resistent machine identity. We need something better for this IMO.
  - woodruffw 6 months ago
    
    “Misuse resistant” is contextually sensitive: I would argue that a self-expiring identity token and a self-expiring signing key are significantly more misuse-resistant than “traditional” code signing (where you issue a key once, let the revocation process ossify, and never rotate due to insufficient client adoption).
    Is it perfect? No, and your point about “true” provenance is a good one: perfect fidelity in provenance would require hardware attestations and a “whole-machine” picture, which is not something that widely adopted CI/CD setups can easily produce at the moment.
    > Minimally this provenance stuff should be built on top of some proof of possession mechanism where a challenge is generated and the builder signs the challenge with its machine identity key.
    That’s how Fulcio works: your public identity is bound as a result of a proof of possession for an ephemeral key and an identity token.
  - arccy 6 months ago
    
    it's not just a random token "signed by github", a token containing the runtime context it was requested in (repo, branch / commit, build, etc)
    see the fields in https://docs.github.com/en/actions/security-for-github-actio...
    the level of attestation you want (essentially bound to tpms) would probably be very difficult to provide given how all sorts of images run in a typical ci pipeline.
- KennyBlanken 6 months ago
  
  [flagged]
  - woodruffw 6 months ago
    
    Apart from the snark (which is unwarranted), I can't even parse what you're saying.
    (Mentioning sudo in the context of Homebrew suggests that you're one of those incoherent threat model people, so I'm going to assume it's that. So I'll say what Homebrew's maintainers have been saying for years: having a user writable Homebrew prefix is no more or less exploitable in the presence of attacker code execution than literally anything else. The attacker can always modify your shell initialization script, or your local Python bin directory, or anything else.)
    
    saidinesh5 6 months ago
    
    I'm not much of a Mac user but I'm guessing the parent comment wanted readonly user install path so accidental/malicious rm -rf only affects the user data and not the installed programs?
    Nothing to do with exploits as part of homebrew etc .
    
    woodruffw 6 months ago
    
    > I'm not much of a Mac user but I'm guessing the parent comment wanted readonly user install path so accidental/malicious rm -rf only affects the user data and not the installed programs?
    That's one possible interpretation, but it's at odds with how most people use the `$PATH` anyways -- there's lots of local read-writable stuff on it in typical cases. And of course, even without that, persistence is trivial for an attacker with local code execution.
- crabbone 6 months ago
  
  Oh, so this is where the idea of signing Python packages with GitHub Actions comes from...
  From the bottom of my heart, I wish only the worst things for you in your programming career. Yes, I know that it's still possible to publish packages w/o GitHub, but the technical aspect alone will not convince people in organizations which set policies for how to do this stuff. So, the technical possibility alone doesn't matter. Now a bunch of organizations which advertise themselves as free / OSS have to eat MS proprietary garbage and be grateful... thanks to heroes like you.
  - woodruffw 6 months ago
    
    That’s nice. I don’t wish anything bad for you.
    It isn’t any harder to publish without before this feature than it is now. In fact it’s easier, because I implemented generic API tokens for PyPI years before this feature, and the feature itself isn’t even tied to GitHub or any particular OIDC IdP. We just picked GitHub as the first external IdP because it is unambiguously the most popular one in the Python ecosystem.
    
    crabbone 6 months ago
    
    Nobody needed generic API tokens for PyPI either. Not now, not years ago.
    The problem isn't a technical one. The problem is that when a company / government body hears that the "official way to do X is Y", they'll create an internal policy to do Y all the time, regardless of how evil or stupid Y is.
    You just destroyed a public good by making a bunch of organizations require the use of GitHub and other MS trash for no technical reason. You are a lot worse than a real estate developer who paved over a community park to build a parking lot for some mega corp. If people saw you doing this in a movie, it would've been so cartoonishly evil, the audience would think the director jumped a shark, yet, real life appears to be worse than the portrayal of evil in art.
    
    woodruffw 6 months ago
    
    > Nobody needed generic API tokens for PyPI either. Not now, not years ago.
    It was one of the most-requested features on PyPI, but OK.
    The rest of this is just ranting. I think you should develop some more perspective on this, and observe that nobody else is even remotely as bent out of shape as you are over these changes. Which, again, aren't even close to mandatory (and couldn't be even if anybody wanted them to be, which nobody does).
    
    crabbone 6 months ago
    
    Most requested by whom? By PyPA members? -- these are some of the most useless people in the history of Python development. Python developer's mailing list that had people actually suggest anything useful for the future of the language / discuss features disappeared. Today, all decisions about Python are, essentially made behind closed doors by a group of people with no skill or ability, appointed to the job by MS.
    > I think you should develop some more perspective on this
    I think you are willfully ignorant, and are upset that I pointed out to you how bad was the thing that you've done. I really don't need a lecturing from some shallow mind that screwed up a lot of people out of their ignorance and unwillingness to educate themselves.
remram 6 months ago

End-to-end it would require something like a web-of-trust or similar. There is little benefit in knowing that your package was definitely built by GitHub Actions definitely from the code that definitely came from the fingers of the random guy who maintains that particular tool.
Unless you have some trust relationship with the author, or with someone that audited the code, the whole cryptographically-authenticated chain hangs from nothing.
Tools like Crev did a lot of work in that area but it never really took off, people don't want to think about trust: https://github.com/crev-dev/cargo-crev
arccy 6 months ago

yes, i've implemented it in multiple companies. cosign supports using generated keys and kms services, that's been pretty stable and usable for a long time. keyless signing is different and you need to think a bit more carefully about what you're trusting.
- eikenberry 6 months ago
  
  I recently implemented a software updating system using [The Update Framework](https://theupdateframework.io/) directly, with [go-tuf](https://github.com/theupdateframework/go-tuf). It required a lot of design work around how we were going to do package management on top of using it for a secure updating system. This was due to TUF's designing around the capability for existing package management systems to adopt it and integrate it into their system. So TUF is very unopinionated and flexible.
  Given how TUF made it particularly hard to implement a system from scratch... How was your experience using Sigstore? Is it designed more around building systems from scratch? I.E. Is it more opinionated?
  Thanks.
  - arccy 6 months ago
    
    TUF is much more comprehensive than what sigstore/cosign offers. at the core for sigtore/cosign are just the primitives for sign a blob / container and maybe some extra metadata, and for verifying the blob / container / metadata. there are other integrations that will also attach and sign a SBOM etc, but it's not necessary, so you can build something very simple such as: artifacts only signed by this key when it's run via ci on master, and deployments must run using artifacts signed with said key.
    sigstore is also a transparency log which you can check the signature against, but it's not really necessary, good for public things, you probably don't need it for private / company stuff.
- linkregister 6 months ago
  
  I designed a system using Sigstore where the signing key is in a secret store, and the CI shells out to the cosign CLI to perform the signing. Is this an antipattern?
  For verification, did you use the policy controller in kubernetes? Or are you manually performing the verification at runtime?
  - arccy 6 months ago
    
    i used OPA in one org, and kyverno in another for verifying (reused whichever was already in place for other purposes).
    our teams always chose to go with cloud kms services for the signing keys, we thought they offered stronger access controls, and less need to revoke / rotate keys when access changes (team member leaves).
firesteelrain 6 months ago

We use CodeLocker then oPA/Gatekeeper as the Admission Controller. Only signed artifacts are allowed to be executed. This is on AKS with ACR connected.

djhn 6 months ago

Somewhat adjacent question: are there people working on ways to verify that a particular server or API backend are running the specific signed release that is open sourced? Can a company somehow cryptographically prove to its users that the running build is derived from the source unmodified?

kfreds 6 months ago

Yes. My colleagues and I have been working on it (and related concepts) for six years.
glasklarteknik.se
system-transparency.org
sigsum.org
tillitis.se
This presentation explains the idea and lists similar projects.
https://youtu.be/Lo0gxBWwwQE
- endiangroup 6 months ago
  
  [AD] Oh wow! I've not found someone working on the same issue, I wrote a short paper on a method to identify remotely running software [1]. I'd love to get your opinion!
  [1] - https://news.ycombinator.com/item?id=42795419
endiangroup 6 months ago

[AD] I've wrote a short paper on this; Remote Software Identification -- Zero Trust Remote Software Verification [1] after reading a blog post by GUIX - and I've provide PoC code in Go [2]. It utilises a hash chain log server side, of inputs and outputs, every response from server embeds the related hash chain entry, client side can request and replay the log and verify the response hash adds up to the hash they calculated.
[1] - https://gist.github.com/adrianduke/ab40044ccee16804a9d0b2b77... [2] - https://gist.github.com/adrianduke/676ee1ffb88f4489b31aebf5e...
mpysc 6 months ago

You can get most of the way there with something like the SLSA/BCID framework, with the final artifact including some trusted provenance from an attested builder. You could go further and aim for full reproducibility on top of the provenance, but reproducible builds across different environments can get messy fast if you're looking to independently build and achieve the same result. Either way the end result is you have some artifact that you reasonably trust to represent some specific source input (ignoring the potential for backdoored compiler or other malicious intermediate code generation step).
Now for the last mile, I'll admit I'm not particularly well-versed on the confidential compute side of things, so bridging the gap from trusted binary to trusted workload is something I can only speculate wildly on. Assuming you have a confidential compute environment that allows for workload attestation, I imagine that you could deploy this trusted binary and record the appropriate provenance information as part of the initial environment attestation report, then provide that to customers on demand (assuming they trust your attestation service).
- sublimefire 6 months ago
  
  Azure attestation service could be an example of a confidential service where you can find the binary measurement. It is a good example as you need to trust that service to attest your confidential workloads. The problem obvs is that it is not open source but they have some samples to deal with their measurements.
  https://github.com/Azure-Samples/microsoft-azure-attestation...
  ^^^ note the "Looks up for the MAA x509 extension;" part in the readme.
  You can see their attestations in the JWT signing certificates.
- bobbiechen 6 months ago
  
  At ACM SCORED last year, I gave a talk on exactly this topic: https://dl.acm.org/doi/10.1145/3689944.3696350
  Conceptually, you nailed it with the last mile. You want to tie the runtime service's attestation report to the provenance that was generated at build time (and signed by the attested build environment). You can do this by including a copy of it with the service and serving it directly; or the build environment can publish it to a place where clients can later look it up using the Confidential Computing service's attestation measurements.
  SLSA is also integrating Confidential Computing as a way to get stronger guarantees about the build environment: https://slsa.dev/spec/draft/attested-build-env-levels#builde... . There was another talk at ACM SCORED about the tradeoffs of hardware-attested vs. reproducible builds: https://dl.acm.org/doi/10.1145/3689944.3696351
cperciva 6 months ago

You can do this with e.g. EC2 enclaves. Of course that's kind of begging the question, since you need to trust the enclaves.
formerly_proven 6 months ago

That's what remote attestation in Intel SGX does. There's similar features in other platforms as well.
- sublimefire 6 months ago
  
  Yes Intel SGX gives you the enclave measurement which you could reproduce (if source code code is reproducible), the measurement can be verified against Intel keys to prove it came from that hardware. Similarly AMD SEV-SNP gives you that, it is preferred to SGX due to the ability to run VMs as opposed to smaller applications.
  AWS has their firmware image in OSS to be able to reproduce it and then compare the measurements in their Nitro instances: https://github.com/aws/uefi
  Azure has confidential compute offerings as well, their attestation, mhsm, ledger services rely on it.
  But it is easy to talk about confidential compute and the link between the measurement and the source code. Such link does not exist in regular non-encrypted services and you basically need to trust the service provider about the proofs they give you.
- udev4096 6 months ago
  
  https://xcancel.com/_markel___/status/1828112469010596347#m
  https://sgx.fail
Joel_Mckay 6 months ago

Detecting physical ingress in a co-location server is not uncommon after contacting political representatives in some countries. It is wise to have password protected SSL certs as the bare minimum non-resettable tripwire, close monitoring of the HDD/SSD drives s.m.a.r.t. firmware power-cycle counter, and of course an encrypted partition for logs and other mutable/sensitive content. Note for performance, a "sudo debsums -sac" command along with other tripwire software can audit unencrypted system binaries efficiently. Most modern ephemeral malware (on android especially) is not written to disk to avoid forensic audits assigning accountability, as the chance of re-infection is higher if you hide the exploit methodology.
Folks should operate like they already have someone with a leaked instance of their key files. In general, a offline key-self-signing authority issuing client/peer certs is also important, as on rare occasion one can't trust 3rd parties not to re-issue certs for Google/Facebook/Github etc. to jack users.
Eventually one should localize your database design to specific users, and embed user action telemetry into a design. i.e. damage or hostile activity is inherently limited to a specific users content, sanity checking quota systems limit the damage they can cause, and windowed data-lifecycle limits the credentials to read-only or does garbage collection after some time.
In general, the rabbitMQ AMQP over SSL client signed cert credential system has proven rather reliable. Erlang/Elixir is far from perfect, but it can be made fairly robust with firewall rules.
Good luck, YMMV of course... =3
captn3m0 6 months ago

In addition to the enclave routes, I have a proposal to build this with AWS Lambda as a poor man’s attestation: https://github.com/captn3m0/ideas?tab=readme-ov-file#verifia...
shortsunblack 6 months ago

See Keylime for this.

rough-sea 6 months ago

JSR supports sigstore https://jsr.io/docs/trust

croes 6 months ago

Does this help when a project change ownership or in cases like the xz backdoor?

sublimefire 6 months ago

Transparency does not prevent it but rather adds an additional anchor to make it harder to spoof packages/binaries and detracts from doing it because it will be publicly logged. Somebody still needs to verify if all is good, e.g. if PR in the official repo adds some malicious code (think xz) then it might get published and logged in this transparency log system.
blueflow 6 months ago

No. Malicious upstreams will have their software properly signed as theirs.

eadmund 6 months ago

It sounds neat, but I am uncomfortable with a central CA (Fulcio) and central log (Rekor). And I trust OIDC providers about as far as I can throw them. Granted, the whole point of a central audit log is to make misbehaviour apparent, but it still strikes me as the wrong direction.

I don’t have a useful proposal for a decentralised version, so I’m just kvetching at this point.

Also, neither X.509 nor JSON is great. We can do better. We should do better.

kfreds 6 months ago

Check out sigsum.org for a simpler design with a stronger threat model.
As for building a decentralized append-only log, that would complicate the design and the threat model quite a bit. In particular it would make proofs of inclusion and consistency much less efficient.
tuananh 6 months ago

you can deploy your own fulcio & rekor.

udev4096 6 months ago

It seems really difficult to actually use it. For instance, a standard linux distro probably has thousands of packages, components, etc. How can you verify all of them? Even if you can, does it defend against attacks like xz where the trusted source itself is compromised

eptcyka 6 months ago

It protects against someone making rogue builds - it should be obvious when a build is made using valid keys. So if you steal my keys, you won’t be able to covertly make a build and get one user of mine to trust it without making publishing the build. If you publish it, everyone knows, and can try and see where it came fron. Prevent against another xz it will not, but it can help against directed attacks.
arccy 6 months ago

it's not much more difficult (maybe even easier) than the gpg signing / checking that distros generally like to do.
with gpg, you get a root set of public keys that you want to trust. with sigstore, depending on the signing method, you either trust public keys, or identities (some oauth2 identity provider, like email, or your ci system).

hulitu 6 months ago

> Sigstore: Making sure your software is what it claims to be

> sign. verify. protect. Making sure your software is what it claims to be.

I guess you never used Windows. No amount of signing will prevent it from phoning home. Spyware at its finest.