-
Notifications
You must be signed in to change notification settings - Fork 428
CI: Perform a best-effort scan of the debug artifacts during release validation #3374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a best-effort scan of debug artifacts to detect potential GitHub tokens before they are uploaded. The scan runs only in test mode and is designed to catch sensitive data leakage early during release validation, though it is explicitly not reliable enough for production use.
Key Changes:
- New
artifact-scanner.tsmodule that recursively scans files and archives for GitHub token patterns - Integration with debug artifact upload to scan artifacts when in test mode
- Testing utilities updated to support optional console logging
- GitHub workflow steps added to verify scan completion
- Custom action to validate that the scan ran successfully
Reviewed changes
Copilot reviewed 10 out of 14 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
src/artifact-scanner.ts |
New module implementing token pattern scanning with recursive archive extraction |
src/debug-artifacts.ts |
Integrates artifact scanner before upload when in test mode |
src/testing-utils.ts |
Adds optional logToConsole parameter to getRecordingLogger |
src/artifact-scanner.test.ts |
Unit tests for the artifact scanner functionality |
src/util.ts |
Adds isInTestMode() helper function |
.github/workflows/*.yml |
Adds verification steps to test workflows |
.github/actions/verify-debug-artifact-scan-completed/* |
Custom action to verify scan completion |
lib/*.js |
Generated JavaScript from TypeScript compilation |
esbena
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have skimmed the code. I don't think it warrants a deep review given its best-effort intentions.
I hoped more of this could have been offloaded to a library.
I worry a bit about the extract/scan/.../delete sequence and limited disk space, but since we are the creators of the asset, we are probably fine in practice. A streaming scan would be preferable.
Hacky solution proposal: pipe all assets through <source> | tar x...... | grep <patterns>. But if we already have something that works, then I can approve without it.
The changes to the generated files seem benign, but they are unexpected. Is that the norm due to non-determinism these days?
|
|
||
| // Check if it's an archive file and recursively scan it | ||
| const fileName = path.basename(fullPath).toLowerCase(); | ||
| const isArchive = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have you looked at all extensions in our assets in practice?
Haven't we recently played around with something like .xz for instance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
find <debug artifacts> -type f | sed 's/.*\.//' | sort | uniq -c | sort -rn
3284 gz
2872 dep
222 js
111 log
52 h
49 yml
39 json
9 qls
8 zst
8 zip
4 trap
4 set
4 odep
3 properties
3 orig
3 env
3 args
2 yaml
2 s
2 rb
2 py
2 manifest
2 list
2 link
2 kt
2 jsonl
2 java
2 go
2 codeql-lock
2 c
I pushed a commit to extract zstd archives too.
For extracting archives, I wanted to use system utilities since this is only going to be run in CI. I'm open to suggestions if you have a library in mind to do the job here, but I think the implementation is fairly slim.
If we start seeing failures, it would probably be a good idea to convert to a streaming scan, or ignore things like the source archive. I think since this is CI-only it doesn't necessarily make sense front-loading that work.
There are some features like tracking the file in which the token was found, what type of token it was, and not relying on having bsdtar to extract zips. Once we add these, I think we're at the point where I'd rather maintain a JS program than a shell script :)
Unfortunately we are bundling this artifact scanner in order to run it before we upload debug artifacts in CI. It's behind the |
c2d74f5 to
1b34478
Compare
1b34478 to
ac6c41b
Compare
Produce a best effort warning if sensitive environment variables are detected in debug artifacts. This is not reliable enough for production but the intent is that the test will run as part of release validation when the bundle update PR is opened.
Risk assessment
For internal use only. Please select the risk level of this change:
Which use cases does this change impact?
Workflow types: n/a
Products: n/a
Environments:
How did/will you validate this change?
.test.tsfiles).pr-checks).If something goes wrong after this change is released, what are the mitigation and rollback strategies?
How will you know if something goes wrong after this change is released?
The CI check should fail if the artifact scan starts failing.
Are there any special considerations for merging or releasing this change?
Merge / deployment checklist