CI: Perform a best-effort scan of the debug artifacts during release validation #3374

henrymercer · 2025-12-17T11:44:42Z

Produce a best effort warning if sensitive environment variables are detected in debug artifacts. This is not reliable enough for production but the intent is that the test will run as part of release validation when the bundle update PR is opened.

Risk assessment

For internal use only. Please select the risk level of this change:

Low risk: Changes are fully under feature flags, or have been fully tested and validated in pre-production environments and are highly observable, or are documentation or test only.

Which use cases does this change impact?

Workflow types: n/a
Products: n/a

Environments:

Testing/None - This change does not impact any CodeQL workflows in production.

How did/will you validate this change?

Unit tests - I am depending on unit test coverage (i.e. tests in .test.ts files).
End-to-end tests - I am depending on PR checks (i.e. tests in pr-checks).

If something goes wrong after this change is released, what are the mitigation and rollback strategies?

Rollback - Change can only be disabled by rolling back the release or releasing a new version with a fix.

How will you know if something goes wrong after this change is released?

The CI check should fail if the artifact scan starts failing.

Are there any special considerations for merging or releasing this change?

No special considerations - This change can be merged at any time.

Merge / deployment checklist

Confirm this change is backwards compatible with existing workflows.
Consider adding a changelog entry for this change.
Confirm the readme and docs have been updated if necessary.

Copilot

Pull request overview

This PR adds a best-effort scan of debug artifacts to detect potential GitHub tokens before they are uploaded. The scan runs only in test mode and is designed to catch sensitive data leakage early during release validation, though it is explicitly not reliable enough for production use.

Key Changes:

New artifact-scanner.ts module that recursively scans files and archives for GitHub token patterns
Integration with debug artifact upload to scan artifacts when in test mode
Testing utilities updated to support optional console logging
GitHub workflow steps added to verify scan completion
Custom action to validate that the scan ran successfully

Reviewed changes

Copilot reviewed 10 out of 14 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
`src/artifact-scanner.ts`	New module implementing token pattern scanning with recursive archive extraction
`src/debug-artifacts.ts`	Integrates artifact scanner before upload when in test mode
`src/testing-utils.ts`	Adds optional `logToConsole` parameter to `getRecordingLogger`
`src/artifact-scanner.test.ts`	Unit tests for the artifact scanner functionality
`src/util.ts`	Adds `isInTestMode()` helper function
`.github/workflows/*.yml`	Adds verification steps to test workflows
`.github/actions/verify-debug-artifact-scan-completed/*`	Custom action to verify scan completion
`lib/*.js`	Generated JavaScript from TypeScript compilation

esbena

I have skimmed the code. I don't think it warrants a deep review given its best-effort intentions.

I hoped more of this could have been offloaded to a library.

I worry a bit about the extract/scan/.../delete sequence and limited disk space, but since we are the creators of the asset, we are probably fine in practice. A streaming scan would be preferable.

Hacky solution proposal: pipe all assets through <source> | tar x...... | grep <patterns>. But if we already have something that works, then I can approve without it.

The changes to the generated files seem benign, but they are unexpected. Is that the norm due to non-determinism these days?

esbena · 2025-12-17T13:04:27Z

src/artifact-scanner.ts

+
+  // Check if it's an archive file and recursively scan it
+  const fileName = path.basename(fullPath).toLowerCase();
+  const isArchive =


have you looked at all extensions in our assets in practice?
Haven't we recently played around with something like .xz for instance?

find <debug artifacts> -type f | sed 's/.*\.//' | sort | uniq -c | sort -rn 3284 gz 2872 dep 222 js 111 log 52 h 49 yml 39 json 9 qls 8 zst 8 zip 4 trap 4 set 4 odep 3 properties 3 orig 3 env 3 args 2 yaml 2 s 2 rb 2 py 2 manifest 2 list 2 link 2 kt 2 jsonl 2 java 2 go 2 codeql-lock 2 c

I pushed a commit to extract zstd archives too.

henrymercer · 2025-12-17T15:19:40Z

I hoped more of this could have been offloaded to a library.

For extracting archives, I wanted to use system utilities since this is only going to be run in CI. I'm open to suggestions if you have a library in mind to do the job here, but I think the implementation is fairly slim.

I worry a bit about the extract/scan/.../delete sequence and limited disk space, but since we are the creators of the asset, we are probably fine in practice. A streaming scan would be preferable.

If we start seeing failures, it would probably be a good idea to convert to a streaming scan, or ignore things like the source archive. I think since this is CI-only it doesn't necessarily make sense front-loading that work.

Hacky solution proposal: pipe all assets through <source> | tar x...... | grep <patterns>. But if we already have something that works, then I can approve without it.

There are some features like tracking the file in which the token was found, what type of token it was, and not relying on having bsdtar to extract zips. Once we add these, I think we're at the point where I'd rather maintain a JS program than a shell script :)

The changes to the generated files seem benign, but they are unexpected. Is that the norm due to non-determinism these days?

Unfortunately we are bundling this artifact scanner in order to run it before we upload debug artifacts in CI. It's behind the isInTestMode() check, which requires a specific environment variable to be set, but it is being included in the production artifacts.

henrymercer added 11 commits December 17, 2025 10:25

Add artifact scanner

0c8bfea

Add simple artifact scanner for tests only

5459b98

Use artifact scanner in debug artifacts PR checks

f28848a

Ensure .gz files are extracted too

f2ccf3b

Add regression test for artifact scanner

488c1f1

Slim down test debug artifacts

de17262

Suppress debug logs for artifact scanner test

da77f9f

Avoid logging each extract call

3b94cfe

Verify using post step

faf6d35

Merge branch 'main' into henrymercer/scan-debug-artifacts

6bc6217

Bump timeout on Windows

3322491

Copilot AI review requested due to automatic review settings December 17, 2025 11:44

henrymercer requested a review from a team as a code owner December 17, 2025 11:44

github-actions bot added the size/L May be hard to review label Dec 17, 2025

Copilot started reviewing on behalf of henrymercer December 17, 2025 11:45 View session

Skip slow test on Windows

6b5763e

Copilot AI reviewed Dec 17, 2025

View reviewed changes

Run testing Action using Node 24

7673a2d

esbena reviewed Dec 17, 2025

View reviewed changes

henrymercer force-pushed the henrymercer/scan-debug-artifacts branch from c2d74f5 to 1b34478 Compare December 17, 2025 15:21

Extract zstd files too

ac6c41b

henrymercer force-pushed the henrymercer/scan-debug-artifacts branch from 1b34478 to ac6c41b Compare December 17, 2025 15:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CI: Perform a best-effort scan of the debug artifacts during release validation #3374

CI: Perform a best-effort scan of the debug artifacts during release validation #3374

Uh oh!

henrymercer commented Dec 17, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

esbena left a comment

Uh oh!

esbena Dec 17, 2025

Uh oh!

henrymercer Dec 17, 2025

Uh oh!

henrymercer commented Dec 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CI: Perform a best-effort scan of the debug artifacts during release validation #3374

Are you sure you want to change the base?

CI: Perform a best-effort scan of the debug artifacts during release validation #3374

Uh oh!

Conversation

henrymercer commented Dec 17, 2025

Risk assessment

Which use cases does this change impact?

How did/will you validate this change?

If something goes wrong after this change is released, what are the mitigation and rollback strategies?

How will you know if something goes wrong after this change is released?

Are there any special considerations for merging or releasing this change?

Merge / deployment checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes:

Reviewed changes

Uh oh!

esbena left a comment

Choose a reason for hiding this comment

Uh oh!

esbena Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

henrymercer Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

henrymercer commented Dec 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants