From b12a1e99c895a7de01d26b50960eac6e01c14fde Mon Sep 17 00:00:00 2001 From: Chad Bentz <1760475+felickz@users.noreply.github.com> Date: Fri, 12 Dec 2025 17:26:12 -0500 Subject: [PATCH 1/7] Create CodeQL performance review prompt Added a prompt for reviewing CodeQL log output focusing on performance issues, including key aspects to analyze and hardware recommendations. --- ...ql_action_log_performance_review.prompt.md | 58 +++++++++++++++++++ 1 file changed, 58 insertions(+) create mode 100644 .github/prompts/codeql_action_log_performance_review.prompt.md diff --git a/.github/prompts/codeql_action_log_performance_review.prompt.md b/.github/prompts/codeql_action_log_performance_review.prompt.md new file mode 100644 index 0000000..c3c8a6e --- /dev/null +++ b/.github/prompts/codeql_action_log_performance_review.prompt.md @@ -0,0 +1,58 @@ +--- +mode: agent +--- +You are reviewing CodeQL log output for performance issues. + +It is critical that you understand key aspects of CodeQL log output that can flag performance issues. Understanding the language being scanned is critical to the performance review process. You should be able to identify the language being scanned, the number of files in the database, and the number of lines of code in the database. You should also be able to identify the time taken to extract the code from the database, build the database, and analyze the code. + +In general, look for the following key aspects in the log output: +- The time taken to extract the code from into the CodeQL database insert format (`Extracting ..` and `Done extracting ..` will be logged for each file) +- The time taken to create/optimize the database indicates size/complexity (`TRAP import`) +- The time taken to analyze the code (each query: `[##/## eval ###ms] Evaluation done; writing results to... `) +- The number of files in the database vs the number of files in the baseline (`CodeQL scanned <# in DB> out of <# in baseline > files ... in this invocation`) +- The number of lines of code in the database (if in debug mode `Total lines of user written code in the database`) + +## Agent AI Instructions + +These log files will be huge, instead of reading them line by line - run grep style commands in the cli to investigate the file. + + +## Review Areas + +### Excluding Code + +This is one of the most important aspects of CodeQL performance. Excluding a file from analysis will speed up extraction, database creation, query execution, and result generation. We would expect to see some number of files excluded from the scan. Scanning unit tests or vendored dependencies is often not useful, and can slow down the scan. Any interpreted language or compiled that utilizes `build-mode: none` can take advantage of a `paths-ignore` array in a CodeQL configuration file. + +To analyze this aspect, look for the following key aspects in the log output: +- In general, the number of files in the database vs the number of files in the baseline should not match (`CodeQL scanned <# in DB> out of <# in baseline > files ... in this invocation`) - this indicates no exclusions were made. +- Extractor output `Done extracting /home/runner/work///src/public/static/3rd-party-static/.js (11164 ms)` + - Identifying common 3rd party libraries by name and version can be a good indicator of files that should be excluded from the scan. For example `jquery.3.5.1.js` or `react.16.8.6.js`. These are commonly in a parent folder that indicates all files contained are vendored and should be completely excluded from the scan using a `paths-ignore` array entry in the `codeql.yml` file. For example, `paths-ignore: [ '**/public/static/3rd-party-static/**' ]`. + - Call out any timings > 1000ms for extraction `(11164 ms)` - often times this indicates a large bundled JS file (and other files int he same folder are often Generated or vendored). + + +See also: https://docs.github.com/en/code-security/code-scanning/troubleshooting-code-scanning/analysis-takes-too-long#reduce-the-amount-of-code-being-analyzed-in-a-single-workflow + +### Hardware Recommendations + +The default GitHub runner is 8GB of RAM and 2 CPUs. This is often not enough power for extracting code from large repos or scanning through complex databases. A RAM ~7GB `CODEQL_RAM: 6914` and 2 cores `CODEQL_THREADS: 2` will likely indicate this is running on the default runner. + +The recommended hardware sizes for running CodeQL are based off of lines of code: +- Small (<100 K lines of code) = 8 GB or higher 2 cores +- Medium (100 K to 1 M lines of code) = 16 GB or higher 4 or 8 cores +- Large (>1 M lines of code) = 64 GB or higher 8 cores + +See also: https://docs.github.com/en/code-security/code-scanning/troubleshooting-code-scanning/analysis-takes-too-long#increase-the-memory-or-cores + +`Compiling in one thread due to RAM limits.` is an indication that there is limited RAM available. This is not often critical as the CodeQL bundle is used that includes precompiled queries. + +### Breaking apart monorepos + +CodeQL can detect data flows through the code but once it reaches a process boundary the flow is stopped. This creates a natural separation point for CodeQL scans based on data flows. Creating a CodeQL scan configuration that separates applications by front end (ex: Web.sln) and back end code(ex: API.sln) that are separated by process/network boundaries would be optimial for performance. This would allow for a smaller database to be created and analyzed. The time taken to extract the code from the database, build the database, and analyze the code would all be reduced. This would further enable a decrease in wall-clock scan time by using parallel per-solution scans using an Actions matrix strategy (such that each gets its own runtime and resources). It will be important to include your common framework code in each solution so that you get a successful compilation while you further analyze other ways to share code. + +Consider utilizing the https://github.com/advanced-security/monorepo-code-scanning-action that builds scan filters based on the monorepo structure as defined in a `projects.json` to describe the monorepo project structure. Further this will optimize scanning by detectiong which projects have changed on a PR and only scanning those projects. Each project will be analyzed in parallel and the results will be combined into a single report. This will further reduce the time taken to scan the monorepo. + + +To find this scenario - review the extractor logs and identify common project structures that might indicate indivdidual applications that would not have any cross method calls OR data flows. Commonly applications will be organized by various techniques - if any of these appear like good candidates for separation, please call them out: +- monorepo structure (ex: `apps/` or `services/`) +- front end web/api / middle tier api / back end data access +- common project structures (ex: `src/` or `lib/` or `framework/` or `common/`) From 618198c86c951018502422b28cae39c5547b29fc Mon Sep 17 00:00:00 2001 From: Chad Bentz <1760475+felickz@users.noreply.github.com> Date: Fri, 12 Dec 2025 17:29:10 -0500 Subject: [PATCH 2/7] Update .github/prompts/codeql_action_log_performance_review.prompt.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- .github/prompts/codeql_action_log_performance_review.prompt.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/prompts/codeql_action_log_performance_review.prompt.md b/.github/prompts/codeql_action_log_performance_review.prompt.md index c3c8a6e..1115cb6 100644 --- a/.github/prompts/codeql_action_log_performance_review.prompt.md +++ b/.github/prompts/codeql_action_log_performance_review.prompt.md @@ -27,7 +27,7 @@ To analyze this aspect, look for the following key aspects in the log output: - In general, the number of files in the database vs the number of files in the baseline should not match (`CodeQL scanned <# in DB> out of <# in baseline > files ... in this invocation`) - this indicates no exclusions were made. - Extractor output `Done extracting /home/runner/work///src/public/static/3rd-party-static/.js (11164 ms)` - Identifying common 3rd party libraries by name and version can be a good indicator of files that should be excluded from the scan. For example `jquery.3.5.1.js` or `react.16.8.6.js`. These are commonly in a parent folder that indicates all files contained are vendored and should be completely excluded from the scan using a `paths-ignore` array entry in the `codeql.yml` file. For example, `paths-ignore: [ '**/public/static/3rd-party-static/**' ]`. - - Call out any timings > 1000ms for extraction `(11164 ms)` - often times this indicates a large bundled JS file (and other files int he same folder are often Generated or vendored). + - Call out any timings > 1000ms for extraction `(11164 ms)` - often times this indicates a large bundled JS file (and other files in the same folder are often Generated or vendored). See also: https://docs.github.com/en/code-security/code-scanning/troubleshooting-code-scanning/analysis-takes-too-long#reduce-the-amount-of-code-being-analyzed-in-a-single-workflow From 9af314f5b6dcca43474e19d5735a0ba88e10c77f Mon Sep 17 00:00:00 2001 From: Chad Bentz <1760475+felickz@users.noreply.github.com> Date: Fri, 12 Dec 2025 17:29:19 -0500 Subject: [PATCH 3/7] Update .github/prompts/codeql_action_log_performance_review.prompt.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- .github/prompts/codeql_action_log_performance_review.prompt.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/prompts/codeql_action_log_performance_review.prompt.md b/.github/prompts/codeql_action_log_performance_review.prompt.md index 1115cb6..ebcdd30 100644 --- a/.github/prompts/codeql_action_log_performance_review.prompt.md +++ b/.github/prompts/codeql_action_log_performance_review.prompt.md @@ -47,7 +47,7 @@ See also: https://docs.github.com/en/code-security/code-scanning/troubleshooting ### Breaking apart monorepos -CodeQL can detect data flows through the code but once it reaches a process boundary the flow is stopped. This creates a natural separation point for CodeQL scans based on data flows. Creating a CodeQL scan configuration that separates applications by front end (ex: Web.sln) and back end code(ex: API.sln) that are separated by process/network boundaries would be optimial for performance. This would allow for a smaller database to be created and analyzed. The time taken to extract the code from the database, build the database, and analyze the code would all be reduced. This would further enable a decrease in wall-clock scan time by using parallel per-solution scans using an Actions matrix strategy (such that each gets its own runtime and resources). It will be important to include your common framework code in each solution so that you get a successful compilation while you further analyze other ways to share code. +CodeQL can detect data flows through the code but once it reaches a process boundary the flow is stopped. This creates a natural separation point for CodeQL scans based on data flows. Creating a CodeQL scan configuration that separates applications by front end (ex: Web.sln) and back end code(ex: API.sln) that are separated by process/network boundaries would be optimal for performance. This would allow for a smaller database to be created and analyzed. The time taken to extract the code from the database, build the database, and analyze the code would all be reduced. This would further enable a decrease in wall-clock scan time by using parallel per-solution scans using an Actions matrix strategy (such that each gets its own runtime and resources). It will be important to include your common framework code in each solution so that you get a successful compilation while you further analyze other ways to share code. Consider utilizing the https://github.com/advanced-security/monorepo-code-scanning-action that builds scan filters based on the monorepo structure as defined in a `projects.json` to describe the monorepo project structure. Further this will optimize scanning by detectiong which projects have changed on a PR and only scanning those projects. Each project will be analyzed in parallel and the results will be combined into a single report. This will further reduce the time taken to scan the monorepo. From 049d9814e95ec58cf47e5c603c1f48b09e2495bd Mon Sep 17 00:00:00 2001 From: Chad Bentz <1760475+felickz@users.noreply.github.com> Date: Fri, 12 Dec 2025 17:29:30 -0500 Subject: [PATCH 4/7] Update .github/prompts/codeql_action_log_performance_review.prompt.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- .github/prompts/codeql_action_log_performance_review.prompt.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/prompts/codeql_action_log_performance_review.prompt.md b/.github/prompts/codeql_action_log_performance_review.prompt.md index ebcdd30..4261c48 100644 --- a/.github/prompts/codeql_action_log_performance_review.prompt.md +++ b/.github/prompts/codeql_action_log_performance_review.prompt.md @@ -52,7 +52,7 @@ CodeQL can detect data flows through the code but once it reaches a process boun Consider utilizing the https://github.com/advanced-security/monorepo-code-scanning-action that builds scan filters based on the monorepo structure as defined in a `projects.json` to describe the monorepo project structure. Further this will optimize scanning by detectiong which projects have changed on a PR and only scanning those projects. Each project will be analyzed in parallel and the results will be combined into a single report. This will further reduce the time taken to scan the monorepo. -To find this scenario - review the extractor logs and identify common project structures that might indicate indivdidual applications that would not have any cross method calls OR data flows. Commonly applications will be organized by various techniques - if any of these appear like good candidates for separation, please call them out: +To find this scenario - review the extractor logs and identify common project structures that might indicate individual applications that would not have any cross method calls OR data flows. Commonly applications will be organized by various techniques - if any of these appear like good candidates for separation, please call them out: - monorepo structure (ex: `apps/` or `services/`) - front end web/api / middle tier api / back end data access - common project structures (ex: `src/` or `lib/` or `framework/` or `common/`) From 9deb232e87609fbef45025b769a10787a40164e2 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Fri, 12 Dec 2025 22:33:01 +0000 Subject: [PATCH 5/7] Initial plan From 3d445a55d1a3de567d941f26aebad398cdcb4602 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Fri, 12 Dec 2025 22:39:53 +0000 Subject: [PATCH 6/7] Fix prettier formatting issues in codeql_action_log_performance_review.prompt.md Co-authored-by: felickz <1760475+felickz@users.noreply.github.com> --- ...ql_action_log_performance_review.prompt.md | 28 ++++++++++--------- 1 file changed, 15 insertions(+), 13 deletions(-) diff --git a/.github/prompts/codeql_action_log_performance_review.prompt.md b/.github/prompts/codeql_action_log_performance_review.prompt.md index 4261c48..110c4fc 100644 --- a/.github/prompts/codeql_action_log_performance_review.prompt.md +++ b/.github/prompts/codeql_action_log_performance_review.prompt.md @@ -1,11 +1,13 @@ --- mode: agent --- + You are reviewing CodeQL log output for performance issues. -It is critical that you understand key aspects of CodeQL log output that can flag performance issues. Understanding the language being scanned is critical to the performance review process. You should be able to identify the language being scanned, the number of files in the database, and the number of lines of code in the database. You should also be able to identify the time taken to extract the code from the database, build the database, and analyze the code. +It is critical that you understand key aspects of CodeQL log output that can flag performance issues. Understanding the language being scanned is critical to the performance review process. You should be able to identify the language being scanned, the number of files in the database, and the number of lines of code in the database. You should also be able to identify the time taken to extract the code from the database, build the database, and analyze the code. In general, look for the following key aspects in the log output: + - The time taken to extract the code from into the CodeQL database insert format (`Extracting ..` and `Done extracting ..` will be logged for each file) - The time taken to create/optimize the database indicates size/complexity (`TRAP import`) - The time taken to analyze the code (each query: `[##/## eval ###ms] Evaluation done; writing results to... `) @@ -16,43 +18,43 @@ In general, look for the following key aspects in the log output: These log files will be huge, instead of reading them line by line - run grep style commands in the cli to investigate the file. - ## Review Areas ### Excluding Code -This is one of the most important aspects of CodeQL performance. Excluding a file from analysis will speed up extraction, database creation, query execution, and result generation. We would expect to see some number of files excluded from the scan. Scanning unit tests or vendored dependencies is often not useful, and can slow down the scan. Any interpreted language or compiled that utilizes `build-mode: none` can take advantage of a `paths-ignore` array in a CodeQL configuration file. +This is one of the most important aspects of CodeQL performance. Excluding a file from analysis will speed up extraction, database creation, query execution, and result generation. We would expect to see some number of files excluded from the scan. Scanning unit tests or vendored dependencies is often not useful, and can slow down the scan. Any interpreted language or compiled that utilizes `build-mode: none` can take advantage of a `paths-ignore` array in a CodeQL configuration file. To analyze this aspect, look for the following key aspects in the log output: + - In general, the number of files in the database vs the number of files in the baseline should not match (`CodeQL scanned <# in DB> out of <# in baseline > files ... in this invocation`) - this indicates no exclusions were made. - Extractor output `Done extracting /home/runner/work///src/public/static/3rd-party-static/.js (11164 ms)` - - Identifying common 3rd party libraries by name and version can be a good indicator of files that should be excluded from the scan. For example `jquery.3.5.1.js` or `react.16.8.6.js`. These are commonly in a parent folder that indicates all files contained are vendored and should be completely excluded from the scan using a `paths-ignore` array entry in the `codeql.yml` file. For example, `paths-ignore: [ '**/public/static/3rd-party-static/**' ]`. + - Identifying common 3rd party libraries by name and version can be a good indicator of files that should be excluded from the scan. For example `jquery.3.5.1.js` or `react.16.8.6.js`. These are commonly in a parent folder that indicates all files contained are vendored and should be completely excluded from the scan using a `paths-ignore` array entry in the `codeql.yml` file. For example, `paths-ignore: [ '**/public/static/3rd-party-static/**' ]`. - Call out any timings > 1000ms for extraction `(11164 ms)` - often times this indicates a large bundled JS file (and other files in the same folder are often Generated or vendored). - See also: https://docs.github.com/en/code-security/code-scanning/troubleshooting-code-scanning/analysis-takes-too-long#reduce-the-amount-of-code-being-analyzed-in-a-single-workflow ### Hardware Recommendations -The default GitHub runner is 8GB of RAM and 2 CPUs. This is often not enough power for extracting code from large repos or scanning through complex databases. A RAM ~7GB `CODEQL_RAM: 6914` and 2 cores `CODEQL_THREADS: 2` will likely indicate this is running on the default runner. +The default GitHub runner is 8GB of RAM and 2 CPUs. This is often not enough power for extracting code from large repos or scanning through complex databases. A RAM ~7GB `CODEQL_RAM: 6914` and 2 cores `CODEQL_THREADS: 2` will likely indicate this is running on the default runner. The recommended hardware sizes for running CodeQL are based off of lines of code: -- Small (<100 K lines of code) = 8 GB or higher 2 cores -- Medium (100 K to 1 M lines of code) = 16 GB or higher 4 or 8 cores -- Large (>1 M lines of code) = 64 GB or higher 8 cores + +- Small (<100 K lines of code) = 8 GB or higher 2 cores +- Medium (100 K to 1 M lines of code) = 16 GB or higher 4 or 8 cores +- Large (>1 M lines of code) = 64 GB or higher 8 cores See also: https://docs.github.com/en/code-security/code-scanning/troubleshooting-code-scanning/analysis-takes-too-long#increase-the-memory-or-cores -`Compiling in one thread due to RAM limits.` is an indication that there is limited RAM available. This is not often critical as the CodeQL bundle is used that includes precompiled queries. +`Compiling in one thread due to RAM limits.` is an indication that there is limited RAM available. This is not often critical as the CodeQL bundle is used that includes precompiled queries. ### Breaking apart monorepos -CodeQL can detect data flows through the code but once it reaches a process boundary the flow is stopped. This creates a natural separation point for CodeQL scans based on data flows. Creating a CodeQL scan configuration that separates applications by front end (ex: Web.sln) and back end code(ex: API.sln) that are separated by process/network boundaries would be optimal for performance. This would allow for a smaller database to be created and analyzed. The time taken to extract the code from the database, build the database, and analyze the code would all be reduced. This would further enable a decrease in wall-clock scan time by using parallel per-solution scans using an Actions matrix strategy (such that each gets its own runtime and resources). It will be important to include your common framework code in each solution so that you get a successful compilation while you further analyze other ways to share code. - -Consider utilizing the https://github.com/advanced-security/monorepo-code-scanning-action that builds scan filters based on the monorepo structure as defined in a `projects.json` to describe the monorepo project structure. Further this will optimize scanning by detectiong which projects have changed on a PR and only scanning those projects. Each project will be analyzed in parallel and the results will be combined into a single report. This will further reduce the time taken to scan the monorepo. +CodeQL can detect data flows through the code but once it reaches a process boundary the flow is stopped. This creates a natural separation point for CodeQL scans based on data flows. Creating a CodeQL scan configuration that separates applications by front end (ex: Web.sln) and back end code(ex: API.sln) that are separated by process/network boundaries would be optimal for performance. This would allow for a smaller database to be created and analyzed. The time taken to extract the code from the database, build the database, and analyze the code would all be reduced. This would further enable a decrease in wall-clock scan time by using parallel per-solution scans using an Actions matrix strategy (such that each gets its own runtime and resources). It will be important to include your common framework code in each solution so that you get a successful compilation while you further analyze other ways to share code. +Consider utilizing the https://github.com/advanced-security/monorepo-code-scanning-action that builds scan filters based on the monorepo structure as defined in a `projects.json` to describe the monorepo project structure. Further this will optimize scanning by detectiong which projects have changed on a PR and only scanning those projects. Each project will be analyzed in parallel and the results will be combined into a single report. This will further reduce the time taken to scan the monorepo. To find this scenario - review the extractor logs and identify common project structures that might indicate individual applications that would not have any cross method calls OR data flows. Commonly applications will be organized by various techniques - if any of these appear like good candidates for separation, please call them out: + - monorepo structure (ex: `apps/` or `services/`) - front end web/api / middle tier api / back end data access - common project structures (ex: `src/` or `lib/` or `framework/` or `common/`) From a6b80a073722fb67dd65fb7483916d990313e5eb Mon Sep 17 00:00:00 2001 From: Chad Bentz <1760475+felickz@users.noreply.github.com> Date: Sat, 13 Dec 2025 22:35:27 -0500 Subject: [PATCH 7/7] Update .github/prompts/codeql_action_log_performance_review.prompt.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- .github/prompts/codeql_action_log_performance_review.prompt.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/prompts/codeql_action_log_performance_review.prompt.md b/.github/prompts/codeql_action_log_performance_review.prompt.md index 110c4fc..ee41dfe 100644 --- a/.github/prompts/codeql_action_log_performance_review.prompt.md +++ b/.github/prompts/codeql_action_log_performance_review.prompt.md @@ -8,7 +8,7 @@ It is critical that you understand key aspects of CodeQL log output that can fla In general, look for the following key aspects in the log output: -- The time taken to extract the code from into the CodeQL database insert format (`Extracting ..` and `Done extracting ..` will be logged for each file) +- The time taken to extract the code into the CodeQL database insert format (`Extracting ..` and `Done extracting ..` will be logged for each file) - The time taken to create/optimize the database indicates size/complexity (`TRAP import`) - The time taken to analyze the code (each query: `[##/## eval ###ms] Evaluation done; writing results to... `) - The number of files in the database vs the number of files in the baseline (`CodeQL scanned <# in DB> out of <# in baseline > files ... in this invocation`)