⚡️ Speed up function check_formatter_installed by 29% in PR #1550 (omni-java-refactor-java-support)#1554
Conversation
The optimized code achieves a **28% runtime improvement** (105ms → 81.9ms) by targeting the hot paths in `check_formatter_installed` and `get_language_support_by_common_formatters`, which are called frequently during formatter validation workflows.
## Key Optimizations
### 1. **Eliminated Repeated Import Overhead (64% → 72.6% of time in one hot path)**
The original code imported `get_language_support_by_common_formatters` on every call to `check_formatter_installed`. The optimization caches this import in a module-level variable `_get_language_support_by_common_formatters`, performing the import only once. This transforms a ~596ms import cost spread across 1005 calls into a single ~615ms import, saving significant time in repeated invocations.
### 2. **Fast-Path String Splitting (11.3% → ~0% of total time)**
Replaced expensive `shlex.split()` with a simple `.split()` for commands without quotes or backslashes. Line profiler shows the original `shlex.split(first_cmd)` took 105ms (11.3% of total time). The optimized version checks for special characters first and only falls back to `shlex.split()` when necessary, dramatically reducing parsing overhead for common simple commands.
### 3. **Set-Based Language Detection (30.1% + 22.8% → 12.5% + 8.5% of function time)**
Converted formatter lists to module-level `frozenset` constants (`_PY_FORMATTERS`, `_JS_TS_FORMATTERS`) and used set intersection (`tokens_set & _PY_FORMATTERS`) instead of nested generator expressions with `any()`. This reduces the original 1.84ms + 1.39ms of `any()` overhead (52.9% of `get_language_support_by_common_formatters` time) to just 332μs + 226μs (21% of time).
### 4. **Lazy Registry Initialization**
Deferred `_ensure_languages_registered()` call until after determining if language detection is possible. When `.js` or `.py` is already in `_EXTENSION_REGISTRY` (common case), the expensive registration is skipped entirely. This prevents the ~694μs registration overhead on every call.
### 5. **Reduced Temporary String Allocations**
Moved `command_str = " ".join(formatter_cmds).replace(" $file", "")` computation from always executing (856μs) to only executing in error paths, eliminating this cost for the success path (~99% of calls).
## Test Case Performance
The optimizations excel across diverse scenarios:
- **Simple commands**: Empty/disabled formatters see 5-13% improvements
- **Complex parsing**: Commands with 100+ arguments show **69.6-491% speedups** (e.g., `test_formatter_with_many_arguments`: 1.65ms → 974μs)
- **Repeated calls**: Stress test with 1000 calls benefits from cached import and faster parsing
- **Mixed workloads**: Real-world sequences of formatter checks (50 different formatters) improved 2.85-4.58%
These optimizations are particularly valuable when `check_formatter_installed` is called in CI/CD pipelines, IDE integrations, or batch processing scenarios where formatter validation happens repeatedly across many files.
| def get_language_support_by_common_formatters(formatter_cmd: str | list[str]) -> LanguageSupport | None: | ||
| _ensure_languages_registered() | ||
| language: Language | None = None | ||
| _ensure_called = False |
There was a problem hiding this comment.
Dead variable — _ensure_called is assigned but never read. Remove it.
| _ensure_called = False |
| try: | ||
| from codeflash.languages.registry import \ | ||
| get_language_support_by_common_formatters as _cached_helper | ||
|
|
There was a problem hiding this comment.
This catches Exception for what should be an ImportError. If the import succeeds but the module has a runtime bug, this silently swallows the error and causes the function to return True (claiming the formatter is installed). Consider narrowing to except ImportError:.
| except ImportError: |
| if isinstance(first_cmd, str): | ||
| if ('"' not in first_cmd) and ("'" not in first_cmd) and ("\\" not in first_cmd): | ||
| cmd_tokens = first_cmd.split() | ||
| else: | ||
| cmd_tokens = shlex.split(first_cmd) | ||
| else: | ||
| cmd_tokens = [first_cmd] |
There was a problem hiding this comment.
Since formatter_cmds: list[str], first_cmd is always a str, making the isinstance check always true and the else branch unreachable (mypy flags this as [unreachable]). The original code had the same dead branch in ternary form — consider removing it for clarity:
| if isinstance(first_cmd, str): | |
| if ('"' not in first_cmd) and ("'" not in first_cmd) and ("\\" not in first_cmd): | |
| cmd_tokens = first_cmd.split() | |
| else: | |
| cmd_tokens = shlex.split(first_cmd) | |
| else: | |
| cmd_tokens = [first_cmd] | |
| if ('"' not in first_cmd) and ("'" not in first_cmd) and ("\\" not in first_cmd): | |
| cmd_tokens = first_cmd.split() | |
| else: | |
| cmd_tokens = shlex.split(first_cmd) |
PR Review SummaryPrek Checks✅ Passed — Auto-fixed 3 issues (1 unsorted import, 2 formatting) and committed in Mypy
Code Review3 inline comments posted:
No critical bugs, security vulnerabilities, or breaking API changes found. The optimization logic (lazy import caching, fast-path string splitting, set intersection for formatter lookup, deferred Test Coverage
Changed lines coverage:
The optimized functions themselves have low direct test coverage, though this is consistent with the pre-optimization state — these functions rely on integration/E2E tests that exercise them indirectly. Last updated: 2026-02-20 |
⚡️ This pull request contains optimizations for PR #1550
If you approve this dependent PR, these changes will be merged into the original PR branch
omni-java-refactor-java-support.📄 29% (0.29x) speedup for
check_formatter_installedincodeflash/code_utils/env_utils.py⏱️ Runtime :
105 milliseconds→81.9 milliseconds(best of25runs)📝 Explanation and details
The optimized code achieves a 28% runtime improvement (105ms → 81.9ms) by targeting the hot paths in
check_formatter_installedandget_language_support_by_common_formatters, which are called frequently during formatter validation workflows.Key Optimizations
1. Eliminated Repeated Import Overhead (64% → 72.6% of time in one hot path)
The original code imported
get_language_support_by_common_formatterson every call tocheck_formatter_installed. The optimization caches this import in a module-level variable_get_language_support_by_common_formatters, performing the import only once. This transforms a ~596ms import cost spread across 1005 calls into a single ~615ms import, saving significant time in repeated invocations.2. Fast-Path String Splitting (11.3% → ~0% of total time)
Replaced expensive
shlex.split()with a simple.split()for commands without quotes or backslashes. Line profiler shows the originalshlex.split(first_cmd)took 105ms (11.3% of total time). The optimized version checks for special characters first and only falls back toshlex.split()when necessary, dramatically reducing parsing overhead for common simple commands.3. Set-Based Language Detection (30.1% + 22.8% → 12.5% + 8.5% of function time)
Converted formatter lists to module-level
frozensetconstants (_PY_FORMATTERS,_JS_TS_FORMATTERS) and used set intersection (tokens_set & _PY_FORMATTERS) instead of nested generator expressions withany(). This reduces the original 1.84ms + 1.39ms ofany()overhead (52.9% ofget_language_support_by_common_formatterstime) to just 332μs + 226μs (21% of time).4. Lazy Registry Initialization
Deferred
_ensure_languages_registered()call until after determining if language detection is possible. When.jsor.pyis already in_EXTENSION_REGISTRY(common case), the expensive registration is skipped entirely. This prevents the ~694μs registration overhead on every call.5. Reduced Temporary String Allocations
Moved
command_str = " ".join(formatter_cmds).replace(" $file", "")computation from always executing (856μs) to only executing in error paths, eliminating this cost for the success path (~99% of calls).Test Case Performance
The optimizations excel across diverse scenarios:
test_formatter_with_many_arguments: 1.65ms → 974μs)These optimizations are particularly valuable when
check_formatter_installedis called in CI/CD pipelines, IDE integrations, or batch processing scenarios where formatter validation happens repeatedly across many files.✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-pr1550-2026-02-19T23.45.37and push.