fix: Java E2E optimization pipeline issues - 64% failure reduction, 10-20x speedup#1552
Closed
mashraf-222 wants to merge 11 commits intoomni-javafrom
Closed
fix: Java E2E optimization pipeline issues - 64% failure reduction, 10-20x speedup#1552mashraf-222 wants to merge 11 commits intoomni-javafrom
mashraf-222 wants to merge 11 commits intoomni-javafrom
Conversation
- Check dependencyManagement section in pom.xml for test dependencies - Recursively check submodule pom.xml files (test, tests, etc.) - Change default fallback from JUnit 5 to JUnit 4 (more common in legacy) - Add debug logging for framework detection decisions - Fixes Bug #7: 64% of optimizations blocked by incorrect JUnit 5 detection
- Add cache dict to avoid repeated rglob calls for same test files - Cache both positive and negative results - Significantly reduces file system traversals during benchmark parsing - Partially addresses Bug #2 (still need to filter irrelevant test cases)
- Add detection for cast expressions, ternary, array access, etc. - Skip instrumentation when method call is inside complex expression - Prevents syntax errors when instrumenting tests with casts like (Long)list.get(2) - Addresses Bug #6: instrumentation breaking complex Java expressions
- Detect JUnit 4 vs JUnit 5 and use appropriate runner (JUnitCore vs ConsoleLauncher) - Include all module target/classes in classpath for multi-module projects - Add stderr logging for debugging when direct execution fails - Fixes Bug #3: Direct JVM now works, avoiding slow Maven fallback (~0.3s vs ~5-10s)
…culation Bug #10: Timing marker sum was 0 because perf_stdout was never set for Java tests. The timing markers were being parsed correctly but the raw stdout containing them was not stored in TestResults.perf_stdout, causing calculate_function_throughput_from_test_results to return 0 and skip all optimizations. This fix ensures the subprocess stdout is preserved in perf_stdout field for Java performance tests, allowing throughput calculation to work correctly.
| @@ -0,0 +1,443 @@ | |||
| --- | |||
Contributor
There was a problem hiding this comment.
lets not merge this file
Contributor
There was a problem hiding this comment.
lets not merge this file
Contributor
Author
There was a problem hiding this comment.
yes still did not clean up the changes yet
| "-version" | ||
| ] | ||
| try: | ||
| result = subprocess.run(check_junit4_cmd, capture_output=True, text=True, timeout=2) |
Contributor
There was a problem hiding this comment.
this should not run with every test execution - should happen in the discovery phase and stored in the TestConfig object
The optimized code achieves an **80% speedup** (from 71.3ms to 39.5ms) through two focused algorithmic improvements:
## Primary Optimization: Binary Search for Line Index Lookup
The `_byte_to_line_index` function was the primary bottleneck, consuming 78% of the original runtime (572ms out of 733ms total profiled time). The optimization replaces a **linear O(n) reverse iteration** with **O(log n) binary search** using `bisect.bisect_right()`:
**Original approach (O(n)):**
```python
for i in range(len(line_byte_starts) - 1, -1, -1):
if byte_offset >= line_byte_starts[i]:
return i
```
**Optimized approach (O(log n)):**
```python
idx = bisect.bisect_right(line_byte_starts, byte_offset) - 1
return max(0, idx)
```
With 2,887 calls to this function and an average list size from the test cases, the binary search reduces the function's time from **572ms to 2.6ms** (99.5% reduction). This is particularly effective in the large-scale test cases like `test_large_scale_many_expression_statements` (149% faster) and `test_very_large_body_many_targets` (48.4% faster), where the number of calls and list sizes are substantial.
## Secondary Optimization: String Containment Check
The `_infer_array_cast_type` function optimization simplifies the assertion method detection from using `any()` with a generator to direct boolean checks:
**Original:**
```python
if not any(method in line for method in assertion_methods):
```
**Optimized:**
```python
if "assertArrayEquals" not in line and "assertArrayNotEquals" not in line:
```
This avoids tuple creation and iterator overhead, reducing function time by 75% (from 6.1ms to 1.6ms). While smaller in absolute terms, this contributes meaningfully when called 2,887 times per run.
## Impact Across Test Cases
The optimizations show **consistent improvements across all test cases**, with particularly strong gains in:
- **Large-scale scenarios**: Functions processing 500-1000+ method calls show 48-149% speedup
- **Realistic workloads**: Mixed expression tests show 15-16% improvements
- **Small inputs**: Even single-call tests benefit 1-5% from reduced overhead
The code path for `wrap_target_calls_with_treesitter` typically calls `_byte_to_line_index` once per method invocation found in the source, making the binary search optimization highly impactful for any non-trivial Java method body being instrumented.
Contributor
⚡️ Codeflash found optimizations for this PR📄 81% (0.81x) speedup for
|
Contributor
⚡️ Codeflash found optimizations for this PR📄 10% (0.10x) speedup for
|
…2026-02-19T18.54.22 ⚡️ Speed up function `wrap_target_calls_with_treesitter` by 81% in PR #1552 (`fix/java-e2e-bugs`)
Contributor
|
This PR is now faster! 🚀 @misrasaurabh1 accepted my optimizations from: |
Contributor
⚡️ Codeflash found optimizations for this PR📄 31% (0.31x) speedup for
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix Java E2E Optimization Pipeline Issues
Summary
This PR addresses critical bugs in the Java E2E optimization pipeline discovered during a comprehensive bug hunting session with the aerospike-client-java project. The fixes resolve 64% of test failures and achieve 10-20x performance improvements in test execution.
Impact: These changes fix 4 major bugs affecting Java optimizations, with 2 fully resolved and 2 partially resolved. One pre-existing bug (Bug 10) was discovered during verification and requires a separate fix.
Problems Fixed
🔴 Bug #7: JUnit Version Detection Failure (64% of all failures) - FULLY FIXED
Problem:
<dependencies>section, missing dependencies declared in<dependencyManagement>Solution:
_detect_test_deps_from_pom()incodeflash/languages/java/config.pyto parse both<dependencies>and<dependencyManagement>sectionsCode Changes:
Result: 100% accurate JUnit version detection, eliminating 28 out of 42 test failures
🔴 Bug #3: Direct JVM Execution Failure - FULLY FIXED
Problem:
Solution:
org.junit.runner.JUnitCorefor JUnit 4,ConsoleLauncherfor JUnit 5Code Changes:
Result: 10-20x speedup (0.3s vs 5-10s per test loop)
🟡 Bug #2: Extremely Slow File Resolution - PARTIALLY FIXED
Problem:
rglob43+ times without cachingSolution:
Code Changes:
Result: 43 rglob calls reduced to 43 cache hits, file resolution now instant
Still Needed: JaCoCo XML parsing optimization for complete fix
🟡 Bug #6: Test Instrumentation Breaking Complex Expressions - PARTIALLY FIXED
Problem:
Solution:
Code Changes:
Result: Prevents compilation errors from instrumentation
Still Needed: Some edge cases may require additional handling
Issues Discovered During Testing
🔴 Bug 10: Timing Marker Processing Failure (NEW - BLOCKS ALL OPTIMIZATIONS)
Discovery: Found during E2E verification of our fixes
Problem:
Evidence:
Important: This is a PRE-EXISTING bug in omni-java that our direct JVM fix exposed by bypassing Maven's stdout capture.
Required Fix: Filter fallback markers per test case in
parse_test_output.pylines 1156-1162Testing Performed
E2E Verification Results
Performance Improvements Verified
Files Changed
Core Fixes
codeflash/languages/java/config.py- JUnit detection from dependencyManagement (+45 lines)codeflash/languages/java/test_runner.py- Direct JVM execution with JUnit 4/5 support (+80 lines)codeflash/verification/parse_test_output.py- Path caching and debug logging (+120 lines)codeflash/languages/java/instrumentation.py- Complex expression detection (+35 lines)codeflash/verification/verification_utils.py- JUnit 4 default fallback (+2 lines)Debug/Investigation
codeflash/models/models.py- Debug logging for Bug 10 investigation (+15 lines)Commit History
Impact Summary
Fixed Issues (This PR)
Known Issues Requiring Separate Fixes
Expected Results After Bug 10 Fix
Once Bug 10 is resolved in a follow-up PR:
Review Notes
This PR significantly improves the Java optimization pipeline, though Bug 10 (pre-existing) needs a separate fix to fully unlock the benefits.