⚡️ Speed up function `_byte_to_line_index` by 3,873% in PR #1199 (`omni-java`) by codeflash-ai[bot] · Pull Request #1589 · codeflash-ai/codeflash

codeflash-ai · 2026-02-20T07:41:13Z

⚡️ This pull request contains optimizations for PR #1199

If you approve this dependent PR, these changes will be merged into the original PR branch omni-java.

This PR will be automatically closed if the original PR is merged.

📄 3,873% (38.73x) speedup for `_byte_to_line_index` in `codeflash/languages/java/instrumentation.py`

⏱️ Runtime : 29.2 milliseconds → 736 microseconds (best of 237 runs)

📝 Explanation and details

The optimized code achieves a 3872% speedup (from 29.2ms to 736μs) by replacing a manual reverse linear search with Python's built-in bisect_right function from the bisect module.

What changed:

Original approach: Iterated backwards through line_byte_starts using a Python for-loop, comparing byte_offset against each element until finding the first match
Optimized approach: Uses bisect_right(line_byte_starts, byte_offset) - 1 to perform a binary search in O(log n) time instead of O(n)

Why this is faster:

Algorithm complexity: Binary search (O(log n)) vs linear search (O(n)). For 1000 lines, this means ~10 comparisons instead of up to 1000
C-level implementation: bisect_right is implemented in C and highly optimized, eliminating Python interpreter overhead for the search loop
Reduced memory access: The line profiler shows the original code spent 57.5% of time on array indexing (line_byte_starts[i]) across many iterations. The optimized version performs far fewer array accesses

Performance characteristics from tests:

Small lists (2-4 elements): ~50-130% faster - modest gains due to setup overhead
Medium lists (100-300 elements): ~200-500% faster - binary search advantage becomes clear
Large lists (1000 elements): ~3000-6400% faster - dramatic improvement as the gap between O(log n) and O(n) widens
The test test_large_scale_sequential_mapping with 1000 lines shows 4495% speedup (13.5ms → 293μs), confirming the optimization's effectiveness at scale

Edge cases preserved:

Empty lists correctly return 0
Negative offsets work correctly
Offsets before the first element return 0
The conditional if idx >= 0 else 0 handles the edge case where bisect_right returns 0 (offset before all elements)

This optimization is particularly valuable when _byte_to_line_index is called repeatedly with large line_byte_starts lists, as is typical in code instrumentation scenarios where files have hundreds or thousands of lines.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 2389 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Click to see Generated Regression Tests

import pytest  # used for our unit tests
from codeflash.languages.java.instrumentation import _byte_to_line_index

def test_basic_exact_matches():
    # simple ascending start offsets for three lines
    starts = [0, 10, 20]
    # exact match to first start should return index 0
    codeflash_output = _byte_to_line_index(0, starts) # 1.45μs -> 711ns (104% faster)
    # exact match to second start should return index 1
    codeflash_output = _byte_to_line_index(10, starts) # 561ns -> 351ns (59.8% faster)
    # exact match to third start should return index 2
    codeflash_output = _byte_to_line_index(20, starts) # 380ns -> 260ns (46.2% faster)
    # value greater than last start returns last index (2)
    codeflash_output = _byte_to_line_index(25, starts) # 321ns -> 221ns (45.2% faster)

def test_basic_between_offsets():
    # starts have gaps; offsets that fall between starts should map to the previous index
    starts = [0, 5, 15, 30]
    # 7 is between 5 and 15 -> should map to index 1
    codeflash_output = _byte_to_line_index(7, starts) # 1.48μs -> 671ns (121% faster)
    # 29 is between 15 and 30 -> should map to index 2
    codeflash_output = _byte_to_line_index(29, starts) # 551ns -> 331ns (66.5% faster)
    # exactly at boundary 30 -> should map to index 3
    codeflash_output = _byte_to_line_index(30, starts) # 390ns -> 280ns (39.3% faster)

def test_offset_greater_than_last_returns_last():
    # a longer list: values greater than the last recorded start must return the last index
    starts = [0, 100, 200, 300]
    # large offset much greater than the last start should return index 3
    codeflash_output = _byte_to_line_index(10000, starts) # 1.31μs -> 671ns (95.5% faster)
    # offset equal to last start returns last index
    codeflash_output = _byte_to_line_index(300, starts) # 501ns -> 371ns (35.0% faster)

def test_empty_line_byte_starts_returns_zero_for_any_offset():
    # when starts list is empty, function iterates zero times and returns 0
    starts = []
    codeflash_output = _byte_to_line_index(0, starts) # 1.07μs -> 561ns (91.1% faster)
    codeflash_output = _byte_to_line_index(12345, starts) # 430ns -> 261ns (64.8% faster)
    codeflash_output = _byte_to_line_index(-5, starts) # 270ns -> 190ns (42.1% faster)

def test_offset_less_than_first_entry_returns_zero():
    # when offset is less than the first start entry, function should return 0
    starts = [10, 20, 30]
    codeflash_output = _byte_to_line_index(0, starts) # 1.45μs -> 632ns (130% faster)
    codeflash_output = _byte_to_line_index(9, starts) # 551ns -> 321ns (71.7% faster)

def test_negative_starts_and_negative_offsets():
    # starts may include negative byte starts (e.g. hypothetical offsets)
    starts = [-30, -10, 0, 10]
    # -30 exactly should map to index 0
    codeflash_output = _byte_to_line_index(-30, starts) # 1.43μs -> 691ns (107% faster)
    # -20 is >= -30 and >= -10? It's >= -30 but < -10, so map to index 0? Let's reason:
    # Iteration checks from end: index 3 (10) => -20 >= 10 False
    # index 2 (0) => -20 >= 0 False
    # index 1 (-10) => -20 >= -10 False
    # index 0 (-30) => -20 >= -30 True => returns 0
    codeflash_output = _byte_to_line_index(-20, starts) # 581ns -> 361ns (60.9% faster)
    # -5 is >= -10 and < 0 -> should map to index 1
    codeflash_output = _byte_to_line_index(-5, starts) # 401ns -> 250ns (60.4% faster)

def test_none_inputs_raise_type_error():
    # passing None for the starts list should raise TypeError when len(None) is attempted
    with pytest.raises(TypeError):
        _byte_to_line_index(0, None) # 2.87μs -> 2.65μs (8.32% faster)
    # passing None as offset should raise TypeError when comparison is attempted
    with pytest.raises(TypeError):
        _byte_to_line_index(None, [0, 10, 20]) # 2.96μs -> 1.99μs (48.2% faster)

def test_float_offset_is_accepted_and_compared_naturally():
    # although annotated as int, a float offset should compare against int starts
    starts = [0, 10, 20]
    # 15.5 is >= 10 but < 20 -> should map to index 1
    codeflash_output = _byte_to_line_index(15.5, starts) # 1.57μs -> 781ns (101% faster)
    # a float equal to a start should map to that index
    codeflash_output = _byte_to_line_index(10.0, starts) # 661ns -> 360ns (83.6% faster)

def test_large_scale_sequential_mapping():
    # create 1000 line starts spaced by 10 bytes: 0, 10, 20, ..., 9990
    n = 1000
    starts = [i * 10 for i in range(n)]
    # For each line i, choose an offset halfway to the next start: i*10 + 5
    # That offset should map to index i for all i in 0..(n-1)
    for i in range(n):
        offset = i * 10 + 5
        # a single assertion per iteration ensures we exercise all internal loop indices
        codeflash_output = _byte_to_line_index(offset, starts) # 13.5ms -> 293μs (4495% faster)
    # Also test offsets beyond the last start map to the last index
    codeflash_output = _byte_to_line_index(n * 10 + 500, starts) # 411ns -> 261ns (57.5% faster)

def test_large_scale_edge_offsets_near_boundaries():
    # ensure boundary values (exact starts) across many entries map correctly
    n = 1000
    starts = [i * 7 for i in range(n)]  # use spacing 7 to vary arithmetic
    # check every exact start maps to its index
    for i in range(n):
        codeflash_output = _byte_to_line_index(starts[i], starts) # 13.5ms -> 305μs (4331% faster)
    # check just before the first start (less than starts[0]) gives 0
    codeflash_output = _byte_to_line_index(starts[0] - 1, starts) # 26.1μs -> 401ns (6411% faster)
    # check exactly one less than a middle start maps to previous index
    mid = n // 2
    codeflash_output = _byte_to_line_index(starts[mid] - 1, starts) # 13.7μs -> 390ns (3417% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import pytest
from codeflash.languages.java.instrumentation import _byte_to_line_index

def test_single_line_at_start():
    """Test byte offset at the very beginning of a single line."""
    # Single line starting at byte 0
    line_byte_starts = [0]
    codeflash_output = _byte_to_line_index(0, line_byte_starts); result = codeflash_output # 1.55μs -> 671ns (131% faster)

def test_single_line_after_start():
    """Test byte offset in the middle of a single line."""
    # Single line starting at byte 0, offset is 5
    line_byte_starts = [0]
    codeflash_output = _byte_to_line_index(5, line_byte_starts); result = codeflash_output # 1.41μs -> 631ns (124% faster)

def test_two_lines_first_line():
    """Test byte offset in the first of two lines."""
    # Line 0 starts at 0, line 1 starts at 10
    line_byte_starts = [0, 10]
    codeflash_output = _byte_to_line_index(5, line_byte_starts); result = codeflash_output # 1.39μs -> 641ns (117% faster)

def test_two_lines_second_line_start():
    """Test byte offset exactly at the start of the second line."""
    # Line 0 starts at 0, line 1 starts at 10
    line_byte_starts = [0, 10]
    codeflash_output = _byte_to_line_index(10, line_byte_starts); result = codeflash_output # 1.24μs -> 591ns (110% faster)

def test_two_lines_second_line_middle():
    """Test byte offset in the middle of the second line."""
    # Line 0 starts at 0, line 1 starts at 10
    line_byte_starts = [0, 10]
    codeflash_output = _byte_to_line_index(15, line_byte_starts); result = codeflash_output # 1.28μs -> 641ns (100% faster)

def test_three_lines_first_line():
    """Test byte offset in the first of three lines."""
    # Line starts: 0, 20, 40
    line_byte_starts = [0, 20, 40]
    codeflash_output = _byte_to_line_index(10, line_byte_starts); result = codeflash_output # 1.36μs -> 642ns (112% faster)

def test_three_lines_second_line():
    """Test byte offset in the second of three lines."""
    # Line starts: 0, 20, 40
    line_byte_starts = [0, 20, 40]
    codeflash_output = _byte_to_line_index(30, line_byte_starts); result = codeflash_output # 1.31μs -> 661ns (98.5% faster)

def test_three_lines_third_line():
    """Test byte offset in the third of three lines."""
    # Line starts: 0, 20, 40
    line_byte_starts = [0, 20, 40]
    codeflash_output = _byte_to_line_index(50, line_byte_starts); result = codeflash_output # 1.23μs -> 641ns (92.4% faster)

def test_offset_zero_with_multiple_lines():
    """Test byte offset of 0 with multiple lines starting at 0."""
    # Multiple lines with first line at byte 0
    line_byte_starts = [0, 10, 20, 30]
    codeflash_output = _byte_to_line_index(0, line_byte_starts); result = codeflash_output # 1.44μs -> 681ns (112% faster)

def test_empty_line_byte_starts():
    """Test with an empty line_byte_starts list."""
    # Empty list of line starts
    line_byte_starts = []
    codeflash_output = _byte_to_line_index(0, line_byte_starts); result = codeflash_output # 992ns -> 561ns (76.8% faster)

def test_empty_line_byte_starts_nonzero_offset():
    """Test with empty list and non-zero byte offset."""
    # Empty list, but asking for a byte offset
    line_byte_starts = []
    codeflash_output = _byte_to_line_index(100, line_byte_starts); result = codeflash_output # 861ns -> 561ns (53.5% faster)

def test_byte_offset_before_any_line_start():
    """Test byte offset that is before all line starts (negative scenario)."""
    # If byte offset is negative (unusual but testing edge case)
    line_byte_starts = [0, 10, 20]
    # Negative byte offset should not match any position
    codeflash_output = _byte_to_line_index(-1, line_byte_starts); result = codeflash_output # 1.45μs -> 691ns (110% faster)

def test_byte_offset_exactly_at_each_line_start():
    """Test byte offset at the exact start of each line."""
    # Line starts: 0, 100, 200
    line_byte_starts = [0, 100, 200]
    
    # Test at first line start
    codeflash_output = _byte_to_line_index(0, line_byte_starts) # 1.43μs -> 681ns (110% faster)
    # Test at second line start
    codeflash_output = _byte_to_line_index(100, line_byte_starts) # 581ns -> 371ns (56.6% faster)
    # Test at third line start
    codeflash_output = _byte_to_line_index(200, line_byte_starts) # 381ns -> 280ns (36.1% faster)

def test_very_large_byte_offset():
    """Test with a very large byte offset."""
    # Small line structure but large offset
    line_byte_starts = [0, 10, 20]
    codeflash_output = _byte_to_line_index(1000000, line_byte_starts); result = codeflash_output # 1.29μs -> 652ns (98.3% faster)

def test_line_byte_starts_with_large_gaps():
    """Test with line starts that have large gaps between them."""
    # Lines with gaps: 0, 1000, 5000, 10000
    line_byte_starts = [0, 1000, 5000, 10000]
    
    # Test in first gap
    codeflash_output = _byte_to_line_index(500, line_byte_starts) # 1.50μs -> 671ns (124% faster)
    # Test in second gap
    codeflash_output = _byte_to_line_index(3000, line_byte_starts) # 551ns -> 360ns (53.1% faster)
    # Test in third gap
    codeflash_output = _byte_to_line_index(7500, line_byte_starts) # 400ns -> 250ns (60.0% faster)
    # Test beyond last line
    codeflash_output = _byte_to_line_index(10500, line_byte_starts) # 341ns -> 220ns (55.0% faster)

def test_many_lines_sequential():
    """Test with a list of many sequential line starts."""
    # 100 lines, each 10 bytes apart
    line_byte_starts = [i * 10 for i in range(100)]
    
    # Test in various positions
    codeflash_output = _byte_to_line_index(0, line_byte_starts) # 3.30μs -> 661ns (399% faster)
    codeflash_output = _byte_to_line_index(45, line_byte_starts) # 2.68μs -> 431ns (521% faster)
    codeflash_output = _byte_to_line_index(500, line_byte_starts) # 1.45μs -> 320ns (354% faster)
    codeflash_output = _byte_to_line_index(995, line_byte_starts) # 351ns -> 271ns (29.5% faster)

def test_offset_between_consecutive_line_starts():
    """Test byte offset between two consecutive line starts."""
    # Lines at 0, 10, 20, 30
    line_byte_starts = [0, 10, 20, 30]
    
    # Test in each gap
    codeflash_output = _byte_to_line_index(5, line_byte_starts) # 1.45μs -> 642ns (126% faster)
    codeflash_output = _byte_to_line_index(15, line_byte_starts) # 531ns -> 340ns (56.2% faster)
    codeflash_output = _byte_to_line_index(25, line_byte_starts) # 361ns -> 250ns (44.4% faster)

def test_single_element_nonzero():
    """Test with a single line not starting at byte 0."""
    # Line starts at byte 50 (unusual but possible)
    line_byte_starts = [50]
    
    # Offset before the line start
    codeflash_output = _byte_to_line_index(25, line_byte_starts); result = codeflash_output # 1.38μs -> 571ns (142% faster)
    
    # Offset at the line start
    codeflash_output = _byte_to_line_index(50, line_byte_starts); result = codeflash_output # 581ns -> 370ns (57.0% faster)
    
    # Offset after the line start
    codeflash_output = _byte_to_line_index(75, line_byte_starts); result = codeflash_output # 381ns -> 251ns (51.8% faster)

def test_duplicate_line_starts():
    """Test with duplicate consecutive line start values."""
    # Multiple lines with same start (edge case)
    line_byte_starts = [0, 10, 10, 20]
    
    # At the duplicate positions, should return the last matching index
    codeflash_output = _byte_to_line_index(10, line_byte_starts); result = codeflash_output # 1.34μs -> 681ns (97.2% faster)

def test_offset_equals_int_max_value():
    """Test with an extremely large integer offset."""
    # Test with a very large but valid int
    line_byte_starts = [0, 100]
    codeflash_output = _byte_to_line_index(2147483647, line_byte_starts); result = codeflash_output # 1.38μs -> 611ns (126% faster)

def test_offset_zero_single_line_not_at_zero():
    """Test offset 0 when the only line doesn't start at 0."""
    # If line starts at 10, offset 0 is before it
    line_byte_starts = [10]
    codeflash_output = _byte_to_line_index(0, line_byte_starts); result = codeflash_output # 1.40μs -> 591ns (137% faster)

def test_large_number_of_lines_1000():
    """Test with 1000 lines to verify performance."""
    # Create 1000 lines with sequential byte starts (0, 10, 20, ..., 9990)
    line_byte_starts = [i * 10 for i in range(1000)]
    
    # Test various positions
    codeflash_output = _byte_to_line_index(0, line_byte_starts) # 27.0μs -> 821ns (3187% faster)
    codeflash_output = _byte_to_line_index(50, line_byte_starts) # 26.2μs -> 501ns (5129% faster)
    codeflash_output = _byte_to_line_index(5000, line_byte_starts) # 13.8μs -> 441ns (3021% faster)
    codeflash_output = _byte_to_line_index(9990, line_byte_starts) # 461ns -> 361ns (27.7% faster)
    codeflash_output = _byte_to_line_index(9995, line_byte_starts) # 411ns -> 330ns (24.5% faster)

def test_large_byte_offsets_across_1000_lines():
    """Test many different byte offsets across 1000 lines."""
    # 1000 lines with variable spacing
    line_byte_starts = [i * 100 for i in range(1000)]
    
    # Test 100 different positions
    for offset_mult in range(0, 100):
        byte_offset = offset_mult * 1000
        codeflash_output = _byte_to_line_index(byte_offset, line_byte_starts); result = codeflash_output # 1.36ms -> 33.6μs (3954% faster)

def test_backwards_iteration_with_many_lines():
    """Test that backwards iteration works correctly with many lines."""
    # 500 lines, test that we're correctly iterating backward
    line_byte_starts = [i * 5 for i in range(500)]
    
    # The last line should be found efficiently
    codeflash_output = _byte_to_line_index(2495, line_byte_starts); result = codeflash_output # 1.14μs -> 742ns (53.9% faster)

def test_boundary_checks_across_large_range():
    """Test boundary conditions across a large range of lines."""
    # 750 lines with different patterns
    line_byte_starts = [i * 15 for i in range(750)]
    
    # Test at boundaries
    codeflash_output = _byte_to_line_index(line_byte_starts[0], line_byte_starts) # 20.1μs -> 781ns (2473% faster)
    codeflash_output = _byte_to_line_index(line_byte_starts[749], line_byte_starts) # 632ns -> 521ns (21.3% faster)
    
    # Test just before and after various boundaries
    for i in [100, 250, 500, 749]:
        byte_pos = line_byte_starts[i]
        # Just before the line start (should map to previous line)
        if i > 0:
            codeflash_output = _byte_to_line_index(byte_pos - 1, line_byte_starts)
        # At the line start (should map to this line)
        codeflash_output = _byte_to_line_index(byte_pos, line_byte_starts) # 38.1μs -> 1.32μs (2786% faster)

def test_performance_with_dense_lines():
    """Test performance with 1000 densely packed lines."""
    # Lines with minimal spacing (1 byte apart)
    line_byte_starts = list(range(1000))
    
    # Test various offsets
    codeflash_output = _byte_to_line_index(0, line_byte_starts) # 26.9μs -> 802ns (3252% faster)
    codeflash_output = _byte_to_line_index(500, line_byte_starts) # 13.9μs -> 471ns (2859% faster)
    codeflash_output = _byte_to_line_index(999, line_byte_starts) # 430ns -> 351ns (22.5% faster)

def test_performance_with_sparse_lines():
    """Test performance with sparse line distribution."""
    # 100 lines with large gaps (1000 bytes apart)
    line_byte_starts = [i * 1000 for i in range(100)]
    
    # Test in the middle of sparse gaps
    codeflash_output = _byte_to_line_index(50000, line_byte_starts) # 2.19μs -> 642ns (242% faster)
    codeflash_output = _byte_to_line_index(99500, line_byte_starts) # 541ns -> 370ns (46.2% faster)

def test_sequential_calls_with_increasing_offsets():
    """Test multiple sequential calls with increasing offsets."""
    # 200 lines
    line_byte_starts = [i * 50 for i in range(200)]
    
    # Make many calls with increasing offsets
    previous_result = 0
    for i in range(0, 10000, 50):
        codeflash_output = _byte_to_line_index(i, line_byte_starts); result = codeflash_output # 498μs -> 57.1μs (773% faster)
        previous_result = result

def test_large_offset_jumps():
    """Test with offset that jumps across many lines at once."""
    # 100 lines
    line_byte_starts = [i * 100 for i in range(100)]
    
    # Jump to near the end
    codeflash_output = _byte_to_line_index(9500, line_byte_starts); result = codeflash_output # 1.15μs -> 691ns (66.7% faster)
    
    # Jump even further
    codeflash_output = _byte_to_line_index(9950, line_byte_starts); result = codeflash_output # 501ns -> 341ns (46.9% faster)

def test_offset_just_before_last_line_among_many():
    """Test offset just before the last line in a large list."""
    # 500 lines
    line_byte_starts = [i * 20 for i in range(500)]
    
    # Last line starts at 9980
    last_line_start = line_byte_starts[-1]
    
    # Offset just before last line
    codeflash_output = _byte_to_line_index(last_line_start - 1, line_byte_starts); result = codeflash_output # 1.17μs -> 741ns (58.2% faster)
    
    # Offset at last line
    codeflash_output = _byte_to_line_index(last_line_start, line_byte_starts); result = codeflash_output # 591ns -> 440ns (34.3% faster)

def test_all_offsets_map_to_valid_indices():
    """Test that many random offsets all map to valid line indices."""
    # 300 lines
    line_byte_starts = [i * 33 for i in range(300)]
    
    # Test many offset values
    for test_offset in [0, 100, 500, 1000, 5000, 9999, 10000]:
        codeflash_output = _byte_to_line_index(test_offset, line_byte_starts); result = codeflash_output # 33.5μs -> 2.84μs (1082% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr1199-2026-02-20T07.41.07 and push.

The optimized code achieves a **3872% speedup** (from 29.2ms to 736μs) by replacing a manual reverse linear search with Python's built-in `bisect_right` function from the bisect module. **What changed:** - **Original approach**: Iterated backwards through `line_byte_starts` using a Python for-loop, comparing `byte_offset` against each element until finding the first match - **Optimized approach**: Uses `bisect_right(line_byte_starts, byte_offset) - 1` to perform a binary search in O(log n) time instead of O(n) **Why this is faster:** 1. **Algorithm complexity**: Binary search (O(log n)) vs linear search (O(n)). For 1000 lines, this means ~10 comparisons instead of up to 1000 2. **C-level implementation**: `bisect_right` is implemented in C and highly optimized, eliminating Python interpreter overhead for the search loop 3. **Reduced memory access**: The line profiler shows the original code spent 57.5% of time on array indexing (`line_byte_starts[i]`) across many iterations. The optimized version performs far fewer array accesses **Performance characteristics from tests:** - **Small lists** (2-4 elements): ~50-130% faster - modest gains due to setup overhead - **Medium lists** (100-300 elements): ~200-500% faster - binary search advantage becomes clear - **Large lists** (1000 elements): ~3000-6400% faster - dramatic improvement as the gap between O(log n) and O(n) widens - The test `test_large_scale_sequential_mapping` with 1000 lines shows **4495% speedup** (13.5ms → 293μs), confirming the optimization's effectiveness at scale **Edge cases preserved:** - Empty lists correctly return 0 - Negative offsets work correctly - Offsets before the first element return 0 - The conditional `if idx >= 0 else 0` handles the edge case where `bisect_right` returns 0 (offset before all elements) This optimization is particularly valuable when `_byte_to_line_index` is called repeatedly with large `line_byte_starts` lists, as is typical in code instrumentation scenarios where files have hundreds or thousands of lines.

claude · 2026-02-20T07:53:32Z

PR Review Summary

Prek Checks

✅ Passed (after auto-fix)

Two issues were auto-fixed and committed:

I001 (unsorted-imports): from bisect import bisect_right moved after import re
FURB136 (if-expr-min-max): idx if idx >= 0 else 0 → max(idx, 0)

Mypy

⚠️ 19 pre-existing mypy errors in instrumentation.py — all from the broader omni-java branch, none introduced by this PR. The optimized function (_byte_to_line_index) has no mypy issues.

Code Review

✅ No critical issues found

The optimization is correct and well-motivated:

Replaces O(n) reverse linear scan with O(log n) binary search via bisect_right
bisect_right(line_byte_starts, byte_offset) - 1 correctly finds the last index where line_byte_starts[i] <= byte_offset
max(idx, 0) handles the edge case where offset is before all entries (same as original return 0)
No behavioral changes, no security concerns, no breaking API changes

Test Coverage

File	Stmts	Miss	Coverage	Status
`codeflash/languages/java/instrumentation.py`	482	84	83%	✅

The modified function _byte_to_line_index (lines 232-235) is fully covered by tests
File is new in the omni-java base branch (does not exist on main), so no main-branch comparison is applicable
83% coverage exceeds the 75% threshold for new files

Last updated: 2026-02-20

codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Feb 20, 2026

codeflash-ai bot mentioned this pull request Feb 20, 2026

codeflash-omni-java #1199

Draft

style: auto-fix linting issues

0fb931a

claude bot mentioned this pull request Feb 20, 2026

fix: resolve test file paths in discover_tests_pytest to fix path com… #1605

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡️ Speed up function `_byte_to_line_index` by 3,873% in PR #1199 (`omni-java`)#1589

⚡️ Speed up function `_byte_to_line_index` by 3,873% in PR #1199 (`omni-java`)#1589
codeflash-ai[bot] wants to merge 2 commits intoomni-javafrom
codeflash/optimize-pr1199-2026-02-20T07.41.07

codeflash-ai bot commented Feb 20, 2026

Uh oh!

claude bot commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Comments

Conversation

codeflash-ai bot commented Feb 20, 2026

⚡️ This pull request contains optimizations for PR #1199

📄 3,873% (38.73x) speedup for _byte_to_line_index in codeflash/languages/java/instrumentation.py

📝 Explanation and details

Uh oh!

claude bot commented Feb 20, 2026

PR Review Summary

Prek Checks

Mypy

Code Review

Test Coverage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Comments

📄 3,873% (38.73x) speedup for `_byte_to_line_index` in `codeflash/languages/java/instrumentation.py`