⚡️ Speed up function function_has_return_statement by 147% in PR #1460 (call-graphee)#1535
Conversation
The optimized code achieves a **146% speedup** (from 1.47ms to 595μs) by eliminating the overhead of `ast.iter_child_nodes()` and replacing it with direct field access on AST nodes. **Key optimizations:** 1. **Direct stack initialization**: Instead of starting with `[function_node]` and then traversing into its body, the stack is initialized directly with `list(function_node.body)`. This skips one iteration and avoids processing the function definition wrapper itself. 2. **Manual field traversal**: Rather than calling `ast.iter_child_nodes(node)` which is a generator that yields all child nodes, the code directly accesses `node._fields` and uses `getattr()` to inspect each field. This eliminates the generator overhead and function call costs associated with `ast.iter_child_nodes()`. 3. **Targeted statement filtering**: By checking `isinstance(child, ast.stmt)` or `isinstance(item, ast.stmt)` only on relevant fields (handling both single statements and lists of statements), the traversal focuses on statement nodes where `ast.Return` can appear, avoiding unnecessary checks on expression nodes. **Why this is faster:** - **Reduced function call overhead**: `ast.iter_child_nodes()` is a generator function that incurs call/yield overhead on every iteration. Direct attribute access via `getattr()` is faster for small numbers of fields. - **Fewer iterations**: The line profiler shows the original code's `ast.iter_child_nodes()` line hit 5,453 times (69% of runtime), while the optimized version's field iteration hits only 3,290 times (17.4% of runtime). - **Better cache locality**: Direct field access patterns may benefit from better CPU cache utilization compared to generator state management. **Test case performance:** The optimization shows dramatic improvements particularly for: - **Functions with many sequential statements** (2365% faster for 1000 statements, 1430% faster for 1000 nested functions) - **Simple functions** (234-354% faster for basic return detection) - **Moderately complex control flow** (80-125% faster for nested conditionals/loops) The speedup is consistent across all test cases, with early-return scenarios benefiting the most as the optimization allows faster discovery of the return statement before processing unnecessary nodes.
PR Review SummaryPrek Checks✅ All prek checks pass — no formatting or linting issues found. Mypy
Code Review✅ No critical issues found. The PR makes two clean optimizations:
Test Coverage
Changed lines analysis:
The codeflash bot reports 100% test coverage on the optimization with 80 generated regression tests passing. Test suite: 2392 passed, 57 skipped, 8 failed (all failures in Codeflash Optimization PRsNo mergeable optimization PRs targeting Last updated: 2026-02-18 |
⚡️ This pull request contains optimizations for PR #1460
If you approve this dependent PR, these changes will be merged into the original PR branch
call-graphee.📄 147% (1.47x) speedup for
function_has_return_statementincodeflash/discovery/functions_to_optimize.py⏱️ Runtime :
1.47 milliseconds→595 microseconds(best of58runs)📝 Explanation and details
The optimized code achieves a 146% speedup (from 1.47ms to 595μs) by eliminating the overhead of
ast.iter_child_nodes()and replacing it with direct field access on AST nodes.Key optimizations:
Direct stack initialization: Instead of starting with
[function_node]and then traversing into its body, the stack is initialized directly withlist(function_node.body). This skips one iteration and avoids processing the function definition wrapper itself.Manual field traversal: Rather than calling
ast.iter_child_nodes(node)which is a generator that yields all child nodes, the code directly accessesnode._fieldsand usesgetattr()to inspect each field. This eliminates the generator overhead and function call costs associated withast.iter_child_nodes().Targeted statement filtering: By checking
isinstance(child, ast.stmt)orisinstance(item, ast.stmt)only on relevant fields (handling both single statements and lists of statements), the traversal focuses on statement nodes whereast.Returncan appear, avoiding unnecessary checks on expression nodes.Why this is faster:
ast.iter_child_nodes()is a generator function that incurs call/yield overhead on every iteration. Direct attribute access viagetattr()is faster for small numbers of fields.ast.iter_child_nodes()line hit 5,453 times (69% of runtime), while the optimized version's field iteration hits only 3,290 times (17.4% of runtime).Test case performance:
The optimization shows dramatic improvements particularly for:
The speedup is consistent across all test cases, with early-return scenarios benefiting the most as the optimization allows faster discovery of the return statement before processing unnecessary nodes.
✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-pr1460-2026-02-18T22.34.56and push.