feat: add reference graph for Python by KRRT7 · Pull Request #1460 · codeflash-ai/codeflash

KRRT7 · 2026-02-12T05:44:19Z

Summary

Add a persistent SQLite-backed reference graph that indexes function call edges using Jedi, with file-hash-based caching and parallel indexing
Expose ReferenceGraph in codeflash/languages/python/ behind a DependencyResolver protocol, removing is_python() gating from the optimizer
Rich Live display for index building with project-relative paths and dependency summary
Two flat human-readable DB tables (indexed_files, call_edges) with full text keys
Skip reference graph in CI where the cache DB doesn't persist
Simplify compat.py by removing unnecessary class wrapper

Test plan

16 unit tests in tests/test_reference_graph.py covering indexing, caching, cross-file edges, persistence
uv run prek run --from-ref origin/main passes

Store only the type string instead of the full Jedi Name object, removing the need for arbitrary_types_allowed and the runtime dependency on jedi in the model layer.

Introduces CallGraph that uses Jedi infer()+goto() to build call edges, stores them in codeflash_cache.db with content-hash invalidation, and serves as a drop-in replacement for get_function_sources_from_jedi().

Create CallGraph in Optimizer.run() for Python runs, pass it through FunctionOptimizer to code_context_extractor where it replaces get_function_sources_from_jedi() calls when available.

Covers same-file calls, cross-file calls, class instantiation, nested function exclusion, module-level exclusion, site-packages exclusion, empty/syntax-error files, and cache persistence.

Replace the simple progress bar with a Live + Tree + Panel display that shows files being analyzed, call edges discovered, cache hits, and summary stats during call graph indexing.

…cy summary Add cross-file edge detection to IndexResult, replace tree sub-entries with flat per-file dependency labels using plain language, and add a post-indexing summary panel showing per-function dependency stats.

Use the call graph to sort functions by callee count (most dependencies first) in --all mode without benchmarks, replacing arbitrary ordering.

Separate Jedi analysis (CPU-bound) from DB persistence so uncached files can be analyzed across multiple worker processes. Files are dispatched to a pool of up to 8 workers when >= 8 need indexing, with sequential fallback for small batches or on pool failure.

Use bounded deque for results, batch updates every 8 results with manual refresh to reduce flicker, and filter source_files to Python-only before passing to the call graph indexer.

Add DependencyResolver protocol and IndexResult to base.py, move call_graph.py to languages/python/, and use factory method in optimizer instead of is_python() gating.

…ve paths in call graph Display file paths relative to project root in the call graph live display for easier navigation. Filter indexed files by the language support's file extensions to avoid processing irrelevant file types.

…h sections Split the runtime estimate and PR message into separate log lines to avoid awkward line wrapping. Add console rules between sections for clearer visual separation.

…bles Replace the normalized relational hierarchy (cg_projects → cg_languages → cg_indexed_files/cg_call_edges) with two self-describing tables (indexed_files, call_edges) where every row includes project_root and language as text columns.

Skip dependency resolver creation in CI environments where the cache DB doesn't persist between runs. Also apply ruff formatting to call_graph.py.

codeflash/optimization/optimizer.py

codeflash/code_utils/config_consts.py

codeflash/cli_cmds/console.py

claude · 2026-02-12T06:02:04Z

PR Review Summary

Prek Checks

✅ All checks pass (ruff check and ruff format both pass).

Mypy

✅ No new mypy errors introduced by this PR. The new reference_graph.py has zero mypy errors. All errors found in changed files are pre-existing.

Code Review

Still-open bug (from prior review):

codeflash/languages/python/support.py:37-38 — fs.jedi_definition was removed from FunctionSource (replaced with definition_type: str | None), but function_sources_to_helpers() still accesses fs.jedi_definition.line. This will raise AttributeError at runtime when the reference graph resolver is enabled. (See existing comment)

Resolved from prior reviews:

✅ FunctionSource constructor calls in code_context_extractor.py — jedi_definition= keyword args removed
✅ get_code_optimization_context() now accepts call_graph keyword argument
✅ Token limits updated to 64K
✅ call_graph_summary uses batch count_callees_per_function
✅ count_callees_per_function uses (file_path, qualified_name) tuple keys

No new critical issues found in the latest changes. The reference graph feature is currently disabled in optimizer.py (commented out), so the support.py bug won't trigger at runtime until it's enabled.

Test Coverage

File	Stmts	Miss	Cover	Status
`cli_cmds/console.py`	179	133	26%	Modified
`code_utils/code_replacer.py`	410	71	83%	Modified
`code_utils/compat.py`	12	0	100%	Modified
`code_utils/config_consts.py`	58	7	88%	Modified
`discovery/functions_to_optimize.py`	549	165	70%	Modified
`languages/__init__.py`	20	14	30%	Modified
`languages/base.py`	127	2	98%	Modified
`languages/javascript/support.py`	956	249	74%	Modified
`languages/python/__init__.py`	3	0	100%	Modified
`languages/python/context/code_context_extractor.py`	634	47	93%	Modified
`languages/python/context/unused_definition_remover.py`	483	28	94%	Modified
`languages/python/reference_graph.py`	274	76	72%	New file
`languages/python/support.py`	286	140	51%	Modified
`models/models.py`	626	138	78%	Modified
`optimization/function_optimizer.py`	1169	950	19%	Modified
`optimization/optimizer.py`	446	361	19%	Modified

Overall project coverage: 79%

Coverage notes:

New file reference_graph.py has 72% coverage (close to the 75% threshold) — the main uncovered paths are the parallel indexing worker functions and some error handling branches
console.py (26%) has low coverage but most of the new code is Rich UI display logic (call_graph_live_display, call_graph_summary) which is difficult to unit test
optimizer.py and function_optimizer.py have low coverage (19%) but this is pre-existing — they are integration-heavy modules
475 new test lines were added in tests/test_reference_graph.py with thorough unit and integration tests for the new ReferenceGraph class

Test results: 2411 passed, 8 failed (all in test_tracer.py — pre-existing, unrelated to this PR), 57 skipped

Last updated: 2026-02-19T07:50:00Z

Deeply nested expression trees (e.g. large dict/list literals) at module or class level caused the recursive ast.NodeVisitor to exceed Python's default recursion limit. Replace the FunctionWithReturnStatement visitor class with an iterative stack-based traversal.

The optimized code achieves a **26% runtime improvement** by making the AST traversal in `function_has_return_statement` more targeted and efficient. **Key Optimization:** The critical change is in how `function_has_return_statement` traverses the AST when searching for `Return` nodes: **Original approach:** ```python stack.extend(ast.iter_child_nodes(node)) ``` This visits *all* child nodes including expressions, names, constants, and other non-statement nodes. **Optimized approach:** ```python for child in ast.iter_child_nodes(node): if isinstance(child, ast.stmt): stack.append(child) ``` This only pushes statement nodes onto the stack, since `Return` is a statement type (`ast.stmt`). **Why This Is Faster:** 1. **Reduced Node Traversal**: In typical Python functions, there are many more expression nodes (variable references, literals, operators, etc.) than statement nodes. For example, a simple `return x + y` has 1 Return statement but multiple Name and BinOp expression nodes underneath. The optimization skips all the expression-level nodes. 2. **Lower Python Overhead**: Fewer nodes in the stack means fewer loop iterations, fewer `isinstance` checks on non-Return nodes, and less list manipulation overhead. 3. **Preserved Correctness**: Since `Return` nodes are always statements in Python's AST (they inherit from `ast.stmt`), filtering to only statement nodes cannot miss any Return nodes. **Performance Impact by Test Case:** The optimization shows particularly strong gains for: - **Functions without returns** (up to 91% faster): Early termination without traversing deep expression trees - **Large codebases** (34-41% faster on tests with 1000+ functions): The cumulative effect across many function bodies - **Functions with complex expressions but no returns** (82% faster): Avoiding expensive traversal of unused expression subtrees - **Generator functions without explicit returns** (64% faster): Skipping yield expression internals The optimization maintains correctness across all test cases including nested classes, async functions, properties, and various control structures, while delivering consistent runtime improvements.

codeflash-ai · 2026-02-18T22:22:46Z

⚡️ Codeflash found optimizations for this PR

📄 26% (0.26x) speedup for `find_functions_with_return_statement` in `codeflash/discovery/functions_to_optimize.py`

⏱️ Runtime : 12.0 milliseconds → 9.48 milliseconds (best of 46 runs)

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up function find_functions_with_return_statement by 26% in PR #1460 (call-graphee) #1534

If you approve, it will be merged into this PR (branch call-graphee).

Replace per-function SQL loops in get_callees() and count_callees_per_function() with temp table JOINs, and thread resolved path strings through to avoid redundant resolve() calls.

…2026-02-18T22.22.36 ⚡️ Speed up function `find_functions_with_return_statement` by 26% in PR #1460 (`call-graphee`)

codeflash-ai · 2026-02-18T22:25:28Z

This PR is now faster! 🚀 @KRRT7 accepted my optimizations from:

⚡️ Speed up function find_functions_with_return_statement by 26% in PR #1460 (call-graphee) #1534

The optimized code achieves a **146% speedup** (from 1.47ms to 595μs) by eliminating the overhead of `ast.iter_child_nodes()` and replacing it with direct field access on AST nodes. **Key optimizations:** 1. **Direct stack initialization**: Instead of starting with `[function_node]` and then traversing into its body, the stack is initialized directly with `list(function_node.body)`. This skips one iteration and avoids processing the function definition wrapper itself. 2. **Manual field traversal**: Rather than calling `ast.iter_child_nodes(node)` which is a generator that yields all child nodes, the code directly accesses `node._fields` and uses `getattr()` to inspect each field. This eliminates the generator overhead and function call costs associated with `ast.iter_child_nodes()`. 3. **Targeted statement filtering**: By checking `isinstance(child, ast.stmt)` or `isinstance(item, ast.stmt)` only on relevant fields (handling both single statements and lists of statements), the traversal focuses on statement nodes where `ast.Return` can appear, avoiding unnecessary checks on expression nodes. **Why this is faster:** - **Reduced function call overhead**: `ast.iter_child_nodes()` is a generator function that incurs call/yield overhead on every iteration. Direct attribute access via `getattr()` is faster for small numbers of fields. - **Fewer iterations**: The line profiler shows the original code's `ast.iter_child_nodes()` line hit 5,453 times (69% of runtime), while the optimized version's field iteration hits only 3,290 times (17.4% of runtime). - **Better cache locality**: Direct field access patterns may benefit from better CPU cache utilization compared to generator state management. **Test case performance:** The optimization shows dramatic improvements particularly for: - **Functions with many sequential statements** (2365% faster for 1000 statements, 1430% faster for 1000 nested functions) - **Simple functions** (234-354% faster for basic return detection) - **Moderately complex control flow** (80-125% faster for nested conditionals/loops) The speedup is consistent across all test cases, with early-return scenarios benefiting the most as the optimization allows faster discovery of the return statement before processing unnecessary nodes.

codeflash-ai · 2026-02-18T22:35:05Z

⚡️ Codeflash found optimizations for this PR

📄 147% (1.47x) speedup for `function_has_return_statement` in `codeflash/discovery/functions_to_optimize.py`

⏱️ Runtime : 1.47 milliseconds → 595 microseconds (best of 58 runs)

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up function function_has_return_statement by 147% in PR #1460 (call-graphee) #1535

If you approve, it will be merged into this PR (branch call-graphee).

…2026-02-18T22.34.56 ⚡️ Speed up function `function_has_return_statement` by 147% in PR #1460 (`call-graphee`)

codeflash-ai · 2026-02-18T22:49:51Z

This PR is now faster! 🚀 @KRRT7 accepted my optimizations from:

⚡️ Speed up function function_has_return_statement by 147% in PR #1460 (call-graphee) #1535

codeflash/languages/python/support.py

# Conflicts: # .codex/skills/.gitignore # .gemini/skills/.gitignore # codeflash/languages/python/context/code_context_extractor.py

codeflash/models/models.py

codeflash/optimization/function_optimizer.py

Add DependencyResolver parameter back to get_code_optimization_context() that was lost during file move from codeflash/context/ to codeflash/languages/python/context/. When call_graph is available, use it for helper discovery instead of Jedi-based fallback.

feat: add reference graph for Python

KRRT7 and others added 19 commits February 10, 2026 04:57

refactor: replace jedi_definition with definition_type on FunctionSource

a969187

Store only the type string instead of the full Jedi Name object, removing the need for arbitrary_types_allowed and the runtime dependency on jedi in the model layer.

feat: add persistent CallGraph class with SQLite caching

b69a713

Introduces CallGraph that uses Jedi infer()+goto() to build call edges, stores them in codeflash_cache.db with content-hash invalidation, and serves as a drop-in replacement for get_function_sources_from_jedi().

feat: wire CallGraph into the optimization pipeline

0078539

Create CallGraph in Optimizer.run() for Python runs, pass it through FunctionOptimizer to code_context_extractor where it replaces get_function_sources_from_jedi() calls when available.

test: add unit and caching tests for CallGraph

b604bf0

Covers same-file calls, cross-file calls, class instantiation, nested function exclusion, module-level exclusion, site-packages exclusion, empty/syntax-error files, and cache persistence.

feat: add Rich Live visualization for call graph indexing

5c4a65c

Replace the simple progress bar with a Live + Tree + Panel display that shows files being analyzed, call edges discovered, cache hits, and summary stats during call graph indexing.

feat: rank functions by dependency count when no trace file is available

92be009

Use the call graph to sort functions by callee count (most dependencies first) in --all mode without benchmarks, replacing arbitrary ordering.

Merge branch 'main' into call-graphee

fada839

refactor: improve call graph live display and filter non-Python files

9cc1004

Use bounded deque for results, batch updates every 8 results with manual refresh to reduce flicker, and filter source_files to Python-only before passing to the call graph indexer.

refactor: move CallGraph into Python language support layer

f3f0b0e

Add DependencyResolver protocol and IndexResult to base.py, move call_graph.py to languages/python/, and use factory method in optimizer instead of is_python() gating.

fix: improve CLI output formatting for runtime estimate and call grap…

5341ac8

…h sections Split the runtime estimate and PR message into separate log lines to avoid awkward line wrapping. Add console rules between sections for clearer visual separation.

Update config_consts.py

0a5e814

Merge branch 'main' into call-graphee

13f8490

fix: skip call graph building in CI and fix ruff formatting

54aa7e1

Skip dependency resolver creation in CI environments where the cache DB doesn't persist between runs. Also apply ruff formatting to call_graph.py.

refactor: simplify compat.py by removing unnecessary class wrapper

c096c82

Merge branch 'main' into call-graphee

bc29463

KRRT7 changed the title ~~feat: add persistent call graph with language support layer~~ feat: add call graph for python Feb 12, 2026

fix: resolve mypy type errors in call_graph.py

8555da0

claude bot reviewed Feb 12, 2026

View reviewed changes

codeflash/optimization/optimizer.py Outdated Show resolved Hide resolved

claude bot reviewed Feb 12, 2026

View reviewed changes

codeflash/code_utils/config_consts.py Outdated Show resolved Hide resolved

claude bot reviewed Feb 12, 2026

View reviewed changes

codeflash/cli_cmds/console.py Outdated Show resolved Hide resolved

KRRT7 added 2 commits February 12, 2026 01:01

fix: revert token limits back to 16K from unintended 100K increase

513e590

feat: increase optimization and testgen token limits to 64K

be4a2ca

KRRT7 added 3 commits February 12, 2026 01:02

fix: use explicit token limits in tests to decouple from global constant

9e90448

test: update token limit tests for 64K default

fc42548

fix: add None guard for lang_support before accessing file_extensions

80759c9

KRRT7 and others added 4 commits February 18, 2026 21:51

Merge branch 'main' into call-graphee

223af06

style: auto-fix linting issues

5663985

codeflash-ai bot mentioned this pull request Feb 18, 2026

⚡️ Speed up function find_functions_with_return_statement by 26% in PR #1460 (call-graphee) #1534

Merged

KRRT7 and others added 2 commits February 18, 2026 17:24

fix: batch SQL queries and deduplicate Path.resolve() in call graph

88b0ee5

Replace per-function SQL loops in get_callees() and count_callees_per_function() with temp table JOINs, and thread resolved path strings through to avoid redundant resolve() calls.

Merge pull request #1534 from codeflash-ai/codeflash/optimize-pr1460-…

9b0606a

…2026-02-18T22.22.36 ⚡️ Speed up function `find_functions_with_return_statement` by 26% in PR #1460 (`call-graphee`)

codeflash-ai bot mentioned this pull request Feb 18, 2026

⚡️ Speed up function function_has_return_statement by 147% in PR #1460 (call-graphee) #1535

Merged

KRRT7 added 2 commits February 18, 2026 22:49

Merge branch 'main' into call-graphee

35d4d92

Merge pull request #1535 from codeflash-ai/codeflash/optimize-pr1460-…

392453a

…2026-02-18T22.34.56 ⚡️ Speed up function `function_has_return_statement` by 147% in PR #1460 (`call-graphee`)

claude bot reviewed Feb 18, 2026

View reviewed changes

codeflash/languages/python/support.py Show resolved Hide resolved

Merge remote-tracking branch 'origin/main' into call-graphee

2652e71

# Conflicts: # .codex/skills/.gitignore # .gemini/skills/.gitignore # codeflash/languages/python/context/code_context_extractor.py

claude bot reviewed Feb 19, 2026

View reviewed changes

codeflash/models/models.py Show resolved Hide resolved

claude bot reviewed Feb 19, 2026

View reviewed changes

codeflash/optimization/function_optimizer.py Show resolved Hide resolved

KRRT7 added 5 commits February 19, 2026 01:17

Simplify dependency summary output

f1c707a

Clarify call graph UI text

12f36fb

Rename Python call graph to reference graph

09ad2f5

refactor: rename test file and imports to match reference graph rename

cb91158

KRRT7 changed the title ~~feat: add call graph for python~~ feat: add reference graph for Python Feb 19, 2026

KRRT7 added 2 commits February 19, 2026 02:40

disable it for now

82783f8

fix: remove stale jedi_definition argument from FunctionSource calls

d2dea5c

KRRT7 merged commit 3dabd44 into main Feb 19, 2026
26 of 28 checks passed

KRRT7 deleted the call-graphee branch February 19, 2026 07:52

KRRT7 added a commit that referenced this pull request Feb 19, 2026

Merge pull request #1460 from codeflash-ai/call-graphee

ffebcbd

feat: add reference graph for Python

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat: add reference graph for Python#1460

feat: add reference graph for Python#1460
KRRT7 merged 57 commits intomainfrom
call-graphee

KRRT7 commented Feb 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude bot commented Feb 12, 2026 •

edited

Loading

Uh oh!

codeflash-ai bot commented Feb 18, 2026

⚡️ Speed up function `find_functions_with_return_statement` by 26% in PR #1460 (`call-graphee`) #1534

Uh oh!

codeflash-ai bot commented Feb 18, 2026

Uh oh!

codeflash-ai bot commented Feb 18, 2026

⚡️ Speed up function `function_has_return_statement` by 147% in PR #1460 (`call-graphee`) #1535

Uh oh!

codeflash-ai bot commented Feb 18, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

KRRT7 commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Summary

Prek Checks

Mypy

Code Review

Test Coverage

Uh oh!

codeflash-ai bot commented Feb 18, 2026

⚡️ Codeflash found optimizations for this PR

📄 26% (0.26x) speedup for find_functions_with_return_statement in codeflash/discovery/functions_to_optimize.py

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up function find_functions_with_return_statement by 26% in PR #1460 (call-graphee) #1534

Uh oh!

codeflash-ai bot commented Feb 18, 2026

Uh oh!

codeflash-ai bot commented Feb 18, 2026

⚡️ Codeflash found optimizations for this PR

📄 147% (1.47x) speedup for function_has_return_statement in codeflash/discovery/functions_to_optimize.py

A dependent PR with the suggested changes has been created. Please review:

⚡️ Speed up function function_has_return_statement by 147% in PR #1460 (call-graphee) #1535

Uh oh!

codeflash-ai bot commented Feb 18, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

KRRT7 commented Feb 12, 2026 •

edited

Loading

claude bot commented Feb 12, 2026 •

edited

Loading

📄 26% (0.26x) speedup for `find_functions_with_return_statement` in `codeflash/discovery/functions_to_optimize.py`

⚡️ Speed up function `find_functions_with_return_statement` by 26% in PR #1460 (`call-graphee`) #1534

📄 147% (1.47x) speedup for `function_has_return_statement` in `codeflash/discovery/functions_to_optimize.py`

⚡️ Speed up function `function_has_return_statement` by 147% in PR #1460 (`call-graphee`) #1535