Fix: Storing built-in feature bins in program + Fix: Using llm_feedback_weight in final score #401

windyriki · 2026-02-09T10:52:05Z

Problem 1

Currently feature bins are shown correctly in the output but not saved in their respective programs, s.t. they are logged as the following in the program:
"complexity": 0.0,
"diversity": 0.0,

Solution 1

Additionally to appending them to coords save features.

Problem 2

Even though llm_feedback_weight is promoted to be reflected in the final score, the final score has a fixed weighting (0.3) and does not consider llm_feedback_weight at all.

Solution 2

Removed fixed weighting and use of llm_feedback_weight in final score.

Files changed

openevolve/database.py -Storing built-in feature bins in program
openevolve/evaluator.py -Using llm_feedback_weight in final score

…olve

Copilot

Pull request overview

This PR fixes two inconsistencies in OpenEvolve’s evolution bookkeeping: (1) built-in MAP-Elites feature bins (complexity/diversity) were being computed for coordinates but not persisted on the Program object, and (2) the final combined_score weighting ignored llm_feedback_weight and used a fixed constant.

Changes:

Persist built-in MAP-Elites complexity and diversity bin indices onto the Program before saving.
Update combined_score recomputation to weight LLM feedback using self.config.llm_feedback_weight instead of a fixed 0.3.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File	Description
`openevolve/database.py`	Stores computed built-in feature bin indices (`complexity`, `diversity`) into the `Program` so they persist to disk.
`openevolve/evaluator.py`	Uses `llm_feedback_weight` when recomputing `combined_score` after LLM evaluation.

openevolve/evaluator.py

Copilot · 2026-02-09T10:56:26Z

openevolve/evaluator.py

                            eval_result.metrics["combined_score"] = (
-                                accuracy * 0.7 + llm_average * 0.3
+                                accuracy * (1-self.config.llm_feedback_weight) + llm_average * self.config.llm_feedback_weight
                            )


combined_score now depends on llm_feedback_weight, but there’s no guard ensuring the weight is within [0.0, 1.0]. If a user misconfigures this, the score can become negative or exceed expected bounds; consider clamping or raising a clear config error before using it here.

Copilot · 2026-02-09T10:56:26Z

openevolve/evaluator.py

+                                accuracy * (1-self.config.llm_feedback_weight) + llm_average * self.config.llm_feedback_weight
                            )


This line exceeds the configured Black line length (100) and is missing spaces around operators (e.g., 1 - weight). Reformatting will improve readability and avoid formatting/lint churn in future diffs.

@copilot open a new pull request to apply changes based on this feedback

Copilot · 2026-02-09T10:56:26Z

openevolve/evaluator.py

                        if "combined_score" in eval_result.metrics:
                            # Original combined_score is just accuracy
                            accuracy = eval_result.metrics["combined_score"]
                            # Combine with LLM average (70% accuracy, 30% LLM quality)
                            eval_result.metrics["combined_score"] = (
-                                accuracy * 0.7 + llm_average * 0.3
+                                accuracy * (1-self.config.llm_feedback_weight) + llm_average * self.config.llm_feedback_weight
                            )


Test coverage: the combined_score weighting behavior changed to depend on llm_feedback_weight, but there doesn’t appear to be a unit test asserting the new formula. Adding a focused test (including edge weights like 0.0/1.0) would prevent regressions.

Copilot · 2026-02-09T10:56:27Z

openevolve/database.py

                # Use code length as complexity measure
                complexity = len(program.code)
                bin_idx = self._calculate_complexity_bin(complexity)
+                program.complexity = bin_idx  # Store complexity bin in program
                coords.append(bin_idx)


Assigning the bin index into Program.complexity/Program.diversity is semantically ambiguous (the dataclass defines these as derived feature values, currently typed as float). Consider either casting to float for consistency, or introducing explicit fields like complexity_bin/diversity_bin to avoid confusing bins with raw feature values.

@copilot open a new pull request to apply changes based on this feedback

Copilot · 2026-02-09T10:56:27Z

openevolve/database.py

                # Use cached diversity calculation with reference set
                if len(self.programs) < 2:
                    bin_idx = 0
                else:
                    diversity = self._get_cached_diversity(program)
                    bin_idx = self._calculate_diversity_bin(diversity)
+                    program.diversity = bin_idx  # Store diversity bin in program
                coords.append(bin_idx)


In the cold-start branch (len(self.programs) < 2) diversity’s bin_idx is forced to 0 but program.diversity is not updated, so saved programs may still show the default value rather than the computed bin. Set program.diversity in this branch as well for consistency with the complexity handling.

@copilot open a new pull request to apply changes based on this feedback

Copilot · 2026-02-09T10:56:27Z

openevolve/database.py

            elif dim == "complexity":
                # Use code length as complexity measure
                complexity = len(program.code)
                bin_idx = self._calculate_complexity_bin(complexity)
+                program.complexity = bin_idx  # Store complexity bin in program
                coords.append(bin_idx)
            elif dim == "diversity":
                # Use cached diversity calculation with reference set
                if len(self.programs) < 2:
                    bin_idx = 0
                else:
                    diversity = self._get_cached_diversity(program)
                    bin_idx = self._calculate_diversity_bin(diversity)
+                    program.diversity = bin_idx  # Store diversity bin in program
                coords.append(bin_idx)


Test coverage: this change is intended to persist built-in feature bins into the saved Program, but there’s no test asserting that Program.complexity/diversity are updated after coordinate calculation/add(). Add a unit test that loads/saves a program and verifies these fields are non-default when built-in dimensions are used.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

CLAassistant · 2026-02-09T20:49:16Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

CLAassistant · 2026-02-09T20:49:17Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

windyriki added 3 commits February 9, 2026 11:35

Save diversity and complexity bin in the program

c742512

Merge branch 'main' of github.com:algorithmicsuperintelligence/openev…

3e9c4cc

…olve

Use llm_feedback_weight for final score

c53c195

Copilot AI review requested due to automatic review settings February 9, 2026 10:52

Copilot started reviewing on behalf of windyriki February 9, 2026 10:52 View session

Copilot AI reviewed Feb 9, 2026

View reviewed changes

Update openevolve/evaluator.py

74f7b0f

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Storing built-in feature bins in program + Fix: Using llm_feedback_weight in final score #401

Fix: Storing built-in feature bins in program + Fix: Using llm_feedback_weight in final score #401

windyriki commented Feb 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Feb 9, 2026

Uh oh!

Copilot AI Feb 9, 2026

Uh oh!

windyriki Feb 9, 2026

Uh oh!

Copilot AI Feb 9, 2026

Uh oh!

Copilot AI Feb 9, 2026

Uh oh!

windyriki Feb 9, 2026

Uh oh!

Copilot AI Feb 9, 2026

Uh oh!

windyriki Feb 9, 2026

Uh oh!

Copilot AI Feb 9, 2026

Uh oh!

CLAassistant commented Feb 9, 2026

Uh oh!

CLAassistant commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		accuracy * (1-self.config.llm_feedback_weight) + llm_average * self.config.llm_feedback_weight
		)

Fix: Storing built-in feature bins in program + Fix: Using llm_feedback_weight in final score #401

Are you sure you want to change the base?

Fix: Storing built-in feature bins in program + Fix: Using llm_feedback_weight in final score #401

Conversation

windyriki commented Feb 9, 2026

Problem 1

Solution 1

Problem 2

Solution 2

Files changed

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

windyriki Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

windyriki Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

windyriki Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

CLAassistant commented Feb 9, 2026

Uh oh!

CLAassistant commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants