This directory contains a collection of examples demonstrating how to use OpenEvolve for various tasks including optimization, algorithm discovery, and code evolution. Each example showcases different aspects of OpenEvolve's capabilities and provides templates for creating your own evolutionary coding projects.
To create your own OpenEvolve example, you need three essential components:
Your initial program must contain exactly one EVOLVE-BLOCK:
# EVOLVE-BLOCK-START
def your_function():
# Your initial implementation here
# This is the only section OpenEvolve will modify
pass
# EVOLVE-BLOCK-END
# Helper functions and other code outside the evolve block
def helper_function():
# This code won't be modified by OpenEvolve
passCritical Requirements:
- ✅ Exactly one EVOLVE-BLOCK (not multiple blocks)
- ✅ Use
# EVOLVE-BLOCK-STARTand# EVOLVE-BLOCK-ENDmarkers - ✅ Put only the code you want evolved inside the block
- ✅ Helper functions and imports go outside the block
Your evaluator can return either a dictionary or an EvaluationResult object:
def evaluate(program_path: str) -> Dict:
"""
Evaluate the program and return metrics.
Can return either a dict or EvaluationResult object.
Use EvaluationResult if you want to include artifacts for debugging.
"""
try:
# Import and run your program
# Calculate metrics
return {
'combined_score': 0.8, # PRIMARY METRIC for evolution (required)
'accuracy': 0.9, # Your custom metrics
'speed': 0.7,
'robustness': 0.6,
# Add any other metrics you want to track
}
except Exception as e:
return {
'combined_score': 0.0, # Always return combined_score, even on error
'error': str(e)
}
# Or use EvaluationResult for artifacts support:
from openevolve.evaluation_result import EvaluationResult
def evaluate(program_path: str) -> EvaluationResult:
return EvaluationResult(
metrics={'combined_score': 0.8, 'accuracy': 0.9},
artifacts={'debug_info': 'useful debugging data'}
)Critical Requirements:
- ✅ Return a dictionary or
EvaluationResult- both are supported - ✅ Must include
'combined_score'- this is the primary metric OpenEvolve uses - ✅ Higher
combined_scorevalues should indicate better programs - ✅ Handle exceptions and return
combined_score: 0.0on failure - ✅ Use
EvaluationResultwith artifacts for richer debugging feedback
Essential configuration structure:
# Evolution settings
max_iterations: 100
checkpoint_interval: 10
parallel_evaluations: 1
# LLM configuration
llm:
api_base: "https://api.openai.com/v1" # Or your LLM provider
models:
- name: "gpt-4"
weight: 1.0
temperature: 0.7
max_tokens: 4000
timeout: 120
# Database configuration (MAP-Elites algorithm)
database:
population_size: 50
num_islands: 3
migration_interval: 10
feature_dimensions: # MUST be a list, not an integer
- "score"
- "complexity"
# Evaluation settings
evaluator:
timeout: 60
max_retries: 3
# Prompt configuration
prompt:
system_message: |
You are an expert programmer. Your goal is to improve the code
in the EVOLVE-BLOCK to achieve better performance on the task.
Focus on algorithmic improvements and code optimization.
num_top_programs: 3
num_diverse_programs: 2
# Logging
log_level: "INFO"Critical Requirements:
- ✅
feature_dimensionsmust be a list (e.g.,["score", "complexity"]), not an integer - ✅ Set appropriate timeouts for your use case
- ✅ Configure LLM settings for your provider
- ✅ Use meaningful
system_messageto guide evolution
❌ Wrong: feature_dimensions: 2
✅ Correct: feature_dimensions: ["score", "complexity"]
❌ Wrong: Using 'total_score' metric name
✅ Correct: Using 'combined_score' metric name
❌ Wrong: Multiple EVOLVE-BLOCK sections ✅ Correct: Exactly one EVOLVE-BLOCK section
💡 Tip: Both {'combined_score': 0.8, ...} dict and EvaluationResult(metrics={...}, artifacts={...}) are valid return types
When using custom feature dimensions, your evaluator must return raw continuous values, not pre-computed bin indices:
def evaluate(program_path: str) -> Dict:
# Calculate actual measurements
prompt_length = len(generated_prompt) # Actual character count
execution_time = measure_runtime() # Time in seconds
memory_usage = get_peak_memory() # Bytes used
return {
"combined_score": accuracy_score,
"prompt_length": prompt_length, # Raw count, not bin index
"execution_time": execution_time, # Raw seconds, not bin index
"memory_usage": memory_usage # Raw bytes, not bin index
}def evaluate(program_path: str) -> Dict:
prompt_length = len(generated_prompt)
# DON'T DO THIS - pre-computing bins
if prompt_length < 100:
length_bin = 0
elif prompt_length < 500:
length_bin = 1
# ... more binning logic
return {
"combined_score": accuracy_score,
"prompt_length": length_bin, # ❌ This is a bin index, not raw value
}- OpenEvolve uses min-max scaling internally
- Bin indices get incorrectly scaled as if they were raw values
- Grid positions become unstable as new programs change the min/max range
- This violates MAP-Elites principles and leads to poor evolution
- Counts: Token count, line count, character count
- Performance: Execution time, memory usage, throughput
- Quality: Accuracy, precision, recall, F1 score
- Complexity: Cyclomatic complexity, nesting depth, function count
# Basic run
python openevolve-run.py path/to/initial_program.py path/to/evaluator.py --config path/to/config.yaml --iterations 100
# Resume from checkpoint
python openevolve-run.py path/to/initial_program.py path/to/evaluator.py \
--config path/to/config.yaml \
--checkpoint path/to/checkpoint_directory \
--iterations 50
# View results
python scripts/visualizer.py --path path/to/openevolve_output/checkpoints/checkpoint_100/llm:
models:
- name: "gpt-4"
weight: 0.7
- name: "claude-3-sonnet"
weight: 0.3database:
num_islands: 5 # More islands = more diversity
migration_interval: 15 # How often islands exchange programs
population_size: 100 # Larger population = more explorationevaluator:
cascade_stages:
- stage1_timeout: 30 # Quick validation
- stage2_timeout: 120 # Full evaluationTask: Find global minimum of complex non-convex function
Achievement: Evolved from random search to sophisticated simulated annealing
Key Lesson: Shows automatic discovery of optimization algorithms
cd examples/function_minimization
python ../../openevolve-run.py initial_program.py evaluator.py --config config.yamlTask: Pack 26 circles in unit square to maximize sum of radii
Achievement: Matched AlphaEvolve paper results (2.634/2.635)
Key Lesson: Demonstrates evolution from geometric heuristics to mathematical optimization
cd examples/circle_packing
python ../../openevolve-run.py initial_program.py evaluator.py --config config_phase_1.yamlTask: Design digital filters for audio processing
Achievement: Discovered novel filter designs with superior characteristics
Key Lesson: Shows evolution of domain-specific algorithms
cd examples/signal_processing
python ../../openevolve-run.py initial_program.py evaluator.py --config config.yamlTask: Create sorting algorithm that adapts to data patterns
Achievement: Evolved sorting strategies beyond traditional algorithms
Key Lesson: Multi-language support (Rust) and algorithm adaptation
cd examples/rust_adaptive_sort
python ../../openevolve-run.py initial_program.rs evaluator.py --config config.yamlTask: Optimize attention mechanisms for Apple Silicon
Achievement: 2-3x speedup over baseline implementation
Key Lesson: Hardware-specific optimization and performance tuning
cd examples/mlx_metal_kernel_opt
python ../../openevolve-run.py initial_program.py evaluator.py --config config.yamlTask: Extract API documentation from HTML pages
Achievement: Demonstrates optillm integration with readurls and MoA
Key Lesson: Shows integration with LLM proxy systems and test-time compute
cd examples/web_scraper_optillm
python ../../openevolve-run.py initial_program.py evaluator.py --config config.yamlTask: Solve competitive programming problems
Achievement: Automated solution generation and submission
Key Lesson: Integration with external evaluation systems
cd examples/online_judge_programming
python ../../openevolve-run.py initial_program.py evaluator.py --config config.yamlTask: Evolve prompts for better LLM performance
Achievement: Discovered effective prompt engineering techniques
Key Lesson: Self-improving AI systems and prompt evolution
cd examples/llm_prompt_optimazation
python ../../openevolve-run.py initial_prompt.txt evaluator.py --config config.yamlTask: Integrate with language model evaluation harness
Achievement: Automated benchmark improvement
Key Lesson: Integration with standard ML evaluation frameworks
Task: Discover mathematical expressions from data
Achievement: Automated discovery of scientific equations
Key Lesson: Scientific discovery and mathematical modeling
Task: Develop robust statistical regression methods
Achievement: Novel statistical algorithms resistant to outliers
Key Lesson: Multi-language support (R) and statistical algorithm evolution
cd examples/r_robust_regression
python ../../openevolve-run.py initial_program.r evaluator.py --config config.yamlTask: Circle packing with detailed execution feedback
Achievement: Advanced debugging and artifact collection
Key Lesson: Using OpenEvolve's artifact system for detailed analysis
cd examples/circle_packing_with_artifacts
python ../../openevolve-run.py initial_program.py evaluator.py --config config_phase_1.yaml- Use meaningful metrics that reflect your goals
- Include both quality and efficiency measures
- Handle edge cases and errors gracefully
- Provide informative feedback for debugging
- Start with smaller populations and fewer iterations for testing
- Increase
num_islandsfor more diverse exploration - Adjust
temperaturebased on how creative you want the LLM to be - Set appropriate timeouts for your compute environment
- Use multiple phases with different configurations
- Begin with exploration, then focus on exploitation
- Consider cascade evaluation for expensive tests
- Monitor progress and adjust configuration as needed
- Check logs in
openevolve_output/logs/ - Examine failed programs in checkpoint directories
- Use artifacts to understand program behavior
- Test your evaluator independently before evolution
- 📖 See individual example READMEs for detailed walkthroughs
- 🔍 Check the main OpenEvolve documentation
- 💬 Open issues on the GitHub repository
Each example is self-contained and includes all necessary files to get started. Pick an example similar to your use case and adapt it to your specific problem!