Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

OpenEvolve Examples

This directory contains a collection of examples demonstrating how to use OpenEvolve for various tasks including optimization, algorithm discovery, and code evolution. Each example showcases different aspects of OpenEvolve's capabilities and provides templates for creating your own evolutionary coding projects.

Quick Start Template

To create your own OpenEvolve example, you need three essential components:

1. Initial Program (initial_program.py)

Your initial program must contain exactly one EVOLVE-BLOCK:

# EVOLVE-BLOCK-START
def your_function():
    # Your initial implementation here
    # This is the only section OpenEvolve will modify
    pass
# EVOLVE-BLOCK-END

# Helper functions and other code outside the evolve block
def helper_function():
    # This code won't be modified by OpenEvolve
    pass

Critical Requirements:

  • Exactly one EVOLVE-BLOCK (not multiple blocks)
  • ✅ Use # EVOLVE-BLOCK-START and # EVOLVE-BLOCK-END markers
  • ✅ Put only the code you want evolved inside the block
  • ✅ Helper functions and imports go outside the block

2. Evaluator (evaluator.py)

Your evaluator can return either a dictionary or an EvaluationResult object:

def evaluate(program_path: str) -> Dict:
    """
    Evaluate the program and return metrics.

    Can return either a dict or EvaluationResult object.
    Use EvaluationResult if you want to include artifacts for debugging.
    """
    try:
        # Import and run your program
        # Calculate metrics
        
        return {
            'combined_score': 0.8,  # PRIMARY METRIC for evolution (required)
            'accuracy': 0.9,        # Your custom metrics
            'speed': 0.7,
            'robustness': 0.6,
            # Add any other metrics you want to track
        }
    except Exception as e:
        return {
            'combined_score': 0.0,  # Always return combined_score, even on error
            'error': str(e)
        }

# Or use EvaluationResult for artifacts support:
from openevolve.evaluation_result import EvaluationResult

def evaluate(program_path: str) -> EvaluationResult:
    return EvaluationResult(
        metrics={'combined_score': 0.8, 'accuracy': 0.9},
        artifacts={'debug_info': 'useful debugging data'}
    )

Critical Requirements:

  • Return a dictionary or EvaluationResult - both are supported
  • Must include 'combined_score' - this is the primary metric OpenEvolve uses
  • ✅ Higher combined_score values should indicate better programs
  • ✅ Handle exceptions and return combined_score: 0.0 on failure
  • ✅ Use EvaluationResult with artifacts for richer debugging feedback

3. Configuration (config.yaml)

Essential configuration structure:

# Evolution settings
max_iterations: 100
checkpoint_interval: 10
parallel_evaluations: 1

# LLM configuration
llm:
  api_base: "https://api.openai.com/v1"  # Or your LLM provider
  models:
    - name: "gpt-4"
      weight: 1.0
  temperature: 0.7
  max_tokens: 4000
  timeout: 120

# Database configuration (MAP-Elites algorithm)
database:
  population_size: 50
  num_islands: 3
  migration_interval: 10
  feature_dimensions:  # MUST be a list, not an integer
    - "score"
    - "complexity"

# Evaluation settings
evaluator:
  timeout: 60
  max_retries: 3

# Prompt configuration
prompt:
  system_message: |
    You are an expert programmer. Your goal is to improve the code
    in the EVOLVE-BLOCK to achieve better performance on the task.
    
    Focus on algorithmic improvements and code optimization.
  num_top_programs: 3
  num_diverse_programs: 2

# Logging
log_level: "INFO"

Critical Requirements:

  • feature_dimensions must be a list (e.g., ["score", "complexity"]), not an integer
  • ✅ Set appropriate timeouts for your use case
  • ✅ Configure LLM settings for your provider
  • ✅ Use meaningful system_message to guide evolution

Common Configuration Mistakes

Wrong: feature_dimensions: 2Correct: feature_dimensions: ["score", "complexity"]

Wrong: Using 'total_score' metric name ✅ Correct: Using 'combined_score' metric name

Wrong: Multiple EVOLVE-BLOCK sections ✅ Correct: Exactly one EVOLVE-BLOCK section

💡 Tip: Both {'combined_score': 0.8, ...} dict and EvaluationResult(metrics={...}, artifacts={...}) are valid return types

MAP-Elites Feature Dimensions Best Practices

When using custom feature dimensions, your evaluator must return raw continuous values, not pre-computed bin indices:

✅ Correct: Return Raw Values

def evaluate(program_path: str) -> Dict:
    # Calculate actual measurements
    prompt_length = len(generated_prompt)  # Actual character count
    execution_time = measure_runtime()     # Time in seconds
    memory_usage = get_peak_memory()       # Bytes used
    
    return {
        "combined_score": accuracy_score,
        "prompt_length": prompt_length,    # Raw count, not bin index
        "execution_time": execution_time,  # Raw seconds, not bin index  
        "memory_usage": memory_usage       # Raw bytes, not bin index
    }

❌ Wrong: Return Bin Indices

def evaluate(program_path: str) -> Dict:
    prompt_length = len(generated_prompt)
    
    # DON'T DO THIS - pre-computing bins
    if prompt_length < 100:
        length_bin = 0
    elif prompt_length < 500:
        length_bin = 1
    # ... more binning logic
    
    return {
        "combined_score": accuracy_score,
        "prompt_length": length_bin,  # ❌ This is a bin index, not raw value
    }

Why This Matters

  • OpenEvolve uses min-max scaling internally
  • Bin indices get incorrectly scaled as if they were raw values
  • Grid positions become unstable as new programs change the min/max range
  • This violates MAP-Elites principles and leads to poor evolution

Examples of Good Feature Dimensions

  • Counts: Token count, line count, character count
  • Performance: Execution time, memory usage, throughput
  • Quality: Accuracy, precision, recall, F1 score
  • Complexity: Cyclomatic complexity, nesting depth, function count

Running Your Example

# Basic run
python openevolve-run.py path/to/initial_program.py path/to/evaluator.py --config path/to/config.yaml --iterations 100

# Resume from checkpoint
python openevolve-run.py path/to/initial_program.py path/to/evaluator.py \
  --config path/to/config.yaml \
  --checkpoint path/to/checkpoint_directory \
  --iterations 50

# View results
python scripts/visualizer.py --path path/to/openevolve_output/checkpoints/checkpoint_100/

Advanced Configuration Options

LLM Ensemble (Multiple Models)

llm:
  models:
    - name: "gpt-4"
      weight: 0.7
    - name: "claude-3-sonnet"
      weight: 0.3

Island Evolution (Population Diversity)

database:
  num_islands: 5        # More islands = more diversity
  migration_interval: 15  # How often islands exchange programs
  population_size: 100   # Larger population = more exploration

Cascade Evaluation (Multi-Stage Testing)

evaluator:
  cascade_stages:
    - stage1_timeout: 30   # Quick validation
    - stage2_timeout: 120  # Full evaluation

Example Directory

🧮 Mathematical Optimization

Task: Find global minimum of complex non-convex function
Achievement: Evolved from random search to sophisticated simulated annealing
Key Lesson: Shows automatic discovery of optimization algorithms

cd examples/function_minimization
python ../../openevolve-run.py initial_program.py evaluator.py --config config.yaml

Task: Pack 26 circles in unit square to maximize sum of radii
Achievement: Matched AlphaEvolve paper results (2.634/2.635)
Key Lesson: Demonstrates evolution from geometric heuristics to mathematical optimization

cd examples/circle_packing
python ../../openevolve-run.py initial_program.py evaluator.py --config config_phase_1.yaml

🔧 Algorithm Discovery

Task: Design digital filters for audio processing
Achievement: Discovered novel filter designs with superior characteristics
Key Lesson: Shows evolution of domain-specific algorithms

cd examples/signal_processing
python ../../openevolve-run.py initial_program.py evaluator.py --config config.yaml

Task: Create sorting algorithm that adapts to data patterns
Achievement: Evolved sorting strategies beyond traditional algorithms
Key Lesson: Multi-language support (Rust) and algorithm adaptation

cd examples/rust_adaptive_sort
python ../../openevolve-run.py initial_program.rs evaluator.py --config config.yaml

🚀 Performance Optimization

Task: Optimize attention mechanisms for Apple Silicon
Achievement: 2-3x speedup over baseline implementation
Key Lesson: Hardware-specific optimization and performance tuning

cd examples/mlx_metal_kernel_opt
python ../../openevolve-run.py initial_program.py evaluator.py --config config.yaml

🌐 Web and Data Processing

Task: Extract API documentation from HTML pages
Achievement: Demonstrates optillm integration with readurls and MoA
Key Lesson: Shows integration with LLM proxy systems and test-time compute

cd examples/web_scraper_optillm
python ../../openevolve-run.py initial_program.py evaluator.py --config config.yaml

💻 Programming Challenges

Task: Solve competitive programming problems
Achievement: Automated solution generation and submission
Key Lesson: Integration with external evaluation systems

cd examples/online_judge_programming
python ../../openevolve-run.py initial_program.py evaluator.py --config config.yaml

📊 Machine Learning and AI

Task: Evolve prompts for better LLM performance
Achievement: Discovered effective prompt engineering techniques
Key Lesson: Self-improving AI systems and prompt evolution

cd examples/llm_prompt_optimazation
python ../../openevolve-run.py initial_prompt.txt evaluator.py --config config.yaml

Task: Integrate with language model evaluation harness
Achievement: Automated benchmark improvement
Key Lesson: Integration with standard ML evaluation frameworks

Task: Discover mathematical expressions from data
Achievement: Automated discovery of scientific equations
Key Lesson: Scientific discovery and mathematical modeling

🔬 Scientific Computing

Task: Develop robust statistical regression methods
Achievement: Novel statistical algorithms resistant to outliers
Key Lesson: Multi-language support (R) and statistical algorithm evolution

cd examples/r_robust_regression
python ../../openevolve-run.py initial_program.r evaluator.py --config config.yaml

🎯 Advanced Features

Task: Circle packing with detailed execution feedback
Achievement: Advanced debugging and artifact collection
Key Lesson: Using OpenEvolve's artifact system for detailed analysis

cd examples/circle_packing_with_artifacts
python ../../openevolve-run.py initial_program.py evaluator.py --config config_phase_1.yaml

Best Practices

🎯 Design Effective Evaluators

  • Use meaningful metrics that reflect your goals
  • Include both quality and efficiency measures
  • Handle edge cases and errors gracefully
  • Provide informative feedback for debugging

🔧 Configuration Tuning

  • Start with smaller populations and fewer iterations for testing
  • Increase num_islands for more diverse exploration
  • Adjust temperature based on how creative you want the LLM to be
  • Set appropriate timeouts for your compute environment

📈 Evolution Strategy

  • Use multiple phases with different configurations
  • Begin with exploration, then focus on exploitation
  • Consider cascade evaluation for expensive tests
  • Monitor progress and adjust configuration as needed

🐛 Debugging

  • Check logs in openevolve_output/logs/
  • Examine failed programs in checkpoint directories
  • Use artifacts to understand program behavior
  • Test your evaluator independently before evolution

Getting Help

Each example is self-contained and includes all necessary files to get started. Pick an example similar to your use case and adapt it to your specific problem!