How it Works
Stage 1: Trace Generation
- Runs the existing program with training examples
- Captures input/output pairs along with evaluation scores
- Creates a dataset of execution traces that show current behavior
Stage 2: Candidate Prompt Generation
This stage has two sub-steps: First, an LLM analyzes the signature and traces to generate a program description. Then it uses that description along with prompting tips to generate candidate instructions. The prompting tips library includes:- Use clear, specific language
- Consider chain-of-thought for complex tasks
- Specify output formats
- Use role-playing when appropriate
- Handle edge cases explicitly
- Request structured outputs when needed
Stage 3: Evaluation and Selection
- Evaluates each candidate on a minibatch of examples
- Computes performance scores
- Selects the best performing candidate
- Applies it to the module
Configuration
Default settings:- Number of candidates to generate
- Minibatch size for evaluation
- Temperature for generation diversity
- Whether to display progress stats
Usage Example
Comparison: COPRO vs MIPROv2 vs GEPA
Feature | COPRO | MIPROv2 | GEPA |
---|---|---|---|
Approach | Iterative refinement | LLM-guided generation | Reflective evolution |
Feedback | Score only | Score only | Score + Text |
Selection | Best candidate | Batch evaluation | Pareto frontier |
LLM calls | Moderate | High | Medium-High |
Speed | Faster | Slower | Medium |
Diversity | Low | Medium | High |
Best for | Quick iteration | Best results | Complex tasks |
When to Use MIPROv2
- You have decent training data (15+ examples recommended)
- Quality matters more than speed
- Task benefits from prompting best practices
- Need LLM-generated program understanding
When to Use COPRO
- You need fast iteration
- Compute budget is limited
- Task is straightforward
When to Use GEPA
- Complex tasks with subtle failure modes
- You can provide rich feedback
- Multi-objective optimization
- Need diverse solutions
Implementation Notes
The code follows standard Rust practices:- No unsafe blocks
- Results for error handling with context via anyhow
- Strong types (Trace, PromptCandidate, PromptingTips)
- Builder pattern for configuration
- Async throughout, no blocking calls
Trace
- Input/output pair with evaluation scorePromptCandidate
- Instruction text with scorePromptingTips
- Library of best practices
Testing
Run tests:Example
MIPROv2 Example
Complete working example with HuggingFace data loading