Interpreting Results
Understanding what your mutation testing results mean and how to improve your test suite.
Mutation Statuses
Killed (Good)
A mutation is killed when at least one test fails after the mutation is applied.
✓ Killed: x + y → x - y
Test failed: "should add two numbers correctly"
This is good! Your tests correctly detected the bug.
Survived (Needs Attention)
A mutation survives when all tests pass despite the code change.
✗ Survived: x >= 18 → x > 18
All tests passed
This needs attention! Either:
- Your tests don't cover this case
- Your tests execute this code but don't verify the result
- The mutation is equivalent (functionally identical)
Timeout
A mutation causes a timeout when tests don't complete within the time limit.
⏱ Timeout: while(i < n) → while(true)
Timeouts usually indicate infinite loops. Timeouts count as killed since they represent detectable bugs.
Error
A mutation causes an error when the code fails to compile or crashes before tests run.
⚠ Error: a?.b → a.b
NullPointerException during compilation
Errors are excluded from the mutation score since they don't represent realistic bugs.
Mutation Score
The mutation score measures your test suite's fault-detection capability:
Mutation Score = Killed / (Killed + Survived)
Interpreting Scores
| Score | Interpretation |
|---|---|
| 90%+ | Excellent - comprehensive test suite |
| 80-89% | Good - strong coverage with minor gaps |
| 70-79% | Acceptable - some improvement needed |
| 60-69% | Fair - significant gaps in test coverage |
| <60% | Poor - tests miss many potential bugs |
Score vs. Code Coverage
| Metric | What It Measures |
|---|---|
| Code Coverage | Lines/branches executed during tests |
| Mutation Score | Code verified by assertions |
A 100% code coverage with 60% mutation score means 40% of your code is executed but not actually tested!
Analyzing Survived Mutants
When a mutation survives, investigate:
1. Missing Test Case
The most common cause. Add a test that would catch this bug:
// Survived: price * 0.9 → price * 1.1
// Add test:
test('applies 10% discount', () {
expect(applyDiscount(100), equals(90)); // Not 110!
});
2. Weak Assertion
Tests execute the code but don't verify the result:
// BAD: Only checks it doesn't throw
test('calculates total', () {
expect(() => calculateTotal(items), returnsNormally);
});
// GOOD: Verifies the actual result
test('calculates total', () {
expect(calculateTotal(items), equals(150.0));
});
3. Equivalent Mutant
Some mutations produce functionally identical code:
// Original
int index = 0;
// Mutant (equivalent if index is never negative)
int index = -0;
These are false positives - you can ignore them.
Prioritizing Improvements
Focus on:
High-Value Mutations
- Business logic - Core calculations, validations
- Security checks - Authentication, authorization
- Data transformations - Parsing, serialization
- Conditional logic - if/else, switch statements
Lower Priority
- Logging statements - Usually don't affect correctness
- Debug code - Should be removed anyway
- Generated code - Tested upstream
Example Analysis
File: lib/src/cart.dart
Mutation Score: 72% (18 killed, 7 survived)
Survived Mutations:
1. Line 45: total += item.price → total -= item.price
2. Line 52: quantity > 0 → quantity >= 0
3. Line 67: discount ?? 0 → discount
Analysis:
- Line 45: Critical! Adding items should increase total. Missing assertion.
- Line 52: Edge case - what happens with quantity = 0? Add test.
- Line 67: What if discount is null? Verify default behavior.
Tracking Progress
Track mutation score over time:
# Save historical data
dart_mutant --json >> mutation-history.jsonl
Set incremental goals:
- Week 1: Reach 70%
- Week 2: Reach 75%
- Week 3: Reach 80%
Next Steps
- Mutation Operators - Understand what's being mutated
- Filtering - Exclude non-critical code
- CI/CD Integration - Enforce thresholds