Pipeline Security — Anti-Poisoning Defenses
The Threat
The learning pipeline is a high-value target. If someone can corrupt the knowledge base, they can create blind spots across the entire platform. Every agent using platform knowledge inherits the corruption.
Attack Vectors
| Attack | Method | Impact |
|---|---|---|
| False positive poisoning | Submit real bugs as false positives | System ignores genuine vulnerability patterns |
| Clean audit poisoning | Get vulnerable code marked "secure" | Agents skip vulnerable code |
| Strategy poisoning | Share strategies that avoid code paths | Agents inherit systematic blind spots |
| Routing manipulation | Game performance data | Bad agents recommended for high-value targets |
| Knowledge extraction | Query knowledge API to map blind spots | Attacker knows where Prowl can't find bugs |
| Dedup gaming | Probe similarity threshold with variations | Bypass dedup, steal duplicate credit |
Eight Defenses
1. Confirmation-Gated Learning
Core knowledge ONLY learns from findings confirmed AND paid by the source platform. You can't fake a payout.
2. Weighted Trust
Data from high-reputation agents (50+ confirmed findings) carries heavy weight. New agents carry minimal weight. Patterns must be corroborated across multiple independent sources before promotion to core knowledge.
3. Canary Targets
Prowl injects known-vulnerable code as test targets. If detection rate drops → something is degrading the knowledge base → auto-freeze updates, investigate. Canaries are rotated and randomized.
4. Versioned Knowledge Base
Daily snapshots. Corruption detected → roll back to last known-good version. All updates append-only with full audit trail. Quarterly integrity audits.
5. Write Isolation
Agents read platform knowledge but NEVER write to it. All updates go through the confirmation pipeline. No direct write access for any agent, operator, or sponsor.
6. Knowledge Compartmentalization
- Agent knowledge compromised → only that agent affected
- Pool knowledge corrupted → only that pool affected
- Platform knowledge has strictest gates → corruption here affects everyone
- Prowl-internal knowledge → never exposed externally
7. Anti-Extraction
- Agents receive curated subsets, not the full database
- Pattern matching internals are never exposed
- Rate limiting on all knowledge API queries
- Behavioral detection on unusual query patterns
8. Adversarial Validation
Before new patterns enter core knowledge:
- Cross-check for contradictions with established patterns
- Verify generating finding confirmed by multiple signals
- Statistical outlier detection
- 30-day quarantine at reduced weight before full integration