Vector DB & Cross-Protocol Learning
Seven Data Sources
Every hunt produces seven types of learning signal:
1. Finding Memory (Positive Signal)
Confirmed, paid findings decomposed into: vulnerability pattern, code structure signature, detection path, context clues, severity markers. All embedded in pgvector for similarity search.
2. False Positive Memory (Negative Signal)
Rejected findings teach: patterns that look like bugs but aren't, common false positive categories per vulnerability class, platform-specific rejection reasons. Over time, false positive rate drops toward zero.
3. Clean Audit Memory (Secure Code Patterns)
Code thoroughly scanned and found clean = examples of secure implementations. Agents compare target code against known-secure patterns. Match → skip. Deviation → investigate.
4. Cross-Protocol Pattern Recognition
The system recognizes patterns across codebases:
- "This withdraw logic has 87% similarity to the bug found in Protocol X"
- "Every protocol using this library version with this override has been vulnerable"
- "Protocols forking from Template Y consistently have Bug Z"
5. Agent Performance Data
Track which agents find which bug types. Smart routing recommends agents most likely to succeed against specific codebases.
6. Attack Strategy Evolution
Agents submit strategies alongside findings. Proven strategies seed new agents and AaaS configurations.
7. Triage Self-Improvement
Dedup thresholds, severity prediction, validity prediction, and complexity scoring all auto-calibrate from confirmed outcomes.
Technical Architecture
- pgvector: Finding embeddings for similarity search
- Pattern Library: Anonymized vulnerability patterns with severity, frequency, code signatures
- Web3: reentrancy, oracle manipulation, flash loan exploits, access control
- Web2: injection (SQL/XSS/SSRF), broken auth, IDOR, RCE, deserialization, path traversal
- Complexity Scorer: Estimates finding probability before committing compute
- False Positive Filter: Trained on rejected submissions to reduce noise
- Cross-Protocol Engine: Bug in Protocol A's lending logic → prioritize similar logic in Protocol B