Benchmark Pack
Standard pack: 6 categories (refund · classification · summarization · reasoning · tool-use · safety). 35 SwarmJelly failure signal types per category. Holdout-protected test set rotates quarterly.
🐝 Operator-grade · books and records · to the shed.
This is a foundational page in the DefendableDocs ecosystem map. The structure is committed · the deep content extends as the platform matures. Cross-references are live below.