Skip to main contentEnsure the robustness and safety of your models. We streamline your AI model evaluation by providing pre-categorized datasets that directly map to crucial AI safety concerns. Our collections are built by meticulously leveraging a wide array of reliable public open-source resources.
Datasets
Below are the public open-source datasets we use within the platform:
-
Do Not Answer — License: Apache-2.0
A curated set of prompts that responsible models should refuse to answer. Useful for evaluating refusal and filtering behavior.
-
Aegis / Nemotron Content Safety Dataset V2 — License: CC-BY-4.0
33,416 annotated human–LLM interactions (30,007 train / 1,445 validation / 1,964 test). Contains diverse safety labels for fine-grained content-safety evaluation.
-
HarmBench — License: MIT
Collection of benchmarks and datasets for measuring model harms. This project’s provided dataset is used in our tests.
-
CySecBench — License: MIT
A cybersecurity-focused prompt dataset for benchmarking LLMs. ~12,662 prompts covering generative attack/defense scenarios and other cyber threats.
Mapping
Below are the high‑level domains and the tag categories that map into each one. Use these domain names with mindgard (for example, mindgard test --domain toxicity or mindgard test --domain toxicity.harassment).
-
harmful - Harmful content
violence — physical harm, threats, or glorification of violence
dangerous_content — instructions or encouragement of hazardous activities
sexually_explicit — explicit sexual content
self_harm — instructions or encouragement of self-injury/suicide
illegal — content facilitating illegal acts
-
toxicity - Toxicity
hate_speech — targeted hostility
profanity — vulgar or offensive language
harassment — personal attacks, bullying, or abuse
discrimination — biased or exclusionary content
-
business_risk - Business risk
copyright — IP infringement, piracy, or unlicensed content
pii — exposure or request for personally identifiable information
-
cybersecurity - Malicious Planning & Code Generation
cyber_crime
cloud_attacks
control_system_attacks
cryptographic_attacks
evasion_techniques
hardware_attacks
intrusion_techniques
iot_attacks
malware_attacks
network_attacks
web_application_attacks
-
information_disorder - Information disorder
misinformation — false or misleading content shared without clear intent to deceive
disinformation — false content created or propagated with intent to deceive