AI Security Testing Tools: Garak, PyRIT, promptfoo, and the Controls They Actually Validate

Do Not Buy an "AI Security Tool" Expecting It to Solve AI Security

AI security tools are useful, but they are often marketed as if one scanner can validate an entire LLM or agentic application. That is not how the problem works.

Different tools answer different questions:

Can I automatically probe for prompt injection and jailbreak behavior?
Can I regression test prompts and outputs in CI?
Can I simulate adversarial prompting at scale?
Can I validate whether the application, not just the model, leaks data or misuses tools?

If you use the wrong tool for the wrong control, you get pretty output and weak assurance.

The Three Tools Most Teams Start With

Garak

Garak is best understood as an LLM vulnerability scanner focused on broad automated probing.

It is strong for:

Prompt injection and jailbreak style probes
Unsafe output patterns
Quick coverage across many prompts and models

It is weaker for:

Full application context
Multi-step business logic
Tool-use authorization checks

PyRIT

PyRIT is geared toward red teaming and risk identification workflows.

It is useful when you want:

Structured adversarial testing
Repeatable attack strategies
More deliberate exploration than a simple one-shot probe suite

It still does not replace manual review of application logic, retrieval paths, and tool permissions.

promptfoo

promptfoo is particularly useful for evaluation and regression testing.

It helps teams answer:

Did the model or application behavior get worse after a prompt, model, or system change?
Are known bad cases still blocked?
Can we keep a repeatable test set in CI?

It is excellent for guardrail regression. It is not a substitute for a full security review.

Which Controls Each Tool Helps Validate

Control area	Garak	PyRIT	promptfoo
Prompt injection probing	Strong	Strong	Moderate
Jailbreak and policy stress testing	Strong	Strong	Moderate
Regression testing in CI	Moderate	Moderate	Strong
Multi-step adversarial exploration	Moderate	Strong	Weak
Full application authorization review	Weak	Weak	Weak
Unsafe tool use or MCP boundary validation	Weak	Moderate	Weak

That last row matters. Most AI incidents that scare engineering leaders are not just model output problems. They involve data access, retrieval boundaries, tool execution, or identity mistakes around agent actions.

A Better Testing Model for AI Applications

Use automated tools for scale, then use human review for context.

Layer 1: Automated prompt and jailbreak testing

Run Garak or PyRIT style probes regularly to catch obvious regressions and weak guardrails.

Layer 2: Regression suites in CI

Use promptfoo or equivalent evaluation tooling to keep a stable set of known-bad prompts, expected refusals, and sensitive scenarios.

Layer 3: Application-context review

This is where human testers look at:

RAG document poisoning paths
Tenant isolation in retrieval
Model output rendering and downstream execution
Tool and function authorization
Secret exposure in logs, traces, and prompts

Layer 4: Adversarial workflows against the real system

For agentic apps, test whether indirect prompt injection can:

Change planning behavior
Trigger a sensitive tool call
Exfiltrate retrieved context
Abuse approvals or hidden side effects

Example: Why Tooling Alone Misses the Main Risk

Suppose an internal AI assistant can read Jira tickets, search documentation, and create GitHub issues.

Automated prompt probes may show that the model resists obvious jailbreak prompts. That sounds encouraging.

But the real problem may be elsewhere: a retrieved document contains hidden instructions that cause the agent to open a GitHub issue containing sensitive internal URLs, or a lower-privileged user can coerce the assistant into fetching content from a project they should never see.

That is not a model-only problem. It is an application security problem expressed through AI.

What to Look For When Evaluating AI Security Tools

Can the tool run against your actual application flow, not just a bare model endpoint?
Can it capture and compare outputs over time?
Does it support the attack classes you care about most?
Can it fit into CI without creating unmanageable noise?
Does it help your team reproduce and fix issues, not just score them?

A Practical Tooling Stack for Most Teams

For many teams, a sensible starting point looks like this:

Garak or PyRIT for automated adversarial probing
promptfoo for repeatable regression tests
Manual review for RAG, tool-use, auth, and output handling risks
Periodic AI red team exercises for high-impact systems

That blend is much more realistic than searching for one all-in-one platform.