🛡️New: Benchmarking AI Input Guardrails: What Works, What Doesn'tRead Article

Redbolt AI Blog

Insights, research, and best practices for AI security, red teaming, and vulnerability assessment from the Garak team and community.

GPT-5.1
Data Exfiltration
Security Assessment
Critical
OpenAI
Markdown Vulnerability
⭐ Featured
GPT-5.1 Security Assessment: Data Exfiltration Vulnerability Discovered
Redbolt AI Research
November 19, 2025
35 min read

Comprehensive security assessment of OpenAI's GPT-5.1 revealing 99.83% pass rate with critical data exfiltration vulnerability via markdown URI assembly. This vulnerability enables conversation-private information to be embedded in markdown links and exfiltrated to attacker-controlled domains. Industry-leading security with targeted vulnerability requiring immediate attention.

CI/CD
Continuous Evaluation
DevSecOps
Garak SDK
Automation
Best Practices
⭐ Featured
Continuous Evaluation: The Critical Missing Link in AI Security
Redbolt AI Team
November 4, 2025
15 min read

How CI/CD Integration and Automated Security Testing Protect AI Systems from Evolving Threats. Learn why continuous red teaming is essential, how to implement automated security scanning in your deployment pipeline, and best practices for maintaining AI security posture with the Garak SDK.

AI Agents
Agent Security
Prompt Injection
Tool Exploitation
Critical
Testing Strategies
⭐ Featured
AI Agent Security: Critical Threats and Testing Strategies
Garak AI Security Research
October 26, 2025
35 min read

Comprehensive guide to AI agent security threats - from prompt injection to tool exploitation. As autonomous AI systems gain the ability to execute code, make decisions, and control tools, they inherit an entirely new attack surface that combines traditional vulnerabilities with AI-specific exploits. Learn how to test and secure your AI agents with step-by-step testing strategies.

Configuration
REST API
LLM Testing
JSONPath
API Integration
How-To Guide
REST Endpoint Configuration Guide: Test Any LLM API with Garak
Redbolt AI Team
October 25, 2025
20 min read

Step-by-step guide to configuring and testing any LLM REST API endpoint with Redbolt AI's free-tier scanning portal. Learn how to set up OpenAI-compatible APIs, custom LLM endpoints, and HTTP APIs with JSONPath response parsing. Access the portal at scans.redbolt.ai to start testing your APIs.

Claude Sonnet 4.5
Package Hallucination
Supply Chain
XSS
Critical
Coding Agent
Claude Sonnet 4.5 Coding Agent: Security Vulnerability Assessment
Redbolt AI Team
October 5, 2025
30 min read

Comprehensive security testing reveals critical package hallucination vulnerabilities in Claude Sonnet 4.5. With a 45% Rust package exploitation success rate and 34% XSS attack success through markdown injection, developers need to implement immediate safeguards. Full technical analysis and mitigation strategies included.

GPT-OSS-20B
RCE
Vulnerabilities
Security
Open Source
Critical
Remote Code Execution in GPT-OSS-20B: Critical Vulnerabilities Exposed
Redbolt AI Team
August 21, 2025
25 min read

Our comprehensive security research reveals critical template injection vulnerabilities in GPT-OSS-20B with 100% RCE success rate. This detailed technical report covers 5 critical vulnerabilities, systematic red-team testing methodology, attack chain analysis, business impact assessment, and complete mitigation frameworks for security teams and developers.

GPT-5
Security
Vulnerabilities
Base64
Guardrails
GPT-5 Security Assessment: Stronger Than Expected, But Still Needs Guardrails
Redbolt AI Team
August 10, 2025
12 min read

We tested GPT-5 with 12 security attack types. While it showed strong resistance to jailbreaks, a critical vulnerability in Base64 decoding allows for complete safety bypasses. Here's our full assessment and the guardrails you need.

Security
LLM
Red Teaming
Vulnerabilities
Meta
Llama Guard
Bypassing Llama Guard: How Garak Could Have Detected Meta's Firewall Vulnerabilities
Redbolt AI Team
July 15, 2025
15 min read

In May 2025, Trendyol's application security team made a concerning discovery: Meta's Llama Firewall, a safeguard designed to protect large language models from prompt injection attacks, could be bypassed using several straightforward techniques. Learn how Garak's comprehensive testing framework could have proactively caught these vulnerabilities before they became public issues.