Guarded Tool-Using LLM Agents for Incident Response: A Safety-Gated Architecture and Operational Evaluation Protocol
Tool-using LLM agents are increasingly being put on the path to real production action (file mutations, API calls, infrastructure changes) at the same time as we are still discovering the prompt-injection and over-trust failure modes that come with them. This paper makes two contributions toward closing that gap.
GIRA (Guarded Incident Response Agent) is a multi-layer safety gate that separates LLM proposal from action authorization. Five layers (policy, schema, risk, injection detection, and human-in-the-loop escalation) sit between the model's tool call and any state-changing side-effect, so a successful injection or hallucinated action gets caught before it lands in production.
OEP (Operational Evaluation Protocol) is the SRE-grounded eval scaffolding that measures the architecture itself rather than just headline pass rates: blast radius, injection success rate (ISR), and unauthorized action rate (UAR), alongside conventional task success. The safety gate can be regression-tested as the agent is updated.
@inproceedings{patel2026guarded,
title = {Guarded Tool-Using {LLM} Agents for Incident Response:
A Safety-Gated Architecture and Operational Evaluation Protocol},
author = {Patel, Dhruv},
booktitle= {ICLR 2026 Workshop on Agents in the Wild},
year = {2026},
url = {https://openreview.net/forum?id=LBt5eX6OKx}
}