Prompt-injection guardrails for internal copilots

Tooling

Separate read-only connectors from write connectors with explicit elevation.

UX

Show users when inputs were sanitized—transparency reduces workaround phishing.

Drills

Red-team monthly with rotating intern prompts—fresh eyes find stale patterns.

Assume hostile inputs at boundaries

Prompt injection is not only “funny jailbreak tweets”—it is untrusted text riding beside trusted instructions in support tickets, emails, and web forms. Guardrails start at parsing and tool authorization, not at witty system prompts.

Separate system instructions from user content with delimiters the pipeline enforces, not that the model politely reads.

Least-privilege tools

Agents should not inherit a user’s full API scope by default. Scope tokens to the ticket, time-bound them, and log every mutation with a correlation ID.

Run red-team fixtures in CI whenever tool schemas change; regressions hide in small JSON edits.

Operational response

Define an incident runbook: how to freeze an agent, how to notify customers if data was touched, how to preserve evidence for security review.

Practice the runbook twice a year; muscle memory beats policy PDFs at 2 a.m.

Defense in depth

Combine input classifiers, output filters, and human approval for irreversible actions. No single layer is sufficient; overlapping controls reduce surprise.

SignalSpring’s security note: document known gaps honestly—executives prefer transparent residual risk to silent optimism.