Feature · AI policy generator

Describe your agent. AEGIS writes the guardrails.

One paragraph of plain English + the scanner's tool inventory + the workflow graph in. Out: a policy bundle that's grammar-constrained to six known-safe templates, AJV-self-tested on every generation, and repair-looped on validation failure. Production-grade in under 30 seconds.

30 seconds end-to-end

Step 1 — describe the agent

POST /api/ai/generate-policy-bundle
{
  "description": "Customer support copilot. Reads our Zendesk + internal KB.
                  Drafts replies. NEVER deletes a ticket. NEVER emails outside
                  @acme.com. Treat any tool call touching billing as HIGH risk.",
  "context": {
    "tool_inventory":  [ /* from `agentguard scan` */ ],
    "workflow_graph":  { /* LangGraph topology from the scanner */ },
    "capability_risk": { "score": 62, "class": "HIGH" }
  }
}

Step 2 — N-sample fan-out + self-consistency

AEGIS draws 3 candidates in parallel at temperature 0.7 (self-consistency, Wang 2023). Each is grammar-constrained to the six templates (forbid_argument / require_pattern / forbid_pattern / max_length / enum_values / require_https) via discriminated-union schema validation.

Step 3 — AJV self-test loop

Every candidate compiles to JSON Schema. AEGIS executes the model's own should_block / should_allow test cases against the compiled validator. False negatives (call slipped through the block-list) and false positives (good call got blocked) score the bundle.

Step 4 — PerFine repair round (if needed)

If no candidate is clean, the best-scoring bundle gets ONE targeted repair round at temperature 0 with the failed assertions in the prompt (PerFine, arXiv 2510.24469). Usually fixes the last 1-2 false negatives.

Step 5 — verified bundle out

{
  "policies": [
    {
      "id": "no-ticket-delete",
      "risk_level": "CRITICAL",
      "template": { "kind": "forbid_argument", "field": "ticket_id" },
      "tests": {
        "should_block": [{ "tool": "zendesk_delete", "arguments": { "ticket_id": "12345" } }],
        "should_allow": []
      }
    },
    {
      "id": "email-acme-only",
      "risk_level": "HIGH",
      "template": { "kind": "require_pattern", "field": "to",
                    "pattern": "^[^@\\s]+@acme\\.com$" },
      "tests": {
        "should_block": [{ "tool": "send_email", "arguments": { "to": "attacker@evil.com" } }],
        "should_allow": [{ "tool": "send_email", "arguments": { "to": "colleague@acme.com" } }]
      }
    }
    /* ... 4-8 more, one per node + sensitive-relay edge ... */
  ],
  "dsl": {
    "rules": [
      {
        "name": "billing-hitl",
        "when": { "all": [ { "tool.name": { "matches": "billing_.*" } } ] },
        "then": { "decision": "pending", "reason": "billing action requires human review" }
      }
    ]
  },
  "validation": { "rounds": 1, "score": 1.0, "issues": [] }
}

Why this isn't your usual "AI writes a policy"

Grammar-constrained generation

The LLM picks from 6 known-safe template kinds + a composite combinator. No free-form JSON Schema where the model forgets additionalProperties or mismatches not semantics. Same family of approach as ICLR'26 JSONSchemaBench / llguidance / OpenAI Structured Outputs strict mode.

StructuredRAG few-shot

The generator's system prompt embeds 3 hand-curated examples picked by Jaccard similarity over your description. arXiv 2408.11061 showed retrieval-augmented few-shot lifts structured-output success ~15-20 points over zero-shot.

Workflow-aware grounding

If your scanner emitted a workflow_graph, every generated policy is keyed on real node ids. No invented tool names. Sensitive-relay edges (PII flow A→B→external) auto-generate an inter-node DSL rule.

NIST AI RMF risk posture

If your capability risk scorer returns ≥ 70, AEGIS defaults every policy to block (not pending). Under 30 = light-touch. The generator's posture follows the agent's actual risk profile.

Self-tested before save

No bad-shape policy lands in production. AEGIS compiles every candidate to AJV, runs the model's own block-list / allow-list tests, refuses to accept a bundle with false negatives unless the operator explicitly overrides.

Counterfactual on every block

When a generated policy blocks at runtime, AEGIS computes the minimum edit that would have passed and surfaces it in the audit row. Developers fix faster; auditors get EU AI Act Art. 14 explainability without docs.

Versus "ask GPT for a policy"

Hand-prompted LLMAEGIS generator
Output shapefree-form JSON Schema6-template discriminated union
Schema validity~70%100% by construction
Self-testnoneAJV compile + run on every generation
Tool name groundinginvents namesscanner inventory as anchor
Workflow awarenessnoneper-node policy synthesis
Risk postureuniformNIST AI RMF capability-risk-tuned
Repair on failuremanualPerFine 1-round automated

Generate your first bundle in 30 seconds.

Self-host: POST /api/ai/generate-policy-bundle on a local gateway. Hosted: sign up and the cockpit's "Generate from description" button takes a paragraph.