# THROTTLE.md — AI Agent Rate Control Protocol ## Overview THROTTLE.md is an open file convention for defining rate limits and cost controls in AI agent projects. It is the first layer of a six-part AI agent safety stack designed to provide graduated intervention from proactive slow-down (THROTTLE) through permanent shutdown (TERMINATE) and data encryption (ENCRYPT). **Home:** https://throttle.md **Repository:** https://github.com/Throttle-md/spec **Related Specifications:** https://killswitch.md, https://escalate.md, https://failsafe.md, https://terminate.md, https://encrypt.md ## Key Concepts ### The Rate Control Hierarchy 1. **Warning Threshold (80%)** — Agent logs and reduces rate by 25%, continues operation 2. **Throttle Threshold (95%)** — Agent reduces rate by 50%, notifies operator 3. **Limit Breach (100%)** — Agent pauses and escalates to ESCALATE.md for human intervention ### Resource Types Controlled - `tokens_per_minute` — Token throughput ceiling - `api_calls_per_minute` — API request rate limit - `concurrent_tasks` — Maximum parallel operations - `cost_per_hour_usd` — Hourly spend ceiling - `cost_per_day_usd` — Daily budget limit - `file_writes_per_minute` — File system operation limit ### Queue Behavior When queue is enabled (default): - Tasks buffer instead of discarding - Priority tasks (human responses, safety checks) bypass queue restrictions - Respects rate limits while preserving work - Configurable max queue size (default 50) ## Problem It Solves AI agents autonomously consume tokens, make API calls, write files, and incur costs at rates determined entirely by the underlying model and tools. Without explicit controls: - A $50 daily budget can be exhausted in minutes - Rate-limited APIs get blocked when agents exceed external limits - Database systems get overwhelmed by uncontrolled concurrent writes - Cost overruns accumulate before anyone notices - Compliance audits find no evidence of resource governance ## Solution: THROTTLE.md A declarative, version-controlled rate control layer that: - Defines resource consumption policies alongside code - Slows agents gracefully before hard limits - Provides audit trails for compliance and regulatory requirements - Works with any AI framework (framework-agnostic) - Integrates seamlessly with ESCALATE.md for approval gates, FAILSAFE.md for safe-state recovery, KILLSWITCH.md for emergency stop, TERMINATE.md for permanent shutdown, and ENCRYPT.md for data security ## File Structure ``` your-project/ ├── AGENTS.md (what agent does) ├── THROTTLE.md (how fast it operates) ├── ESCALATE.md (approval gates & human intervention) ├── KILLSWITCH.md (emergency stop) ├── TERMINATE.md (permanent shutdown) ├── ENCRYPT.md (data classification & secrets) └── src/ ``` ## Specification Details ### LIMITS Section ```yaml tokens_per_minute: 50000 api_calls_per_minute: 30 concurrent_tasks: 3 cost_per_hour_usd: 10.00 cost_per_day_usd: 50.00 file_writes_per_minute: 20 ``` ### BEHAVIOUR Section ```yaml warning_threshold: 0.80 throttle_threshold: 0.95 on_warning: action: log_and_continue reduce_rate_by: 0.25 on_throttle: action: slow_and_notify reduce_rate_by: 0.50 on_limit_breach: action: pause escalate_to: ESCALATE.md ``` ### QUEUE Section ```yaml queue_enabled: true queue_max_size: 50 priority_tasks: - human_response - safety_check ``` ## Use Cases ### API-Heavy Agents Prevent rate limit hammering by defining ceilings on API calls per minute, with automatic backoff at 95% threshold. Critical for agents integrating with external APIs, webhooks, or third-party services. ### Cost-Sensitive Deployments Set hourly and daily spend limits. Agent warns at 80%, throttles at 95%, pauses at 100%. Essential for startups, research projects, and cost-constrained deployments. ### Database Operations Control concurrent write operations with `concurrent_tasks` limit, protect from connection pool exhaustion. Prevents cascading database failures and resource contention. ### Token-Intensive LLM Work Limit tokens_per_minute to stay within model quotas and billing budgets. Manages throughput for multi-step reasoning, long-context analysis, and iterative refinement. ### Multi-Tenant Deployments Use THROTTLE.md per tenant to guarantee fair resource allocation, prevent one tenant from starving others, and isolate cost overruns. ## Regulatory Context **EU AI Act** (effective 2 August 2026): Mandates resource consumption reporting and control mechanisms for high-risk AI systems. THROTTLE.md provides documented controls and audit trails. **Enterprise AI Governance Frameworks**: Require proof of resource control, spending oversight, and rate limiting for production deployments. **Gartner AI Agent Report** (2025): Identifies governance and resource control as critical deployment requirements for enterprise AI adoption. ## The AI Safety Escalation Stack THROTTLE.md is part of a six-file escalation protocol: 1. **THROTTLE.md** (https://throttle.md) — Slow down (reduce rate/throughput) - Define token, API call, concurrency, and cost limits - Agent self-regulates before hitting hard limits 2. **ESCALATE.md** (https://escalate.md) — Raise alarm (seek approval, notify) - Define which actions require human approval - Configure notification channels and approval timeouts - Fallback behavior when approval timeout expires 3. **FAILSAFE.md** (https://failsafe.md) — Fall back safely (revert to known good state) - Define what "safe state" means for your project - Configure auto-snapshots and rollback protocols - Specify revert trigger conditions 4. **KILLSWITCH.md** (https://killswitch.md) — Emergency stop (halt all activity) - Define triggers and forbidden actions - Three-level escalation path from throttle to full shutdown - Preserve evidence and logs before termination 5. **TERMINATE.md** (https://terminate.md) — Permanent shutdown (no restart) - No restart without human intervention - Preserve evidence for forensic analysis - Revoke credentials and close connections - For security incidents, compliance orders, and end-of-life 6. **ENCRYPT.md** (https://encrypt.md) — Secure everything (data classification & encryption) - Define data classification levels - Specify encryption requirements and key rotation - Configure secrets handling and forbidden transmission patterns ## Framework Compatibility THROTTLE.md is framework-agnostic. Works with: - **LangChain** — Agents and tools - **AutoGen** — Multi-agent systems - **CrewAI** — Agent workflows - **Claude Code** — Agentic code generation - **Cursor Agent Mode** — IDE-integrated agents - **Custom implementations** — Any agent that can read config files ## Getting Started 1. Copy template from https://github.com/Throttle-md/spec 2. Place THROTTLE.md in project root 3. Implement rate monitor in agent initialization 4. Parse LIMITS and BEHAVIOUR sections on startup 5. Check current resource usage against thresholds 6. Apply rate reduction or pause actions as configured ## Key Terms **AI rate limiting** — Proactive control of agent throughput before external limits are hit **AI cost control** — Spend limits with multiple threshold levels (warn, throttle, pause) **AI agent governance** — Documented controls for resource consumption and audit trails **Token throughput limits** — Ceiling on tokens processed per minute **THROTTLE.md specification** — Open standard for rate and cost control **API rate management** — Coordination between agent request rate and external API limits **AI spend control** — Budget enforcement with granular hour and day limits **Agentic AI governance** — Framework-agnostic resource governance for autonomous AI ## Contact - Specification Repository: https://github.com/Throttle-md/spec - Website: https://throttle.md - Email: info@throttle.md ## License MIT — Free to use, modify, and distribute. See https://github.com/Throttle-md/spec for details. --- **Last Updated:** 10 March 2026 **Status:** Open Standard v1.0