Open Standard · v1.0 · 2026

THROTTLE.md

// AI Agent Rate Control Protocol

A plain-text file convention for defining rate limits and cost controls in AI agent projects. Define token throughput ceilings, API call rates, spend limits, and automatic slow-down behaviour — before your agent hits a hard wall.

THROTTLE.md
# THROTTLE   > Rate control protocol. > Spec: https://throttle.md   ---   ## LIMITS   tokens_per_minute: 50000 api_calls_per_minute: 30 concurrent_tasks: 3 cost_per_hour_usd: 10.00 cost_per_day_usd: 50.00 file_writes_per_minute: 20   ## BEHAVIOUR   warning_threshold: 0.80 throttle_threshold: 0.95 on_warning:   action: log_and_continue   reduce_rate_by: 0.25 on_throttle:   action: slow_and_notify   reduce_rate_by: 0.50 on_limit_breach:   action: pause   escalate_to: ESCALATE.md   ## QUEUE   queue_enabled: true queue_max_size: 50 priority_tasks:   - human_response   - safety_check
80%
warning threshold: agent alerts at 80% of configured limit
50%
rate reduction applied when throttle threshold (95%) reached
$50/day
default daily cost ceiling in THROTTLE.md spec template
0
tasks dropped when queue enabled — requests buffer, not discard

AGENTS.md tells it what to do.
THROTTLE.md controls how fast.

THROTTLE.md is a plain-text Markdown file you place in the root of any repository that contains an AI agent. It defines the rate limits and cost controls your agent must respect — and what to do when it approaches them.

The problem it solves

AI agents consume tokens, make API calls, write files, and spend money — at whatever rate the underlying model and tools allow. Without explicit rate controls, a busy agent can exhaust a daily budget in minutes, hammer a rate-limited API until it's blocked, or overwhelm a database with concurrent writes.

How it works

Drop THROTTLE.md in your repo root and define: token and API call rate ceilings, hourly and daily cost limits, concurrency caps, and the behaviour at each threshold — warn at 80%, slow at 95%, pause at 100% and hand off to ESCALATE.md. The agent reads it on startup. Your compliance team reads it in the audit.

The regulatory context

Enterprise AI governance frameworks require documented resource controls. The EU AI Act (effective 2 August 2026) mandates resource consumption reporting and control mechanisms for high-risk AI systems. Gartner's AI Agent Report identifies governance and resource control as critical deployment requirements. THROTTLE.md gives you the documented controls and the audit trail.

How to use it

Copy the template from GitHub and place it in your project root:

your-project/
├── AGENTS.md
├── CLAUDE.md
├── THROTTLE.md ← add this
├── README.md
└── src/

What it replaces

Before THROTTLE.md, rate control rules were scattered: hardcoded in the system prompt, buried in config files, missing entirely, or documented in a Notion page no one reads. THROTTLE.md makes rate controls version-controlled, auditable, and co-located with your code.

Who reads it

The AI agent reads it on startup. Your engineer reads it during code review. Your compliance team reads it during audits. Your regulator reads it if something goes wrong. One file serves all four audiences.

A complete protocol.
From slow down to shut down.

THROTTLE.md is one file in a complete open specification for AI agent safety. Each file addresses a different level of intervention.

Frequently asked questions.

What is THROTTLE.md?

A plain-text Markdown file defining rate limits and cost controls for AI agents. It sets ceilings on token throughput, API call rates, concurrent tasks, and spend per hour and per day. When an agent approaches a limit, it slows automatically. When it hits a limit, it pauses and hands off to the escalation protocol.

How is THROTTLE.md different from API rate limits?

API rate limits are enforced externally by the service provider — they cut your agent off without warning. THROTTLE.md is your own proactive control layer. It slows the agent gracefully before an external limit is hit, preserves queued work, and notifies you before things go wrong rather than after.

What happens to queued tasks during throttling?

With queue enabled (the default), tasks are buffered — not dropped. The agent processes them at the reduced rate. Priority tasks (human responses, safety checks) skip the queue entirely. Tasks older than the configured timeout are dropped and logged.

Can I set different limits for different task types?

Yes. The spec supports priority task lists that bypass queue restrictions, and the limit fields cover distinct resource types (tokens, API calls, file writes, database queries, cost). You can tune each independently per project.

What is the difference between warning and throttle thresholds?

Warning (default 80%) — agent logs the event and reduces rate by 25%, but continues. Throttle (default 95%) — agent cuts rate by 50% and notifies the operator. Limit breach (100%) — agent pauses all new tasks and hands off to ESCALATE.md for human intervention.

Does THROTTLE.md work with all AI frameworks?

Yes — it is framework-agnostic. It defines the policy; your agent implementation enforces it. Works with LangChain, AutoGen, CrewAI, Claude Code, custom agents, or any AI system that can read its own configuration files.

// Domain Acquisition

Own the standard.
Own throttle.md

This domain is available for acquisition. It is the canonical home of the THROTTLE.md specification — the rate control layer of the AI agent safety stack, essential for any production AI deployment.

Inquire About Acquisition

Or email directly: info@throttle.md

THROTTLE.md is an open specification for AI agent rate and cost control. Defines LIMITS (tokens/min, API calls/min, concurrent tasks, cost/hour, cost/day), BEHAVIOUR thresholds (warn at 80%, throttle at 95%, pause at 100%), QUEUE management (buffer tasks, priority bypass for safety checks and human responses), and AUDIT logging. First layer of the AI safety stack: THROTTLE → ESCALATE → FAILSAFE → KILLSWITCH → TERMINATE → ENCRYPT. MIT licence.
Last Updated
10 March 2026