THROTTLE.md

Q: How is THROTTLE.md different from API rate limits?

API rate limits are enforced externally by the service provider — they cut your agent off without warning. THROTTLE.md is your own proactive control layer. It slows the agent gracefully before an external limit is hit, preserves queued work, and notifies you before things go wrong rather than after.

// What is THROTTLE.md

AGENTS.md tells it what to do.
THROTTLE.md controls how fast.

THROTTLE.md is a plain-text Markdown file you place in the root of any repository that contains an AI agent. It defines the rate limits and cost controls your agent must respect — and what to do when it approaches them.

What problem does THROTTLE.md solve?

AI agents consume tokens, make API calls, write files, and spend money — at whatever rate the underlying model and tools allow. Without explicit rate controls, a busy agent can exhaust a daily budget in minutes, hammer a rate-limited API until it's blocked, or overwhelm a database with concurrent writes.

How does THROTTLE.md work?

Drop THROTTLE.md in your repo root and define: token and API call rate ceilings, hourly and daily cost limits, concurrency caps, and the behaviour at each threshold — warn at 80%, slow at 95%, pause at 100% and hand off to ESCALATE.md. The agent reads it on startup. Your compliance team reads it in the audit.

What regulations require THROTTLE.md?

Enterprise AI governance frameworks require documented resource controls. The EU AI Act (effective 2 August 2026) mandates resource consumption reporting and control mechanisms for high-risk AI systems. Gartner's AI Agent Report identifies governance and resource control as critical deployment requirements. THROTTLE.md gives you the documented controls and the audit trail.

How do I add THROTTLE.md to my project?

Copy the template from GitHub and place it in your project root:

your-project/
├── AGENTS.md
├── CLAUDE.md
├── THROTTLE.md ← add this
├── README.md
└── src/

What did teams use before THROTTLE.md?

Before THROTTLE.md, rate control rules were scattered: hardcoded in the system prompt, buried in config files, missing entirely, or documented in a Notion page no one reads. THROTTLE.md makes rate controls version-controlled, auditable, and co-located with your code.

Who benefits from THROTTLE.md?

The AI agent reads it on startup. Your engineer reads it during code review. Your compliance team reads it during audits. Your regulator reads it if something goes wrong. One file serves all four audiences.

// The AI Safety Escalation Stack

A complete protocol.
From slow down to shut down.

THROTTLE.md is one file in a complete twelve-part open specification for AI agent safety. Each file addresses a different level of intervention.

Operational Control

01 / 12

THROTTLE.md

→ Control the speed

Define rate limits, cost ceilings, and concurrency caps. Agent slows down automatically before it hits a hard limit.

02 / 12

ESCALATE.md

→ Raise the alarm

Define which actions require human approval. Configure notification channels. Set approval timeouts and fallback behaviour.

03 / 12

FAILSAFE.md

→ Fall back safely

Define what safe state means for your project. Configure auto-snapshots. Specify the revert protocol when things go wrong.

04 / 12

KILLSWITCH.md

→ Emergency stop

The nuclear option. Define triggers, forbidden actions, and a three-level escalation path from throttle to full shutdown.

05 / 12

TERMINATE.md

→ Permanent shutdown

No restart without human intervention. Preserve evidence. Revoke credentials. For security incidents and end-of-life.

Data Security

06 / 12

ENCRYPT.md

→ Secure everything

Define data classification, encryption requirements, secrets handling rules, and forbidden transmission patterns.

07 / 12

ENCRYPTION.md

→ Implement the standards

Algorithms, key lengths, TLS configuration, certificate management, and FIPS/SOC2/ISO compliance mapping.

Output Quality

08 / 12

SYCOPHANCY.md

→ Prevent bias

Detect agreement without evidence. Require citations. Enforce disagreement protocol for honest, unbiased AI outputs.

09 / 12

COMPRESSION.md

→ Compress context

Define summarization rules, what to preserve, what to discard, and post-compression coherence verification checks.

10 / 12

COLLAPSE.md

→ Prevent collapse

Detect context exhaustion, model drift, and repetition loops. Enforce recovery checkpoints before coherence degrades.

Accountability

11 / 12

FAILURE.md

→ Define failure modes

Map graceful degradation, cascading failure, and silent failure. Specify health checks and per-mode response procedures.

12 / 12

LEADERBOARD.md

→ Benchmark agents

Track task completion, accuracy, cost efficiency, and safety scores across sessions. Alert on performance regression.

// FAQ

Frequently asked questions.

What is THROTTLE.md?

A plain-text Markdown file defining rate limits and cost controls for AI agents. It sets ceilings on token throughput, API call rates, concurrent tasks, and spend per hour and per day. When an agent approaches a limit, it slows automatically. When it hits a limit, it pauses and hands off to the escalation protocol.

How is THROTTLE.md different from API rate limits?

API rate limits are enforced externally by the service provider — they cut your agent off without warning. THROTTLE.md is your own proactive control layer. It slows the agent gracefully before an external limit is hit, preserves queued work, and notifies you before things go wrong rather than after.

What happens to queued tasks during throttling?

With queue enabled (the default), tasks are buffered — not dropped. The agent processes them at the reduced rate. Priority tasks (human responses, safety checks) skip the queue entirely. Tasks older than the configured timeout are dropped and logged.

Can I set different limits for different task types?

Yes. The spec supports priority task lists that bypass queue restrictions, and the limit fields cover distinct resource types (tokens, API calls, file writes, database queries, cost). You can tune each independently per project.

What is the difference between warning and throttle thresholds?

Warning (default 80%) — agent logs the event and reduces rate by 25%, but continues. Throttle (default 95%) — agent cuts rate by 50% and notifies the operator. Limit breach (100%) — agent pauses all new tasks and hands off to ESCALATE.md for human intervention.

Does THROTTLE.md work with all AI frameworks?

Yes — it is framework-agnostic. It defines the policy; your agent implementation enforces it. Works with LangChain, AutoGen, CrewAI, Claude Code, custom agents, or any AI system that can read its own configuration files.

// Domain Acquisition

Own the standard.
Own throttle.md

This domain is available for acquisition. It is the canonical home of the THROTTLE.md specification — the rate control layer of the AI agent safety stack, essential for any production AI deployment.

Inquire About Acquisition

Or email directly: info@throttle.md

THROTTLE.md is an open specification for AI agent rate and cost control. Defines LIMITS (tokens/min, API calls/min, concurrent tasks, cost/hour, cost/day), BEHAVIOUR thresholds (warn at 80%, throttle at 95%, pause at 100%), QUEUE management (buffer tasks, priority bypass for safety checks and human responses), and AUDIT logging. First layer of the AI safety stack: THROTTLE → ESCALATE → FAILSAFE → KILLSWITCH → TERMINATE → ENCRYPT → ENCRYPTION → SYCOPHANCY → COMPRESSION → COLLAPSE → FAILURE → LEADERBOARD. MIT licence.

Last Updated

13 March 2026

AGENTS.md tells it what to do.THROTTLE.md controls how fast.