Bench Integration for AI Agents

Securely connect your AI agents and chatbots (Claude, ChatGPT, Cursor, etc) with Bench MCP or direct API to run benchmarks, fetch performance metrics, generate comparative reports, and track historical results through natural language.
Bench Logo
Gradient Top
Gradient Middle
Gradient Bottom
divider

Supported Tools

Tools
SleepSleep

Connect Bench MCP Tool with your Agent

Python
TypeScript

Install Composio

python
pip install composio claude-agent-sdk
Install the Composio SDK and Claude Agent SDK

Create Tool Router Session

python
from composio import Composio
from claude_agent_sdk import ClaudeSDKClient, ClaudeAgentOptions

composio = Composio(api_key='your-composio-api-key')
session = composio.create(user_id='your-user-id')
url = session.mcp.url
Initialize the Composio client and create a Tool Router session

Connect to AI Agent

python
import asyncio

options = ClaudeAgentOptions(
    permission_mode='bypassPermissions',
    mcp_servers={
        'tool_router': {
            'type': 'http',
            'url': url,
            'headers': {
                'x-api-key': 'your-composio-api-key'
            }
        }
    },
    system_prompt='You are a helpful assistant with access to Bench tools.',
    max_turns=10
)

async def main():
    async with ClaudeSDKClient(options=options) as client:
        await client.query('Pause agent execution for 5 seconds')
        async for message in client.receive_response():
            if hasattr(message, 'content'):
                for block in message.content:
                    if hasattr(block, 'text'):
                        print(block.text)

asyncio.run(main())
Use the MCP server with your AI agent

Connect Bench API Tool with your Agent

Python
TypeScript

Install Composio

python
pip install composio_openai
Install the Composio SDK

Initialize Composio and Create Tool Router Session

python
from openai import OpenAI
from composio import Composio
from composio_openai import OpenAIResponsesProvider

composio = Composio(provider=OpenAIResponsesProvider())
openai = OpenAI()
session = composio.create(user_id='your-user-id')
Import and initialize Composio client, then create a Tool Router session

Execute Bench Tools via Tool Router with Your Agent

python
tools = session.tools
response = openai.responses.create(
  model='gpt-4.1',
  tools=tools,
  input=[{
    'role': 'user',
    'content': 'Run a benchmark sleep test for 5 seconds'
  }]
)
result = composio.provider.handle_tool_calls(
  response=response,
  user_id='your-user-id'
)
print(result)
Get tools from Tool Router session and execute Bench actions with your Agent

Why Use Composio?

AI Native Bench Integration

  • Supports both Bench MCP and direct API based integrations
  • Structured, LLM-friendly schemas for reliable tool execution
  • Rich coverage for running, tracking, and analyzing your Bench benchmarks

Managed Auth

  • No credentials required—Bench supports NO_AUTH for fast, frictionless setup
  • Central place to manage and scope Bench access if needed
  • Per user and per environment context for ultimate flexibility

Agent Optimized Design

  • Bench tools are tuned for LLM agents for reliable execution
  • Comprehensive logs so you always know what benchmarks ran, when, and why

Enterprise Grade Security

  • Fine-grained RBAC so you control which agents and users can access Bench
  • Scoped, least privilege access to benchmarking resources
  • Full audit trail of agent actions to support review and compliance

Frequently Asked Questions

Do I need my own developer credentials to use Bench with Composio?

No developer credentials are needed for Bench. You can get started right away—no setup required!

Can I use multiple toolkits together?

Yes! Composio's Tool Router enables agents to use multiple toolkits. Learn more.

Is Composio secure?

Composio is SOC 2 and ISO 27001 compliant with all data encrypted in transit and at rest. Learn more.

What if the API changes?

Composio maintains and updates all toolkit integrations automatically, so your agents always work with the latest API versions.

Used by agents from

Context
ASU
Letta
glean
HubSpot
Agent.ai
Altera
DataStax
Entelligence
Rolai
Context
ASU
Letta
glean
HubSpot
Agent.ai
Altera
DataStax
Entelligence
Rolai
Context
ASU
Letta
glean
HubSpot
Agent.ai
Altera
DataStax
Entelligence
Rolai

Never worry about agent reliability

We handle tool reliability, observability, and security so you never have to second-guess an agent action.