# Honeyhive

```json
{
  "name": "Honeyhive",
  "slug": "honeyhive",
  "url": "https://composio.dev/toolkits/honeyhive",
  "markdown_url": "https://composio.dev/toolkits/honeyhive.md",
  "logo_url": "https://www.honeyhive.ai/logo.png",
  "categories": [
    "ai & machine learning"
  ],
  "is_composio_managed": false,
  "updated_at": "2026-05-12T10:15:04.084Z"
}
```

![Honeyhive logo](https://www.honeyhive.ai/logo.png)

## Description

Securely connect your AI agents and chatbots (Claude, ChatGPT, Cursor, etc) with Honeyhive MCP or direct API to log LLM outputs, fetch evaluation runs, analyze error traces, and create feedback reports through natural language.

## Summary

Honeyhive is an AI observability and evaluation platform for analyzing LLM apps. It helps teams monitor, debug, and improve AI system reliability faster.

## Categories

- ai & machine learning

## Toolkit Details

- Tools: 42

## Images

- Logo: https://www.honeyhive.ai/logo.png

## Authentication

- **Api Key**
  - Type: `api_key`
  - Description: Api Key authentication for Honeyhive.
  - Setup:
    - Configure Api Key credentials for Honeyhive.
    - Use the credentials when creating an auth config in Composio.

## Suggested Prompts

- Add new datapoints to my evaluation dataset
- List all datasets in my Honeyhive project
- Log a batch of model events for analysis
- Mark my current evaluation run as completed

## Supported Tools

| Tool slug | Name | Description |
|---|---|---|
| `HONEYHIVE_ADD_DATAPOINTS_TO_DATASET` | Add datapoints to dataset | Tool to add datapoints to a dataset. Use when you need to append multiple entries with specified input, ground truth, and history mappings. |
| `HONEYHIVE_COMPARE_RUNS` | Compare Experiment Runs | Tool to retrieve experiment comparison between two evaluation runs. Use when you need to analyze the differences in metrics, datapoints, and events between two runs. |
| `HONEYHIVE_COMPARE_RUNS_EVENTS` | Compare Runs Events | Tool to compare events between two experiment runs side-by-side. Use when analyzing differences in model behavior, performance metrics, or outputs between evaluation runs. Returns matched event pairs with their respective data from both runs for comparison. |
| `HONEYHIVE_CREATE_BATCH_DATAPOINTS` | Batch Create Datapoints | Tool to create multiple datapoints in a single batch operation. Use when you need to bulk-import events into a dataset or create many datapoints at once. Supports filtering by date range, event IDs, or custom criteria. Efficient for migrating large numbers of events to evaluation datasets. |
| `HONEYHIVE_CREATE_BATCH_MODEL_EVENTS` | Create Batch Model Events | Tool to create multiple model events in a single request. Use when you need to log a batch of event interactions to HoneyHive. |
| `HONEYHIVE_CREATE_BATCH_TOOL_EVENTS` | Create Batch Tool Events | Tool to log a batch of external API calls as tool events. Use when you need to record multiple tool events in one request—use after gathering all event data. |
| `HONEYHIVE_CREATE_CONFIGURATION` | Create Configuration | Creates a new configuration in HoneyHive for managing LLM or pipeline settings. Use this to define reusable configurations with specific models, prompts, and parameters that can be deployed across different environments (dev, staging, prod). Configurations enable version control and environment-specific management of your AI application settings. |
| `HONEYHIVE_CREATE_DATAPOINT` | Create Datapoint | Tool to create a new datapoint with input-output pairs. Use when you need to add a single datapoint with inputs, ground truth, conversation history, and metadata. |
| `HONEYHIVE_CREATE_DATASET` | Create Dataset | Tool to create a dataset. Use when you need to initialize a new dataset within a project. |
| `HONEYHIVE_CREATE_EVENT` | Create Event | Tool to create a new event in HoneyHive to track execution of different parts of your application. Use when you need to log a model call, tool execution, or chain step. Events can be grouped into sessions and nested hierarchically using parent_id and children_ids. |
| `HONEYHIVE_CREATE_METRIC` | Create Metric | Tool to create a new metric in HoneyHive. Use when you need to define how to evaluate model outputs, whether through code (PYTHON), AI evaluation (LLM), human review (HUMAN), or combining multiple metrics (COMPOSITE). Important: LLM metrics require both model_provider and model_name to be specified. |
| `HONEYHIVE_CREATE_MODEL_EVENT` | Create Model Event | Tool to create a new model event to log LLM call data. Use when you need to track a single model interaction including messages, responses, usage, and metadata. |
| `HONEYHIVE_CREATE_TOOL` | Create Tool | Creates a new tool definition in a HoneyHive project. Use this to register functions or plugins that can be invoked and tracked within HoneyHive. Tools are defined with a JSON Schema for their parameters, allowing HoneyHive to validate inputs and track tool usage in your AI workflows. Tool names must be unique within a project. |
| `HONEYHIVE_DELETE_DATAPOINT` | Delete Datapoint | Tool to delete a specific datapoint by its ID. Use when you need to remove a datapoint from HoneyHive after confirming its identifier. |
| `HONEYHIVE_DELETE_DATASET` | Delete Dataset | Tool to delete a dataset by ID. Use when you need to remove a dataset after confirming its ID. |
| `HONEYHIVE_END_EVALUATION_RUN` | End Evaluation Run | Tool to update an evaluation run's status and metadata. Use to mark a run as completed after finishing evaluations, or update run properties like name, metadata, configuration, and associated event/datapoint IDs. |
| `HONEYHIVE_GET_CONFIGURATIONS` | Get Configurations | Tool to retrieve a list of configurations. Use when you need to fetch all configurations for a specific project before making changes. |
| `HONEYHIVE_GET_DATASETS` | Get Datasets | Retrieve datasets from HoneyHive for a specified project. Use this tool when you need to: - List all datasets within a project - Find datasets by type (evaluation or fine-tuning) - Retrieve a specific dataset by its ID Returns dataset details including name, description, datapoints count, type, and timestamps. |
| `HONEYHIVE_GET_EVENTS` | Get Events | Tool to query events with filters and projections from HoneyHive. Use this action when you need to retrieve events with lightweight filtering (limit 1000 results). For bulk exports or more complex queries, use the Retrieve Events action instead. Supports filtering by date range, event properties, and field projections. |
| `HONEYHIVE_GET_EVENTS_BY_SESSION_ID` | Get Events By Session ID | Tool to retrieve the complete tree of nested events for a specific session. Use when you need to analyze all events (model calls, tool calls, chains) that occurred within a session, including their hierarchical relationships, inputs, outputs, metrics, and costs. Returns a tree structure with recursive children. |
| `HONEYHIVE_GET_EVENTS_CHART` | Get Events Chart | Tool to retrieve charting and analytics data for events over time. Use when you need aggregated metrics (duration, cost, token usage) grouped by time buckets or fields. Supports percentile analysis (p50, p95, p99) for latency monitoring and custom filters for targeted analytics. |
| `HONEYHIVE_GET_METRICS` | Get Metrics | Retrieves all metrics associated with a HoneyHive project. Returns a list of metrics including their configuration (name, type, description, thresholds, evaluator details) and metadata (creation/update timestamps, sampling settings). Use this tool when you need to: - List all metrics configured for a project - Get metric IDs for updating metrics via HONEYHIVE_UPDATE_METRIC - Understand what evaluations are set up for a project Prerequisites: Obtain a valid project_name using HONEYHIVE_GET_PROJECTS first. |
| `HONEYHIVE_GET_PROJECTS` | Get Projects | Tool to retrieve all projects in the HoneyHive account. Use when you need to list available projects, get project IDs for use in other API calls, or search for a specific project by name. |
| `HONEYHIVE_GET_RUN` | Get Evaluation Run Details | Tool to get details of an evaluation run by its UUID. Use when you need to check the status, configuration, results, or metadata of a specific evaluation run. |
| `HONEYHIVE_GET_RUN_METRICS` | Get Run Metrics | Tool to get event metrics for an experiment run. Use when you need to retrieve metrics computed on events within a specific experiment run. Returns an array of event objects with their associated metrics, which can be filtered by date range or custom filters. |
| `HONEYHIVE_GET_RUNS` | Get Evaluation Runs | Tool to retrieve a list of evaluation runs from HoneyHive. Use when you need to: - List all evaluation runs for analysis - Find runs by status, name, or dataset - Get specific runs by their IDs - Paginate through large sets of evaluation runs Returns evaluation details including status, results, configuration, and timestamps. |
| `HONEYHIVE_GET_RUNS_SCHEMA` | Get Runs Schema | Tool to retrieve the schema for experiment runs in HoneyHive. Use when you need to understand available fields, datasets, and mappings for experiment runs. |
| `HONEYHIVE_GET_SESSION` | Get Session | Retrieve a complete session tree by session ID from HoneyHive. Use this tool to fetch the full session hierarchy including all nested events (model calls, tool calls, chains) with their inputs, outputs, durations, and metadata. Returns a recursive tree structure with aggregated metrics. Prerequisites: You need a valid session ID from HONEYHIVE_START_SESSION or HONEYHIVE_RETRIEVE_EVENTS. |
| `HONEYHIVE_LIST_TOOLS` | List Tools | Tool to list all available Honeyhive tools. Use when you need to discover which functions or plugins are registered for use. |
| `HONEYHIVE_RETRIEVE_DATAPOINT` | Retrieve Datapoint | Retrieve a specific datapoint by its ID from HoneyHive. Use this tool when you need the full details of a single datapoint, including its inputs, ground truth, conversation history, linked datasets, and metadata. Prerequisites: You need a valid datapoint ID. Get datapoint IDs from: - HONEYHIVE_RETRIEVE_DATAPOINTS (list datapoints by project/dataset) - HONEYHIVE_ADD_DATAPOINTS_TO_DATASET (returns IDs of newly created datapoints) |
| `HONEYHIVE_RETRIEVE_DATAPOINTS` | Retrieve Datapoints | Retrieve datapoints from a HoneyHive project. Use this tool to fetch evaluation datapoints containing inputs, ground truth, and metadata. Supports filtering by specific datapoint IDs or dataset name. Commonly used to: - Review existing test cases before running evaluations - Export datapoints for analysis - Verify datapoint contents after adding them to a dataset |
| `HONEYHIVE_RETRIEVE_EVENTS` | Retrieve Events | Retrieve and export events from a HoneyHive project. Use this tool to query traced events (model calls, tool calls, sessions, chains) with optional filters by event_type, metadata, feedback scores, or date range. Returns events with their inputs, outputs, duration, and metrics. Supports pagination for large result sets (max 7500 per page). |
| `HONEYHIVE_RETRIEVE_EXPERIMENT_RESULT` | Retrieve Experiment Result | Tool to retrieve the result of a specific experiment run. Use when you need the status, metrics, and datapoint-level details of a completed experiment. |
| `HONEYHIVE_START_EVALUATION_RUN` | Start Evaluation Run | Creates a new evaluation run to group and track multiple session events for analysis. Use this action when you want to: - Compare model performance across multiple sessions - Create evaluation batches for quality assurance - Link existing events to datasets for structured evaluation Prerequisites: - Get project ID using Get Projects action - Get event IDs from Start Session or Retrieve Events actions - (Optional) Get dataset ID from Get Datasets action |
| `HONEYHIVE_START_SESSION` | Start Session | Start a new HoneyHive session for tracing and observability. Use this tool to initiate a tracking session that groups together related model, tool, and chain events. Returns a session_id that should be used to link subsequent events to this session. Common use cases: - Start tracing a user conversation - Begin logging an LLM pipeline execution - Initialize observability for a batch processing job |
| `HONEYHIVE_UPDATE_CONFIGURATION` | Update Configuration | Tool to update an existing HoneyHive configuration. Use when you need to modify a configuration's name, provider, model parameters, environments, or other settings. You must provide the configuration ID (obtainable via Get Configurations action) and the name field. All other fields are optional and will only update if provided. |
| `HONEYHIVE_UPDATE_DATAPOINT` | Update Datapoint | Update an existing datapoint by ID. Use this to modify any combination of inputs, ground_truth, history, metadata, linked_datasets, or linked_evals for a datapoint. Requires a valid datapoint ID obtained from retrieve_datapoints or add_datapoints_to_dataset. |
| `HONEYHIVE_UPDATE_DATASET` | Update Dataset | Tool to update an existing dataset. Use when you need to modify a dataset's details (name, description, datapoints, linked evaluations, or metadata) after confirming its ID. |
| `HONEYHIVE_UPDATE_EVENT` | Update Event | Update an existing HoneyHive event by ID. Use to attach feedback, metrics, metadata, outputs, config, user properties, or update duration on events created via start_session or batch event creation. At least one optional field must be provided alongside the event_id. |
| `HONEYHIVE_UPDATE_METRIC` | Update Metric | Tool to update an existing metric. Use when you need to modify a metric’s properties after creation. Ensure you retrieve the metric first to verify its current state. |
| `HONEYHIVE_UPDATE_PROJECT` | Update Project | Updates an existing HoneyHive project's name or description. Use this action to modify project metadata after creation. You must provide the project_id and at least one field to update (name or description). To find project IDs, use the Get Projects action first. |
| `HONEYHIVE_UPDATE_TOOL` | Update Tool | Tool to update an existing tool in HoneyHive. Use when you need to modify a tool's name, description, parameters, or type after confirming its ID. At least one optional field must be provided alongside the required tool ID. |

## Supported Triggers

None listed.

## Installation and MCP Setup

### Path 1: SDK Installation

#### Path 1, Step 1: Install Composio

Install the Composio SDK
```python
pip install composio_openai
```

```typescript
npm install @composio/openai
```

#### Path 1, Step 2: Initialize Composio and Create Tool Router Session

Import and initialize Composio client, then create a Tool Router session
```python
from openai import OpenAI
from composio import Composio
from composio_openai import OpenAIResponsesProvider

composio = Composio(provider=OpenAIResponsesProvider())
openai = OpenAI()
session = composio.create(user_id='your-user-id')
```

```typescript
import OpenAI from 'openai';
import { Composio } from '@composio/core';
import { OpenAIResponsesProvider } from '@composio/openai';

const composio = new Composio({
  provider: new OpenAIResponsesProvider(),
});
const openai = new OpenAI({});
const session = await composio.create('your-user-id');
```

#### Path 1, Step 3: Execute Honeyhive Tools via Tool Router with Your Agent

Get tools from Tool Router session and execute Honeyhive actions with your Agent
```python
tools = session.tools
response = openai.responses.create(
  model='gpt-4.1',
  tools=tools,
  input=[{
    'role': 'user',
    'content': 'Add new datapoints to the "test-eval" dataset for today''s experiment.'
  }]
)
result = composio.provider.handle_tool_calls(
  response=response,
  user_id='your-user-id'
)
print(result)
```

```typescript
const tools = session.tools;
const response = await openai.responses.create({
  model: 'gpt-4.1',
  tools: tools,
  input: [{
    role: 'user',
    content: 'Add new datapoints to the "test-eval" dataset for today''s experiment.'
  }],
});
const result = await composio.provider.handleToolCalls(
  'your-user-id',
  response.output
);
console.log(result);
```

### Path 2: MCP Server Setup

#### Path 2, Step 1: Install Composio

Install the Composio SDK and Claude Agent SDK
```python
pip install composio claude-agent-sdk
```

```typescript
npm install @composio/core ai @ai-sdk/openai @ai-sdk/mcp
```

#### Path 2, Step 2: Create Tool Router Session

Initialize the Composio client and create a Tool Router session
```python
from composio import Composio
from claude_agent_sdk import ClaudeSDKClient, ClaudeAgentOptions

composio = Composio(api_key='your-composio-api-key')
session = composio.create(user_id='your-user-id')
url = session.mcp.url
```

```typescript
import { Composio } from '@composio/core';

const composio = new Composio({ apiKey: 'your-api-key' });

console.log("Creating Tool Router session...");
const { mcp } = await composio.create('your-user-id');
console.log(`Tool Router session created: ${mcp.url}`);
```

#### Path 2, Step 3: Connect to AI Agent

Use the MCP server with your AI agent
```python
import asyncio

options = ClaudeAgentOptions(
    permission_mode='bypassPermissions',
    mcp_servers={
        'tool_router': {
            'type': 'http',
            'url': url,
            'headers': {
                'x-api-key': 'your-composio-api-key'
            }
        }
    },
    system_prompt='You are a helpful assistant with access to Honeyhive tools.',
    max_turns=10
)

async def main():
    async with ClaudeSDKClient(options=options) as client:
        await client.query('Get datasets for project ABC123')
        async for message in client.receive_response():
            if hasattr(message, 'content'):
                for block in message.content:
                    if hasattr(block, 'text'):
                        print(block.text)

asyncio.run(main())
```

```typescript
import { openai } from '@ai-sdk/openai';
import { experimental_createMCPClient as createMCPClient } from '@ai-sdk/mcp';
import { generateText, stepCountIs } from 'ai';

const client = await createMCPClient({
  transport: {
    type: 'http',
    url: mcp.url,
    headers: { 'x-api-key': 'your-composio-api-key' }
  }
});

const tools = await client.tools();

const { text } = await generateText({
  model: openai('gpt-4o'),
  tools,
  messages: [{ role: 'user', content: 'Get datasets for project ABC123' }],
  stopWhen: stepCountIs( 5 )
});

console.log(`Agent: ${text}`);
```

## Why Use Composio?

### 1. AI Native Honeyhive Integration

- Supports both Honeyhive MCP and direct API based integrations
- Structured, LLM-friendly schemas for reliable tool execution
- Rich coverage for reading, writing, and querying your Honeyhive data

### 2. Managed Auth

- Built-in OAuth handling with automatic token refresh and rotation
- Central place to manage, scope, and revoke Honeyhive access
- Per user and per environment credentials instead of hard-coded keys

### 3. Agent Optimized Design

- Tools are tuned using real error and success rates to improve reliability over time
- Comprehensive execution logs so you always know what ran, when, and on whose behalf

### 4. Enterprise Grade Security

- Fine-grained RBAC so you control which agents and users can access Honeyhive
- Scoped, least privilege access to Honeyhive resources
- Full audit trail of agent actions to support review and compliance

## Use Honeyhive with any AI Agent Framework

Choose a framework you want to connect Honeyhive with:

- [OpenAI Agents SDK](https://composio.dev/toolkits/honeyhive/framework/open-ai-agents-sdk)
- [Claude Agent SDK](https://composio.dev/toolkits/honeyhive/framework/claude-agents-sdk)
- [Claude Code](https://composio.dev/toolkits/honeyhive/framework/claude-code)
- [Claude Cowork](https://composio.dev/toolkits/honeyhive/framework/claude-cowork)
- [Codex](https://composio.dev/toolkits/honeyhive/framework/codex)
- [OpenClaw](https://composio.dev/toolkits/honeyhive/framework/openclaw)
- [Hermes](https://composio.dev/toolkits/honeyhive/framework/hermes-agent)
- [Google ADK](https://composio.dev/toolkits/honeyhive/framework/google-adk)
- [LangChain](https://composio.dev/toolkits/honeyhive/framework/langchain)
- [Vercel AI SDK](https://composio.dev/toolkits/honeyhive/framework/ai-sdk)
- [Mastra AI](https://composio.dev/toolkits/honeyhive/framework/mastra-ai)
- [LlamaIndex](https://composio.dev/toolkits/honeyhive/framework/llama-index)
- [CrewAI](https://composio.dev/toolkits/honeyhive/framework/crew-ai)
- [Pydantic AI](https://composio.dev/toolkits/honeyhive/framework/pydantic-ai)
- [AutoGen](https://composio.dev/toolkits/honeyhive/framework/autogen)

## Related Toolkits

- [Composio](https://composio.dev/toolkits/composio) - Composio is an integration platform that connects AI agents with hundreds of business tools. It streamlines authentication and lets you trigger actions across services—no custom code needed.
- [Composio search](https://composio.dev/toolkits/composio_search) - Composio search is a unified web search toolkit spanning travel, e-commerce, news, financial markets, images, and more. It lets you and your apps tap into up-to-date web data from a single, easy-to-integrate service.
- [Perplexityai](https://composio.dev/toolkits/perplexityai) - Perplexityai delivers natural, conversational AI models for generating human-like text. Instantly get context-aware, high-quality responses for chat, search, or complex workflows.
- [Browser tool](https://composio.dev/toolkits/browser_tool) - Browser tool is a virtual browser integration that lets AI agents interact with the web programmatically. It enables automated browsing, scraping, and action-taking from any AI workflow.
- [Ai ml api](https://composio.dev/toolkits/ai_ml_api) - Ai ml api is a suite of AI/ML models for natural language and image tasks. It provides fast, scalable access to advanced AI capabilities for your apps and workflows.
- [Aivoov](https://composio.dev/toolkits/aivoov) - Aivoov is an AI-powered text-to-speech platform offering 1,000+ voices in over 150 languages. Instantly turn written content into natural, human-like audio for any application.
- [All images ai](https://composio.dev/toolkits/all_images_ai) - All-Images.ai is an AI-powered image generation and management platform. It helps you create, search, and organize images effortlessly with advanced AI capabilities.
- [Anthropic administrator](https://composio.dev/toolkits/anthropic_administrator) - Anthropic administrator is an API for managing Anthropic organizational resources like members, workspaces, and API keys. It helps you automate admin tasks and streamline resource management across your Anthropic organization.
- [Api labz](https://composio.dev/toolkits/api_labz) - Api labz is a platform offering a suite of AI-driven APIs and workflow tools. It helps developers automate tasks and build smarter, more efficient applications.
- [Apipie ai](https://composio.dev/toolkits/apipie_ai) - Apipie ai is an AI model aggregator offering a single API for accessing top AI models from multiple providers. It helps developers build cost-efficient, latency-optimized AI solutions without juggling multiple integrations.
- [Astica ai](https://composio.dev/toolkits/astica_ai) - Astica ai provides APIs for computer vision, NLP, and voice synthesis. Integrate advanced AI features into your app with a single API key.
- [Bigml](https://composio.dev/toolkits/bigml) - BigML is a machine learning platform that lets you build, train, and deploy predictive models from your data. Its intuitive interface and robust API make machine learning accessible and efficient.
- [Botbaba](https://composio.dev/toolkits/botbaba) - Botbaba is a platform for building, managing, and deploying conversational AI chatbots across messaging channels. It streamlines chatbot automation, making it easier to integrate AI into customer interactions.
- [Botpress](https://composio.dev/toolkits/botpress) - Botpress is an open-source platform for building, deploying, and managing chatbots. It helps teams automate conversations and deliver rich, interactive messaging experiences.
- [Chatbotkit](https://composio.dev/toolkits/chatbotkit) - Chatbotkit is a platform for building and managing AI-powered chatbots using robust APIs and SDKs. It lets you easily add conversational AI to your apps for better user engagement.
- [Cody](https://composio.dev/toolkits/cody) - Cody is an AI assistant built for businesses, trained on your company's knowledge and data. It delivers instant answers and insights, tailored for your team.
- [Context7 MCP](https://composio.dev/toolkits/context7_mcp) - Context7 MCP delivers live, version-specific code docs and examples right from the source. It helps developers and AI agents instantly retrieve authoritative programming info—no more out-of-date docs.
- [Customgpt](https://composio.dev/toolkits/customgpt) - CustomGPT.ai lets you build and deploy chatbots tailored to your own data and business needs. Get precise and context-aware AI conversations without writing code.
- [Datarobot](https://composio.dev/toolkits/datarobot) - Datarobot is a machine learning platform that automates model development, deployment, and monitoring. It empowers organizations to quickly gain predictive insights from large datasets.
- [Deepgram](https://composio.dev/toolkits/deepgram) - Deepgram is an AI-powered speech recognition platform for accurate audio transcription and understanding. It enables fast, scalable speech-to-text with advanced audio intelligence features.

## Frequently Asked Questions

### Do I need my own developer credentials to use Honeyhive with Composio?

Yes, Honeyhive requires you to configure your own API key credentials. Once set up, Composio handles secure credential storage and API request handling for you.

### Can I use multiple toolkits together?

Yes! Composio's Tool Router enables agents to use multiple toolkits. [Learn more](https://docs.composio.dev/tool-router/overview).

### Is Composio secure?

Composio is SOC 2 and ISO 27001 compliant with all data encrypted in transit and at rest. [Learn more](https://trust.composio.dev).

### What if the API changes?

Composio maintains and updates all toolkit integrations automatically, so your agents always work with the latest API versions.

---
[See all toolkits](https://composio.dev/toolkits) · [Composio docs](https://docs.composio.dev/llms.txt)