Firecrawl with LlamaIndex

How to integrate Firecrawl MCP with LlamaIndex

Trusted by

GET STARTED FOR FREE GET A DEMO

30 min · no commitment · see it on your stack

Introduction Also integrate Firecrawl with TL;DR What is LlamaIndex What is the Firecrawl MCP server Supported Tools & Triggers Creating MCP Server - Stand-alone vs Composio SDK Step-by-step Guide Complete Code Conclusion How to build Firecrawl MCP Agent with another framework Explore Other Toolkits FAQ

Connect Firecrawl without Auth hassles

We manage OAuth, API Key, token refresh, and scopes, you just build.

Try for Free

Introduction

This guide walks you through connecting Firecrawl to LlamaIndex using the Composio tool router. By the end, you'll have a working Firecrawl agent that can extract all product prices from this e-commerce site, crawl competitor blogs for latest article summaries, map all subpages linked from homepage url through natural language commands.

This guide will help you understand how to give your LlamaIndex agent real control over a Firecrawl account through Composio's Firecrawl MCP server.

Before we dive in, let's take a quick look at the key ideas and tools involved.

Also integrate Firecrawl with

ChatGPT Antigravity OpenAI Agents SDK Claude Agent SDK Claude Code Claude Cowork Codex OpenCode Cursor VS Code OpenClaw Hermes CLI Google ADK LangChain Vercel AI SDK Mastra AI CrewAI

TL;DR

Here's what you'll learn:

Set your OpenAI and Composio API keys
Install LlamaIndex and Composio packages
Create a Composio Tool Router session for Firecrawl
Connect LlamaIndex to the Firecrawl MCP server
Build a Firecrawl-powered agent using LlamaIndex
Interact with Firecrawl through natural language

What is LlamaIndex?

LlamaIndex is a data framework for building LLM applications. It provides tools for connecting LLMs to external data sources and services through agents and tools.

Key features include:

ReAct Agent: Reasoning and acting pattern for tool-using agents
MCP Tools: Native support for Model Context Protocol
Context Management: Maintain conversation context across interactions
Async Support: Built for async/await patterns

What is the Firecrawl MCP server, and what's possible with it?

The Firecrawl MCP server is an implementation of the Model Context Protocol that connects your AI agent and assistants like Claude, Cursor, etc directly to your Firecrawl account. It provides structured and secure access to automated web crawling, scraping, and data extraction, so your agent can perform actions like indexing sites, extracting structured content, mapping URLs, and searching the web on your behalf.

Automated web crawling and indexing: Let your agent launch and manage web crawl jobs to gather content or index entire websites efficiently.
Structured data extraction: Instruct your agent to extract targeted data from web pages using custom prompts or schemas, turning unstructured sites into actionable information.
URL mapping and discovery: Have the agent explore and map all URLs within a website, including options for subdomain inclusion, sitemap processing, or search-based discovery.
On-demand scraping and content retrieval: Enable your agent to scrape specific URLs, retrieve page content, and even extract structured JSON using LLM-powered methods.
Integrated web search and data collection: Task your agent with running web searches, scraping top result pages, and returning relevant details—all in one workflow.

Supported Tools & Triggers

Tools

Cancel an agent jobTool to cancel an in-progress agent job by its ID.

Batch scrape multiple URLsTool to scrape multiple URLs in batch with concurrent processing.

Cancel a batch scrape jobTool to cancel a running batch scrape job using its unique identifier.

Get batch scrape statusRetrieves the current status and results of a batch scrape job using the job ID.

Get errors from batch scrape jobTool to retrieve error details from a batch scrape job, including failed URLs and URLs blocked by robots.

Start a web crawlInitiates a Firecrawl web crawl from a given URL, applying various filtering and content extraction rules, and polls until the job is complete; ensure the URL is accessible and any regex patterns for paths are valid.

Cancel a crawl jobCancels an active or queued web crawl job using its ID; attempting to cancel completed, failed, or previously canceled jobs will not change their state.

Cancel a crawl jobTool to cancel a running crawl job by its ID.

Get crawl job statusTool to retrieve the status and results of a Firecrawl crawl job.

Get errors from a crawl jobTool to retrieve errors from a Firecrawl crawl job.

Get all active crawl jobsTool to retrieve all active crawl jobs for the authenticated team.

Preview crawl parametersPreview crawl parameters before starting a crawl by generating optimal configuration from natural language instructions.

Start a web crawl (v2) [NEW][NEW v2 API] Initiates a Firecrawl v2 web crawl with enhanced features over v1: natural language prompts for automatic crawler configuration, crawlEntireDomain for sibling/parent page discovery, better depth control with maxDiscoveryDepth, subdomain support, and full webhook configuration.

Get team credit usageTool to get current team credit usage information.

Get historical team credit usageTool to retrieve historical team credit usage on a monthly basis.

Extract structured dataExtracts structured data from web pages by initiating an extraction job and polling for completion; requires a natural language `prompt` or a JSON `schema` (one must be provided).

Get extract job statusTool to retrieve the status and results of a previously submitted extract job.

Get agent job statusTool to get the status and results of an agent job.

Get deep research statusRetrieves the status and results of a deep research job by its ID.

Get the status of a crawl jobRetrieves the current status, progress, and details of a web crawl job, using the job ID obtained when the crawl was initiated.

Generate LLMs.txt for a websiteInitiates an async job to generate an LLMs.

Get LLMs.txt generation job statusTool to get the status and results of an LLMs.

Map multiple URLsMaps a website by discovering URLs from a starting base URL, with options to customize the crawl via search query, subdomain inclusion, sitemap handling, and result limits; search effectiveness is site-dependent.

Get team queue statusTool to retrieve metrics about the team's scrape queue.

Scrape URLScrapes a publicly accessible URL, optionally performing pre-scrape browser actions or extracting structured JSON using an LLM, to retrieve content in specified formats.

SearchPerforms a web search for a query, scrapes content from the top search results using Firecrawl, and returns details in specified formats.

Start an agent jobTool to start an agent job for agentic web extraction with multi-page navigation and interaction capabilities.

Get team token usageTool to retrieve the current team's token usage and balance information for Firecrawl's Extract feature.

Get historical team token usageTool to retrieve historical team token usage on a monthly basis.

What is the Composio tool router, and how does it fit here?

What is Composio SDK?

Composio's Composio SDK helps agents find the right tools for a task at runtime. You can plug in multiple toolkits (like Gmail, HubSpot, and GitHub), and the agent will identify the relevant app and action to complete multi-step workflows. This can reduce token usage and improve the reliability of tool calls. Read more here: Getting started with Composio SDK

The tool router generates a secure MCP URL that your agents can access to perform actions.

How the Composio SDK works

The Composio SDK follows a three-phase workflow:

Discovery: Searches for tools matching your task and returns relevant toolkits with their details.
Authentication: Checks for active connections. If missing, creates an auth config and returns a connection URL via Auth Link.
Execution: Executes the action using the authenticated connection.

Step-by-step Guide

Prerequisites

Before you begin, make sure you have:

Python 3.8/Node 16 or higher installed
A Composio account with the API key
An OpenAI API key
A Firecrawl account and project
Basic familiarity with async Python/Typescript

Getting API Keys for OpenAI, Composio, and Firecrawl

OpenAI API key (OPENAI_API_KEY)

Go to the OpenAI dashboard
Create an API key if you don't have one
Assign it to OPENAI_API_KEY in .env

Composio API key and user ID

Log into the Composio dashboard
Copy your API key from Settings
- Use this as COMPOSIO_API_KEY
Pick a stable user identifier (email or ID)
- Use this as COMPOSIO_USER_ID

Installing dependencies

pip install composio-llamaindex llama-index llama-index-llms-openai llama-index-tools-mcp python-dotenv

Create a new Python project and install the necessary dependencies:

composio-llamaindex: Composio's LlamaIndex integration
llama-index: Core LlamaIndex framework
llama-index-llms-openai: OpenAI LLM integration
llama-index-tools-mcp: MCP client for LlamaIndex
python-dotenv: Environment variable management

Set environment variables

bash

OPENAI_API_KEY=your-openai-api-key
COMPOSIO_API_KEY=your-composio-api-key
COMPOSIO_USER_ID=your-user-id

Create a .env file in your project root:

These credentials will be used to:

Authenticate with OpenAI's GPT-5 model
Connect to Composio's Tool Router
Identify your Composio user session for Firecrawl access

Import modules

import asyncio
import os
import dotenv

from composio import Composio
from composio_llamaindex import LlamaIndexProvider
from llama_index.core.agent.workflow import ReActAgent
from llama_index.core.workflow import Context
from llama_index.llms.openai import OpenAI
from llama_index.tools.mcp import BasicMCPClient, McpToolSpec

dotenv.load_dotenv()

Create a new file called firecrawl_llamaindex_agent.py and import the required modules:

Key imports:

asyncio: For async/await support
Composio: Main client for Composio services
LlamaIndexProvider: Adapts Composio tools for LlamaIndex
ReActAgent: LlamaIndex's reasoning and action agent
BasicMCPClient: Connects to MCP endpoints
McpToolSpec: Converts MCP tools to LlamaIndex format

Load environment variables and initialize Composio

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
COMPOSIO_API_KEY = os.getenv("COMPOSIO_API_KEY")
COMPOSIO_USER_ID = os.getenv("COMPOSIO_USER_ID")

if not OPENAI_API_KEY:
    raise ValueError("OPENAI_API_KEY is not set in the environment")
if not COMPOSIO_API_KEY:
    raise ValueError("COMPOSIO_API_KEY is not set in the environment")
if not COMPOSIO_USER_ID:
    raise ValueError("COMPOSIO_USER_ID is not set in the environment")

What's happening:

This ensures missing credentials cause early, clear errors before the agent attempts to initialise.

Create a Tool Router session and build the agent function

async def build_agent() -> ReActAgent:
    composio_client = Composio(
        api_key=COMPOSIO_API_KEY,
        provider=LlamaIndexProvider(),
    )

    session = composio_client.create(
        user_id=COMPOSIO_USER_ID,
        toolkits=["firecrawl"],
    )

    mcp_url = session.mcp.url
    print(f"Composio MCP URL: {mcp_url}")

    mcp_client = BasicMCPClient(mcp_url, headers={"x-api-key": COMPOSIO_API_KEY})
    mcp_tool_spec = McpToolSpec(client=mcp_client)
    tools = await mcp_tool_spec.to_tool_list_async()

    llm = OpenAI(model="gpt-5")

    description = "An agent that uses Composio Tool Router MCP tools to perform Firecrawl actions."
    system_prompt = """
    You are a helpful assistant connected to Composio Tool Router.
    Use the available tools to answer user queries and perform Firecrawl actions.
    """
    return ReActAgent(tools=tools, llm=llm, description=description, system_prompt=system_prompt, verbose=True)

What's happening here:

We create a Composio client using your API key and configure it with the LlamaIndex provider
We then create a tool router MCP session for your user, specifying the toolkits we want to use (in this case, firecrawl)
The session returns an MCP HTTP endpoint URL that acts as a gateway to all your configured tools
LlamaIndex will connect to this endpoint to dynamically discover and use the available Firecrawl tools.
The MCP tools are mapped to LlamaIndex-compatible tools and plug them into the Agent.

Create an interactive chat loop

async def chat_loop(agent: ReActAgent) -> None:
    ctx = Context(agent)
    print("Type 'quit', 'exit', or Ctrl+C to stop.")

    while True:
        try:
            user_input = input("\nYou: ").strip()
        except (KeyboardInterrupt, EOFError):
            print("\nBye!")
            break

        if not user_input or user_input.lower() in {"quit", "exit"}:
            print("Bye!")
            break

        try:
            print("Agent: ", end="", flush=True)
            handler = agent.run(user_input, ctx=ctx)

            async for event in handler.stream_events():
                # Stream token-by-token from LLM responses
                if hasattr(event, "delta") and event.delta:
                    print(event.delta, end="", flush=True)
                # Show tool calls as they happen
                elif hasattr(event, "tool_name"):
                    print(f"\n[Using tool: {event.tool_name}]", flush=True)

            # Get final response
            response = await handler
            print()  # Newline after streaming
        except KeyboardInterrupt:
            print("\n[Interrupted]")
            continue
        except Exception as e:
            print(f"\nError: {e}")

What's happening here:

We're creating a direct terminal interface to chat with your Firecrawl database
The LLM's responses are streamed to the CLI for faster interaction.
The agent uses context to maintain conversation history
You can type 'quit' or 'exit' to stop the chat loop gracefully
Agent responses and any errors are displayed in a clear, readable format

Define the main entry point

async def main() -> None:
    agent = await build_agent()
    await chat_loop(agent)

if __name__ == "__main__":
    # Handle Ctrl+C gracefully
    signal.signal(signal.SIGINT, lambda s, f: (print("\nBye!"), exit(0)))
    try:
        asyncio.run(main())
    except KeyboardInterrupt:
        print("\nBye!")

What's happening here:

We're orchestrating the entire application flow
The agent gets built with proper error handling
Then we kick off the interactive chat loop so you can start talking to Firecrawl

Run the agent

npx ts-node llamaindex-agent.ts

When prompted, authenticate and authorise your agent with Firecrawl, then start asking questions.

Complete Code

Here's the complete code to get you started with Firecrawl and LlamaIndex:

import asyncio
import os
import signal
import dotenv

from composio import Composio
from composio_llamaindex import LlamaIndexProvider
from llama_index.core.agent.workflow import ReActAgent
from llama_index.core.workflow import Context
from llama_index.llms.openai import OpenAI
from llama_index.tools.mcp import BasicMCPClient, McpToolSpec

dotenv.load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
COMPOSIO_API_KEY = os.getenv("COMPOSIO_API_KEY")
COMPOSIO_USER_ID = os.getenv("COMPOSIO_USER_ID")

if not OPENAI_API_KEY:
    raise ValueError("OPENAI_API_KEY is not set")
if not COMPOSIO_API_KEY:
    raise ValueError("COMPOSIO_API_KEY is not set")
if not COMPOSIO_USER_ID:
    raise ValueError("COMPOSIO_USER_ID is not set")

async def build_agent() -> ReActAgent:
    composio_client = Composio(
        api_key=COMPOSIO_API_KEY,
        provider=LlamaIndexProvider(),
    )

    session = composio_client.create(
        user_id=COMPOSIO_USER_ID,
        toolkits=["firecrawl"],
    )

    mcp_url = session.mcp.url
    print(f"Composio MCP URL: {mcp_url}")

    mcp_client = BasicMCPClient(mcp_url, headers={"x-api-key": COMPOSIO_API_KEY})
    mcp_tool_spec = McpToolSpec(client=mcp_client)
    tools = await mcp_tool_spec.to_tool_list_async()

    llm = OpenAI(model="gpt-5")
    description = "An agent that uses Composio Tool Router MCP tools to perform Firecrawl actions."
    system_prompt = """
    You are a helpful assistant connected to Composio Tool Router.
    Use the available tools to answer user queries and perform Firecrawl actions.
    """
    return ReActAgent(
        tools=tools,
        llm=llm,
        description=description,
        system_prompt=system_prompt,
        verbose=True,
    );

async def chat_loop(agent: ReActAgent) -> None:
    ctx = Context(agent)
    print("Type 'quit', 'exit', or Ctrl+C to stop.")

    while True:
        try:
            user_input = input("\nYou: ").strip()
        except (KeyboardInterrupt, EOFError):
            print("\nBye!")
            break

        if not user_input or user_input.lower() in {"quit", "exit"}:
            print("Bye!")
            break

        try:
            print("Agent: ", end="", flush=True)
            handler = agent.run(user_input, ctx=ctx)

            async for event in handler.stream_events():
                # Stream token-by-token from LLM responses
                if hasattr(event, "delta") and event.delta:
                    print(event.delta, end="", flush=True)
                # Show tool calls as they happen
                elif hasattr(event, "tool_name"):
                    print(f"\n[Using tool: {event.tool_name}]", flush=True)

            # Get final response
            response = await handler
            print()  # Newline after streaming
        except KeyboardInterrupt:
            print("\n[Interrupted]")
            continue
        except Exception as e:
            print(f"\nError: {e}")

async def main() -> None:
    agent = await build_agent()
    await chat_loop(agent)

if __name__ == "__main__":
    # Handle Ctrl+C gracefully
    signal.signal(signal.SIGINT, lambda s, f: (print("\nBye!"), exit(0)))
    try:
        asyncio.run(main())
    except KeyboardInterrupt:
        print("\nBye!")

Conclusion

You've successfully connected Firecrawl to LlamaIndex through Composio's Tool Router MCP layer. Key takeaways:

Tool Router dynamically exposes Firecrawl tools through an MCP endpoint
LlamaIndex's ReActAgent handles reasoning and orchestration; Composio handles integrations
The agent becomes more capable without increasing prompt size
Async Python provides clean, efficient execution of agent workflows

You can easily extend this to other toolkits like Gmail, Notion, Stripe, GitHub, and more by adding them to the toolkits parameter.

How to build Firecrawl MCP Agent with another framework

ChatGPT

Use Firecrawl MCP with ChatGPT

Antigravity

Use Firecrawl MCP with Antigravity

OpenAI Agents SDK

Use Firecrawl MCP with OpenAI Agents SDK

Claude Agent SDK

Use Firecrawl MCP with Claude Agent SDK

Claude Code

Use Firecrawl MCP with Claude Code

Claude Cowork

Use Firecrawl MCP with Claude Cowork

Codex

Use Firecrawl MCP with Codex

Cursor

Use Firecrawl MCP with Cursor

OpenClaw

Use Firecrawl MCP with OpenClaw

Hermes

Use Firecrawl MCP with Hermes

CLI

Use Firecrawl MCP with CLI

Google ADK

Use Firecrawl MCP with Google ADK

LangChain

Use Firecrawl MCP with LangChain

Vercel AI SDK

Use Firecrawl MCP with Vercel AI SDK

Mastra AI

Use Firecrawl MCP with Mastra AI

CrewAI

Use Firecrawl MCP with CrewAI

OpenCode

Use Firecrawl MCP with OpenCode

VS Code

Use Firecrawl MCP with VS Code

Explore Other Toolkits

Tavily

Api Key

Tavily offers powerful search and data retrieval from documents, databases, and the web. It helps teams locate and filter information instantly, saving hours on research.

Exa

Api Key

Exa is a data extraction and search platform for gathering and analyzing information from websites, APIs, or databases. It helps teams quickly surface insights and automate data-driven workflows.

Serpapi

Api Key

SerpApi is a real-time API for structured search engine results. It lets you automate SERP data collection, parsing, and analysis for SEO and research.

TOOLKIT MARKETPLACE

FAQ

What are the differences in Tool Router MCP and Firecrawl MCP?

With a standalone Firecrawl MCP server, the agents and LLMs can only access a fixed set of Firecrawl tools tied to that server. However, with the Composio Tool Router, agents can dynamically load tools from Firecrawl and many other apps based on the task at hand, all through a single MCP endpoint.

Can I use Tool Router MCP with LlamaIndex?

Yes, you can. LlamaIndex fully supports MCP integration. You get structured tool calling, message history handling, and model orchestration while Tool Router takes care of discovering and serving the right Firecrawl tools.

Can I manage the permissions and scopes for Firecrawl while using Tool Router?

Yes, absolutely. You can configure which Firecrawl scopes and actions are allowed when connecting your account to Composio. You can also bring your own OAuth credentials or API configuration so you keep full control over what the agent can do.

How safe is my data with Composio Tool Router?

All sensitive data such as tokens, keys, and configuration is fully encrypted at rest and in transit. Composio is SOC 2 Type 2 compliant and follows strict security practices so your Firecrawl data and credentials are handled as safely as possible.

Used by agents from

Never worry about agent reliability

We handle tool reliability, observability, and security so you never have to second-guess an agent action.

Get started for free Get a demo↗

Harsha GaddipatiCo-founder, Slashy

Karan skipped his own birthday party to fix our critical issue. It was 10 pm and he diverted his Waymo to help us instead. This really sets the bar, shows you the commitment you need to have when users rely on your software.

Abhi AryaCo-founder, Opennote

A lot of students tell us that the moment their connected tools start talking to each other inside Opennote feels almost magical. The agent just knows them, and it has immensely helped in keeping new users on the platform.

Nirman DaveCEO, Zams

We chose Composio over Pipedream because it delivered depth where it mattered. It supported niche tools and tricky edge cases that other platforms simply ignored. Giving us confidence to scale without compromising.

Ryan YuFounder, Extra Thursday

As a solo builder, shipping fast is life or death. The only way I can outcompete incumbents is by outmanoeuvring them. Getting bogged down in the complexities of managing agent auth would have been a death sentence for Extra Thursday.

Tomisin JenrolaFounder & CEO, SwarmZero

Before partnering with Composio, adding tool integrations was a slow, resource-intensive process. Each integration could take weeks or months of engineering time, and maintaining them meant constantly keeping up with API changes.

Jerome LeclancheCo-Founder, Ingram Technologies

With hands-on help from their founder, we integrated Gmail and Google Drive in just 30 minutes. This level of personal support and commitment is exactly what startups should strive for.

Harsha GaddipatiCo-founder, Slashy

Abhi AryaCo-founder, Opennote

Nirman DaveCEO, Zams

Ryan YuFounder, Extra Thursday

Tomisin JenrolaFounder & CEO, SwarmZero

Jerome LeclancheCo-Founder, Ingram Technologies

With hands-on help from their founder, we integrated Gmail and Google Drive in just 30 minutes. This level of personal support and commitment is exactly what startups should strive for.

Harsha GaddipatiCo-founder, Slashy

Abhi AryaCo-founder, Opennote

Nirman DaveCEO, Zams

Ryan YuFounder, Extra Thursday

Tomisin JenrolaFounder & CEO, SwarmZero

Jerome LeclancheCo-Founder, Ingram Technologies

With hands-on help from their founder, we integrated Gmail and Google Drive in just 30 minutes. This level of personal support and commitment is exactly what startups should strive for.

How to integrate Firecrawl MCP with LlamaIndex

Table of Contents

Connect Firecrawl without Auth hassles

Introduction

Also integrate Firecrawl with

TL;DR

What is LlamaIndex?

What is the Firecrawl MCP server, and what's possible with it?

Supported Tools & Triggers

What is the Composio tool router, and how does it fit here?

What is Composio SDK?

How the Composio SDK works

Step-by-step Guide

Prerequisites

Getting API Keys for OpenAI, Composio, and Firecrawl

Installing dependencies

Set environment variables

Import modules

Load environment variables and initialize Composio

Create a Tool Router session and build the agent function

Create an interactive chat loop

Define the main entry point

Run the agent

Complete Code

Conclusion

How to build Firecrawl MCP Agent with another framework

ChatGPT

Antigravity

OpenAI Agents SDK

Claude Agent SDK

Claude Code

Claude Cowork

Codex

Cursor

OpenClaw

Hermes

CLI

Google ADK

LangChain

Vercel AI SDK

Mastra AI

CrewAI

OpenCode

VS Code

Explore Other Toolkits

Tavily

Exa

Serpapi

FAQ

What are the differences in Tool Router MCP and Firecrawl MCP?

Can I use Tool Router MCP with LlamaIndex?

Can I manage the permissions and scopes for Firecrawl while using Tool Router?

How safe is my data with Composio Tool Router?

Used by agents from

Never worry about agent reliability