How to integrate Firecrawl MCP with LlamaIndex

Trusted by
AWS
Glean
Zoom
Airtable

30 min · no commitment · see it on your stack

Firecrawl logo
LlamaIndex logo
divider

Introduction

This guide walks you through connecting Firecrawl to LlamaIndex using the Composio tool router. By the end, you'll have a working Firecrawl agent that can extract all product prices from this e-commerce site, crawl competitor blogs for latest article summaries, map all subpages linked from homepage url through natural language commands.

This guide will help you understand how to give your LlamaIndex agent real control over a Firecrawl account through Composio's Firecrawl MCP server.

Before we dive in, let's take a quick look at the key ideas and tools involved.

Also integrate Firecrawl with

TL;DR

Here's what you'll learn:
  • Set your OpenAI and Composio API keys
  • Install LlamaIndex and Composio packages
  • Create a Composio Tool Router session for Firecrawl
  • Connect LlamaIndex to the Firecrawl MCP server
  • Build a Firecrawl-powered agent using LlamaIndex
  • Interact with Firecrawl through natural language

What is LlamaIndex?

LlamaIndex is a data framework for building LLM applications. It provides tools for connecting LLMs to external data sources and services through agents and tools.

Key features include:

  • ReAct Agent: Reasoning and acting pattern for tool-using agents
  • MCP Tools: Native support for Model Context Protocol
  • Context Management: Maintain conversation context across interactions
  • Async Support: Built for async/await patterns

What is the Firecrawl MCP server, and what's possible with it?

The Firecrawl MCP server is an implementation of the Model Context Protocol that connects your AI agent and assistants like Claude, Cursor, etc directly to your Firecrawl account. It provides structured and secure access to automated web crawling, scraping, and data extraction, so your agent can perform actions like indexing sites, extracting structured content, mapping URLs, and searching the web on your behalf.

  • Automated web crawling and indexing: Let your agent launch and manage web crawl jobs to gather content or index entire websites efficiently.
  • Structured data extraction: Instruct your agent to extract targeted data from web pages using custom prompts or schemas, turning unstructured sites into actionable information.
  • URL mapping and discovery: Have the agent explore and map all URLs within a website, including options for subdomain inclusion, sitemap processing, or search-based discovery.
  • On-demand scraping and content retrieval: Enable your agent to scrape specific URLs, retrieve page content, and even extract structured JSON using LLM-powered methods.
  • Integrated web search and data collection: Task your agent with running web searches, scraping top result pages, and returning relevant details—all in one workflow.

Supported Tools & Triggers

Tools
Cancel an agent jobTool to cancel an in-progress agent job by its ID.
Batch scrape multiple URLsTool to scrape multiple URLs in batch with concurrent processing.
Cancel a batch scrape jobTool to cancel a running batch scrape job using its unique identifier.
Get batch scrape statusRetrieves the current status and results of a batch scrape job using the job ID.
Get errors from batch scrape jobTool to retrieve error details from a batch scrape job, including failed URLs and URLs blocked by robots.
Start a web crawlInitiates a Firecrawl web crawl from a given URL, applying various filtering and content extraction rules, and polls until the job is complete; ensure the URL is accessible and any regex patterns for paths are valid.
Cancel a crawl jobCancels an active or queued web crawl job using its ID; attempting to cancel completed, failed, or previously canceled jobs will not change their state.
Cancel a crawl jobTool to cancel a running crawl job by its ID.
Get crawl job statusTool to retrieve the status and results of a Firecrawl crawl job.
Get errors from a crawl jobTool to retrieve errors from a Firecrawl crawl job.
Get all active crawl jobsTool to retrieve all active crawl jobs for the authenticated team.
Preview crawl parametersPreview crawl parameters before starting a crawl by generating optimal configuration from natural language instructions.
Start a web crawl (v2) [NEW][NEW v2 API] Initiates a Firecrawl v2 web crawl with enhanced features over v1: natural language prompts for automatic crawler configuration, crawlEntireDomain for sibling/parent page discovery, better depth control with maxDiscoveryDepth, subdomain support, and full webhook configuration.
Get team credit usageTool to get current team credit usage information.
Get historical team credit usageTool to retrieve historical team credit usage on a monthly basis.
Extract structured dataExtracts structured data from web pages by initiating an extraction job and polling for completion; requires a natural language `prompt` or a JSON `schema` (one must be provided).
Get extract job statusTool to retrieve the status and results of a previously submitted extract job.
Get agent job statusTool to get the status and results of an agent job.
Get deep research statusRetrieves the status and results of a deep research job by its ID.
Get the status of a crawl jobRetrieves the current status, progress, and details of a web crawl job, using the job ID obtained when the crawl was initiated.
Generate LLMs.txt for a websiteInitiates an async job to generate an LLMs.
Get LLMs.txt generation job statusTool to get the status and results of an LLMs.
Map multiple URLsMaps a website by discovering URLs from a starting base URL, with options to customize the crawl via search query, subdomain inclusion, sitemap handling, and result limits; search effectiveness is site-dependent.
Get team queue statusTool to retrieve metrics about the team's scrape queue.
Scrape URLScrapes a publicly accessible URL, optionally performing pre-scrape browser actions or extracting structured JSON using an LLM, to retrieve content in specified formats.
SearchPerforms a web search for a query, scrapes content from the top search results using Firecrawl, and returns details in specified formats.
Start an agent jobTool to start an agent job for agentic web extraction with multi-page navigation and interaction capabilities.
Get team token usageTool to retrieve the current team's token usage and balance information for Firecrawl's Extract feature.
Get historical team token usageTool to retrieve historical team token usage on a monthly basis.

What is the Composio tool router, and how does it fit here?

What is Composio SDK?

Composio's Composio SDK helps agents find the right tools for a task at runtime. You can plug in multiple toolkits (like Gmail, HubSpot, and GitHub), and the agent will identify the relevant app and action to complete multi-step workflows. This can reduce token usage and improve the reliability of tool calls. Read more here: Getting started with Composio SDK

The tool router generates a secure MCP URL that your agents can access to perform actions.

How the Composio SDK works

The Composio SDK follows a three-phase workflow:

  1. Discovery: Searches for tools matching your task and returns relevant toolkits with their details.
  2. Authentication: Checks for active connections. If missing, creates an auth config and returns a connection URL via Auth Link.
  3. Execution: Executes the action using the authenticated connection.

Step-by-step Guide

Prerequisites

Before you begin, make sure you have:
  • Python 3.8/Node 16 or higher installed
  • A Composio account with the API key
  • An OpenAI API key
  • A Firecrawl account and project
  • Basic familiarity with async Python/Typescript

Getting API Keys for OpenAI, Composio, and Firecrawl

OpenAI API key (OPENAI_API_KEY)
  • Go to the OpenAI dashboard
  • Create an API key if you don't have one
  • Assign it to OPENAI_API_KEY in .env
Composio API key and user ID
  • Log into the Composio dashboard
  • Copy your API key from Settings
    • Use this as COMPOSIO_API_KEY
  • Pick a stable user identifier (email or ID)
    • Use this as COMPOSIO_USER_ID

Installing dependencies

pip install composio-llamaindex llama-index llama-index-llms-openai llama-index-tools-mcp python-dotenv

Create a new Python project and install the necessary dependencies:

  • composio-llamaindex: Composio's LlamaIndex integration
  • llama-index: Core LlamaIndex framework
  • llama-index-llms-openai: OpenAI LLM integration
  • llama-index-tools-mcp: MCP client for LlamaIndex
  • python-dotenv: Environment variable management

Set environment variables

bash
OPENAI_API_KEY=your-openai-api-key
COMPOSIO_API_KEY=your-composio-api-key
COMPOSIO_USER_ID=your-user-id

Create a .env file in your project root:

These credentials will be used to:

  • Authenticate with OpenAI's GPT-5 model
  • Connect to Composio's Tool Router
  • Identify your Composio user session for Firecrawl access

Import modules

import asyncio
import os
import dotenv

from composio import Composio
from composio_llamaindex import LlamaIndexProvider
from llama_index.core.agent.workflow import ReActAgent
from llama_index.core.workflow import Context
from llama_index.llms.openai import OpenAI
from llama_index.tools.mcp import BasicMCPClient, McpToolSpec

dotenv.load_dotenv()

Create a new file called firecrawl_llamaindex_agent.py and import the required modules:

Key imports:

  • asyncio: For async/await support
  • Composio: Main client for Composio services
  • LlamaIndexProvider: Adapts Composio tools for LlamaIndex
  • ReActAgent: LlamaIndex's reasoning and action agent
  • BasicMCPClient: Connects to MCP endpoints
  • McpToolSpec: Converts MCP tools to LlamaIndex format

Load environment variables and initialize Composio

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
COMPOSIO_API_KEY = os.getenv("COMPOSIO_API_KEY")
COMPOSIO_USER_ID = os.getenv("COMPOSIO_USER_ID")

if not OPENAI_API_KEY:
    raise ValueError("OPENAI_API_KEY is not set in the environment")
if not COMPOSIO_API_KEY:
    raise ValueError("COMPOSIO_API_KEY is not set in the environment")
if not COMPOSIO_USER_ID:
    raise ValueError("COMPOSIO_USER_ID is not set in the environment")

What's happening:

This ensures missing credentials cause early, clear errors before the agent attempts to initialise.

Create a Tool Router session and build the agent function

async def build_agent() -> ReActAgent:
    composio_client = Composio(
        api_key=COMPOSIO_API_KEY,
        provider=LlamaIndexProvider(),
    )

    session = composio_client.create(
        user_id=COMPOSIO_USER_ID,
        toolkits=["firecrawl"],
    )

    mcp_url = session.mcp.url
    print(f"Composio MCP URL: {mcp_url}")

    mcp_client = BasicMCPClient(mcp_url, headers={"x-api-key": COMPOSIO_API_KEY})
    mcp_tool_spec = McpToolSpec(client=mcp_client)
    tools = await mcp_tool_spec.to_tool_list_async()

    llm = OpenAI(model="gpt-5")

    description = "An agent that uses Composio Tool Router MCP tools to perform Firecrawl actions."
    system_prompt = """
    You are a helpful assistant connected to Composio Tool Router.
    Use the available tools to answer user queries and perform Firecrawl actions.
    """
    return ReActAgent(tools=tools, llm=llm, description=description, system_prompt=system_prompt, verbose=True)

What's happening here:

  • We create a Composio client using your API key and configure it with the LlamaIndex provider
  • We then create a tool router MCP session for your user, specifying the toolkits we want to use (in this case, firecrawl)
  • The session returns an MCP HTTP endpoint URL that acts as a gateway to all your configured tools
  • LlamaIndex will connect to this endpoint to dynamically discover and use the available Firecrawl tools.
  • The MCP tools are mapped to LlamaIndex-compatible tools and plug them into the Agent.

Create an interactive chat loop

async def chat_loop(agent: ReActAgent) -> None:
    ctx = Context(agent)
    print("Type 'quit', 'exit', or Ctrl+C to stop.")

    while True:
        try:
            user_input = input("\nYou: ").strip()
        except (KeyboardInterrupt, EOFError):
            print("\nBye!")
            break

        if not user_input or user_input.lower() in {"quit", "exit"}:
            print("Bye!")
            break

        try:
            print("Agent: ", end="", flush=True)
            handler = agent.run(user_input, ctx=ctx)

            async for event in handler.stream_events():
                # Stream token-by-token from LLM responses
                if hasattr(event, "delta") and event.delta:
                    print(event.delta, end="", flush=True)
                # Show tool calls as they happen
                elif hasattr(event, "tool_name"):
                    print(f"\n[Using tool: {event.tool_name}]", flush=True)

            # Get final response
            response = await handler
            print()  # Newline after streaming
        except KeyboardInterrupt:
            print("\n[Interrupted]")
            continue
        except Exception as e:
            print(f"\nError: {e}")

What's happening here:

  • We're creating a direct terminal interface to chat with your Firecrawl database
  • The LLM's responses are streamed to the CLI for faster interaction.
  • The agent uses context to maintain conversation history
  • You can type 'quit' or 'exit' to stop the chat loop gracefully
  • Agent responses and any errors are displayed in a clear, readable format

Define the main entry point

async def main() -> None:
    agent = await build_agent()
    await chat_loop(agent)

if __name__ == "__main__":
    # Handle Ctrl+C gracefully
    signal.signal(signal.SIGINT, lambda s, f: (print("\nBye!"), exit(0)))
    try:
        asyncio.run(main())
    except KeyboardInterrupt:
        print("\nBye!")

What's happening here:

  • We're orchestrating the entire application flow
  • The agent gets built with proper error handling
  • Then we kick off the interactive chat loop so you can start talking to Firecrawl

Run the agent

npx ts-node llamaindex-agent.ts

When prompted, authenticate and authorise your agent with Firecrawl, then start asking questions.

Complete Code

Here's the complete code to get you started with Firecrawl and LlamaIndex:

import asyncio
import os
import signal
import dotenv

from composio import Composio
from composio_llamaindex import LlamaIndexProvider
from llama_index.core.agent.workflow import ReActAgent
from llama_index.core.workflow import Context
from llama_index.llms.openai import OpenAI
from llama_index.tools.mcp import BasicMCPClient, McpToolSpec

dotenv.load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
COMPOSIO_API_KEY = os.getenv("COMPOSIO_API_KEY")
COMPOSIO_USER_ID = os.getenv("COMPOSIO_USER_ID")

if not OPENAI_API_KEY:
    raise ValueError("OPENAI_API_KEY is not set")
if not COMPOSIO_API_KEY:
    raise ValueError("COMPOSIO_API_KEY is not set")
if not COMPOSIO_USER_ID:
    raise ValueError("COMPOSIO_USER_ID is not set")

async def build_agent() -> ReActAgent:
    composio_client = Composio(
        api_key=COMPOSIO_API_KEY,
        provider=LlamaIndexProvider(),
    )

    session = composio_client.create(
        user_id=COMPOSIO_USER_ID,
        toolkits=["firecrawl"],
    )

    mcp_url = session.mcp.url
    print(f"Composio MCP URL: {mcp_url}")

    mcp_client = BasicMCPClient(mcp_url, headers={"x-api-key": COMPOSIO_API_KEY})
    mcp_tool_spec = McpToolSpec(client=mcp_client)
    tools = await mcp_tool_spec.to_tool_list_async()

    llm = OpenAI(model="gpt-5")
    description = "An agent that uses Composio Tool Router MCP tools to perform Firecrawl actions."
    system_prompt = """
    You are a helpful assistant connected to Composio Tool Router.
    Use the available tools to answer user queries and perform Firecrawl actions.
    """
    return ReActAgent(
        tools=tools,
        llm=llm,
        description=description,
        system_prompt=system_prompt,
        verbose=True,
    );

async def chat_loop(agent: ReActAgent) -> None:
    ctx = Context(agent)
    print("Type 'quit', 'exit', or Ctrl+C to stop.")

    while True:
        try:
            user_input = input("\nYou: ").strip()
        except (KeyboardInterrupt, EOFError):
            print("\nBye!")
            break

        if not user_input or user_input.lower() in {"quit", "exit"}:
            print("Bye!")
            break

        try:
            print("Agent: ", end="", flush=True)
            handler = agent.run(user_input, ctx=ctx)

            async for event in handler.stream_events():
                # Stream token-by-token from LLM responses
                if hasattr(event, "delta") and event.delta:
                    print(event.delta, end="", flush=True)
                # Show tool calls as they happen
                elif hasattr(event, "tool_name"):
                    print(f"\n[Using tool: {event.tool_name}]", flush=True)

            # Get final response
            response = await handler
            print()  # Newline after streaming
        except KeyboardInterrupt:
            print("\n[Interrupted]")
            continue
        except Exception as e:
            print(f"\nError: {e}")

async def main() -> None:
    agent = await build_agent()
    await chat_loop(agent)

if __name__ == "__main__":
    # Handle Ctrl+C gracefully
    signal.signal(signal.SIGINT, lambda s, f: (print("\nBye!"), exit(0)))
    try:
        asyncio.run(main())
    except KeyboardInterrupt:
        print("\nBye!")

Conclusion

You've successfully connected Firecrawl to LlamaIndex through Composio's Tool Router MCP layer. Key takeaways:
  • Tool Router dynamically exposes Firecrawl tools through an MCP endpoint
  • LlamaIndex's ReActAgent handles reasoning and orchestration; Composio handles integrations
  • The agent becomes more capable without increasing prompt size
  • Async Python provides clean, efficient execution of agent workflows
You can easily extend this to other toolkits like Gmail, Notion, Stripe, GitHub, and more by adding them to the toolkits parameter.

How to build Firecrawl MCP Agent with another framework

FAQ

What are the differences in Tool Router MCP and Firecrawl MCP?

With a standalone Firecrawl MCP server, the agents and LLMs can only access a fixed set of Firecrawl tools tied to that server. However, with the Composio Tool Router, agents can dynamically load tools from Firecrawl and many other apps based on the task at hand, all through a single MCP endpoint.

Can I use Tool Router MCP with LlamaIndex?

Yes, you can. LlamaIndex fully supports MCP integration. You get structured tool calling, message history handling, and model orchestration while Tool Router takes care of discovering and serving the right Firecrawl tools.

Can I manage the permissions and scopes for Firecrawl while using Tool Router?

Yes, absolutely. You can configure which Firecrawl scopes and actions are allowed when connecting your account to Composio. You can also bring your own OAuth credentials or API configuration so you keep full control over what the agent can do.

How safe is my data with Composio Tool Router?

All sensitive data such as tokens, keys, and configuration is fully encrypted at rest and in transit. Composio is SOC 2 Type 2 compliant and follows strict security practices so your Firecrawl data and credentials are handled as safely as possible.

Used by agents from

Context
Letta
glean
HubSpot
Agent.ai
Altera
DataStax
Entelligence
Rolai
Context
Letta
glean
HubSpot
Agent.ai
Altera
DataStax
Entelligence
Rolai
Context
Letta
glean
HubSpot
Agent.ai
Altera
DataStax
Entelligence
Rolai

Never worry about agent reliability

We handle tool reliability, observability, and security so you never have to second-guess an agent action.