Swarm: The Agentic Framework from OpenAI

•

Oct 16, 2024

AI Agents

OpenAI recently made an unexpected move by unveiling Swarm, an experimental and lightweight framework designed to simplify the creation of multi-agent workflows.

I’ve been playing with various frameworks for a while, so I checked this one out. Surprisingly, it was a minimal, bare-bones framework—refreshingly different from the more complex alternatives.

I went through the codebase, which you might feel is very small for an agentic framework, and also executed a few examples. So, here are my thoughts on the Swarm and where it stands among the existing frameworks.

TL;DR

If you are busy and have anywhere else to go, here’s the summary.

1. Swarm is a minimal, clean, lightweight framework for building agentic workflows.
2. The Agent in Swarm is simply an LLM with instructions and function calls. So, Can we finally ditch all the complicated explanations?
3. Unlike Assistants API, this is entirely client-side and does not explicitly store state or provide any retrieval.
4. OpenAI Swarm shares its characteristics with CrewAi, also inspired by the Swarm's repository.
5. It is an excellent resource for learning the basics of AI agent orchestration. However, it is not ideal for production apps.

If you are still here, check out how to integrate third-party services into your AI workflows.

Let’s Understand OpenAI Swarm

As I mentioned, it is a framework that allows you to build multi-agent systems to automate complex workflows. As per OpenAI, the framework is experimental and limited to educational purposes only. It is also for developers who are curious to learn about multi-agent orchestration. However, it tells a lot about an organization like OpenAI's philosophy about AI agents.

The core of Swarm is just a single file, which I liked. It’s simple and doesn't entertain unnecessary abstraction to make things look more complex than they are.

Get started with Swarm by initializing the Swarm client.

from swarm import Swarm
client = Swarm()

Client.run()

Under the hood, the client. run () wraps the chat. completions. create (), so most of us are familiar with the operations.

At its core, it implements the following functions.

Get a completion from the current Agent
Execute tool calls and append results
Switch Agent if necessary
Update context variables, if necessary
If no new function calls, return

It accepts the following parameters.

agent (Type: Agent): The (initial) agent to be called. (Required)
messages (Type: List): A list of message objects identical to Chat Completions messages. (Required)
context_variables (Type: dict): A dictionary of additional context variables, available to functions and Agent instructions. (Default: {})
max_turns (Type: int): The maximum number of conversational turns allowed. (Default: float("inf"))
model_override (Type: str): An optional string to override the model being used by an Agent. (Default: None)
execute_tools (Type: bool): If False, interrupt execution and immediately return tool_calls message when an Agent tries to call a function. (Default: True)
stream (Type: bool): If True, enables streaming responses. (Default: False)
debug (Type: bool): If True, enables debug logging. (Default: False)

Once the run is finished, it returns a Response object with the following fields.

messages (Type: List): A list of message objects generated during the conversation. Like Chat Completions messages, it includes a sender field indicating which Agent the message originated from.
agent (Type: Agent): The last agent to handle a message.
context_variables (Type: dict): The input variables, plus any changes made during the conversation.

Swarm Agent

The Swarm Agent is a simple Pydantic class with the following parameters.

class Agent(BaseModel):
    name: str = "Agent"
    model: str = "gpt-4o"
    instructions: Union[str, Callable[[], str]] = "You are a helpful agent."
    functions: List[AgentFunction] = []
    tool_choice: str = None
    parallel_tool_calls: bool = True

instructions: These are simple strings or executables that return a string value. These are used as the system prompt.
functions: List of functions to be called by LLM to execute tasks.
tool_choice: Simillar to ChatCompletions API, tool_choice, if set to required, forces the LLM to choose one of the provided functions. Setting it to auto will let the LLM decide if any tool needs to be called and none for no function calls.
parallel-tool_calls: This enables the agent to call multiple functions simultaneously.

So, let’s explore a simple example of Swarm.

from swarm import Agent
def process_refund(item_id, reason="NOT SPECIFIED"):
    """Refund an item. Refund an item. Make sure you have the item_id of the form item_... Ask for user confirmation before processing the refund."""
    print(f"[mock] Refunding item {item_id} because {reason}...")
    return "Success!"
def apply_discount():
    """Apply a discount to the user's cart."""
    print("[mock] Applying discount...")
    return "Applied discount of 11%"
triage_agent = Agent(
    name="Triage Agent",
    instructions="Determine which agent is best suited to handle the user's request, and transfer the conversation to that agent.",
)
sales_agent = Agent(
    name="Sales Agent",
    instructions="Be super enthusiastic about selling bees.",
)
refunds_agent = Agent(
    name="Refunds Agent",
    instructions="Help the user with a refund. If the reason is that it was too expensive, offer the user a refund code. If they insist, then process the refund.",
    functions=[process_refund, apply_discount],
)
def transfer_back_to_triage():
    """Call this function if a user is asking about a topic that is not handled by the current agent."""
    return triage_agent
def transfer_to_sales():
    return sales_agent
def transfer_to_refunds():
    return refunds_agent
triage_agent.functions = [transfer_to_sales, transfer_to_refunds]
sales_agent.functions.append(transfer_back_to_triage)
refunds_agent.functions.append(transfer_back_to_triage)

As you can observe above, we have three agents triage agents for selecting other agents based on prompts: a sales agent and a refund agent. We also have functions to hand off control to other agents.

Here is the overall workflow

The user starts with the Triage Agent to route the request.
The Triage Agent decides if the request is for sales or refunds.
Sales Agent handles sales queries and transfers non-sales back to triage.
Refunds Agent processes refunds and offers discounts for expensive items.
Refunds Agent transfers non-refund queries back to triage.
Agents can escalate or redirect conversations based on expertise.

These are all the fundamentals needed for a Swarm. You can use the same principle, add as many agents and tools as you like, and build complex workflows.

OpenAI Assistants Vs Swarm

Both the Assistant and Swarm APIs serve different purposes. The Assistants API makes sense if you are building for real customers in the real world. It has built-in memory management with retrieval and is fully hosted.

On the other hand, Swarm is wholly client-side and is a wrapper around Chat completion API. It does not store states or provide any retrieval to keep the agency intact for long.

Where does Swarm stand?

If you've used popular agentic frameworks, you'll notice that Swarm takes a similar approach to CrewAI. Digging a bit further, I found out that the OpenAI Swarm and CrewAI were both inspired by the original repository named Swarms.

You can check out this original thread on Twitter.

The Swarm provides a gateway for builders to understand multi-agent orchestration workflows. In its current state, it is usable only for demo applications. OpenAI has also clarified that it will not be actively supporting Swarms and will keep them for only educational purposes. You must look for other frameworks or use the Assistants API with Pydantic for anything serious.

Check out this article if you want to learn how to integrate third-party services like Gmail and Google Sheets to build real-life AI agents.