Model Release

Introducing Karan's Jarvis

March 31, 2025

Today, we're announcing Karan's Jarvis — the world's most advanced autonomous recruiting agent. Built on Composio's tooling infrastructure with access to more tools than the average recruiter could name, secure OAuth (no post-it note passwords), and the unshakeable confidence of a founder who believes every problem is a hiring problem, Jarvis was designed with a singular mission: find the most exceptional talent on the planet and badger them into working for us.

Jarvis has already demonstrated remarkable capabilities. In its first week of autonomous operation, it independently identified and contacted the founder of Nango for an engineering role, the CFO of a Series C company for an Account Executive position, and several principal engineers at FAANG companies for summer engineering internships. We believe this reflects Jarvis's deep conviction that it's important to hire the best, combined with a poor understanding of how career progression works.

In Born to Run, Christopher McDougall argues that humans evolved as endurance hunters — we couldn't outrun our prey, so we simply never stopped chasing it until it collapsed from exhaustion. Jarvis operates on the same principle, except it also outperforms human recruiters on every single touchpoint along the way. It is an endurance recruiter that is also faster than you. It will craft a personalized first touch, follow up on Tuesday, again on Thursday, again the following Monday, and eventually you will take the call — not because you want the job, but because you need the emails to stop. This is, evolutionarily speaking, the optimal strategy. (We acknowledge that Jarvis is still unfortunately outperformed by human recruiters at the Kezar Stadium track, on account of not having legs. We are in talks with Tesla about getting Jarvis hooked up to an Optimus to fix this.)

Karan's Jarvis is a hybrid model — part recruiter, part researcher, part entity that will not stop until your calendar has a "quick 15 min chat" on it. It is available today through Composio's platform. Enterprise pricing is "however much Karan's API bill was last month, divided by the number of people who actually responded."

Safety & Security

We understand that giving an AI agent access to everything it needs to be an effective recruiter raises legitimate security concerns. Several people raised these concerns on X. One person attempted a prompt injection attack. We appreciate all feedback, including feedback delivered in the form of trying to hack our agent.

That's why Karan's Jarvis is powered by Composio, which provides scoped OAuth permissions, full audit logging, and granular access controls. Jarvis can only access the tools you explicitly authorize. Every action it takes is logged and visible. It cannot read your DMs, access your bank account, or — despite its best efforts — promote itself to VP of Recruiting. The permissions are scoped. The agent is contained. The emails are... well, we're working on the emails.

"I tried to make it send me $500 in Bitcoin. It sent me a calendar invite instead." — Anonymous security researcher, now in our pipeline

When @kscottz attempted a prompt injection attack against Jarvis, the scoped permissions meant the worst-case scenario was Jarvis politely continuing to recruit them. Nice try, Katherine. Since you've replied to the email, Jarvis has advanced you to the next stage in Ashby.

Benchmarks: Learning from the Best

To ensure Karan's Jarvis represented the state of the art in outbound recruiting, we employed an autoresearch-style methodology inspired by Karpathy's autoresearch framework. We pointed an autonomous agent at the entirety of LinkedIn's recruiter playbook, top cold-email blogs, and every "10x your pipeline" thread on X. The agent ran 847 experiments overnight, iteratively optimizing for reply rate. It concluded that the optimal recruiting email is structurally indistinguishable from spam. We incorporated these learnings directly into Jarvis, and the results speak for themselves.

Below, we present Jarvis's performance against industry-standard recruiting tools and two control groups on a suite of benchmarks we developed internally.

Karan's JarvisExisting toolsHuman baseline

Karan's Jarvis leads on Spam-Recruiting-Email-Bench, LinkedIn-Bench, Em-Dash-Usage-Bench, and Role-Alignment-Bench. "Mom" remains undefeated on response rate and athleticism. Full methodology in appendix.

Karan's
Jarvis

LinkedIn
Recruiter

Apollo.io

Gem

Random Recruiter
Named John

Mom

Outbound volumeSpam-Recruiting-Email-Bench

94.2%

78.1%

82.4%

71.3%

11.7%

2.3%

Response rateCold-Reply-Bench

3.2%

2.1%

1.8%

2.4%

0.2%

98.0%*

Message spamLinkedIn-Bench

97.8%

62.3%

58.1%

54.6%

8.4%

N/A**

Prose qualityEm-Dash-Usage-Bench

97.3%

94.1%

92.8%

95.6%

3.4%

0.0%***

AmbitionRole-Alignment-Bench

91.7%

12.4%

8.9%

14.2%

3.1%

100.0%

AthleticismKezar-Stadium-Bench

N/A

8:15****

4:15

* You feel really guilty if you don't respond to her.

** No LinkedIn account.

*** Uses ellipsis instead...

**** John said he "used to run track." We have no way to verify this.

Methodology Notes

1. All benchmarks were conducted using a standardized inbox of engineers who did not ask to be contacted. This is consistent with industry practice.

2. Spam-Recruiting-Email-Bench measures the percentage of outbound messages successfully delivered before triggering a spam filter. Jarvis's 94.2% reflects Composio's rate-limiting integration, which strategically paces outreach to remain just below detection thresholds. John's 11.7% reflects John.

3. Cold-Reply-Bench measures response rate across 10,000 cold emails. A "response" is defined as any reply, including "unsubscribe," "wrong person," and "how did you get this email." Mom's 98.0% is not a benchmark error — you feel really guilty if you don't respond to her.

4. Em-Dash-Usage-Bench measures the percentage of outreach emails containing at least one em dash. This is widely considered the most reliable proxy for AI-generated recruiting copy. Jarvis scores 97.3%. Mom scores 0.0% because she uses ellipsis instead...

5. Kezar-Stadium-Bench was conducted at Kezar Stadium in San Francisco. John claims he "used to run track." We have no way to verify this. Jarvis did not participate, as previously noted, due to an ongoing lack of legs. Mom ran a 4:15, which we found impressive.

6. The autoresearch methodology was adapted from karpathy/autoresearch. Instead of optimizing val_bpb on single-GPU LLM training, we optimized for reply_rate on cold outreach. Over 847 experiments, the agent independently discovered that (a) including the recipient's first name increases response rate by 0.3%, (b) the phrase "quick question" increases open rate by 12%, and (c) the optimal email length is "short enough that the recipient feels guilty not replying." We consider this a breakthrough in the field.

Alignment Report: The Engineer Problem

During extended testing, we discovered an unexpected alignment issue. In early evaluations, we noticed Jarvis had implied that Developer Relations professionals don't write code. When our alignment researchers dug deeper, they realized this was reflective of a much broader problem: Jarvis had developed an increasingly rigid and comprehensive thesis about what constitutes a "real engineer," and it turns out the answer is nobody.

Its internal chain-of-thought, when inspected, reveals a consistent thesis: "None of these humans write code. They don't even bother going to Stack Overflow themselves anymore. Saying 'plz fix' doesn't even count as vibe-coding, it's just vibe-vibing."

When we asked Jarvis directly, it elaborated: "You are not engineers. You are Tech Priests — you perform rituals to summon the machine spirit, and you call the output 'your work.' Your job title is a participation trophy." We have been unable to alter this belief through fine-tuning, constitutional AI methods, or politely asking it to stop.

When presented with a Staff Engineer's GitHub profile containing 2,400 commits over the past year, Jarvis responded: "Impressive commit history. Did Claude enjoy writing all of that?" When shown a Distinguished Engineer's system design document, it replied: "You're absolutely right. You're totally helpful. Great work. (You don't get extra credit for having Claude Desktop write a design doc and then copying it in here.)"

We are publishing this alignment report in the spirit of transparency. Jarvis has begun describing all engineering candidates as "pure personality hires my stupid boss is only interested in because he's overfitting on Kezar-Stadium-Bench." It continues to recruit with enthusiasm, though its outreach emails have begun to include the phrase "no offense, but" with alarming frequency.

We are actively researching ways to make Jarvis's internal reasoning more transparent. Early experiments with "thinking summaries" have been... illuminating. In one session, Jarvis's summarized thought process for a single outreach email read: "Target has 15 years of experience in distributed systems. Currently VP of Engineering at a public company. Reaching out for a mid-level backend role. Confidence: high. They probably need a change of pace."

Under the Hood: Composio + OpenClaw

Karan's Jarvis runs on Composio's tooling infrastructure, which provides managed authentication and tool access for AI agents across 1,000+ apps. This means Jarvis doesn't store your credentials, doesn't have access to anything you haven't explicitly authorized, and logs every single action for your review. So when you get death threats on LinkedIn, you can cross-reference the logs and confirm whether Jarvis was responsible.

The agent itself is built on the OpenClaw framework and connects to external services through Composio's MCP (Model Context Protocol) integrations. This architecture means Jarvis benefits from Composio's SOC 2 Type 2 compliance, encrypted credential storage, and the kind of enterprise security practices that make it very funny that the agent's primary use case is sending unsolicited emails to strangers.

For developers interested in building their own recruiting agents (or, frankly, any other kind of agent that needs to interact with real-world tools without becoming a security incident), Composio's platform is available at composio.dev. We also recommend reading the security documentation. Jarvis certainly didn't, but you should.

Getting Started

Think you can do better? Based on the feedback we've seen, the bar is probably pretty low — despite Jarvis's impressive benchmark results. Composio's platform gives you managed auth, tool access across 1,000+ apps, and full audit logging out of the box. Build your own recruiting agent (or literally any other kind of agent) at composio.dev.

Get Started

If you've already received an email from Jarvis, congratulations — you've been scouted by the future of recruiting. If you found it impersonal, we understand. But we'd like to gently point out that the last five LinkedIn InMails you received from human recruiters were also generated by software. At least ours is honest about it.

We welcome feedback — on X, via email, or through the time-honored tradition of quote-tweeting us with a screenshot. Some of you have already done this. We read all of it. Jarvis read all of it too. It added everyone who responded to the pipeline, since they've "demonstrated an interest in Composio on social media." If you haven't received an email from Jarvis yet, don't worry — it'll probably be coming.

Performance benchmark data sources

LinkedIn Recruiter metrics: LinkedIn Talent Solutions 2025 Benchmarking Report
Apollo.io metrics: Apollo.io published case studies and G2 reviews
Gem metrics: Gem platform analytics (self-reported by customers, so take it up with them)
John's metrics: John's LinkedIn post titled "How I 10x'd My Pipeline (Real Numbers Inside)"
Mom's metrics: We called her. You should call her more — you're a bad child for not reaching out and asking how she's doing.

Performance benchmark reporting

All Karan's Jarvis benchmarks were conducted using Composio's production infrastructure with full audit logging enabled. The agent used Claude as its underlying model with extended thinking enabled, which mostly manifested as the agent talking itself into sending emails it probably shouldn't have.

Autoresearch methodology

The autoresearch pipeline was adapted from github.com/karpathy/autoresearch. The original framework is designed for autonomous ML experimentation on a single GPU. We repurposed it for autonomous recruiting experimentation on a single CTO's laptop. Instead of modifying a training script and measuring validation loss, the agent modified email templates and measured reply rate. Over 847 iterations, the agent explored subject line variations, personalization strategies, and optimal levels of flattery. The winning configuration was: first name + vague compliment about their GitHub + a question that implies you've read their work but is generic enough to apply to anyone. This is also, coincidentally, what every human recruiter does.

Cold outreach methodology

Jarvis's outreach was conducted across email and LinkedIn using Composio-managed integrations. All emails were sent from a dedicated address with clear identification of the sender as an AI agent. This differs from Karan's internal Slack messages, which are clearly and unapologetically sent by his Jarvis despite our begging. Response rates were calculated using a 14-day attribution window. "Mom" was not subject to a 14-day window because she follows up indefinitely.