Is a human-validated subset of the original dataset, consisting of 500 samples reviewed by software engineers.
Is a subset of the original dataset with 300 samples.
You can add GitHub, Jira, Linear, and more with AI agents to automate workflows.
Run your agents in secure and isolated environments. Use Docker for a complete local experience or any cloud-hosted solution like E2B and FlyIo.
Using SweKit, we built an automated GitHub code review bot. The GitHub PR Agent automates pull request reviews, helping identify code quality issues and flagging potential bugs. You can also integrate Slack and get a summary of a PR right in your channel, making the code review process feel less like a job.
We also created a Q&A bot for codebases that lets anyone ask questions about any codebase from Slack. The agent can explain specific functions, clarify dependencies, and offer insights into architecture. Again, you can integrate Slack to get answers right where your tech team resides.
This is the agent we used to top the SweBench rankings. It interacts with GitHub repositories through GitHub Integration, navigating and working with local codebases using CodeAnalysis, file management and shell tools.