How to integrate Google cloud vision MCP with Codex

Framework Integration Gradient
Google cloud vision Logo
Codex Logo
divider

Introduction

Codex is one of the most popular coding harnesses out there. And MCP makes the experience even better. With Google cloud vision MCP integration, you can draft, triage, summarise emails, and much more, all without leaving the terminal or app, whichever you prefer.

Composio removes the Authentication handling completely from you. We handle the entire integration lifecycle, and all you need to do is just copy the URL below, authenticate inside Codex, and start using it.

Why use Composio?

Apart from a managed and hosted MCP server, you will get:

  • CodeAct: A dedicated workbench that allows GPT to write its code to handle complex tool chaining. Reduces to-and-fro with LLMs for frequent tool calling.
  • Large tool responses: Handle them to minimise context rot.
  • Dynamic just-in-time access to 20,000 tools across 870+ other Apps for cross-app workflows. It loads the tools you need, so GPTs aren't overwhelmed by tools you don't need.

How to install Google cloud vision MCP in Codex

Codex CLI

Run the command in your terminal.

Terminal

This will auto-redirect you to the Rube authentication page.

Rube authentication redirect page

Once you're authenticated, you will be able to access the tools.

Verify the installation by running:

codex mcp list

If you otherwise prefer to use config.toml, add the following URL to it. You can get the bearer token from rube.app → Use Rube → MCP URL → Generate token

[projects."/home/user/composio"]
trust_level = "untrusted"

[mcp_servers.rube]
bearer_token_env_var = "your bearer token"
enabled = true
url = "https://rube.app/mcp"

Codex in VS Code

If you have installed Codex in VS Code.

Then: ⚙️ → MCP Settings → + Add servers → Streamable HTTP:

Add the Rube MCP URL: https://rube.app/mcp and the bearer token.

VS Code MCP Settings

To verify, click on the Open config.toml

Open config toml in Codex

Make sure it's there:

[mcp_servers.composio_rube]
bearer_token_env_var = "your bearer token"
enabled = true
url = "https://rube.app/mcp"

Codex App

Codex App follows the same approach as VS Code.

  1. Click ⚙️ on the bottom left → MCP Servers → + Add servers → Streamable HTTP:
Codex App MCP Settings
  1. Restart and verify if it's there in .codex/config.toml
[mcp_servers.composio_rube]
bearer_token_env_var = "your bearer token"
enabled = true
url = "https://rube.app/mcp"
  1. Save, restart the extension, and start working.

What is the Google cloud vision MCP server, and what's possible with it?

The Google cloud vision MCP server is an implementation of the Model Context Protocol that connects your AI agent and assistants like Claude, Cursor, etc directly to your Google Cloud Vision account. It provides structured and secure access to your image analysis resources, so your agent can perform actions like registering products, managing reference images, listing endpoints, and automating large-scale image operations on your behalf.

  • Product and reference image management: Easily create new products and add reference images for visual search, enabling your agent to organize and expand your vision datasets effortlessly.
  • Bulk import and product set operations: Let your agent import large numbers of reference images into product sets from Cloud Storage CSV files, streamlining dataset curation at scale.
  • Automated product cleanup and deletion: Direct your agent to purge unused or orphan products from your project, keeping your cloud resources tidy without manual effort.
  • Location and endpoint discovery: Quickly list available Vision AI service locations and existing IndexEndpoints, making it easy for your agent to select optimal regions and manage deployment targets.
  • Vision API operation tracking: Retrieve and review ongoing or past Vision API operations, so your agent can monitor processing jobs and ensure workflow transparency.

Supported Tools & Triggers

Tools
Create Vision ProductTool to create and return a new Product resource.
Create ReferenceImageTool to create a ReferenceImage under a product.
Delete ProductTool to permanently delete a Product and its reference images.
Get ProductTool to get information associated with a Product.
Get Product SetTool to get a ProductSet.
Import Product SetsTool to asynchronously import reference images into ProductSets from a CSV in GCS.
List IndexEndpointsTool to list IndexEndpoints in a project and location.
List LocationsTool to list available Vision AI service locations for a project.
List Vision API OperationsTool to list operations that match the specified filter.
Purge ProductsTool to asynchronously delete products in a ProductSet or orphan products.
Update ProductTool to update a Product's mutable fields: displayName, description, and productLabels.
Update Product SetTool to update a ProductSet resource.
Add Product to ProductSetTool to add a Product to a specified ProductSet.
Cancel Vision OperationTool to cancel a long-running Vision API operation.
Delete Vision API OperationTool to delete a long-running Vision API operation.
Delete Product SetTool to permanently delete a ProductSet.
Delete Reference ImageTool to permanently delete a reference image.
Get Vision API OperationTool to get the latest state of a long-running operation.
Get Reference ImageTool to get information associated with a ReferenceImage.
List Products in ProductSetTool to list Products in a specified ProductSet.
List ProjectsTool to list Google Cloud projects accessible by the authenticated user.
List Reference ImagesTool to list reference images for a product.
Remove Product from ProductSetTool to remove a Product from a specified ProductSet.

Conclusion

You've successfully integrated Google cloud vision with Codex using Composio's Rube MCP server. Now you can interact with Google cloud vision directly from your terminal, VS Code, or the Codex App using natural language commands.

Key benefits of this setup:

  • Seamless integration across CLI, VS Code, and standalone app
  • Natural language commands for Google cloud vision operations
  • Managed authentication through Composio's Rube
  • Access to 20,000+ tools across 870+ apps for cross-app workflows
  • CodeAct workbench for complex tool chaining

Next steps:

  • Try asking Codex to perform various Google cloud vision operations
  • Explore cross-app workflows by connecting more toolkits
  • Build automation scripts that leverage Codex's AI capabilities

How to build Google cloud vision MCP Agent with another framework

FAQ

What are the differences in Tool Router MCP and Google cloud vision MCP?

With a standalone Google cloud vision MCP server, the agents and LLMs can only access a fixed set of Google cloud vision tools tied to that server. However, with the Composio Tool Router, agents can dynamically load tools from Google cloud vision and many other apps based on the task at hand, all through a single MCP endpoint.

Can I use Tool Router MCP with Codex?

Yes, you can. Codex fully supports MCP integration. You get structured tool calling, message history handling, and model orchestration while Tool Router takes care of discovering and serving the right Google cloud vision tools.

Can I manage the permissions and scopes for Google cloud vision while using Tool Router?

Yes, absolutely. You can configure which Google cloud vision scopes and actions are allowed when connecting your account to Composio. You can also bring your own OAuth credentials or API configuration so you keep full control over what the agent can do.

How safe is my data with Composio Tool Router?

All sensitive data such as tokens, keys, and configuration is fully encrypted at rest and in transit. Composio is SOC 2 Type 2 compliant and follows strict security practices so your Google cloud vision data and credentials are handled as safely as possible.

Used by agents from

Context
Letta
glean
HubSpot
Agent.ai
Altera
DataStax
Entelligence
Rolai
Context
Letta
glean
HubSpot
Agent.ai
Altera
DataStax
Entelligence
Rolai
Context
Letta
glean
HubSpot
Agent.ai
Altera
DataStax
Entelligence
Rolai

Never worry about agent reliability

We handle tool reliability, observability, and security so you never have to second-guess an agent action.