Web Crawl
Initiates a web crawl starting from a specified URL, following links to discover and index web pages.
from composio_langchain import ComposioToolSet, Action
tool_set = ComposioToolSet()
tools = tool_set.get_tools(actions=[Action.SPIDER_WEB_CRAWL])
Generate Sitemap
Creates a sitemap of a website by crawling and mapping its structure.
from composio_langchain import ComposioToolSet, Action
tool_set = ComposioToolSet()
tools = tool_set.get_tools(actions=[Action.SPIDER_GENERATE_SITEMAP])
Check Broken Links
Scans a website to identify and report broken links or 404 errors.
from composio_langchain import ComposioToolSet, Action
tool_set = ComposioToolSet()
tools = tool_set.get_tools(actions=[Action.SPIDER_CHECK_BROKEN_LINKS])
Capture Screenshots
Takes screenshots of specified web pages during the crawling process.
from composio_langchain import ComposioToolSet, Action
tool_set = ComposioToolSet()
tools = tool_set.get_tools(actions=[Action.SPIDER_CAPTURE_SCREENSHOTS])
Download Files
Downloads files of specified types (e.g., PDFs, images) encountered during web crawling.
from composio_langchain import ComposioToolSet, Action
tool_set = ComposioToolSet()
tools = tool_set.get_tools(actions=[Action.SPIDER_DOWNLOAD_FILES])
Analyze Page Speed
Measures and reports the loading speed of crawled web pages.
from composio_langchain import ComposioToolSet, Action
tool_set = ComposioToolSet()
tools = tool_set.get_tools(actions=[Action.SPIDER_ANALYZE_PAGE_SPEED])
Generate Word Cloud
Creates a word cloud visualization based on the most frequent terms found during crawling.
from composio_langchain import ComposioToolSet, Action
tool_set = ComposioToolSet()
tools = tool_set.get_tools(actions=[Action.SPIDER_GENERATE_WORD_CLOUD])
Detect Content Changes
Compares current crawl results with previous ones to identify content changes on websites.
from composio_langchain import ComposioToolSet, Action
tool_set = ComposioToolSet()
tools = tool_set.get_tools(actions=[Action.SPIDER_DETECT_CONTENT_CHANGES])
Export Data To CSV
Exports extracted data or crawl results to a CSV file for further analysis.
from composio_langchain import ComposioToolSet, Action
tool_set = ComposioToolSet()
tools = tool_set.get_tools(actions=[Action.SPIDER_EXPORT_CSV])
Generate SEO Report
Creates a comprehensive SEO report based on the crawled data, including metadata analysis and keyword density.
from composio_langchain import ComposioToolSet, Action
tool_set = ComposioToolSet()
tools = tool_set.get_tools(actions=[Action.SPIDER_GENERATE_SEO_REPORT])
Detect Language
Automatically detects and reports the language of crawled web pages.
from composio_langchain import ComposioToolSet, Action
tool_set = ComposioToolSet()
tools = tool_set.get_tools(actions=[Action.SPIDER_DETECT_LANGUAGE])
Create Content Archive
Archives the content of crawled web pages for future reference or analysis.
from composio_langchain import ComposioToolSet, Action
tool_set = ComposioToolSet()
tools = tool_set.get_tools(actions=[Action.SPIDER_CREATE_CONTENT_ARCHIVE])
Analyze Internal Linking
Examines and reports on the internal linking structure of a website.
from composio_langchain import ComposioToolSet, Action
tool_set = ComposioToolSet()
tools = tool_set.get_tools(actions=[Action.SPIDER_ANALYZE_INTERNAL_LINKING])
Generate API Documentation
Automatically generates documentation for APIs discovered during web crawling.
from composio_langchain import ComposioToolSet, Action
tool_set = ComposioToolSet()
tools = tool_set.get_tools(actions=[Action.SPIDER_GENERATE_API_DOCS])
Perform Security Scan
Conducts a basic security scan of websites during crawling, identifying common vulnerabilities.
from composio_langchain import ComposioToolSet, Action
tool_set = ComposioToolSet()
tools = tool_set.get_tools(actions=[Action.SPIDER_SECURITY_SCAN])
Mobile Friendly Test
Performs a mobile-friendly test on crawled web pages and generates a report.
from composio_langchain import ComposioToolSet, Action
tool_set = ComposioToolSet()
tools = tool_set.get_tools(actions=[Action.SPIDER_MOBILE_FRIENDLY_TEST])
Generate Visual Sitemap
Creates a visual representation of the website structure based on crawl data.
from composio_langchain import ComposioToolSet, Action
tool_set = ComposioToolSet()
tools = tool_set.get_tools(actions=[Action.SPIDER_GENERATE_VISUAL_SITEMAP])
Crawl Depth Reached
Triggered when the spider reaches a specified crawl depth limit.
from composio_langchain import ComposioToolSet, Action
tool_set = ComposioToolSet()
tools = tool_set.get_tools(actions=[Action.SPIDER_CRAWL_DEPTH_REACHED])
New Domain Discovered
Fires when the spider encounters a link to a previously undiscovered domain.
from composio_langchain import ComposioToolSet, Action
tool_set = ComposioToolSet()
tools = tool_set.get_tools(actions=[Action.SPIDER_NEW_DOMAIN_DISCOVERED])
Crawl Completed
Triggered when a full web crawl operation is completed.
from composio_langchain import ComposioToolSet, Action
tool_set = ComposioToolSet()
tools = tool_set.get_tools(actions=[Action.SPIDER_CRAWL_COMPLETED])
Error Encountered
Fires when the spider encounters an error during crawling or data extraction.
from composio_langchain import ComposioToolSet, Action
tool_set = ComposioToolSet()
tools = tool_set.get_tools(actions=[Action.SPIDER_ERROR_ENCOUNTERED])
Data Threshold Reached
Triggered when the amount of extracted data reaches a specified threshold.
from composio_langchain import ComposioToolSet, Action
tool_set = ComposioToolSet()
tools = tool_set.get_tools(actions=[Action.SPIDER_DATA_THRESHOLD_REACHED])
New File Type Found
Fires when the spider discovers a new file type during crawling.
from composio_langchain import ComposioToolSet, Action
tool_set = ComposioToolSet()
tools = tool_set.get_tools(actions=[Action.SPIDER_NEW_FILE_TYPE_FOUND])
Duplicate Content Detected
Triggered when the spider identifies duplicate content across different pages.
from composio_langchain import ComposioToolSet, Action
tool_set = ComposioToolSet()
tools = tool_set.get_tools(actions=[Action.SPIDER_DUPLICATE_CONTENT_DETECTED])
Crawl Rate Limit Reached
Fires when the spider reaches the configured crawl rate limit to avoid overloading servers.
from composio_langchain import ComposioToolSet, Action
tool_set = ComposioToolSet()
tools = tool_set.get_tools(actions=[Action.SPIDER_RATE_LIMIT_REACHED])
New Technology Detected
Triggered when the spider detects a new technology or framework used on a website.
from composio_langchain import ComposioToolSet, Action
tool_set = ComposioToolSet()
tools = tool_set.get_tools(actions=[Action.SPIDER_NEW_TECHNOLOGY_DETECTED])
Content Update Detected
Fires when changes in content are detected on previously crawled pages.
from composio_langchain import ComposioToolSet, Action
tool_set = ComposioToolSet()
tools = tool_set.get_tools(actions=[Action.SPIDER_CONTENT_UPDATE_DETECTED])
Crawl Budget Exceeded
Triggered when the spider exceeds the allocated crawl budget (e.g., number of pages or data volume).
from composio_langchain import ComposioToolSet, Action
tool_set = ComposioToolSet()
tools = tool_set.get_tools(actions=[Action.SPIDER_CRAWL_BUDGET_EXCEEDED])
New Structured Data Found
Fires when the spider discovers new structured data (e.g., Schema.org markup) on a web page.
from composio_langchain import ComposioToolSet, Action
tool_set = ComposioToolSet()
tools = tool_set.get_tools(actions=[Action.SPIDER_NEW_STRUCTURED_DATA_FOUND])