feat: add Tavily Extract as pluggable web scraper option#1303
Open
tavily-integrations wants to merge 1 commit intokhoj-ai:masterfrom
Open
feat: add Tavily Extract as pluggable web scraper option#1303tavily-integrations wants to merge 1 commit intokhoj-ai:masterfrom
tavily-integrations wants to merge 1 commit intokhoj-ai:masterfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
TAVILYtoWebScraper.WebScraperTypeenum in the database modelread_webpage_with_tavily()using the Tavily Extract API (/extractendpoint) to extract content from URLs, returning raw markdown contentscrape_webpage()for the new TAVILY scraper typeaget_enabled_webscrapers()adapter so Tavily is auto-discovered whenTAVILY_API_KEYis setTAVILY_API_KEYandTAVILY_API_URLenv var resolution in the model'sclean()method (consistent with existing Exa/Firecrawl/Olostep handling)0100_alter_webscraper_typefor the updated choices fieldtavily-python >= 0.5.0topyproject.tomldependenciesFiles changed
src/khoj/database/models/__init__.py— Added TAVILY enum value and env var resolutionsrc/khoj/processor/tools/online_search.py— Addedread_webpage_with_tavily()and scraper routingsrc/khoj/database/adapters/__init__.py— Added TAVILY env-var fallback in adaptersrc/khoj/database/migrations/0100_alter_webscraper_type.py— New migration for updated choicespyproject.toml— Addedtavily-pythondependencyEnvironment variable changes
TAVILY_API_KEY— Required to use Tavily web scraper (shared with tavily-web-search unit)TAVILY_API_URL— Optional, defaults tohttps://api.tavily.comDependency changes
tavily-python >= 0.5.0to pyproject.tomlNotes for reviewers
scrape_webpage_with_fallback()function requires no changes as it already iterates over all configured WebScraper DB records by priority🤖 Generated with Claude Code
Automated Review