Skip to content

Add natural language search for Premium Archive#2085

Open
samuelclay wants to merge 5 commits intomainfrom
natural-language-search
Open

Add natural language search for Premium Archive#2085
samuelclay wants to merge 5 commits intomainfrom
natural-language-search

Conversation

@samuelclay
Copy link
Copy Markdown
Owner

Summary

  • Adds hybrid keyword + semantic search that combines Elasticsearch keyword results with OpenAI embedding vector similarity from the discover index
  • Search queries are embedded via text-embedding-3-small, projected to 256 dims, and ranked by combined relevance score (60% keyword, 40% semantic) — gracefully falls back to keyword-only if embedding fails
  • Gated to Premium Archive users (discover index required); shows "AI" badge in search header when semantic search is active
  • Adds "Natural language search by near terms" to Premium Archive feature lists across upgrade dialog, welcome page, iOS, and template tags; removes from Pro "Coming soon"

Test plan

  • Search with a natural language query (e.g., "articles about rising sea levels") as an archive user and verify semantically related results appear
  • Search with the same query as a non-archive premium user and verify keyword-only results
  • Verify fallback: if OpenAI API is unreachable, keyword results still return
  • Verify "AI" badge appears in search header when semantic search is active
  • Check Premium Archive feature lists in upgrade dialog, welcome page, and iOS
  • Verify "Natural language search" is removed from Pro tier "Coming soon"

Generated with Claude Code

samuelclay and others added 2 commits March 12, 2026 21:30
Combines Elasticsearch keyword results with OpenAI embedding vector
similarity from the discover index. Search queries are embedded via
text-embedding-3-small, projected to 256 dims, and matched against
indexed stories using cosine similarity. Results are ranked by combined
relevance score (60% keyword, 40% semantic). Falls back to keyword-only
if embedding fails or discover index isn't ready. Shows "AI" badge in
search header when semantic search is active.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add "Natural language search by near terms" to archive tier in the
upgrade dialog, welcome page, iOS premium view, and random upgrade
reasons. Remove "Natural language search" from Premium Pro coming soon
since it's now shipped as an archive feature.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@samuelclay samuelclay changed the base branch from main to vlm-image-filter March 13, 2026 04:32
* main: (59 commits)
  Split Android backfill into per-user method, fix httplib2 pin
  Add "manually or by clicking in story" mark-as-read option
  Split Android backfill into per-user instance method
  Rewrite backfill_android_payments to use Google Play Orders API
  Fix right-click Mark as Read not working in manual mark-read mode
  Add backfill_android_payments to fill in missed Google Play renewals
  Add Google Play RTDN webhooks and fix Android subscription sync
  Fix Ansible conditional for cert backup download task
  Fix backdrop-filter stripped by LightningCSS on production
  Prepend featured image from wp:featuredmedia in WordPress JSON feeds
  Rebuild blog _site with Web Feeds post
  Publish Web Feeds blog post with March 13 date
  Move offsite backup scheduling from HA automation to SSH add-on cron
  Fix removing old title/text/url training that predates scope/is_regex fields
  Add WordPress REST API (wp-json) feed support
  Fix clicking in story detail bypassing manual mark-as-read preference
  Apply text classifiers against original_text on the frontend
  Stop auto-submitting Add Site when selecting an autocomplete result
  Skip retry and fallback requests for openrss.org feeds and enforce 3s rate limit
  Prioritize higher-ranked briefing sections in story allocation
  ...
Base automatically changed from vlm-image-filter to main March 15, 2026 00:10
samuelclay and others added 2 commits March 14, 2026 20:24
Resolve conflicts in PremiumView.swift, reader_premium_upgrade.js,
and welcome.xhtml keeping natural language filters as upcoming feature.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* main: (24 commits)
  Fix AttributeError in folder RSS feed when folder slug doesn't match
  Allow crawlers to access /media/img/ for OG image previews
  Add publish date to story clustering draft and enable incremental Jekyll builds
  Fix Open Graph and Twitter Card image previews on all static pages
  Rebuild blog _site with theme toggle fix
  Fix blog theme toggle not switching back to dark mode
  Use restart policy 'always' for node-exporter so it survives daemon restarts
  DRY up Docker container config for web and task deploys
  Fix AttributeError on MongoEngine queryset in feed limit notification task
  Fix ValueError from bracketed URLs in feed fetcher qurl() calls
  Fix ValueError from CRLF injection scans on /reader/feeds/ version param
  Fix MultiValueDictKeyError in rename_feed endpoint
  Fix UTF-8 encoding for web feed titles and allow link-free subscriptions
  Retry HAProxy delegated tasks on transient SSH failures during deploy
  Replace all MapReduce with aggregation pipelines to prevent JS lock cascade
  Disable Sentry tracing in flask_metrics to fix memory leak
  Skip attack detection on newsletter and push endpoints
  Add attack payload detection middleware with auto-ban
  Fix ValueError in URL normalization for malformed OPML imports
  Remove drafts from prod
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant