Skip to content

Commit f5abc94

Browse files
authored
[AI Search] Add new Workers AI models for text generation and embedding (#29704)
1 parent cfb4da3 commit f5abc94

File tree

2 files changed

+74
-36
lines changed

2 files changed

+74
-36
lines changed
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
---
2+
title: New Workers AI models for text generation and embedding in AI Search
3+
description: AI Search adds four new Workers AI models including GLM, Qwen, and EmbeddingGemma.
4+
products:
5+
- ai-search
6+
date: 2026-04-08
7+
---
8+
9+
[AI Search](/ai-search/) now supports four additional [Workers AI](/workers-ai/) models across text generation and embedding.
10+
11+
### Text generation
12+
13+
| Model | Context window (tokens) |
14+
| ---------------------------- | ----------------------- |
15+
| `@cf/zai-org/glm-4.7-flash` | 131,072 |
16+
| `@cf/qwen/qwen3-30b-a3b-fp8` | 32,000 |
17+
18+
GLM-4.7-Flash is a lightweight model from Zhipu AI with a 131,072 token context window, suitable for long-document summarization and retrieval tasks. Qwen3-30B-A3B is a mixture-of-experts model from Alibaba that activates only 3 billion parameters per forward pass, keeping inference fast while maintaining strong response quality.
19+
20+
### Embedding
21+
22+
| Model | Vector dims | Input tokens | Metric |
23+
| -------------------------------- | ----------- | ------------ | ------ |
24+
| `@cf/qwen/qwen3-embedding-0.6b` | 1,024 | 4,096 | cosine |
25+
| `@cf/google/embeddinggemma-300m` | 768 | 512 | cosine |
26+
27+
Qwen3-Embedding-0.6B supports up to 4,096 input tokens, making it a good fit for indexing longer text chunks. EmbeddingGemma-300M from Google produces 768-dimension vectors and is optimized for low-latency embedding workloads.
28+
29+
All four models are available without additional provider keys since they run on Workers AI. Select them when creating or updating an AI Search instance in the dashboard or through the API.
30+
31+
For the full list of supported models, refer to [Supported models](/ai-search/configuration/models/supported-models/).

src/content/docs/ai-search/configuration/models/supported-models.mdx

Lines changed: 43 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -11,50 +11,57 @@ This page lists all models supported by AI Search and their lifecycle status.
1111
If you would like to use a model that is not currently supported, reach out to us on [Discord](https://discord.gg/cloudflaredev) to request it.
1212
:::
1313

14-
1514
## Production models
15+
1616
Production models are the actively supported and recommended models that are stable, fully available.
1717

1818
### Text generation
19-
| Provider | Alias | Context window (tokens) |
20-
|---|---|---|
21-
| **Anthropic** | `anthropic/claude-3-7-sonnet` | 200,000 |
22-
| | `anthropic/claude-sonnet-4` | 200,000 |
23-
| | `anthropic/claude-opus-4` | 200,000 |
24-
| | `anthropic/claude-3-5-haiku` | 200,000 |
25-
| **Cerebras** | `cerebras/qwen-3-235b-a22b-instruct` | 64,000 |
26-
| | `cerebras/qwen-3-235b-a22b-thinking` | 65,000 |
27-
| | `cerebras/llama-3.3-70b` | 65,000 |
28-
| | `cerebras/llama-4-maverick-17b-128e-instruct` | 8,000 |
29-
| | `cerebras/llama-4-scout-17b-16e-instruct` | 8,000 |
30-
| | `cerebras/gpt-oss-120b` | 64,000 |
31-
| **Google AI Studio** | `google-ai-studio/gemini-2.5-flash` | 1,048,576 |
32-
| | `google-ai-studio/gemini-2.5-pro` | 1,048,576 |
33-
| **Grok (x.ai)** | `grok/grok-4` | 256,000 |
34-
| **Groq** | `groq/llama-3.3-70b-versatile` | 131,072 |
35-
| | `groq/llama-3.1-8b-instant` | 131,072 |
36-
| **OpenAI** | `openai/gpt-5` | 400,000 |
37-
| | `openai/gpt-5-mini` | 400,000 |
38-
| | `openai/gpt-5-nano` | 400,000 |
39-
| **Workers AI** | `@cf/meta/llama-3.3-70b-instruct-fp8-fast` | 24,000 |
40-
| | `@cf/meta/llama-3.1-8b-instruct-fast` | 60,000 |
41-
| | `@cf/meta/llama-3.1-8b-instruct-fp8` | 32,000 |
42-
| | `@cf/meta/llama-4-scout-17b-16e-instruct` | 131,000 |
19+
20+
| Provider | Alias | Context window (tokens) |
21+
| -------------------- | --------------------------------------------- | ----------------------- |
22+
| **Anthropic** | `anthropic/claude-3-7-sonnet` | 200,000 |
23+
| | `anthropic/claude-sonnet-4` | 200,000 |
24+
| | `anthropic/claude-opus-4` | 200,000 |
25+
| | `anthropic/claude-3-5-haiku` | 200,000 |
26+
| **Cerebras** | `cerebras/qwen-3-235b-a22b-instruct` | 64,000 |
27+
| | `cerebras/qwen-3-235b-a22b-thinking` | 65,000 |
28+
| | `cerebras/llama-3.3-70b` | 65,000 |
29+
| | `cerebras/llama-4-maverick-17b-128e-instruct` | 8,000 |
30+
| | `cerebras/llama-4-scout-17b-16e-instruct` | 8,000 |
31+
| | `cerebras/gpt-oss-120b` | 64,000 |
32+
| **Google AI Studio** | `google-ai-studio/gemini-2.5-flash` | 1,048,576 |
33+
| | `google-ai-studio/gemini-2.5-pro` | 1,048,576 |
34+
| **Grok (x.ai)** | `grok/grok-4` | 256,000 |
35+
| **Groq** | `groq/llama-3.3-70b-versatile` | 131,072 |
36+
| | `groq/llama-3.1-8b-instant` | 131,072 |
37+
| **OpenAI** | `openai/gpt-5` | 400,000 |
38+
| | `openai/gpt-5-mini` | 400,000 |
39+
| | `openai/gpt-5-nano` | 400,000 |
40+
| **Workers AI** | `@cf/meta/llama-3.3-70b-instruct-fp8-fast` | 24,000 |
41+
| | `@cf/meta/llama-3.1-8b-instruct-fast` | 60,000 |
42+
| | `@cf/meta/llama-3.1-8b-instruct-fp8` | 32,000 |
43+
| | `@cf/meta/llama-4-scout-17b-16e-instruct` | 131,000 |
44+
| | `@cf/zai-org/glm-4.7-flash` | 131,072 |
45+
| | `@cf/qwen/qwen3-30b-a3b-fp8` | 32,000 |
4346

4447
### Embedding
45-
| Provider | Alias | Vector dims | Input tokens | Metric |
46-
|---|---|---|---|---|
47-
| **Google AI Studio** | `google-ai-studio/gemini-embedding-001` | 1,536 | 2048 | cosine |
48-
| **OpenAI** | `openai/text-embedding-3-small` | 1,536 | 8192 | cosine |
49-
| | `openai/text-embedding-3-large` | 1,536 | 8192 | cosine |
50-
| **Workers AI** | `@cf/baai/bge-m3` | 1,024 | 512 | cosine |
51-
| | `@cf/baai/bge-large-en-v1.5` | 1,024 | 512 | cosine |
48+
49+
| Provider | Alias | Vector dims | Input tokens | Metric |
50+
| -------------------- | --------------------------------------- | ----------- | ------------ | ------ |
51+
| **Google AI Studio** | `google-ai-studio/gemini-embedding-001` | 1,536 | 2048 | cosine |
52+
| **OpenAI** | `openai/text-embedding-3-small` | 1,536 | 8192 | cosine |
53+
| | `openai/text-embedding-3-large` | 1,536 | 8192 | cosine |
54+
| **Workers AI** | `@cf/baai/bge-m3` | 1,024 | 512 | cosine |
55+
| | `@cf/baai/bge-large-en-v1.5` | 1,024 | 512 | cosine |
56+
| | `@cf/qwen/qwen3-embedding-0.6b` | 1,024 | 4,096 | cosine |
57+
| | `@cf/google/embeddinggemma-300m` | 768 | 512 | cosine |
5258

5359
### Reranking
54-
| Provider | Alias | Input tokens |
55-
|---|---|---|
56-
| **Workers AI** | `@cf/baai/bge-reranker-base` | 512 |
60+
61+
| Provider | Alias | Input tokens |
62+
| -------------- | ---------------------------- | ------------ |
63+
| **Workers AI** | `@cf/baai/bge-reranker-base` | 512 |
5764

5865
## Transition models
5966

60-
There are currently no models marked for end-of-life.
67+
There are currently no models marked for end-of-life.

0 commit comments

Comments
 (0)