Skip to content

Commit f970b2f

Browse files
committed
[AI Search] add changelog
1 parent 7591a02 commit f970b2f

File tree

2 files changed

+61
-21
lines changed

2 files changed

+61
-21
lines changed
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
---
2+
title: AI Search CSS content selectors for precise content extraction
3+
description: Control which parts of crawled pages are indexed using CSS selectors.
4+
products:
5+
- ai-search
6+
date: 2026-04-08
7+
---
8+
9+
[AI Search](/ai-search/) now supports [CSS content selectors](/ai-search/configuration/data-source/website/#content-selectors) for website data sources. You can now define which parts of a crawled page are extracted and indexed by specifying CSS selectors paired with URL glob patterns.
10+
11+
Content selectors solve the problem of indexing only relevant content while ignoring navigation, sidebars, footers, and other boilerplate. When a page URL matches a glob pattern, only elements matching the corresponding CSS selector are extracted and converted to Markdown for indexing.
12+
13+
Configure content selectors via the dashboard or API:
14+
15+
```bash
16+
curl "https://api.cloudflare.com/client/v4/accounts/{account_id}/ai-search/instances" \
17+
-H "Authorization: Bearer {api_token}" \
18+
-H "Content-Type: application/json" \
19+
-d '{
20+
"id": "my-ai-search",
21+
"source": "https://example.com",
22+
"type": "web-crawler",
23+
"source_params": {
24+
"web_crawler": {
25+
"parse_options": {
26+
"content_selector": [
27+
{
28+
"path": "**/blog/**",
29+
"selector": "article .post-body"
30+
}
31+
]
32+
}
33+
}
34+
}
35+
}'
36+
```
37+
38+
Selectors are evaluated in order, and the first matching pattern wins. You can define up to 10 content selector entries per instance.
39+
40+
For configuration details and examples, refer to the [content selectors documentation](/ai-search/configuration/data-source/website/#content-selectors).

src/content/docs/ai-search/configuration/data-source/website.mdx

Lines changed: 21 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -105,58 +105,58 @@ curl "https://api.cloudflare.com/client/v4/accounts/{account_id}/ai-search/insta
105105
}'
106106
```
107107

108-
| Field | Type | Description |
109-
| ---------- | ------ | ------------------------------------------------------------------------------------------------------------------------------------------------ |
108+
| Field | Type | Description |
109+
| ---------- | ------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
110110
| `path` | string | Glob pattern to match against the full page URL. Uses the same glob syntax as [path filtering](/ai-search/configuration/path-filtering/)`*` matches within a segment, `**` crosses directories. Maximum 200 characters. |
111-
| `selector` | string | CSS selector to extract content from pages matching the path pattern. Supports standard CSS selectors including element, class, ID, and attribute selectors. Maximum 200 characters. |
111+
| `selector` | string | CSS selector to extract content from pages matching the path pattern. Supports standard CSS selectors including element, class, ID, and attribute selectors. Maximum 200 characters. |
112112

113113
### Examples
114114

115115
#### Extract main content from blog pages
116116

117117
To index only the article body on blog pages and ignore navigation, sidebars, and footers:
118118

119-
| Path | Selector |
120-
| -------------- | -------------------- |
121-
| `**/blog/**` | `article .post-body` |
119+
| Path | Selector |
120+
| ------------ | -------------------- |
121+
| `**/blog/**` | `article .post-body` |
122122

123123
#### Target documentation content
124124

125125
To index the main content area of a documentation site:
126126

127-
| Path | Selector |
128-
| -------------- | -------------- |
129-
| `**/docs/**` | `main .content` |
127+
| Path | Selector |
128+
| ------------ | --------------- |
129+
| `**/docs/**` | `main .content` |
130130

131131
#### Different selectors for different sections
132132

133133
You can define multiple entries to apply different selectors to different parts of your site. The first matching path wins, so place more specific patterns first:
134134

135-
| Path | Selector |
136-
| ---------------------- | -------------------- |
137-
| `**/blog/releases/**` | `.release-notes` |
138-
| `**/blog/**` | `article .post-body` |
139-
| `**/docs/**` | `main .content` |
135+
| Path | Selector |
136+
| --------------------- | -------------------- |
137+
| `**/blog/releases/**` | `.release-notes` |
138+
| `**/blog/**` | `article .post-body` |
139+
| `**/docs/**` | `main .content` |
140140

141141
In this example, a page at `https://example.com/blog/releases/v2` matches the first pattern and uses the `.release-notes` selector. A page at `https://example.com/blog/my-post` skips the first pattern and matches the second.
142142

143143
:::caution
144-
If a CSS selector does not match any elements on a page, the page is indexed with empty content. Verify that your selectors match the expected elements before applying them to a broad set of pages.
144+
If a CSS selector does not match any elements on a page, the resulting Markdown is empty and AI Search marks the item as errored. Verify that your selectors match the expected elements before applying them to a broad set of pages.
145145
:::
146146

147147
### Interaction with other features
148148

149149
- **Path filtering**: [Path filtering](/ai-search/configuration/path-filtering/) takes priority over content selectors. Pages excluded by path filters are never crawled, so content selectors do not apply to them.
150150
- **Browser Rendering**: Content selectors apply to the HTML that AI Search receives. For sites that render content with JavaScript, turn on [Browser Rendering](#rendering-mode) so that selectors can target the fully rendered DOM.
151-
- **Future crawls only**: Changes to content selectors apply to pages crawled after the change. To apply new selectors to already-indexed pages, trigger a new [sync job](/ai-search/configuration/indexing/).
151+
- **Automatic re-indexing**: Updating content selectors triggers a new [sync job](/ai-search/configuration/indexing/) immediately, so changes are applied to all indexed pages.
152152

153153
### Limits
154154

155-
| Limit | Value |
156-
| --------------------------------- | -------------- |
157-
| Maximum content selector entries | 10 |
158-
| Maximum path pattern length | 200 characters |
159-
| Maximum selector length | 200 characters |
155+
| Limit | Value |
156+
| -------------------------------- | -------------- |
157+
| Maximum content selector entries | 10 |
158+
| Maximum path pattern length | 200 characters |
159+
| Maximum selector length | 200 characters |
160160

161161
## Best practices for robots.txt and sitemap
162162

0 commit comments

Comments
 (0)