You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
title: Website Source CSS content selectors for precise content extraction in AI Search
3
+
description: Control which parts of crawled pages are indexed using CSS selectors.
4
+
products:
5
+
- ai-search
6
+
date: 2026-04-08
7
+
---
8
+
9
+
[AI Search](/ai-search/) now supports [CSS content selectors](/ai-search/configuration/data-source/website/#content-selectors) for website data sources. You can now define which parts of a crawled page are extracted and indexed by specifying CSS selectors paired with URL glob patterns.
10
+
11
+
Content selectors solve the problem of indexing only relevant content while ignoring navigation, sidebars, footers, and other boilerplate. When a page URL matches a glob pattern, only elements matching the corresponding CSS selector are extracted and converted to Markdown for indexing.
12
+
13
+
Configure content selectors via the dashboard or API:
Selectors are evaluated in order, and the first matching pattern wins. You can define up to 10 content selector entries per instance.
39
+
40
+
For configuration details and examples, refer to the [content selectors documentation](/ai-search/configuration/data-source/website/#content-selectors).
Copy file name to clipboardExpand all lines: src/content/docs/ai-search/configuration/data-source/website.mdx
+111Lines changed: 111 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -47,6 +47,117 @@ For example, to index only blog posts while excluding drafts:
47
47
48
48
Refer to [Path filtering](/ai-search/configuration/path-filtering/) for pattern syntax, filtering behavior, and more examples.
49
49
50
+
## Content selectors
51
+
52
+
Content selectors let you control which parts of a crawled page are indexed. Each entry pairs a URL glob pattern with a CSS selector. When a page URL matches a glob pattern, only the elements matching the corresponding CSS selector — and their descendants — are extracted and converted to Markdown for indexing.
53
+
54
+
The list is ordered and the **first matching path wins**. If a page URL matches multiple glob patterns, only the selector from the first match is applied. Order your entries from most specific to least specific.
55
+
56
+
### Default behavior
57
+
58
+
Without content selectors, AI Search applies a default processing pipeline that removes elements such as `<header>`, `<footer>`, and `<head>` before converting the remaining content to Markdown. For more details on how HTML is processed, refer to [How HTML is processed](/workers-ai/features/markdown-conversion/how-it-works/#html).
59
+
60
+
### Configure content selectors in the dashboard
61
+
62
+
<Steps>
63
+
64
+
1. Go to the [AI Search](https://dash.cloudflare.com/?to=/:account/ai/ai-search) page in the Cloudflare dashboard.
65
+
66
+
<DashButtonurl="/?to=/:account/ai/ai-search" />
67
+
68
+
2. Select your AI Search instance, or select **Create** to create a new one with a **Website** data source.
69
+
3. Under the data source settings, locate the **Content selectors** section.
70
+
4. Select **Add selector**.
71
+
5. In the **Path** field, enter a glob pattern to match page URLs. For example, `**/blog/**`.
72
+
6. In the **Selector** field, enter a CSS selector to extract content from matching pages. For example, `article .post-body`.
73
+
7. To add more entries, select **Add selector** again. Entries are evaluated in order from top to bottom.
74
+
75
+
</Steps>
76
+
77
+
### Configure content selectors via the API
78
+
79
+
Content selectors are configured in the `source_params.web_crawler.parse_options.content_selector` field when creating or updating an AI Search instance. The field accepts an array of objects, each with a `path` and `selector` property.
|`path`| string | Glob pattern to match against the full page URL. Uses the same glob syntax as [path filtering](/ai-search/configuration/path-filtering/) — `*` matches within a segment, `**` crosses directories. Maximum 200 characters. |
111
+
|`selector`| string | CSS selector to extract content from pages matching the path pattern. Supports standard CSS selectors including element, class, ID, and attribute selectors. Maximum 200 characters. |
112
+
113
+
### Examples
114
+
115
+
#### Extract main content from blog pages
116
+
117
+
To index only the article body on blog pages and ignore navigation, sidebars, and footers:
118
+
119
+
| Path | Selector |
120
+
| ------------ | -------------------- |
121
+
|`**/blog/**`|`article .post-body`|
122
+
123
+
#### Target documentation content
124
+
125
+
To index the main content area of a documentation site:
126
+
127
+
| Path | Selector |
128
+
| ------------ | --------------- |
129
+
|`**/docs/**`|`main .content`|
130
+
131
+
#### Different selectors for different sections
132
+
133
+
You can define multiple entries to apply different selectors to different parts of your site. The first matching path wins, so place more specific patterns first:
134
+
135
+
| Path | Selector |
136
+
| --------------------- | -------------------- |
137
+
|`**/blog/releases/**`|`.release-notes`|
138
+
|`**/blog/**`|`article .post-body`|
139
+
|`**/docs/**`|`main .content`|
140
+
141
+
In this example, a page at `https://example.com/blog/releases/v2` matches the first pattern and uses the `.release-notes` selector. A page at `https://example.com/blog/my-post` skips the first pattern and matches the second.
142
+
143
+
:::caution
144
+
If a CSS selector does not match any elements on a page, the resulting Markdown is empty and AI Search marks the item as errored. Verify that your selectors match the expected elements before applying them to a broad set of pages.
145
+
:::
146
+
147
+
### Interaction with other features
148
+
149
+
-**Path filtering**: [Path filtering](/ai-search/configuration/path-filtering/) takes priority over content selectors. Pages excluded by path filters are never crawled, so content selectors do not apply to them.
150
+
-**Browser Rendering**: Content selectors apply to the HTML that AI Search receives. For sites that render content with JavaScript, turn on [Browser Rendering](#rendering-mode) so that selectors can target the fully rendered DOM.
151
+
-**Automatic re-indexing**: Updating content selectors triggers a new [sync job](/ai-search/configuration/indexing/) immediately, so changes are applied to all indexed pages.
0 commit comments