Markdown indexing
On this page
To provide the most accurate answers, Ask AI relies on cleanly structured content. To achieve this, it’s best to create a separate text-only index for Ask AI. This is especially important for documentation sites where layout elements, such as navigation components, might dilute your content.
To split plain text into chunks,
you can use the helpers.splitTextIntoRecords
helper in your crawler configuration.
This works with many plain-text formats, but Markdown is especially well-suited for this purpose.
Update your crawler configuration
Add the following code to the actions
array in your crawler configuration:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// actions: [ ...,
{
indexName: "my-markdown-index",
pathsToMatch: ["https://example.com/docs/**"],
recordExtractor: ({ $, url, helpers }) => {
const text = helpers.markdown("main"); // Change "main" to match your content tag (e.g., "main", "article", etc.)
if (text === "") return [];
// Extract language or other attributes as needed. Optional
const language = $("html").attr("lang") || "en";
return helpers.splitTextIntoRecords({
text,
baseRecord: {
url,
objectID: url,
title: $("head > title").text(),
lang: language, // Add more attributes as needed
},
maxRecordBytes: 100000, // Higher = fewer, larger records. Lower = more, smaller records.
// Note: Increasing this value may increase the token count for LLMs, which can affect context size and cost.
orderingAttributeName: "part",
});
},
},
// [...]
Update the index settings:
1
2
3
4
5
6
7
8
9
10
// initialIndexSettings: { ...,
"my-markdown-index": {
attributesForFaceting: ["lang"], // Add more if you extract more attributes
ignorePlurals: true,
minProximity: 4,
removeStopWords: false,
searchableAttributes: ["unordered(title)", "unordered(text)"],
removeWordsIfNoResults: "allOptional" // This will help if the LLM finds no results. A graceful fallback.
},
// ...},
Run the crawler
After updating the crawler configuration:
- Publish the configuration in the Crawler dashboard to save and activate it.
- Run the crawler to index your Markdown content.
Integrate the Markdown index with Ask AI
Once your Crawler and index are configured, set up your frontend to use both your main keyword index and your markdown index for AskAI. Here’s how you might configure DocSearch to use your main keyword index for search and your markdown index for AskAI:
1
2
3
4
5
6
7
8
9
10
11
docsearch({
indexName: 'YOUR_INDEX_NAME', // Main DocSearch keyword index
apiKey: 'YOUR_SEARCH_API_KEY',
appId: 'YOUR_APP_ID',
askAi: {
indexName: 'YOUR_INDEX_NAME-markdown', // Markdown index for AskAI
apiKey: 'YOUR_SEARCH_API_KEY', // (or a different key if needed)
appId: 'YOUR_APP_ID',
assistantId: 'YOUR_ALGOLIA_ASSISTANT_ID',
},
})
indexName
refers to your main DocSearch indexaskAi.indexName
refers to the dedicated Markdown index
Best practices
- Use clear consistent titles and headings for better discoverability
- Structure your content with headings and lists for better chunking
- Add facets to support filtering in your search UI or the Ask AI assistant.
For example, you can add attributes like
lang
,version
,tags
to your records and declare them asattributesForFaceting
. -
Adjust record size by changing
maxRecordBytes
.-
If your answers seem too broad or fragmented, increase
maxRecordBytes
to create fewer, larger records. This might increase the token count for LLMs, which can affect the size of the context window and the cost of each Ask AI response. -
If you have very large Markdown files, decrease
maxRecordBytes
to create smaller, more focused records.
-
Example configuration
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
// In your Crawler config:
// actions: [ ...,
{
indexName: "my-markdown-index",
pathsToMatch: ["https://example.com/**"],
recordExtractor: ({ $, url, helpers }) => {
const text = helpers.markdown("main"); // Change "main" to match your content tag (e.g., "main", "article", etc.)
if (text === "") return [];
// Customize selectors or meta extraction as needed. Optional
const language = $("html").attr("lang") || "en";
return helpers.splitTextIntoRecords({
text,
baseRecord: {
url,
objectID: url,
title: $("head > title").text(),
// Add more optional attributes to the record
lang: language
},
maxRecordBytes: 100000, // Higher = fewer, larger records. Lower = more, smaller records.
// Note: Increasing this value may increase the token count for LLMs, which can affect context size and cost.
orderingAttributeName: "part",
});
},
},
// ...],
// initialIndexSettings: { ...,
"my-markdown-index": {
attributesForFaceting: ["lang"], // Recommended if you add more attributes outside of objectID
ignorePlurals: true,
minProximity: 4,
removeStopWords: false,
searchableAttributes: ["unordered(title)", "unordered(text)"],
removeWordsIfNoResults: "allOptional" // This will help if the LLM finds no results. A graceful fallback.
},
// ...},
For example configuration using DocSearch with Docusaurus, Vitepress, or Astro, see Crawler configuration examples by integration