Ask AI is a beta feature under the Algolia Terms of Service (“Beta Services”). Use of this feature is subject to Algolia’s GenAI Addendum.

Algolia AI / Ask AI / Guides

To provide the most accurate answers, Ask AI relies on cleanly structured content. To achieve this, it’s best to create a separate text-only index for Ask AI. This is especially important for documentation sites where layout elements, such as navigation components, might dilute your content.

To split plain text into chunks, you can use the helpers.splitTextIntoRecords helper in your crawler configuration. This works with many plain-text formats, but Markdown is especially well-suited for this purpose.

Update your crawler configuration

Add the following code to the actions array in your crawler configuration:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// actions: [ ...,
{
  indexName: "my-markdown-index",
  pathsToMatch: ["https://example.com/docs/**"],
  recordExtractor: ({ $, url, helpers }) => {
    const text = helpers.markdown("main"); // Change "main" to match your content tag (e.g., "main", "article", etc.)
    if (text === "") return [];

    // Extract language or other attributes as needed. Optional
    const language = $("html").attr("lang") || "en";

    return helpers.splitTextIntoRecords({
      text,
      baseRecord: {
        url,
        objectID: url,
        title: $("head > title").text(),
        lang: language, // Add more attributes as needed
      },
      maxRecordBytes: 100000, // Higher = fewer, larger records. Lower = more, smaller records.
      // Note: Increasing this value may increase the token count for LLMs, which can affect context size and cost.
      orderingAttributeName: "part",
    });
  },
},
// [...]

Update the index settings:

1
2
3
4
5
6
7
8
9
10
// initialIndexSettings: { ...,
"my-markdown-index": {
  attributesForFaceting: ["lang"], // Add more if you extract more attributes
  ignorePlurals: true,
  minProximity: 4,
  removeStopWords: false,
  searchableAttributes: ["unordered(title)", "unordered(text)"],
  removeWordsIfNoResults: "allOptional" // This will help if the LLM finds no results. A graceful fallback.
},
// ...},

Run the crawler

After updating the crawler configuration:

  1. Publish the configuration in the Crawler dashboard to save and activate it.
  2. Run the crawler to index your Markdown content.

Integrate the Markdown index with Ask AI

Once your Crawler and index are configured, set up your frontend to use both your main keyword index and your markdown index for AskAI. Here’s how you might configure DocSearch to use your main keyword index for search and your markdown index for AskAI:

1
2
3
4
5
6
7
8
9
10
11
docsearch({
  indexName: 'YOUR_INDEX_NAME', // Main DocSearch keyword index
  apiKey: 'YOUR_SEARCH_API_KEY',
  appId: 'YOUR_APP_ID',
  askAi: {
    indexName: 'YOUR_INDEX_NAME-markdown', // Markdown index for AskAI
    apiKey: 'YOUR_SEARCH_API_KEY', // (or a different key if needed)
    appId: 'YOUR_APP_ID',
    assistantId: 'YOUR_ALGOLIA_ASSISTANT_ID',
  },
})
  • indexName refers to your main DocSearch index
  • askAi.indexName refers to the dedicated Markdown index

Best practices

  • Use clear consistent titles and headings for better discoverability
  • Structure your content with headings and lists for better chunking
  • Add facets to support filtering in your search UI or the Ask AI assistant. For example, you can add attributes like lang, version, tags to your records and declare them as attributesForFaceting.
  • Adjust record size by changing maxRecordBytes.

    • If your answers seem too broad or fragmented, increase maxRecordBytes to create fewer, larger records. This might increase the token count for LLMs, which can affect the size of the context window and the cost of each Ask AI response.

    • If you have very large Markdown files, decrease maxRecordBytes to create smaller, more focused records.

Example configuration

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
// In your Crawler config:

// actions: [ ...,
{
  indexName: "my-markdown-index",
  pathsToMatch: ["https://example.com/**"],
  recordExtractor: ({ $, url, helpers }) => {
    const text = helpers.markdown("main"); // Change "main" to match your content tag (e.g., "main", "article", etc.)
    if (text === "") return [];

    // Customize selectors or meta extraction as needed. Optional
    const language = $("html").attr("lang") || "en";

    return helpers.splitTextIntoRecords({
      text,
      baseRecord: {
        url,
        objectID: url,
        title: $("head > title").text(),
        // Add more optional attributes to the record
        lang: language
      },
      maxRecordBytes: 100000, // Higher = fewer, larger records. Lower = more, smaller records.
      // Note: Increasing this value may increase the token count for LLMs, which can affect context size and cost.
      orderingAttributeName: "part",
    });
  },
},
// ...],

// initialIndexSettings: { ...,
"my-markdown-index": {
  attributesForFaceting: ["lang"], // Recommended if you add more attributes outside of objectID
  ignorePlurals: true,
  minProximity: 4,
  removeStopWords: false,
  searchableAttributes: ["unordered(title)", "unordered(text)"],
  removeWordsIfNoResults: "allOptional" // This will help if the LLM finds no results. A graceful fallback.
},
// ...},

For example configuration using DocSearch with Docusaurus, Vitepress, or Astro, see Crawler configuration examples by integration

Did you find this page helpful?