Markdown indexing

Update your crawler configuration

Add the following code to the actions array in your crawler configuration:

Copy
// actions: [ ...,
{
  indexName: "my-markdown-index",
  pathsToMatch: ["https://example.com/docs/**"],
  recordExtractor: ({ $, url, helpers }) => {
    const text = helpers.markdown("main"); // Change "main" to match your content tag (e.g., "main", "article", etc.)
    if (text === "") return [];

    // Extract language or other attributes as needed. Optional
    const language = $("html").attr("lang") || "en";

    return helpers.splitTextIntoRecords({
      text,
      baseRecord: {
        url,
        objectID: url,
        title: $("head > title").text(),
        lang: language, // Add more attributes as needed
      },
      maxRecordBytes: 100000, // Higher = fewer, larger records. Lower = more, smaller records.
      // Note: Increasing this value may increase the token count for LLMs, which can affect context size and cost.
      orderingAttributeName: "part",
    });
  },
},
// [...]

Update the index settings:

Copy
// initialIndexSettings: { ...,
"my-markdown-index": {
  attributesForFaceting: ["lang"], // Add more if you extract more attributes
  ignorePlurals: true,
  minProximity: 4,
  removeStopWords: false,
  searchableAttributes: ["unordered(title)", "unordered(text)"],
  removeWordsIfNoResults: "allOptional" // This will help if the LLM finds no results. A graceful fallback.
},
// ...},

Run the crawler

After updating the crawler configuration:

Publish the configuration in the Crawler dashboard to save and activate it.
Run the crawler to index your Markdown content.

Integrate the Markdown index with Ask AI

Once your Crawler and index are configured, set up your frontend to use both your main keyword index and your markdown index for AskAI. Here’s how you might configure DocSearch to use your main keyword index for search and your markdown index for AskAI:

Copy
docsearch({
  indexName: 'YOUR_INDEX_NAME', // Main DocSearch keyword index
  apiKey: 'YOUR_SEARCH_API_KEY',
  appId: 'YOUR_APP_ID',
  askAi: {
    indexName: 'YOUR_INDEX_NAME-markdown', // Markdown index for AskAI
    apiKey: 'YOUR_SEARCH_API_KEY', // (or a different key if needed)
    appId: 'YOUR_APP_ID',
    assistantId: 'YOUR_ALGOLIA_ASSISTANT_ID',
  },
})

indexName refers to your main DocSearch index
askAi.indexName refers to the dedicated Markdown index

Best practices

Use clear consistent titles and headings for better discoverability
Structure your content with headings and lists for better chunking
Add facets to support filtering in your search UI or the Ask AI assistant. For example, you can add attributes like lang, version, tags to your records and declare them as attributesForFaceting.
Adjust record size by changing maxRecordBytes.
- If your answers seem too broad or fragmented, increase maxRecordBytes to create fewer, larger records. This might increase the token count for LLMs, which can affect the size of the context window and the cost of each Ask AI response.
- If you have very large Markdown files, decrease maxRecordBytes to create smaller, more focused records.

Example configuration

Copy
// In your Crawler config:

// actions: [ ...,
{
  indexName: "my-markdown-index",
  pathsToMatch: ["https://example.com/**"],
  recordExtractor: ({ $, url, helpers }) => {
    const text = helpers.markdown("main"); // Change "main" to match your content tag (e.g., "main", "article", etc.)
    if (text === "") return [];

    // Customize selectors or meta extraction as needed. Optional
    const language = $("html").attr("lang") || "en";

    return helpers.splitTextIntoRecords({
      text,
      baseRecord: {
        url,
        objectID: url,
        title: $("head > title").text(),
        // Add more optional attributes to the record
        lang: language
      },
      maxRecordBytes: 100000, // Higher = fewer, larger records. Lower = more, smaller records.
      // Note: Increasing this value may increase the token count for LLMs, which can affect context size and cost.
      orderingAttributeName: "part",
    });
  },
},
// ...],

// initialIndexSettings: { ...,
"my-markdown-index": {
  attributesForFaceting: ["lang"], // Recommended if you add more attributes outside of objectID
  ignorePlurals: true,
  minProximity: 4,
  removeStopWords: false,
  searchableAttributes: ["unordered(title)", "unordered(text)"],
  removeWordsIfNoResults: "allOptional" // This will help if the LLM finds no results. A graceful fallback.
},
// ...},

For example configuration using DocSearch with Docusaurus, Vitepress, or Astro, see Crawler configuration examples by integration

Did you find this page helpful?