Guides / Sending and managing data / Prepare your records for indexing

Make measures and dimensions searchable

Users enter dimensions and measures in many different ways. For example, users might search for 30 x 50 or 30mm x 50mm, while your records may use a different format like 30x50mm. Algolia’s features support some formatting variations but may not handle all dimension formats reliably. This is because:

  • User query formats vary. A query like 30mm x 50mm won’t necessarily match a record with 30x50mm.
  • Unit differences cause mismatches. Queries may include ", inches, mm, cm, or ft, while records might use only one format.
  • Typo tolerance has limits. While typo tolerance can match slight variations such as 30by50 or 30 x 50, but not different units or separators.
  • AI doesn’t consistently interpret dimensions. Although NeuralSearch can identify dimensions, it doesn’t do so consistently, due to the ambiguity of the input.

To improve matching for dimension-based searches, standardize and expand these values using a transformation function before sending your data to Algolia.

Transform data into dimension-friendly formats

To address this issue, pre-process your data with a transformation function like the one below.

For each record you pass to it, the transform function returns a transformed record or undefined if no dimensions are found.

To run this function, create a Push to Algolia connector, using the following transformation code.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
const DIMENSIONS_RE =
  /\b\d+(?:[.,]\d+)?\s*(?:(?:mm|cm|m|in(?:ch(?:es)?)?|ft|'|["″])?(?:\s*(?:x|\*|by)\s*\d+(?:[.,]\d+)?\s*(?:mm|cm|m|in(?:ch(?:es)?)?|ft|'|["″])?){1,2}|(?:mm|cm|m|in(?:ch(?:es)?)?|ft|'|["″]))\b/gi;
const PART_RE =
  /(\d+(?:[.,]\d+)?)(?:\s*(mm|cm|m|in(?:ch(?:es)?)?|ft|["″'])?)?/gi;

// Convert any raw unit symbol to a standard form
function normalizeUnit(raw, fallback) {
  if (!raw) return fallback.toLowerCase();

  switch (raw.toLowerCase()) {
    case '"':
    case "":
    case "in":
    case "inch":
    case "inches":
      return "in";
    case "'":
    case "ft":
      return "ft";
    default:
      return raw.toLowerCase();
  }
}

// Return every spelling for a unit
function unitForms(unit) {
  if (unit === "in") return ["in", '"', "inch", "inches"];
  if (unit === "ft") return ["ft", "'"];
  return [unit];
}

function dimensionKeywords(attr, fallbackUnit) {
  if (!attr) return [];

  const out = new Set();
  const entries = [...attr.matchAll(DIMENSIONS_RE)].map((m) => m[0]); // Extract all matching dimensions

  for (const entry of entries) {
    // Extract number and unit pairs
    const parts = [];
    let match;
    while ((match = PART_RE.exec(entry)) !== null) {
      const number = match[1];
      const unit = normalizeUnit(match[2], fallbackUnit);
      parts.push({ n: number, u: unit });
    }

    if (!parts.length) continue;

    // Single numbers and number plus unit forms
    parts.forEach((p) => {
      out.add(p.n);
      unitForms(p.u).forEach((f) => out.add(p.n + f));
    });

    // Combined variants when there's more than one part
    if (parts.length > 1) {
      const nums = parts.map((p) => p.n); // ["30","50"]
      const nu = parts.map((p) => p.n + p.u); // ["30mm","50mm"]

      // Join dimensions without spaces (for example, 30mm50mm), including inch variants
      const tight = nu.join("");
      out.add(tight);
      if (tight.includes("in")) {
        unitForms("in").forEach((f) => out.add(tight.replace(/in/g, f)));
      }

      ["x", "by"].forEach((sep) => {
        // Raw numbers joined
        out.add(nums.join(sep));
        if (sep === "by") out.add(nums.join(` ${sep} `));

        // Numbers and units joined
        const joinedTight = nu.join(sep);
        const joinedSpc = nu.join(` ${sep} `);

        [joinedTight, joinedSpc].forEach((s) => {
          out.add(s);
          if (s.includes("in"))
            unitForms("in").forEach((f) => out.add(s.replace(/in\b/g, f)));
        });
      });
    }
  }
  return [...out];
}

/**
 * Transforms a record by extracting dimension keyword variants.
 * @param {SourceRecord} record - A record from your dataset.
 * @param {Helper} helper - A helper for accessing secrets and metadata.
 * @returns {SourceRecord|undefined} - The transformed record, or undefined if no dimensions are found.
 */
async function transform(record, helper) {
  record.dimension_keywords = dimensionKeywords(record.name, "mm");

  return record;
}

 

Customization

You can customize the function to support other measurement units or non-standard formats. To add new units (for example, yd, mil, µm, kg) or handle alternative patterns (for example, D30, H50, Ø20mm x 40mm), update the following:

  • DIMENSIONS_RE. Extend the regular expression to detect new unit symbols or structural patterns. Consider using AI-assisted tools to build and test regular expressions.
  • normalizeUnit(raw, fallback). Map any new unit symbol or abbreviation to a standard form. For example, ‘yard’ and ‘yd’ become ‘yd’.
  • dimensionKeywords(). The function defaults to mm if it doesn’t find a unit. To change this default (for example, to cm or in), update the second argument in the dimensionKeywords() call.
  • unitForms(unit). Add alternative spellings and symbols for each unit. For example, ["yd", "yard", "yards"].

How the transformation function works

The function improves search by extracting keyword variants from dimension patterns, by performing the following steps:

Identify dimensions

The function uses the DIMENSIONS_RE regular expression to detect one-part, two-part, or three-part dimensions, such as:

  • One part: 600mm, 2.4m
  • Two-part: 30x50, 3"x6", 20mm x 30mm
  • Three-part: 245x148x65mm, 30mm x 50mm x 2m

It recognizes various separators (x, *, by) and units (including mm, ", inches, and ft), with or without spaces.

The regular expression handles a wide range of edge cases, but test it against your data to confirm it captures the formats you use.

Extract numbers and units

For each match, the function extracts the numbers and their associated units.

The function standardizes unit variants like ", inch, and inches to in.

Generate variants

Each detected dimension expands into these keyword-friendly formats:

  • Bare numbers: 30, 50, 2
  • Normalized units: 30mm, 2m, 3in
  • Commonly-accepted synonyms: 30in, 30", 30inch, 30inches
  • Joined forms:
    • Without spacing: 30mm50mm
    • With separators: 30x50, 30 mm by 50 mm, 30mmx50mm

For example, from 30mm x 50mm, the function generates:

1
2
3
4
5
6
7
8
[
 "30", "50",
 "30mm", "50mm",
 "30x50", "30 x 50",
 "30mmx50mm", "30 mm x 50 mm",
 "30 by 50", "30mm by 50mm",
 "30mm50mm"
]

Attach keywords to an attribute

The function adds a new attribute, dimension_keywords, to each record it processes.

Add dimension keywords to your index

To use the generated keywords in Algolia:

  • With the taskID generated by the Push to Algolia connector, send your data to Algolia with the Ingestion API pushTask method, making sure each one includes the dimension_keywords attribute.
  • Configure dimension_keywords as a searchable attribute in your index settings.

See also

Did you find this page helpful?