📑 In this article
If you have a store with catalog in Ukrainian, Russian, Polish, German — standard search solutions (including popular US SaaS — Klevu, Searchanise) often show lower quality on non-English languages. Reason is simple: these models were trained mostly on English-speaking e-commerce data. We discuss why this happens and how multilingual-e5-large solves it out of the box.
Problem: why US-trained models fail on CIS/EU
Most smart-search SaaS platforms were created for US market: Shopify stores with English catalogs, fashion / electronics / beauty domains. When these products later "expand to international markets" — they add basic translations, but the core model stays English-trained.
Standard OpenCart LIKE search and its limits
OpenCart standard search uses SQL LIKE '%query%'. Means literal string match. On UA/RU/PL fails on 5 levels:
1. Morphology
Slavic languages are flective — one word has 6-12 forms:
Morphology example
Чашка → Чашки, Чашці, Чашку, Чашкою, Чашок, Чашкам...
2. Latin/Cyrillic
User can type "stiklokeramika" or "склокераміка" — standard search won't understand they're the same.
3. Synonyms
"refrigerator" / "fridge", "trousers" / "pants" — different words for same products. LIKE doesn't know synonyms.
4. Transliteration
"Айфон" / "iPhone", "Панасонік" / "Panasonic" — UA users partially transliterate brands.
5. Cross-language
Buyer with Russian UI types "сковорода" — but product indexed only in Ukrainian as "пательня" — search fails.
Models compared: e5 vs US-models
| Model | Training data | UA/RU/PL quality |
|---|---|---|
| OpenAI text-embedding-ada-002 | ~93% English | mediocre |
| Klevu (proprietary) | US e-commerce | mediocre |
| Searchanise (proprietary) | US/UK e-commerce | limited |
| BGE-M3 (Baidu) | multilingual | good |
| multilingual-e5-large | 100+ languages parallel | excellent |
Why multilingual-e5 is better for UA/RU
multilingual-e5-large-instruct — open-source model from Microsoft Research. Trained on 100+ languages in parallel (not "English + translations"). Means:
- Morphology — model understands "чашки" and "чашка" as close concepts without dictionary
- Cross-language — "сковорода" (RU) and "пательня" (UA) end up close in vector space
- Synonyms — "холодильник" and "фрідж" model understands from training context
- Transliteration — "iPhone" and "Айфон" have cosine ~0.9
Cross-language matching in practice
On Ukrainian store with trilingual catalog (UA/RU/EN) real examples:
| Buyer query | UI language | Found product | Product language |
|---|---|---|---|
| steklokeramika | EN | Склокерамічна тарілка | UA |
| сковорода | RU | Пательня з антипригарним покриттям | UA |
| kettle | EN | Чайник електричний 1.7л | UA |
| чайнік | UA | Чайник Bosch (description in EN) | EN |
| iPhne | UA | iPhone 15 Pro Max | UA |
| фріж | UA | Холодильник Samsung | UA |
All these queries return relevant products in AI Search v1.0.5. On standard OpenCart LIKE all of them — 0 results.
Real examples from UA stores
Houseware store (~30k SKU, isklad.com.ua)
- Query "чашка з блюдцем" — finds "Чашка кавова з блюдцем 250мл"
- Query "тарілка для пасти" — finds "Глибока тарілка для першого 23см"
- Query "білий керамічний горщик" — finds "Кашпо керамічне біле для квітів"
Apparel store (5k SKU)
- Query "сорочка з довгим рукавом" — finds blouses + casual shirts
- Query "trousers black" (EN) — finds "Штани чорні класичні" (UA catalog)
- Query "плаття літне" (typo) — finds "Сукня літня"
Statistics: how much search improves
Data from 5 OpenCart stores that switched from LIKE to AI Search during 2025-2026. Details in "isklad.com.ua case study".
How to enable multilingual mode
In AI Search multilingual is default. Nothing to configure:
- Install module (5 minutes — guide here)
- Reindex — module auto-indexes products in all active languages
- Done — search works multilingually out of the box
FAQ
How many languages does AI Search support?
100+ via multilingual-e5-large. Best on: UA, RU, PL, DE, ES, IT, FR, EN. For Chinese/Korean/Japanese efficiency slightly lower but works.
What if my catalog is English only?
Multilingual model isn't worse than US-models on English — our benchmark shows 91-93% top-3 on EN catalogs vs ~94% Klevu.
Need separate indexing for each language?
No. AI Search auto-indexes all active store languages on reindex. 3 langs × 30k SKU = 90k embeddings, 30-90 min.
What if some products don't have translation?
Fallback works. Product indexed only in UA still findable via cross-language matching.
Can I disable cross-language matching?
Yes. AI Search → Settings → Strict Language Mode: Enabled. Then query in language X returns only results in same language.
Does multilingual affect search speed?
No. Vector space size doesn't depend on language count — each embedding 1024-dim. Speed stays ~200ms on 30k SKU regardless of locales.