Files
2026-01-01 03:40:41 +08:00

2.4 KiB

@maigolabs/needle

Fuzzy search engine for small text pieces, with Chinese/Japanese pronunciation support.

See also in-browser demo.

Install

Dictionaries are installed as dependencies of the package, but if you don't use the indexer, they could be tree-shaken when bundling.

pnpm install @maigolabs/needle

Usage

Indexing

NeedLe uses Kuromoji for Japanese tokenization, which loads dictionaries dynamically. You need to create a Kuromoji TokenizerBuilder first:

// In Node.js you can just load the dictionary from the file system.

import { TokenizerBuilder } from '@patdx/kuromoji';
import NodeDictionaryLoader from '@patdx/kuromoji/node';

const kuromojiDictPath = path.resolve(url.fileURLToPath(import.meta.resolve('@patdx/kuromoji')), '..', '..', 'dict');
const kuromoji = await new TokenizerBuilder({ loader: new NodeDictionaryLoader({ dic_path: kuromojiDictPath }) }).build();

// In browser you need to provide a custom loader to load the dictionary files with fetch().

import { TokenizerBuilder } from '@patdx/kuromoji';

// You can load dict files from CDN (See also the README of https://github.com/patdx/kuromoji.js)
const kuromoji = await new TokenizerBuilder({
  loader: {
    loadArrayBuffer: async (url: string) => {
      url = `https://cdn.jsdelivr.net/npm/@aiktb/kuromoji@1.0.2/dict/${url.replace('.gz', '')}`;
      const res = await fetch(url);
      if (!res.ok) throw new Error(`Failed to fetch ${url}`);
      return await res.arrayBuffer();
    },
  },
}).build();

After creating the Kuromoji instance, you can build the inverted index:

import { buildInvertedIndex } from '@maigolabs/needle/indexer';

const documents = ['你好世界', 'こんにちは'];
const compressedIndex = buildInvertedIndex(documents, { kuromoji });

// The built index could be stored for later use.
const json = JSON.stringify(compressedIndex);

Searching

If you only import the searcher in your frontend code, indexer and dictionary-related dependencies will be tree-shaken.

import { loadInvertedIndex, searchInvertedIndex } from '@maigolabs/needle/searcher';

const loadedIndex = loadInvertedIndex(compressedIndex);
const results = searchInvertedIndex(loadedIndex, 'sekai');
for (const result of results) console.log(`${result.documentText} (${(result.matchRatio * 100).toFixed(0)}%)`);
// → 你好世界 (50%)

To highlight the search result, see also highlightSearchResult.