Files
2026-01-01 03:40:41 +08:00

73 lines
2.4 KiB
Markdown

# `@maigolabs/needle`
Fuzzy search engine for small text pieces, with Chinese/Japanese pronunciation support.
See also [in-browser demo](https://needle.maigo.dev).
## Install
Dictionaries are installed as dependencies of the package, but if you don't use the indexer, they could be tree-shaken when bundling.
```bash
pnpm install @maigolabs/needle
```
## Usage
### Indexing
NeedLe uses Kuromoji for Japanese tokenization, which loads dictionaries dynamically. You need to create a Kuromoji `TokenizerBuilder` first:
```ts
// In Node.js you can just load the dictionary from the file system.
import { TokenizerBuilder } from '@patdx/kuromoji';
import NodeDictionaryLoader from '@patdx/kuromoji/node';
const kuromojiDictPath = path.resolve(url.fileURLToPath(import.meta.resolve('@patdx/kuromoji')), '..', '..', 'dict');
const kuromoji = await new TokenizerBuilder({ loader: new NodeDictionaryLoader({ dic_path: kuromojiDictPath }) }).build();
// In browser you need to provide a custom loader to load the dictionary files with fetch().
import { TokenizerBuilder } from '@patdx/kuromoji';
// You can load dict files from CDN (See also the README of https://github.com/patdx/kuromoji.js)
const kuromoji = await new TokenizerBuilder({
loader: {
loadArrayBuffer: async (url: string) => {
url = `https://cdn.jsdelivr.net/npm/@aiktb/kuromoji@1.0.2/dict/${url.replace('.gz', '')}`;
const res = await fetch(url);
if (!res.ok) throw new Error(`Failed to fetch ${url}`);
return await res.arrayBuffer();
},
},
}).build();
```
After creating the Kuromoji instance, you can build the inverted index:
```ts
import { buildInvertedIndex } from '@maigolabs/needle/indexer';
const documents = ['你好世界', 'こんにちは'];
const compressedIndex = buildInvertedIndex(documents, { kuromoji });
// The built index could be stored for later use.
const json = JSON.stringify(compressedIndex);
```
### Searching
If you only import the searcher in your frontend code, indexer and dictionary-related dependencies will be tree-shaken.
```ts
import { loadInvertedIndex, searchInvertedIndex } from '@maigolabs/needle/searcher';
const loadedIndex = loadInvertedIndex(compressedIndex);
const results = searchInvertedIndex(loadedIndex, 'sekai');
for (const result of results) console.log(`${result.documentText} (${(result.matchRatio * 100).toFixed(0)}%)`);
// → 你好世界 (50%)
```
To highlight the search result, see also `highlightSearchResult`.