infinilabs / analysis-ik
🚌 The IK Analysis plugin integrates Lucene IK analyzer into Elasticsearch and OpenSearch, support customized dictionary.
AI Architecture Analysis
This repository is indexed by RepoMind. By analyzing infinilabs/analysis-ik in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.
Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.
Repository Overview (README excerpt)
Crawler viewIK Analysis for Elasticsearch and OpenSearch ================================== The IK Analysis plugin integrates Lucene IK analyzer, and support customized dictionary. It supports major versions of Elasticsearch and OpenSearch. Maintained and supported with ❤️ by INFINI Labs. The plugin comprises analyzer: , , and tokenizer: , How to Install 🚀 **Get the Packaged Plugins** You can download the packaged plugins from here: **https://release.infinilabs.com/** --- 🛠️ **Install via CLI** Alternatively, you can use the CLI to install the plugin. Here's how: For Elasticsearch: For OpenSearch: --- ⚠️ **Tip** Make sure to replace the version number with the one that matches your Elasticsearch or OpenSearch version. Getting Started 1.create a index 2.create a mapping 3.index some docs 4.query with highlighting Result Dictionary Configuration Config file can be located at or Hot-reload Dictionary The current plugin supports hot reloading dictionary for IK Analysis, through the configuration mentioned earlier in the IK configuration file. Among which refers to a URL, such as . This request only needs to meet the following two points to complete the segmentation hot update. • The HTTP request needs to return two headers, one is , and the other is . Both of these are of string type, and if either changes, the plugin will fetch new segmentation to update the word library. • The content format returned by the HTTP request is one word per line, and the newline character is represented by . Meeting the above two requirements can achieve hot word updates without the need to restart the ES instance. You can place the hot words that need to be automatically updated in a .txt file encoded in UTF-8. Place it under nginx or another simple HTTP server. When the .txt file is modified, the HTTP server will automatically return the corresponding Last-Modified and ETag when the client requests the file. You can also create a separate tool to extract relevant vocabulary from the business system and update this .txt file. FAQs ------- • Why isn't the custom dictionary taking effect? Please ensure that the text format of your custom dictionary is UTF8 encoded. • What is the difference between ik_max_word and ik_smart? ik_max_word: Performs the finest-grained segmentation of the text. For example, it will segment "中华人民共和国国歌" into "中华人民共和国,中华人民,中华,华人,人民共和国,人民,人,民,共和国,共和,和,国国,国歌", exhaustively generating various possible combinations, suitable for Term Query. ik_smart: Performs the coarsest-grained segmentation of the text. For example, it will segment "中华人民共和国国歌" into "中华人民共和国,国歌", suitable for Phrase queries. Note: ik_smart is not a subset of ik_max_word. Community Fell free to join the Discord server to discuss anything around this project: https://discord.gg/4tKTMkkvVX License Copyright ©️ INFINI Labs. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.