back to home

Best Open Source nlp Libraries

A curated list of the most popular GitHub repositories tagged with nlp. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.

#1huggingface/transformers

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

156,780Python
Analyze Code

#2hiyouga/LlamaFactory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

67,420Python
Analyze Code

#3microsoft/AI-For-Beginners

12 Weeks, 24 Lessons, AI for All!

45,408Jupyter Notebook
Analyze Code

#4apachecn/ailearning

AiLearning:数据分析+机器学习实战+线性代数+PyTorch+NLTK+TF2

42,037Python
Analyze Code

#5google-research/bert

TensorFlow code and pre-trained models for BERT

39,871Python
Analyze Code

#6hankcs/HanLP

Natural Language Processing for the next decade. Tokenization, Part-of-Speech Tagging, Named Entity Recognition, Syntactic & Semantic Dependency Parsing, Document Classification

36,138Python
Analyze Code

#7666ghj/BettaFish

微舆:人人可用的多Agent舆情分析助手,打破信息茧房,还原舆情原貌,预测未来走向,辅助决策!从0实现,不依赖任何框架。

35,654Python
Analyze Code

#8google/langextract

A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization.

33,427Python
Analyze Code

#9explosion/spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python

33,228Python
Analyze Code

#10ashishpatel26/500-AI-Machine-learning-Deep-learning-Computer-vision-NLP-Projects-with-code

500 AI Machine learning Deep learning Computer vision NLP Projects with code

31,785
Analyze Code

#11stanford-oval/storm

An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.

27,914Python
Analyze Code

#12deepset-ai/haystack

Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and agent workflows with explicit control over retrieval, routing, memory, and generation. Built for scalable agents, RAG, multimodal applications, semantic search, and conversational systems.

24,250MDX
Analyze Code

#13lukasmasuch/best-of-ml-python

🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.

23,240
Analyze Code

#14HqWu-HITCS/Awesome-Chinese-LLM

整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。

22,245
Analyze Code

#15microsoft/unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

22,031Python
Analyze Code

#16huggingface/datasets

🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools

21,200Python
Analyze Code

#17RasaHQ/rasa

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

21,057Python
Analyze Code

#18ymcui/Chinese-LLaMA-Alpaca

中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)

18,962Python
Analyze Code

#19AI4Finance-Foundation/FinGPT

FinGPT: Open-Source Financial Large Language Models! Revolutionize 🔥 We release the trained model on HuggingFace.

18,630Jupyter Notebook
Analyze Code

#20keon/awesome-nlp

:book: A curated list of resources dedicated to Natural Language Processing (NLP)

18,207
Analyze Code

#21NLP-LOVE/ML-NLP

此项目是机器学习(Machine Learning)、深度学习(Deep Learning)、NLP面试中常考到的知识点和代码实现,也是作为一个算法工程师必会的理论基础知识。

17,487Jupyter Notebook
Analyze Code

#22graykode/nlp-tutorial

Natural Language Processing Tutorial for Deep Learning Researchers

14,855Jupyter Notebook
Analyze Code

#23NVIDIA/DeepLearningExamples

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

14,732Jupyter Notebook
Analyze Code

#24botpress/botpress

The open-source hub to build & deploy GPT/LLM Agents ⚡️

14,558TypeScript
Analyze Code

#25nltk/nltk

NLTK Source

14,519Python
Analyze Code

#26flairNLP/flair

A very simple framework for state-of-the-art Natural Language Processing (NLP)

14,359Python
Analyze Code

#27virgili0/Virgilio

Your new Mentor for Data Science E-Learning.

14,314Jupyter Notebook
Analyze Code

#28Unstructured-IO/unstructured

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

14,014HTML
Analyze Code

#29memvid/memvid

Memory layer for AI Agents. Replace complex RAG pipelines with a serverless, single-file memory layer. Give your agents instant retrieval and long-term memory.

13,175Rust
Analyze Code

#30PaddlePaddle/PaddleNLP

Easy-to-use and powerful LLM and SLM library with awesome model zoo.

12,913Python
Analyze Code

#31tangyudi/Ai-Learn

人工智能学习路线图,整理近200个实战案例与项目,免费提供配套教材,零基础入门,就业实战!包括:Python,数学,机器学习,数据分析,深度学习,计算机视觉,自然语言处理,PyTorch tensorflow machine-learning,deep-learning data-analysis data-mining mathematics data-science artificial-intelligence python tensorflow tensorflow2 caffe keras pytorch algorithm numpy pandas matplotlib seaborn nlp cv等热门领域

12,642
Analyze Code

#32dair-ai/ML-Papers-of-the-Week

🔥Highlighting the top ML papers every week.

12,241
Analyze Code

#33neuml/txtai

💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows

12,192Python
Analyze Code

#34bigscience-workshop/petals

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

9,953Python
Analyze Code

#35brightmart/nlp_chinese_corpus

大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP

9,854
Analyze Code

#36openvinotoolkit/openvino

OpenVINO™ is an open source toolkit for optimizing and deploying AI inference

9,732C++
Analyze Code

#37jadore801120/attention-is-all-you-need-pytorch

A PyTorch implementation of the Transformer model in "Attention is All You Need".

9,629Python
Analyze Code

#38sloria/TextBlob

Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

9,515Python
Analyze Code

#39modelscope/modelscope

ModelScope: bring the notion of Model-as-a-Service to life.

8,720Python
Analyze Code

#40deeppavlov/DeepPavlov

An open source library for deep learning end-to-end dialog systems and chatbots.

6,968Python
Analyze Code

#41PaddlePaddle/models

Officially maintained, supported by PaddlePaddle, including CV, NLP, Speech, Rec, TS, big models and so on.

6,948Python
Analyze Code

#42amitness/learning

A log of things I'm learning

6,812
Analyze Code

#43clovaai/donut

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022

6,788Python
Analyze Code

#44MycroftAI/mycroft-core

Mycroft Core, the Mycroft Artificial Intelligence platform.

6,618Python
Analyze Code

#45axa-group/nlp.js

An NLP library for building bots, with entity extraction, sentiment analysis, automatic language identify, and so more

6,553JavaScript
Analyze Code

#46NLPchina/ansj_seg

ansj分词.ict的真正java实现.分词效果速度都超过开源版的ict. 中文分词,人名识别,词性标注,用户自定义词典

6,545Java
Analyze Code

#47codertimo/BERT-pytorch

Google AI 2018 BERT pytorch implementation

6,518Python
Analyze Code