Best Open Source nlp Libraries
A curated list of the most popular GitHub repositories tagged with nlp. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.
#1huggingface/transformers
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
#2hiyouga/LlamaFactory
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
#3microsoft/AI-For-Beginners
12 Weeks, 24 Lessons, AI for All!
#4apachecn/ailearning
AiLearning:数据分析+机器学习实战+线性代数+PyTorch+NLTK+TF2
#5google-research/bert
TensorFlow code and pre-trained models for BERT
#6hankcs/HanLP
Natural Language Processing for the next decade. Tokenization, Part-of-Speech Tagging, Named Entity Recognition, Syntactic & Semantic Dependency Parsing, Document Classification
#7666ghj/BettaFish
微舆:人人可用的多Agent舆情分析助手,打破信息茧房,还原舆情原貌,预测未来走向,辅助决策!从0实现,不依赖任何框架。
#8google/langextract
A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization.
#9explosion/spaCy
💫 Industrial-strength Natural Language Processing (NLP) in Python
#10ashishpatel26/500-AI-Machine-learning-Deep-learning-Computer-vision-NLP-Projects-with-code
500 AI Machine learning Deep learning Computer vision NLP Projects with code
#11stanford-oval/storm
An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
#12deepset-ai/haystack
Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and agent workflows with explicit control over retrieval, routing, memory, and generation. Built for scalable agents, RAG, multimodal applications, semantic search, and conversational systems.
#13lukasmasuch/best-of-ml-python
🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.
#14HqWu-HITCS/Awesome-Chinese-LLM
整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。
#15microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
#16huggingface/datasets
🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools
#17RasaHQ/rasa
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
#18ymcui/Chinese-LLaMA-Alpaca
中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
#19AI4Finance-Foundation/FinGPT
FinGPT: Open-Source Financial Large Language Models! Revolutionize 🔥 We release the trained model on HuggingFace.
#20keon/awesome-nlp
:book: A curated list of resources dedicated to Natural Language Processing (NLP)
#21NLP-LOVE/ML-NLP
此项目是机器学习(Machine Learning)、深度学习(Deep Learning)、NLP面试中常考到的知识点和代码实现,也是作为一个算法工程师必会的理论基础知识。
#22graykode/nlp-tutorial
Natural Language Processing Tutorial for Deep Learning Researchers
#23NVIDIA/DeepLearningExamples
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
#24botpress/botpress
The open-source hub to build & deploy GPT/LLM Agents ⚡️
#25nltk/nltk
NLTK Source
#26flairNLP/flair
A very simple framework for state-of-the-art Natural Language Processing (NLP)
#27virgili0/Virgilio
Your new Mentor for Data Science E-Learning.
#28Unstructured-IO/unstructured
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.
#29memvid/memvid
Memory layer for AI Agents. Replace complex RAG pipelines with a serverless, single-file memory layer. Give your agents instant retrieval and long-term memory.
#30PaddlePaddle/PaddleNLP
Easy-to-use and powerful LLM and SLM library with awesome model zoo.
#31tangyudi/Ai-Learn
人工智能学习路线图,整理近200个实战案例与项目,免费提供配套教材,零基础入门,就业实战!包括:Python,数学,机器学习,数据分析,深度学习,计算机视觉,自然语言处理,PyTorch tensorflow machine-learning,deep-learning data-analysis data-mining mathematics data-science artificial-intelligence python tensorflow tensorflow2 caffe keras pytorch algorithm numpy pandas matplotlib seaborn nlp cv等热门领域
#32dair-ai/ML-Papers-of-the-Week
🔥Highlighting the top ML papers every week.
#33neuml/txtai
💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows
#34bigscience-workshop/petals
🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
#35brightmart/nlp_chinese_corpus
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
#36openvinotoolkit/openvino
OpenVINO™ is an open source toolkit for optimizing and deploying AI inference
#37jadore801120/attention-is-all-you-need-pytorch
A PyTorch implementation of the Transformer model in "Attention is All You Need".
#38sloria/TextBlob
Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.
#39modelscope/modelscope
ModelScope: bring the notion of Model-as-a-Service to life.
#40deeppavlov/DeepPavlov
An open source library for deep learning end-to-end dialog systems and chatbots.
#41PaddlePaddle/models
Officially maintained, supported by PaddlePaddle, including CV, NLP, Speech, Rec, TS, big models and so on.
#42amitness/learning
A log of things I'm learning
#43clovaai/donut
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
#44MycroftAI/mycroft-core
Mycroft Core, the Mycroft Artificial Intelligence platform.
#45axa-group/nlp.js
An NLP library for building bots, with entity extraction, sentiment analysis, automatic language identify, and so more
#46NLPchina/ansj_seg
ansj分词.ict的真正java实现.分词效果速度都超过开源版的ict. 中文分词,人名识别,词性标注,用户自定义词典
#47codertimo/BERT-pytorch
Google AI 2018 BERT pytorch implementation