back to home

cheahjs / free-llm-api-resources

A list of free LLM inference resources accessible via API.

16,076 stars
1,600 forks
35 issues
Python

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing cheahjs/free-llm-api-resources in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/cheahjs/free-llm-api-resources)
Preview:Analyzed by RepoMind

Repository Overview (README excerpt)

Crawler view

Free LLM API resources This lists various services that provide free access or credits towards API-based LLM usage. > [!NOTE] > Please don't abuse these services, else we might lose them. > [!WARNING] > This list explicitly excludes any services that are not legitimate (eg reverse engineers an existing chatbot) • Free Providers • OpenRouter • Google AI Studio • NVIDIA NIM • Mistral (La Plateforme) • Mistral (Codestral) • HuggingFace Inference Providers • Vercel AI Gateway • OpenCode Zen • Cerebras • Groq • Cohere • GitHub Models • Cloudflare Workers AI • Providers with trial credits • Fireworks • Baseten • Nebius • Novita • AI21 • Upstage • NLP Cloud • Alibaba Cloud (International) Model Studio • Modal • Inference.net • Hyperbolic • SambaNova Cloud • Scaleway Generative APIs Free Providers OpenRouter **Limits:** 20 requests/minute 50 requests/day Up to 1000 requests/day with $10 lifetime topup Models share a common quota. • Gemma 3 12B Instruct • Gemma 3 27B Instruct • Gemma 3 4B Instruct • Hermes 3 Llama 3.1 405B • Llama 3.2 3B Instruct • Llama 3.3 70B Instruct • Mistral Small 3.1 24B Instruct • arcee-ai/trinity-large-preview:free • arcee-ai/trinity-mini:free • cognitivecomputations/dolphin-mistral-24b-venice-edition:free • google/gemma-3n-e2b-it:free • google/gemma-3n-e4b-it:free • liquid/lfm-2.5-1.2b-instruct:free • liquid/lfm-2.5-1.2b-thinking:free • nvidia/nemotron-3-nano-30b-a3b:free • nvidia/nemotron-nano-12b-v2-vl:free • nvidia/nemotron-nano-9b-v2:free • openai/gpt-oss-120b:free • openai/gpt-oss-20b:free • qwen/qwen3-4b:free • qwen/qwen3-coder:free • qwen/qwen3-next-80b-a3b-instruct:free • stepfun/step-3.5-flash:free • z-ai/glm-4.5-air:free Google AI Studio Data is used for training when used outside of the UK/CH/EEA/EU. Model Name Model Limits Gemini 3 Flash 250,000 tokens/minute 20 requests/day 5 requests/minute Gemini 3.1 Flash-Lite 250,000 tokens/minute 500 requests/day 15 requests/minute Gemini 2.5 Flash 250,000 tokens/minute 20 requests/day 5 requests/minute Gemini 2.5 Flash-Lite 250,000 tokens/minute 20 requests/day 10 requests/minute Gemma 3 27B Instruct 15,000 tokens/minute 14,400 requests/day 30 requests/minute Gemma 3 12B Instruct 15,000 tokens/minute 14,400 requests/day 30 requests/minute Gemma 3 4B Instruct 15,000 tokens/minute 14,400 requests/day 30 requests/minute Gemma 3 1B Instruct 15,000 tokens/minute 14,400 requests/day 30 requests/minute NVIDIA NIM Phone number verification required. Models tend to be context window limited. **Limits:** 40 requests/minute • Various open models Mistral (La Plateforme) • Free tier (Experiment plan) requires opting into data training • Requires phone number verification. **Limits (per-model):** 1 request/second, 500,000 tokens/minute, 1,000,000,000 tokens/month • Open and Proprietary Mistral models Mistral (Codestral) • Currently free to use • Monthly subscription based • Requires phone number verification **Limits:** 30 requests/minute, 2,000 requests/day • Codestral HuggingFace Inference Providers HuggingFace Serverless Inference limited to models smaller than 10GB. Some popular models are supported even if they exceed 10GB. **Limits:** $0.10/month in credits • Various open models across supported providers Vercel AI Gateway Routes to various supported providers. **Limits:** $5/month OpenCode Zen AI gateway with curated models. Free models may use data for improvement. • Big Pickle Stealth • MiniMax M2.5 Free • Arcee Large Preview Free Cerebras Model Name Model Limits gpt-oss-120b 30 requests/minute 60,000 tokens/minute 900 requests/hour 1,000,000 tokens/hour 14,400 requests/day 1,000,000 tokens/day Llama 3.1 8B 30 requests/minute 60,000 tokens/minute 900 requests/hour 1,000,000 tokens/hour 14,400 requests/day 1,000,000 tokens/day Groq Model Name Model Limits Allam 2 7B 7,000 requests/day 6,000 tokens/minute Llama 3.1 8B 14,400 requests/day 6,000 tokens/minute Llama 3.3 70B 1,000 requests/day 12,000 tokens/minute Llama 4 Maverick 17B 128E Instruct 1,000 requests/day 6,000 tokens/minute Llama 4 Scout Instruct 1,000 requests/day 30,000 tokens/minute Whisper Large v3 7,200 audio-seconds/minute 2,000 requests/day Whisper Large v3 Turbo 7,200 audio-seconds/minute 2,000 requests/day canopylabs/orpheus-arabic-saudi canopylabs/orpheus-v1-english groq/compound 250 requests/day 70,000 tokens/minute groq/compound-mini 250 requests/day 70,000 tokens/minute meta-llama/llama-guard-4-12b 14,400 requests/day 15,000 tokens/minute meta-llama/llama-prompt-guard-2-22m meta-llama/llama-prompt-guard-2-86m moonshotai/kimi-k2-instruct 1,000 requests/day 10,000 tokens/minute moonshotai/kimi-k2-instruct-0905 1,000 requests/day 10,000 tokens/minute openai/gpt-oss-120b 1,000 requests/day 8,000 tokens/minute openai/gpt-oss-20b 1,000 requests/day 8,000 tokens/minute openai/gpt-oss-safeguard-20b 1,000 requests/day 8,000 tokens/minute qwen/qwen3-32b 1,000 requests/day 6,000 tokens/minute Cohere **Limits:** 20 requests/minute 1,000 requests/month Models share a common monthly quota. • c4ai-aya-expanse-32b • c4ai-aya-vision-32b • command-a-03-2025 • command-a-reasoning-08-2025 • command-a-translate-08-2025 • command-a-vision-07-2025 • command-r-08-2024 • command-r-plus-08-2024 • command-r7b-12-2024 • command-r7b-arabic-02-2025 • tiny-aya-earth • tiny-aya-fire • tiny-aya-global • tiny-aya-water GitHub Models Extremely restrictive input/output token limits. **Limits:** Dependent on Copilot subscription tier (Free/Pro/Pro+/Business/Enterprise) • AI21 Jamba 1.5 Large • Codestral 25.01 • Cohere Command A • Cohere Command R 08-2024 • Cohere Command R+ 08-2024 • DeepSeek-R1 • DeepSeek-R1-0528 • DeepSeek-V3-0324 • Grok 3 • Grok 3 Mini • Llama 4 Maverick 17B 128E Instruct FP8 • Llama 4 Scout 17B 16E Instruct • Llama-3.2-11B-Vision-Instruct • Llama-3.2-90B-Vision-Instruct • Llama-3.3-70B-Instruct • MAI-DS-R1 • Meta-Llama-3.1-405B-Instruct • Meta-Llama-3.1-8B-Instruct • Ministral 3B • Mistral Medium 3…