Groq
activeHigh-speed LLM inference powered by LPU hardware — free tier includes Llama, Kimi, Qwen and more with generous rate limits.
https://api.groq.com/openai/v1
Avg Latency
—
90-day Uptime
—%
Rate Limits
30 RPM / 1,000 RPD
Sign-up Required
Yes
Info
Models (14)
allam-2-7b
CTX: 4K · 30 RPM
groq/compound
CTX: 128K · 30 RPM
groq/compound-mini
CTX: 128K · 30 RPM
llama-3.1-8b-instant
CTX: 128K · 30 RPM
llama-3.3-70b-versatile
CTX: 128K · 30 RPM
meta-llama/llama-4-scout-17b-16e-instruct
CTX: 10M · 30 RPM
meta-llama/llama-prompt-guard-2-22m
CTX: 512 · 30 RPM
meta-llama/llama-prompt-guard-2-86m
CTX: 512 · 30 RPM
moonshotai/kimi-k2-instruct
CTX: 128K · 60 RPM
moonshotai/kimi-k2-instruct-0905
CTX: 128K · 60 RPM
openai/gpt-oss-120b
CTX: 128K · 30 RPM
openai/gpt-oss-20b
CTX: 128K · 30 RPM
openai/gpt-oss-safeguard-20b
CTX: 128K · 30 RPM
qwen/qwen3-32b
CTX: 128K · 60 RPM
Quick Start
export API_KEY="your_api_key_here"
curl https://api.groq.com/openai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $API_KEY" \
-d '{
"model": "llama-3.3-70b-versatile",
"messages": [
{"role": "user", "content": "Hello! How are you?"}
]
}'