NVIDIA NIM
activeNVIDIA's free DGX Cloud catalog — 100+ open models (DeepSeek V4, Llama, Nemotron, Kimi, GLM, gpt-oss, Qwen) accessible through an OpenAI-compatible endpoint.
https://integrate.api.nvidia.com/v1
Avg Latency
—
90-day Uptime
—%
Rate Limits
40 RPM / — RPD
Sign-up Required
Yes
Info
Models (15)
meta/llama-3.3-70b-instruct
CTX: 128K · 40 RPM
meta/llama-3.1-70b-instruct
CTX: 128K · 40 RPM
meta/llama-3.1-8b-instruct
CTX: 128K · 40 RPM
meta/llama-4-maverick-17b-128e-instruct
CTX: 128K · 40 RPM
deepseek-ai/deepseek-v4-pro
CTX: 164K · 40 RPM
deepseek-ai/deepseek-v4-flash
CTX: 164K · 40 RPM
nvidia/llama-3.1-nemotron-ultra-253b-v1
CTX: 128K · 40 RPM
nvidia/llama-3.3-nemotron-super-49b-v1.5
CTX: 128K · 40 RPM
nvidia/nemotron-3-super-120b-a12b
CTX: 128K · 40 RPM
moonshotai/kimi-k2.6
CTX: 256K · 40 RPM
openai/gpt-oss-120b
CTX: 128K · 40 RPM
openai/gpt-oss-20b
CTX: 128K · 40 RPM
qwen/qwen3-coder-480b-a35b-instruct
CTX: 262K · 40 RPM
z-ai/glm-5.1
CTX: 200K · 40 RPM
mistralai/mixtral-8x22b-instruct-v0.1
CTX: 64K · 40 RPM
Quick Start
export API_KEY="your_api_key_here"
curl https://integrate.api.nvidia.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $API_KEY" \
-d '{
"model": "meta/llama-3.3-70b-instruct",
"messages": [
{"role": "user", "content": "Hello! How are you?"}
]
}'