DeepInfra uses public pay-as-you-go pricing across model APIs and hardware, with current official examples including Qwen3.6-35B-A3B at $0.20 input and $1.00 output per 1M tokens, GLM-5.1 at $1.40 input and $4.40 output, DGX B300 at $4.20 per instance-hour, and DeepCluster from $1.98 per GPU-hour.
- The official homepage explicitly describes DeepInfra as low pay-as-you-go pricing with no long-term contracts and no hidden fees.
- Current public model examples on the site include Qwen3.6-35B-A3B at $0.20 per 1M input tokens and $1.00 per 1M output tokens.
- The same pricing surface lists GLM-5.1 at $1.40 per 1M input tokens and $4.40 per 1M output tokens, alongside speech examples like Inworld TTS at $25 or $50 per 1M characters depending on tier.
- For infrastructure, the site currently shows on-demand DGX B300 at $4.20 per instance-hour and DeepCluster starting at $1.98 per GPU-hour on a 5-year term.
Pricing and packaging may change - verify on the official site. View official pricing
Observability for AI agents with tracing, debugging, session visibility, and production monitoring.