// marketplace_data.json · updated 2026-02-25

The State of Inference,
by the numbers.

MOST DEPLOYED ARCHITECTURE

Transformer (Decoder-Only)

Q1 2026 · 61.4% of deployments

0
Models Indexed
+184 added this month
0%
Median Cost Decline
Inference cost drop over 12 months
0.0B
Avg Params This Quarter
Most-deployed architecture size
0%
Teams Reduced Cost
vs. training from scratch
INFERENCE_COST_INDEX · 12-MONTH TREND↓ 67% YoY
Feb 2025Aug 2025Feb 2026

// model_comparison · live data

Compare Models. Understand the Numbers.

8 models
MODEL

Qwen2 72B Instruct

open72B
92.1%
$380
48ms
42G

Llama-3 70B Instruct

open70B
91.2%
$340
42ms
40G

DeepSeek-V2 Chat

open236B MoE
90.3%
$95
22ms
24G

Gemma 2 9B IT

fine-tune9B
85.1%
$65
12ms
10G

GPT-4o Mini

apiAPI
84.2%
$18
4ms

Mistral 7B Instruct v0.3

fine-tune7B
83.4%
$48
8ms
8G

Claude 3 Haiku

apiAPI
82.9%
$28
6ms

Phi-3 Mini 3.8B

fine-tune3.8B
79.8%
$22
5ms
4G

For ML Teams at Mid-Stage Startups

You need 90th-percentile accuracy at 10th-percentile cost. Models in the 7B–13B range — Mistral, Phi-3, Gemma 2 — hit that sweet spot. Fine-tune on your domain data, serve on a single A10G, and keep your inference bill under $200/month. The 70B models are overkill unless your task genuinely needs them.

RECOMMENDATION

Recommended: Mistral 7B Instruct · Phi-3 Mini

< $100/mo
typical inference cost

Read the Full Benchmark Report

42-page analysis · Q1 2026 · Free with email

// case_studies · production deployments

Deployed in Production.
Numbers don't lie.

View all 47 case studies →

Meridian Analytics

FinTech · Series B

Cost Reduction

MODEL_DEPLOYED

Mistral 7B Instruct → Fine-tuned

Replaced a GPT-4 API call on every transaction classification with a fine-tuned Mistral 7B. Domain accuracy held at 96.1% — 0.3% below GPT-4 — at 22× lower cost.

LATENCY

340ms38ms

COST

$4,200/mo$190/mo

Vantage Robotics

Robotics · Seed

Edge Deployment

MODEL_DEPLOYED

Phi-3 Mini 3.8B (edge-quantized)

Quantized Phi-3 Mini to INT4 and deployed on-device. Eliminated cloud latency entirely. Edge inference runs on 2GB RAM — no connectivity required on the factory floor.

LATENCY

1,200ms9ms

COST

$890/mo$0/mo

Helix Research Lab

BioML · Academic

Training Avoided

MODEL_DEPLOYED

DeepSeek-V2 Chat (MoE)

Abandoned a six-month training run after discovering DeepSeek-V2 scored 2.1% higher on their protein annotation benchmark. Saved $62K compute and shipped three months early.

LATENCY

2,800ms24ms

COST

Training: $62,000$95/mo

Your stack. Your budget. Your benchmark.

2,412 models indexed · Updated daily · No account required to browse

2,412 models ready to deploy — no training compute required.