// marketplace_data.json · updated 2026-02-25
The State of Inference,
by the numbers.
MOST DEPLOYED ARCHITECTURE
Transformer (Decoder-Only)
Q1 2026 · 61.4% of deployments
// model_comparison · live data
Compare Models. Understand the Numbers.
Qwen2 72B Instruct
Llama-3 70B Instruct
DeepSeek-V2 Chat
Gemma 2 9B IT
GPT-4o Mini
Mistral 7B Instruct v0.3
Claude 3 Haiku
Phi-3 Mini 3.8B
For ML Teams at Mid-Stage Startups
You need 90th-percentile accuracy at 10th-percentile cost. Models in the 7B–13B range — Mistral, Phi-3, Gemma 2 — hit that sweet spot. Fine-tune on your domain data, serve on a single A10G, and keep your inference bill under $200/month. The 70B models are overkill unless your task genuinely needs them.
RECOMMENDATION
Recommended: Mistral 7B Instruct · Phi-3 Mini
Read the Full Benchmark Report
42-page analysis · Q1 2026 · Free with email
// case_studies · production deployments
Deployed in Production.
Numbers don't lie.
View all 47 case studies →Meridian Analytics
FinTech · Series B
MODEL_DEPLOYED
Mistral 7B Instruct → Fine-tuned
Replaced a GPT-4 API call on every transaction classification with a fine-tuned Mistral 7B. Domain accuracy held at 96.1% — 0.3% below GPT-4 — at 22× lower cost.
LATENCY
COST
Vantage Robotics
Robotics · Seed
MODEL_DEPLOYED
Phi-3 Mini 3.8B (edge-quantized)
Quantized Phi-3 Mini to INT4 and deployed on-device. Eliminated cloud latency entirely. Edge inference runs on 2GB RAM — no connectivity required on the factory floor.
LATENCY
COST
Helix Research Lab
BioML · Academic
MODEL_DEPLOYED
DeepSeek-V2 Chat (MoE)
Abandoned a six-month training run after discovering DeepSeek-V2 scored 2.1% higher on their protein annotation benchmark. Saved $62K compute and shipped three months early.
LATENCY
COST
Your stack. Your budget. Your benchmark.
2,412 models indexed · Updated daily · No account required to browse