Tensor — Find, Compare & Deploy ML Models Instantly

// marketplace_data.json · updated 2026-02-25

The State of Inference,
by the numbers.

MOST DEPLOYED ARCHITECTURE

Transformer (Decoder-Only)

Q1 2026 · 61.4% of deployments

Models Indexed

+184 added this month

Median Cost Decline

Inference cost drop over 12 months

0.0B

Avg Params This Quarter

Most-deployed architecture size

Teams Reduced Cost

vs. training from scratch

INFERENCE_COST_INDEX · 12-MONTH TREND↓ 67% YoY

Feb 2025Aug 2025Feb 2026

// model_comparison · live data

Compare Models. Understand the Numbers.

8 models

MODEL

Qwen2 72B Instruct

open72B

92.1%

$380

48ms

42G

Llama-3 70B Instruct

open70B

91.2%

$340

42ms

40G

DeepSeek-V2 Chat

open236B MoE

90.3%

$95

22ms

24G

Gemma 2 9B IT

fine-tune9B

85.1%

$65

12ms

10G

GPT-4o Mini

apiAPI

84.2%

$18

4ms

—

Mistral 7B Instruct v0.3

fine-tune7B

83.4%

$48

8ms

Claude 3 Haiku

apiAPI

82.9%

$28

6ms

—

Phi-3 Mini 3.8B

fine-tune3.8B

79.8%

$22

5ms

Explore Models Free

For ML Teams at Mid-Stage Startups

You need 90th-percentile accuracy at 10th-percentile cost. Models in the 7B–13B range — Mistral, Phi-3, Gemma 2 — hit that sweet spot. Fine-tune on your domain data, serve on a single A10G, and keep your inference bill under $200/month. The 70B models are overkill unless your task genuinely needs them.

RECOMMENDATION

Recommended: Mistral 7B Instruct · Phi-3 Mini

< $100/mo

typical inference cost

Read the Full Benchmark Report

42-page analysis · Q1 2026 · Free with email

// case_studies · production deployments

Deployed in Production.
Numbers don't lie.

View all 47 case studies →

Meridian Analytics

FinTech · Series B

Cost Reduction

MODEL_DEPLOYED

Mistral 7B Instruct → Fine-tuned

Replaced a GPT-4 API call on every transaction classification with a fine-tuned Mistral 7B. Domain accuracy held at 96.1% — 0.3% below GPT-4 — at 22× lower cost.

LATENCY

340ms38ms

COST

$4,200/mo$190/mo

Vantage Robotics

Robotics · Seed

Edge Deployment

MODEL_DEPLOYED

Phi-3 Mini 3.8B (edge-quantized)

Quantized Phi-3 Mini to INT4 and deployed on-device. Eliminated cloud latency entirely. Edge inference runs on 2GB RAM — no connectivity required on the factory floor.

LATENCY

1,200ms9ms

COST

$890/mo$0/mo

Helix Research Lab

BioML · Academic

Training Avoided

MODEL_DEPLOYED

DeepSeek-V2 Chat (MoE)

Abandoned a six-month training run after discovering DeepSeek-V2 scored 2.1% higher on their protein annotation benchmark. Saved $62K compute and shipped three months early.

LATENCY

2,800ms24ms

COST

Training: $62,000$95/mo

Your stack. Your budget. Your benchmark.

2,412 models indexed · Updated daily · No account required to browse

Read Benchmark Report Explore Models Free

The State of Inference,by the numbers.

Compare Models. Understand the Numbers.

For ML Teams at Mid-Stage Startups

Deployed in Production.Numbers don't lie.

Meridian Analytics

Vantage Robotics

Helix Research Lab

Your stack. Your budget. Your benchmark.

The State of Inference,
by the numbers.

Deployed in Production.
Numbers don't lie.