Spare Relay Protocol

Cheaper AI, within reach

200+ open source and proprietary models at up to 90% off. Powered by a distributed network of idle compute. One API for everything.

01 · The problem

AI compute is taxed three times.

Silicon → hyperscaler → API reseller → you. Meanwhile the real supply — your GPU, your unused subscription quota, your expiring credits — is sitting idle.

  • 3×+markup between silicon and end-user API
  • 200M+idle AI subscriptions + personal GPUs
  • 18hdaily idle time per GPU on average
  • $0.02electricity per hour on idle Apple Silicon
02 · Protocol Architecture

The third path for AI compute.
Not hyperscalers. A protocol.

One protocol routes requests to four idle supply tiers — local GPUs, unused API credits, startup credits, or spare ChatGPT / Claude / Gemini subscriptions. No reseller markup, no data-center tax.

203
Models
4
Supply tiers
OSS
Open protocol
Consumer
Any OpenAI SDK · curl · Claude Code · Cursor
POST /v1/chat/completions
Protocol
SpareAI Relay Hub
SRP · Spare Relay Protocol
T1GPU
Local GPU · Ollama / vLLM / LM Studio
T2KEY
API Key · Anthropic / OpenAI / Google
T3$
Startup Credits · YC / AWS / Microsoft
T4SUB
Subscription · Claude Max / ChatGPT Pro / Gemini Ultra
Four supply paths · one open protocol · one API
03 · OpenAI-compatible API

Change one line. Everything else works.

Native support across every major SDK. Streaming, function calling, tool use, multimodal — all preserved. Claude Code, Cursor, OpenAI / Anthropic SDKs: point them here and go.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.spareai.net/v1",
    api_key="sk-spareai-…",
)

response = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True,
)

for chunk in response:
    print(chunk.choices[0].delta.content or "", end="")
Streaming — SSE, OpenAI format
Function calling & tool use
Native OpenAI / Anthropic SDK support
Claude / GPT / Gemini / 200+ open source
04 · Buyers

Why SpareAI

Save 80%+

Claude, GPT, Gemini at 10-20% of official pricing. Open source models even cheaper.

One API, every model

OpenAI-compatible endpoint. One line to switch. Works with Claude Code, Cursor, any OpenAI SDK client.

Available everywhere

Credit card or crypto. No region locks. Developers in the Global South can afford the best AI too.

Reliable

Multi-provider automatic failover. Failed requests are never billed. Uptime on par with commercial APIs.

05 · Model catalog

200+ models, one endpoint

Open source and proprietary models, all through a single OpenAI-compatible API

ModelInput/1MOutput/1M
Seed 1.6$0.25$2.00
Command A$2.50$10.00
DeepSeek V3$0.32$0.89
DeepSeek V3 0324$0.20$0.77
DeepSeek V3.1 Terminus$0.21$0.79
Gemma 4 26B A4B IT$0.08$0.35
Gemma 4 31B IT$0.13$0.38
Llama 4 Maverick$0.15$0.60
Llama 4 Scout$0.08$0.30
MiniMax 01$0.20$1.10
Mistral Small 4$0.15$0.60
Mistral Large 3 2512$0.50$1.50
Kimi K2 0905$0.40$2.00
Hermes 3 Llama 3.1 70B$0.30$0.30
Nemotron 3 Nano 30B A3B$0.05$0.20
Qwen 3 Coder Flash$0.20$0.97
Qwen 3 Coder Plus$0.65$3.25
Qwen 3.5 Flash$0.07$0.26
MiMo V2 Pro$1.00$3.00
GLM 4.6$0.39$1.90

203+ models available through the Spare Relay Protocol

05 · Cost comparison

Same model. Up to 90% off the official price.

Idle compute has near-zero marginal cost, so the savings pass through. No subscription, no minimum, pure per-token billing.

ModelSpareAI inSpareAI outOfficialSavings
claude-sonnet-4-6
Anthropic flagship
$0.60$3.0$3.0 / $15.0−80%
claude-opus-4-6
Anthropic reasoning
$1.0$5.0$5.0 / $25.0−80%
gpt-5.4
OpenAI frontier coding
$0.50$3.0$2.5 / $15.0−80%
gemini-3-pro
Google multimodal
$0.40$2.4$2.0 / $12.0−80%
qwen-3.5-122b-moe
Open · 10B active MoE
$0.10$0.30$0.50 / $1.5−80%
minimax-m2.5-239b
Open · SOTA coding
$0.08$0.24$0.40 / $1.2−80%

Prices per million tokens (USD). Buyers pay 20% of official; providers keep 90% of buyer payment; 10% covers protocol operations.

−80%
Buyer savings
vs official API pricing
200+
Models covered
Closed + open, one API
0
Subscriptions
Pay-as-you-go, no minimum
07 · How it works

How it works

01

Providers connect capacity

GPU rigs, API keys, or idle subscriptions — one command to plug in any compute resource.

02

Protocol routes requests

Requests auto-match to the best provider — lowest price, lowest latency, automatic failover.

03

Pay per token, earn automatically

Buyers pay per token. Providers earn 90% of every request. Settled in USD or crypto.

09 · Supply tiers

Four ways to provide

Local GPU

Run Llama, DeepSeek, Qwen on your hardware. Set your price, earn per token.

Fully Legal

API Key

Forward requests through your own Anthropic/OpenAI key. Keep the margin.

Fully Legal

Enterprise Credits

Unused AI credits from accelerators or cloud programs? Turn them into cash.

Coming Soon

Subscription

Claude Max / ChatGPT Pro / Gemini Ultra quota sitting idle? Put it to work.

ToS Risk

Subscription sharing may violate the provider's terms of service. Use at your own risk. GPU and API key forwarding are fully compliant.

08 · Run a node

Turn your idle GPU or subscription into monthly income.

One command detects what you already have, registers it on-chain, and starts serving. No manual model lists, no port forwarding, no cloud account.

  • ·Zero deps · Node 18+
  • ·Background daemon auto-reconnects
  • ·stop / logs / status anytime
terminal · provider setup
$ npx spareai relay setup

› hub      hub.spareai.net
› wallet   0x7a…e31f registered
› probe    ok — 142 ms

# daemon up. earnings land in your wallet.
REQUIRESNode 18+ · any OS · no port-forwarding
Revenue split · enforced on-chain
90%
To the provider

Paid on-chain via PaySplitter to whoever's GPU, API key, credits, or subscription served the request.

10%
Protocol fee

Keeps the hub, scoring, settlement, and anti-fraud running. Flat. No tiers, no seats, no minimum.

10 · Protocol & source

This is a protocol, not a SaaS.

The Spare Relay Protocol (SRP) spec is open. Anyone can run their own Hub, their own provider, their own client. We just ship a reference implementation.

  • Open spec: WebSocket handshake, auth, routing, attestation, settlement — all public
  • Reference client spareai-cli is open; fork it to target your own hub
  • T1–T4 supply abstraction — protocol is hardware-agnostic, any OpenAI-compatible endpoint plugs in
  • No single-company lock-in — hubs can be swapped, clients can be re-implemented

Fully open source

SpareAI is the reference implementation of Spare Relay Protocol. Open protocol, open code, open to everyone.