4.4 Storage Architecture

Hybrid Storage Strategy

Blockchain storage costs prohibit storing rich metadata on-chain.

Ethereum mainnet charges approximately $20-50 per KB at current typical gas prices (5-15 gwei). A detailed model card with training configuration, dataset descriptions, and evaluation results might be 10-50 KB, costing $200-$2,500 to store on-chain. Even at these reduced costs, storing rich metadata directly on mainnet remains prohibitively expensive. Origyn solves this through hybrid storage: immutable commitments on-chain, full data on IPFS, permanent backup on Arweave.

On-Chain (Ethereum Layer 2): Store only the 32-byte provenance hash (~$0.01 on Arbitrum). This SHA-256 digest commits to the full model card content. Anyone can verify data integrity by computing the hash of off-chain content and comparing to the on-chain value.

The blockchain provides immutability and temporal ordering: once registered, hashes cannot change, and timestamps establish precedence for IP claims.

IPFS (InterPlanetary File System): Store full model cards as JSON (~1-10 KB per model). IPFS provides content-addressed storage where data is retrieved by its CID. The same content always produces the same CID, ensuring consistency. Pinning services (Filebase, Pinata, Storj) maintain availability for approximately $0.01/month per model.

The protocol can operate pinning nodes to guarantee availability of critical models. Public models benefit from distributed pinning: popular models are pinned by multiple community members, improving resilience.

Arweave: Store critical provenance records permanently (~$0.0084 per KB one-time fee). Arweave's blockweave provides permanent storage backed by a $1+ billion endowment and economic model ensuring long-term viability. High-value models (those generating significant royalty revenue) can justify Arweave storage for their complete lineage.

For a 5 KB model card, permanent storage costs $0.042 once, providing insurance against IPFS pinning failures.

This three-tier system balances cost, availability, and permanence. On-chain hashes provide immutability. IPFS provides affordable availability. Arweave provides permanent preservation. Users can choose their storage level based on model importance.

Storage Tier
Data Stored
Size
Cost
Permanence
Speed

On-chain (Ethereum L2)

Model hash, parent IDs, timestamp, creator address

~200 bytes

$0.10-$0.50 per registration

✅ Permanent

⚡ Fast (2-3 sec)

IPFS

Full model card, metadata, training config, benchmarks

1-10 KB

$0.01/month (pinning)

⚠️ Requires active pinning

🟡 Medium (1-5 sec)

Arweave

Optional backup of critical provenance records

1-10 KB

$0.0084 one-time (permanent)

✅ Permanent (endowment model)

🟠 Slow (5-30 sec)

Cost Analysis

On-chain storage at Ethereum mainnet rates is prohibitive.

A single model registration with parent relationships and metadata would cost thousands. Layer 2 networks reduce this by 10-100x: Arbitrum charges approximately $4/KB effective cost (factoring in data compression in batch submissions), Optimism similar, and newer networks like Base push toward $1/KB.

Storing only the 32-byte hash on Arbitrum costs approximately $0.01 per registration at current rates. This scales linearly: 100,000 registrations cost $1,000 in on-chain storage, affordable for a protocol treasury funded by registration fees.

IPFS monthly costs depend on pinning service pricing. Filebase charges $5/TB/month, equivalent to $0.000005 per KB per month. A 5 KB model card costs $0.000025/month or $0.0003/year. Even pinning 1 million models costs only $300/year.

Pinning services offer cost-effective redundancy: pin with 3 providers for triple redundancy at $900/year for 1 million models.

Arweave one-time costs currently run approximately $7 per MB based on network rates, equivalent to $0.0084 per KB. Storing a 5 KB model card permanently costs $0.042. For high-value models, this one-time payment ensures permanent availability regardless of pinning service reliability.

The Origyn treasury could subsidize Arweave storage for historically important models (original base models like GPT-3, Llama 2, Stable Diffusion) to guarantee their lineage remains accessible indefinitely.

Redundancy and Pinning Strategy

IPFS availability depends on active pinning.

If all pinners go offline, content becomes unavailable until someone re-pins it. Origyn mitigates this through multi-layered redundancy:

Foundation Pinning: The Origyn Foundation operates geographically distributed IPFS nodes pinning all registered models. This guarantees baseline availability controlled by a mission-aligned entity. Foundation nodes run in multiple cloud regions (AWS, GCP, Azure) and on-premise servers for resilience against any single provider failure.

Community Incentivized Pinning: Community members earn rewards for pinning high-value models. When a model's royalty revenue exceeds a threshold (e.g., $1,000/year), the protocol allocates a portion to incentivize pinning. Pinners prove availability through challenge-response protocols: validators randomly challenge pinners to provide specific content within time limits.

Successful responses earn rewards. Failed responses result in reward loss. This creates a decentralized pinning network without relying solely on foundation infrastructure.

Arweave Backup: Models generating significant royalty revenue (top 1% by revenue) qualify for automatic Arweave backup funded by protocol treasury. Once uploaded to Arweave, content persists permanently without ongoing maintenance.

This protects the most economically valuable lineages against long-term IPFS availability concerns.

Retrieval Market Integration: Origyn can integrate with Filecoin's retrieval market, allowing users to pay for expedited content delivery. Model cards larger than typical metadata (e.g., including embedded visualizations or extensive evaluation data) benefit from Filecoin's incentivized storage and retrieval market.

Content stores on Filecoin with guaranteed availability backed by collateral, retrieved quickly through a competitive market of retrieval providers.

This redundancy strategy ensures content availability through multiple independent mechanisms: foundation commitment, community incentives, permanent Arweave backup, and Filecoin market integration. No single point of failure compromises the registry's accessibility.

Last updated