3. The Origyn Solution

Origyn Provenance Protocol establishes the universal model ancestry layer the AI economy requires.

The protocol operates on three foundational principles: decentralization through blockchain-based registry without central authority, verifiability through immutable lineage captured in cryptographic structures, and financial alignment through royalty distribution that rewards contributors. Think of Origyn as Git for AI models (providing version control and commit history), NPM for model weights (creating a package registry with dependency trees), or a family tree for machine intelligence (tracking ancestry across generations).

The protocol doesn't replace existing ML tools. Researchers continue using HuggingFace for model distribution, MLflow for experiment tracking, and Weights & Biases for artifact management. Origyn adds the missing layer these platforms cannot provide alone: a cross-platform registry with immutable provenance, verifiable lineage, and automated attribution.

Integration happens through plugins, APIs, and CLI tools that fit existing workflows rather than demanding new ones.

How the Protocol Works

High-Level System Architecture

Model Registration

Registration begins when a creator publishes a model to Origyn's registry. They provide the model ID (a cryptographic hash of the weights), references to parent models if the work derives from existing models, metadata including training data identifiers and configuration files, licensing terms, and royalty settings. The creator pays a registration fee of 100 $ORIGYN tokens, with 70% burned to create deflationary pressure and 30% funding protocol development.

A smart contract stores provenance information on-chain while detailed metadata lives on IPFS, balancing immutability with storage costs.

DAG Construction

Each model registration forms a node in a directed acyclic graph (DAG) representing the AI ecosystem's evolution. When a developer fine-tunes GPT-4 for medical diagnosis, they register their model with GPT-4 as the parent. When another team quantizes that medical model for edge deployment, they register with the medical model as parent. The graph grows organically as the community builds derivatives.

Edge types capture the nature of each derivation: fine-tuning, model merging, quantization, distillation, LoRA adaptation, or pruning. Multi-parent relationships work naturally. A model created by averaging the weights of three specialist models would register all three as parents with contribution weights summing to 1.0.

Automated Royalty Distribution

Royalty distribution automates the financial attribution missing from today's ecosystem.

When a derivative model generates revenue, smart contracts calculate what flows upstream to ancestors. The default formula applies a 5% base rate with 0.5 decay per generation and a 20% total cap. If Model C generates $10,000 in revenue and derives from Model B (which derives from Model A), Model B's creator receives $500 (5%) and Model A's creator receives $125 (2.5% with 0.5 decay). The system handles complex multi-parent scenarios where contribution weights determine each ancestor's share.

Creators can opt out by setting their base rate to zero, preserving the ethos of fully open-source work while enabling commercial creators to participate in derivative value.

Validator Integrity

Validator integrity ensures the registry remains trustworthy despite decentralization. Validators stake 10,000 $ORIGYN tokens to operate nodes that verify registration claims. Anyone can challenge a fraudulent lineage by staking 1,000 tokens. If the challenge succeeds (proving the claimed parent relationship is false), the validator who approved the fraudulent registration loses 50% of their stake while the challenger receives their stake back plus a reward.

This mechanism creates economic incentives for honest behavior and community policing without requiring a central authority to adjudicate disputes.

Core Metaphors: Why This Matters

Git for AI Models

Software development flourished when Git solved version control.

Before Git, developers emailed code patches and maintained parallel branches through manual file management. Git made forking, attribution, and merging natural operations with cryptographic verification. Every line of code traces to an author and timestamp. Origyn brings this capability to model weights, where derivative work is the norm but attribution remains manual.

NPM for Model Weights

Package management enables modern software through registries like NPM. Developers declare dependencies, and the package manager ensures compatibility and retrieves the correct versions. NPM's registry shows download statistics, dependent projects, and security advisories.

Origyn creates similar infrastructure for model weights, making it possible to query which models depend on a given base model, who created each derivative, and what training data influenced the lineage.

Family Trees for Intelligence

Family trees make ancestry legible. Looking at a genealogical chart reveals who descended from whom, how traits passed through generations, and where different branches diverged. Origyn visualizes model evolution the same way, showing how GPT variants proliferated, where Stable Diffusion forks specialized, and which models merged to create new capabilities. The metaphor extends to inheritance: just as genetic traits flow through bloodlines, training data characteristics and model behaviors flow through AI lineage.

Stakeholder Benefits in Practice

Ecosystem & Stakeholders Integration

For Model Creators

Model creators gain three concrete advantages.

First, cryptographic proof of ownership through immutable registration timestamps establishes IP rights without requiring platform accounts or third-party verification. Second, automated royalty collection means passive income from derivatives without negotiating licensing agreements or tracking usage manually. Third, visibility into how their work influences the ecosystem through queryable lineage graphs showing every derivative and its applications.

An independent researcher who releases an open-source vision model can choose their participation level. Setting base rate to zero maintains fully open access while still registering lineage for attribution and visibility. Setting base rate to 5% generates income when companies commercialize derivatives without restricting initial access. The researcher sees download metrics, derivative counts, and how their work propagates through the model economy regardless of monetization choice.

For Enterprises and Developers

Enterprises and developers gain risk mitigation and compliance infrastructure.

Before deploying an AI system in healthcare, query Origyn to verify the model's training data origins and confirm all parent models comply with HIPAA requirements. During mergers and acquisitions, audit a target company's AI assets by examining their registered models, verifying ownership claims, and checking for IP disputes. When regulators request documentation, generate provenance reports directly from the registry rather than reconstructing history through archaeological examination of git repositories and documentation.

A hospital evaluating diagnostic AI can trace lineage back to original training data. If the model derives from a base model trained on European medical records (GDPR-compliant), then fine-tuned on US data (HIPAA-compliant), then adapted for specific imaging equipment, Origyn's DAG captures each step. The hospital's compliance team verifies this chain before deployment rather than trusting vendor documentation that may be incomplete or inaccurate.

For Regulators and Auditors

Regulators and auditors gain transparent oversight without compromising trade secrets. The EU AI Act requires documentation of high-risk AI systems, including training data sources and modification history. Origyn provides this through public registry queries while allowing companies to store sensitive details in encrypted off-chain storage. Regulators verify that documentation exists and matches on-chain hashes without accessing proprietary information.

This separation enables compliance verification without exposing competitive advantages.

Market surveillance authorities can query the registry to identify models deployed in regulated sectors, trace their lineage to verify compliance with documentation requirements, and flag systems lacking adequate provenance. The transparency is asymmetric: regulators see structural information (this model has 6 ancestors, training data CIDs are registered) while competitive details remain protected. Zero-knowledge proofs enhance this further, allowing companies to prove "trained on GDPR-compliant data" without revealing the dataset itself.

For the AI Ecosystem

The broader AI ecosystem benefits from reduced friction in model reuse. Today, uncertainty about attribution and licensing discourages sharing. Researchers hesitate to release models fearing commercial exploitation without credit. Companies hesitate to use open-source models fearing hidden licensing restrictions or IP disputes.

Origyn makes reuse safe and profitable for all parties.

Creators receive attribution and optional compensation. Users gain legal clarity and compliance documentation. The ecosystem moves toward composability, where model development resembles software development: building on proven foundations with clear dependency chains.

This composability unlocks network effects similar to those in software. Popular base models become platforms. Derivative creators contribute value back upstream through royalties. Data providers see their datasets power multiple model generations and receive attribution. The visibility enables discovery: developers searching for a model trained on specific data types can query the registry by ancestor rather than hoping HuggingFace search returns relevant results.

Origyn transforms model weights from opaque artifacts into legible, tradeable, and composable infrastructure.

Last updated