5.1 Model Lineage Registry

Registration Flow

Registering a model on Origyn takes minutes.

The creator prepares metadata including model name, parent model IDs for derivatives, derivation type (fine-tune, merge, quantize, etc.), dataset identifier pointing to training data on IPFS, training configuration specifying hyperparameters, license terms, and royalty settings. This metadata goes through a six-step process that balances decentralization with usability, making blockchain complexity invisible while preserving verifiability.

Step 1: Choose Registration Method

The command-line tool provides direct control for technical users: origyn register ./model --parent bafyxxx --type fine-tune --fee 100. The Python SDK integrates into training scripts: client.register_model(model_path, parent_id="bafyxxx", derivation_type="fine-tune").

The web UI accommodates non-technical users through form-based input and drag-and-drop model card upload.

All methods produce identical on-chain registrations. Developers choose the interface matching their workflow rather than adapting workflows to the protocol.

Step 2: Handle Payment

The creator approves 100 $ORIGYN tokens through their connected wallet (MetaMask, WalletConnect, or hardware wallet). The smart contract splits this fee with mathematical precision: 70 tokens burn permanently to address 0x000...000, reducing circulating supply forever. 30 tokens flow to the protocol treasury, funding development, security audits, and infrastructure maintenance.

This burn mechanism creates deflationary pressure as registration volume increases.

Step 3: Execute On-Chain Registration

The ModelRegistry contract mints an ERC-721 NFT representing model ownership. The token ID becomes the model's unique identifier across the ecosystem. The contract stores the provenance hash (32-byte SHA-256 digest linking to off-chain metadata), parent model IDs establishing lineage through cryptographic references, registration timestamp providing IP precedence in disputes, and creator address enabling automatic royalty distribution. The contract emits a ModelRegistered event that indexers capture for queryable databases.

This on-chain footprint remains minimal (32-200 bytes per model) to control gas costs while providing verifiable anchors for detailed metadata stored elsewhere.

Step 4: Upload Metadata to IPFS

The registration client packages the model card JSON, computes its Content Identifier (CID), uploads to IPFS through a pinning service, and stores the CID on-chain as the NFT's tokenURI.

This separation keeps detailed information off-chain where storage is affordable while maintaining cryptographic links through content addressing.

Anyone can verify metadata integrity by computing the CID from retrieved content and comparing to the on-chain value. Content addressing makes tampering detectable: changing even a single character produces a completely different CID, breaking the cryptographic link and exposing the alteration.

Step 5: Optional Arweave Backup

High-value commercial models or historically important open-source models can pay a one-time fee (approximately $0.08 for a 10KB model card) for permanent storage. The Arweave transaction ID stores on-chain as backup reference, creating redundancy across storage layers. If IPFS pinning fails years later (node operators stop hosting content, pinning services shut down), the content remains accessible through Arweave's permanent storage guarantee funded by endowment economics.

This optional layer provides insurance for models where long-term accessibility justifies the modest cost.

Step 6: Confirm Success

The creator receives their unique model ID (either the CIDv1 format starting with bafy... or an Origyn-specific format like origyn://model/12345). This ID enables lineage queries through the explorer interface, sharing with consumers for verification, listing in model marketplaces, and referencing as parent in future derivative registrations.

The entire process completes in minutes, similar to publishing a package on npm or pushing code to GitHub.

Metadata Standards and Model Cards

Origyn metadata extends HuggingFace model card conventions while adding provenance-specific fields.

Required fields capture essential provenance: model name for human readability, creator Ethereum address for ownership and royalty distribution, registration timestamp establishing IP precedence, parent model IDs array (empty for base models trained from scratch), and derivation type indicating relationship semantics. Optional fields provide discovery and technical details including model description explaining purpose and capabilities, architecture type (transformer, CNN, diffusion, VAE), parameter count (7B, 13B, 70B), dataset CID pointing to training data documentation, training configuration with hyperparameters, license identifier (SPDX format or custom terms), custom royalty rate if deviating from 5% default, tags enabling keyword search, and benchmark results showing performance metrics.

The metadata format follows JSON structure with nested objects for complex fields:

HuggingFace Compatibility

Models on HuggingFace Hub already include YAML frontmatter with base_model fields.

Origyn's browser extension and CLI tool parse these fields automatically, suggesting parent model IDs during registration. When a model card lists base_model: meta-llama/Llama-2-7b-hf, the registration tool resolves this to Origyn's registered CID for Llama 2 and pre-fills the parent field, eliminating manual CID lookups.

Users can augment HuggingFace model cards with Origyn metadata:

This enhancement makes model cards verifiable rather than self-reported.

Anyone can query Origyn to confirm the claimed parent relationship exists on-chain with proper timestamps and creator signatures. The IPFS CID provides tamper-evident storage where changing any metadata produces a different CID, making retrospective alterations immediately detectable.

Last updated