7. Use Cases and Applications

7. Use Cases and Applications

Origyn's provenance infrastructure enables practical applications across research, commercial AI, and regulatory compliance.

This section presents concrete scenarios demonstrating how different stakeholders use the protocol to solve real problems. Each use case shows the current friction, how Origyn addresses it, and the resulting benefits for all parties involved.

Model Licensing and Fair Attribution

An independent researcher spends six months training a specialized computer vision model on curated medical imaging datasets.

They release it open-source on HuggingFace under Apache 2.0 license, hoping to advance medical AI while building professional reputation. Within weeks, dozens of teams download and fine-tune the model for specific applications. Three months later, a startup raises $5 million in funding with their product built entirely on a fine-tuned version of the researcher's model, featuring the model's capabilities prominently in their pitch deck.

The researcher receives no compensation, no attribution beyond a GitHub star, and no awareness their work enabled this commercial success.

How Origyn Changes This Dynamic

The researcher registers their base model with a 5% royalty rate.

When the startup fine-tunes and deploys commercially, they register their derivative model, explicitly linking to the parent model through the protocol's required lineage declaration. Their first $100,000 in revenue triggers $5,000 in royalties to the original researcher automatically. The smart contract handles payment when the startup distributes royalty tokens quarterly, making attribution financial rather than merely reputational.

The researcher sees their model listed in the startup's lineage, providing both financial compensation and verifiable proof of impact.

This proof matters for future grant applications where funding committees ask "Who has used your models?" and job opportunities where employers evaluate real-world impact beyond citation counts. The on-chain lineage provides cryptographic proof that cannot be disputed or retroactively altered.

Benefits for the Startup

The startup benefits from clear licensing terms that eliminate legal ambiguity.

Instead of vague "acknowledgment" clauses in Apache licenses that lack enforcement mechanisms, Origyn's smart contracts make expectations explicit and automatic. The startup prices their product knowing royalty obligations upfront, budgeting for 5% attribution costs the same way they budget for cloud infrastructure ($5,000/month) or API fees.

When pitching investors, they demonstrate responsible IP practices by showing clean lineage documentation.

Potential acquirers conducting due diligence verify the startup holds legitimate rights to their core technology through on-chain provenance rather than relying on legal attestations that might miss hidden dependencies. The blockchain record provides instant verification that traditional legal due diligence requires weeks to establish through document review and legal opinions.

Multi-Generation Attribution

If multiple researchers contributed to the lineage, attribution flows automatically through multiple generations.

The startup's model derives from the researcher's model, which itself fine-tuned from Llama 2. If Meta released Llama 2 with zero royalty rate (as they might for open-source positioning), Meta receives attribution but no payment, building their reputation as an ecosystem enabler. The researcher receives their 5% from the startup's revenue. If instead Llama 2 had a 3% rate, the researcher would pay 3% × 0.5 decay = 1.5% of their $5,000 to Meta ($75), retaining $4,925.

The startup pays once ($5,000), and the protocol distributes appropriately without requiring the startup to identify all ancestors or calculate complex splits.

The smart contract traverses the DAG automatically, calculates each ancestor's share, and distributes tokens atomically. This automation reduces legal complexity from weeks of contract negotiation to zero ongoing effort after initial registration.

Regulatory Proof of Compliance

A healthcare technology company develops an AI system for analyzing X-ray images to detect early-stage lung conditions.

They plan deployment across hospitals in Germany and France, requiring compliance with both GDPR (data protection) and the EU AI Act (high-risk AI systems). Regulators demand proof that training data came from consented patients, no protected health information leaked into the model through parent models, and all models in the lineage meet medical device documentation standards.

Traditional approaches require extensive legal documentation costing hundreds of thousands in legal fees and months to compile.

The company assembles contracts showing data acquisition agreements, consent forms from patients, audit reports from training data providers, and impact assessments for GDPR Article 35. This documentation provides little verifiability to regulators who must trust the company's self-reporting since retrospective fabrication is difficult to detect.

Origyn Enables Cryptographic Compliance Proofs

The company registers their model with dataset CID pointing to an IPFS-stored data card.

This data card documents patient consent procedures and institutional review board approvals with timestamps proving the consents existed before training began. They implement a zero-knowledge proof circuit that verifies all training data came from the approved dataset list without revealing which specific hospitals or patient demographics contributed data, preserving both trade secrets and patient privacy.

The circuit takes private input (actual dataset CIDs used in training) and public input (merkle root of approved dataset CIDs).

It outputs true/false for compliance without revealing the private inputs. The company generates a proof locally on their own infrastructure, never exposing sensitive data. They publish the proof to IPFS, register the proof CID on-chain, and link it to their model registration.

The Regulatory Audit Process

When German BfArM (Federal Institute for Drugs and Medical Devices) audits the deployment, the company provides their Origyn model ID.

Regulators query the on-chain registry (30 seconds), retrieve the ZK-proof CID from the model metadata (10 seconds), download the proof from IPFS (5 seconds), and verify it on-chain using the verification smart contract (transaction completes in 2-3 seconds on L2). Verification costs approximately $0.025 in L2 gas.

The regulator confirms in under one minute: proof is valid, training data came from approved sources, and timestamps prove the model registered before deployment.

They don't learn which specific datasets were used or any patient information, preserving trade secrets and privacy while satisfying compliance requirements. Traditional audit processes requiring document review and in-person interviews take weeks and cost tens of thousands in auditor time.

Lineage-Based Compliance

The company's parent models follow the same pattern creating chain-of-custody documentation.

If they fine-tuned from a base model released by a research institution, that base model's registration shows its own compliance proofs for research ethics guidelines. Regulators can trace the complete lineage: base model compliant with research ethics (verified through ZK-proof), fine-tuned model compliant with medical data regulations (verified through separate ZK-proof), deployed model meeting all EU AI Act requirements (verified through final ZK-proof).

This chain satisfies Article 11 (technical documentation) and Article 12 (record-keeping) requirements efficiently without exposing sensitive training details to competitors or violating patient privacy through disclosure of training data sources.

Dataset-Linked Reward Pools

A biomedical research consortium curates a high-quality dataset of anonymized patient records with expert annotations for rare diseases.

Creating this dataset cost $2 million over three years through data collection from partner hospitals, expert labeling by board-certified specialists, quality assurance catching annotation errors, and privacy compliance ensuring HIPAA and GDPR adherence. They release it publicly for research purposes, requiring only citation in academic papers under a CC-BY license.

Pharmaceutical companies and AI labs use the dataset to train disease detection models, collectively generating hundreds of millions in commercial value.

The consortium receives academic citations that help with grant renewals but no financial return beyond the warm feeling of advancing science. This limits their ability to fund future dataset creation, with the next rare disease dataset project stalled due to lack of funding despite clear demand from the research community.

Dataset Registration and Royalties

The consortium registers their dataset on IPFS, receives a content identifier (CID), and publishes the CID through Origyn's metadata standard.

When researchers train models using this dataset, they include the dataset CID in their model registration under the datasetCID field. The model inherits a "data lineage" link in addition to "model lineage" links, creating parallel attribution for both model parents and data sources. The consortium sets a royalty rate (perhaps 2% given datasets involve different cost structures than models where pure data curation merits lower rates than model training).

When a pharmaceutical company commercializes a diagnostic tool trained on this dataset, generating $10 million in revenue, their model registration shows the dataset CID as training data.

The smart contract calculates dataset royalty: dataset creator receives 2% × $10 million = $200,000. This payment funds the next dataset curation project (recruiting another 15 hospitals, hiring 5 expert annotators, processing 50,000 patient records over 18 months), creating a sustainable cycle where dataset creators can afford to continue producing high-quality training data.

Multi-Dataset Attribution

If a model uses multiple datasets, attribution splits accordingly reflecting each dataset's contribution.

A model trained 70% on the consortium's rare disease dataset and 30% on a public radiology dataset (with zero royalty rate from a government research initiative) would pay 70% of dataset royalties to the consortium and 30% to the public dataset which receives attribution but no payment. The protocol treats data lineage similarly to model lineage, applying the same attribution logic to a different content type with CID references working identically for datasets and models.

Market Incentives for Quality

This mechanism incentivizes dataset quality over quantity through market forces.

Dataset creators compete to have their datasets chosen for high-value commercial applications rather than just maximizing citation counts. Quality datasets that enable better model performance command usage even with 1-2% royalty obligations because the value they provide (10% better model accuracy worth millions in commercial value) far exceeds the royalty cost. Poor datasets that require extensive cleaning or provide little signal get avoided despite being free, with developers preferring to pay 2% royalty for clean data that saves weeks of preprocessing work.

The market prices dataset value based on downstream commercial outcomes rather than upfront sales, aligning dataset creator incentives with model creator success in a way that pure citation-based systems cannot achieve.

Open Model Discovery and Transparency

A development team needs a multilingual code generation model fine-tuned specifically for mobile app frameworks (Swift, Kotlin, React Native).

Searching HuggingFace yields hundreds of models claiming various capabilities, but documentation quality varies wildly. Some models claim to be fine-tuned from GPT-4 (impossible given API restrictions and closed weights), others provide no information about base models or training data sources, and assessing quality requires downloading and testing each model through a time-consuming process.

Each model download requires gigabytes of bandwidth and hours per evaluation including loading into memory, running inference tests, and benchmarking performance.

Sophisticated Discovery Through Origyn

The team queries Origyn's registry: "Models descended from StarCoder, fine-tuned on mobile framework data, with performance metrics above specified thresholds."

The query returns models matching all criteria with verified lineage showing StarCoder as an ancestor (cryptographically verified through on-chain parent relationships), dataset CIDs pointing to mobile framework training data (Swift documentation, Kotlin samples, React Native codebases), and benchmark results for code completion tasks (pass@1 scores, compilation success rates, human evaluation metrics). Each result includes the complete lineage graph, showing exactly how the model evolved from StarCoder through intermediate derivatives.

Examining Lineage for Quality Signals

The team examines lineage for a promising candidate showing a four-generation chain.

Model Z derives from StarCoder → Model X (general code fine-tune) → Model Y (mobile-specific fine-tune) → Model Z (React Native specialization). They see Model X was created by a well-known AI lab with strong reputation (10 other widely-used models in their portfolio), Model Y by a mobile development consultancy with verifiable commercial clients, and Model Z by an individual developer with contributions to open-source React Native projects.

They check each ancestor's training data CIDs, review model cards for detailed descriptions, and examine benchmark results across generations.

This lineage transparency helps assess quality: models descended from reputable ancestors with clean training data likely maintain quality standards through their derivative chain. Models with suspicious lineage (claimed ancestry that doesn't match capability patterns, impossible parent relationships, no training data documentation) warrant skepticism and further investigation.

License Compatibility Verification

License compatibility becomes verifiable rather than assumed through self-reporting.

The team needs commercial usage rights for their enterprise mobile application. They check each ancestor's license field in the metadata: StarCoder released as Apache 2.0 (permissive commercial use explicitly granted), Model X also Apache 2.0 (maintaining permissive terms), Model Y switched to CC-BY-NC (non-commercial restriction introduced), Model Z couldn't legally relicense to commercial given the NC parent.

The team immediately eliminates this candidate despite good performance because the lineage reveals licensing issues that would expose them to IP litigation.

They find an alternative with full Apache 2.0 lineage across all four generations, confident they can deploy commercially without legal risk. The on-chain license tracking prevents situations where developers unknowingly violate licenses because parent model restrictions weren't clearly communicated.

Transparency Benefits for Creators

Model Y's creator (mobile consultancy) sees their model referenced in 47 derivatives through Origyn's explorer.

While they chose non-commercial licensing preventing direct monetization through royalties, the visibility attracts consulting clients who want commercial licenses. Companies that need commercial usage contact the consultancy for custom fine-tuning services (paying $50,000-$150,000 per engagement), seeing the provenance graph as a portfolio demonstrating mobile AI expertise more credibly than marketing claims could achieve.

The provenance becomes a discovery mechanism where quality work attracts commercial opportunities through transparent attribution rather than requiring active marketing.

Fork Governance and Community Models

An open-source AI community releases a base language model trained on diverse internet text with moderate performance across many tasks.

Over time, community members create hundreds of derivatives: some optimize for creative writing with temperature adjustments and training on fiction, others for factual accuracy with retrieval augmentation and training on Wikipedia, some for multilingual capabilities with translation data, others for code generation with GitHub training data. The community wants to coordinate improvements: when one member discovers a better training technique or high-quality dataset, how does that innovation propagate to all forks?

Traditional approaches rely on manual coordination creating fragmentation.

Community members announce improvements in Discord channels, GitHub issues, or forum posts. Individual fork maintainers decide whether to incorporate changes based on attention and perceived value. No systematic way tracks which forks adopted which improvements or measures aggregate impact across the fork ecosystem. Valuable innovations sometimes languish because discoverers lack visibility into the fork network and don't know who would benefit from their discoveries.

Verifiable Lineage Enables Fork Governance

The community establishes a governance model where derivative creators can vote on upstream improvements.

Voting power is proportional to their model's usage measured by derivative count or revenue if commercial. When a community member proposes a new training technique (improved learning rate schedule that reduces training time 30%), they demonstrate results on a test fork with benchmark comparisons. The community votes through a governance portal: derivative creators with active models registered on Origyn vote yes/no weighted by their models' influence.

If approved with >50% voting power supporting, the innovation becomes "blessed" in community documentation.

A badge appears on the technique's documentation page: "Community Approved - May 2025". Forks that adopt blessed techniques receive higher visibility in the community model explorer. New derivative creators see which techniques have community endorsement, guiding their development choices toward validated approaches rather than experimenting with unproven methods.

Meritocratic Voting Power

Voting power weighted by lineage impact creates natural quality signals that reward valuable contributions.

A derivative that spawned 50 further derivatives has more voting power (proportional to 50 descendants) than one with zero derivatives (minimal voting power), reflecting proven value through revealed preference of other developers. This prevents voting capture by many low-quality derivatives where someone could spam registrations to gain influence. Instead, voting power concentrates on demonstrably useful models that other developers chose to build upon, creating governance that reflects actual ecosystem value rather than registration counts.

Community Treasury Models

Royalty sharing for community models inverts the individual creator model creating sustainable open source.

The base model sets a 5% royalty rate but directs proceeds to a community treasury multisig wallet rather than individual creator addresses. Derivatives follow suit through social norms and governance encouragement, creating a community pool funded by all commercial applications descended from the community's collective work. With 20 commercial derivatives each generating $100,000 annually (total $2 million), the 5% royalty generates $100,000 annual community treasury income.

The community governance votes on treasury spending through quarterly proposals.

Funding compute for model training (rent 100 H100 GPUs for 2 weeks = $50,000), bounties for specific improvements (implement multimodal capabilities = $20,000), grants for documentation (comprehensive fine-tuning guide = $5,000), or scholarships for contributors from underrepresented groups (fund 5 students for 3 months = $25,000). This creates sustainable open-source development where community models compete with commercial alternatives not through altruism but by pooling royalty proceeds into continued development that matches or exceeds corporate R&D budgets.

Enterprise AI Compliance and Risk Management

A financial services firm deploys machine learning models for credit risk assessment, fraud detection, and algorithmic trading across their operations.

Regulatory requirements from the Federal Reserve, SEC, and international equivalents demand model governance: documentation of development methodologies before deployment, ongoing performance monitoring detecting model drift, change management processes for model updates, and audit trails proving compliance during examinations. The firm's model risk management team manually maintains spreadsheets tracking model versions, approval dates, training data sources, and responsible developers.

This process is error-prone with spreadsheets becoming out of sync with actual deployments, provides no cryptographic proof of documentation accuracy allowing retrospective fabrication, and requires costly annual audits consuming hundreds of hours of staff time and external auditor fees.

Origyn Provides Audit-Ready Provenance Infrastructure

Each model deployed in production gets registered on Origyn before launch.

Development team provides metadata including approved training data with CIDs, documented testing procedures showing validation on holdout sets, risk assessment results evaluating bias and error rates, and responsible executive signatures from the Chief Risk Officer approving deployment. The registration timestamp proves when the model was documented relative to deployment through immutable blockchain records. Auditors can verify the firm didn't create documentation retroactively after discovering issues, with the timestamp providing cryptographic proof of temporal ordering.

The immutable blockchain record prevents the firm from altering documentation to cover up problems.

If a model was documented as "approved for consumer lending" but later faces regulatory action for discriminatory outcomes, the firm cannot change the record to claim they documented bias risks originally. The blockchain provides tamper-evident history that regulators trust more than internal databases that firms control.

Version Control and Change Management

When a model undergoes modifications (retraining on new data, hyperparameter adjustments, or architecture changes), the new version registers as a derivative of the production model.

The lineage shows complete modification history with cryptographic links: Model v1.0 deployed January (registered January 5 before deployment January 15) → Model v1.1 (retrained) deployed March (registered March 3 with parent reference to v1.0) → Model v1.2 (architecture change) deployed June (registered June 1 with parent reference to v1.1). Each version links to approval documents stored on IPFS and test results showing validation performance.

Auditors trace evolution over time with a few blockchain queries, verifying each modification followed change management procedures before deployment.

They can see Model v1.2's registration includes approval from the Model Risk Committee (documented in linked meeting minutes), validation results exceeding performance thresholds (documented in linked test reports), and registration timestamp preceding deployment (proving proper sequencing). This verification takes minutes rather than days of document review, with cryptographic verification impossible through traditional document systems.

Standardized Multi-Regulator Compliance

The firm benefits from standardized compliance formats across different regulatory regimes.

Instead of maintaining different documentation formats for each regulator (Fed requires specific risk assessment format, SEC requires different disclosure format, FINRA has its own requirements, international regulators each have unique templates), they provide Origyn model IDs. Regulators query the registry using their own tools, retrieve relevant documentation fields from the standardized metadata schema, and verify cryptographic integrity through blockchain proof.

If documentation exists on-chain with correct timestamps and includes required fields, the firm demonstrates compliance regardless of which regulator is auditing.

The consistent standard reduces compliance costs from hundreds of hours per audit across multiple regulators (Fed: 200 hours, SEC: 150 hours, FINRA: 100 hours = 450 hours annually) to automated queries that reference the same underlying blockchain records. This standardization is possible because the protocol defines a common metadata schema that satisfies requirements across regulatory frameworks, similar to how XBRL standardized financial reporting.

Risk Propagation Tracking

Risk propagation becomes trackable through lineage queries preventing systemic issues.

If the firm discovers a bug in a base model (perhaps training data contained a subtle bias that affects risk assessment for certain demographics), they identify all derivatives through automated lineage queries. Model Management team receives a list in seconds: Models v2.3, v3.1, v3.2 all descended from the buggy base model version and require review for the same bias issue. Without lineage tracking, finding all affected models requires manual investigation across teams (email all model developers asking "did you use the buggy base model?"), potentially missing production systems, and risking regulatory sanctions for incomplete issue response.

With Origyn, the impact analysis is automatic and provably complete because all production models must be registered.

Research Attribution and Academic Credit

An academic researcher publishes a novel neural architecture that enables better sample efficiency in reinforcement learning, cutting training time by 40%.

They release the trained model and paper simultaneously on arXiv and HuggingFace. Within a year, hundreds of papers cite their work, achieving strong academic recognition. Many papers use the architecture without acknowledging the original model weights, claiming they "implemented" the architecture (implying training from scratch for reproducibility) when they actually fine-tuned the released weights, saving weeks of training time worth thousands in compute costs.

Academic credit goes primarily to papers with citations tracked through Google Scholar, but model reuse remains invisible in h-index calculations.

The researcher has 500 citations but doesn't know how many used the weights versus reimplemented from scratch. This incomplete picture understates their true impact where the model weights enable rapid experimentation that wouldn't have happened if every user had to train from scratch.

Visible and Verifiable Model Reuse

The researcher registers their model with metadata linking to the paper (DOI or arXiv ID creating bidirectional links).

When other researchers fine-tune the model, their registrations create verifiable citation links tracked on-chain. The researcher's Origyn profile shows: 47 direct derivatives (models that list theirs as immediate parent), 203 models in the complete descendant tree (including second and third generation derivatives), used in 18 published papers with model IDs linked to paper DOIs through metadata fields. This data supplements traditional citation metrics with model-level impact: not just who cited your paper, but who actually used your model to build their research work.

Academic Recognition and Career Benefits

Academic institutions begin recognizing this value in hiring and promotion decisions.

Grant applications include Origyn lineage graphs: "My model X has been adopted as the base for 50 derivatives across 12 institutions, demonstrating research impact beyond traditional citations where the model itself enables new research directions." Hiring committees review candidates' Origyn profiles alongside h-index and publication counts, recognizing that a researcher with 3 highly-reused models (spawning 100+ derivatives collectively) might demonstrate greater impact than one with 20 papers but no model adoption indicating pure theoretical work without practical impact.

The researcher sets zero royalty rate following academic norms emphasizing free knowledge sharing, but maintains attribution through immutable lineage records.

Commercial companies that use academic models see clear acknowledgment requirements. A company building a product on the researcher's model pays no royalties (respecting the zero-rate choice) but must maintain attribution in their product documentation and model registry because the on-chain parent relationship is immutable. If the company later monetizes derivatives generating significant revenue, they can offer a royalty-sharing agreement to the researcher directly through off-chain negotiation, enabled by the clear attribution trail Origyn provides that makes finding the original creator trivial.

Reproducibility Benefits

Research reproducibility benefits from immutable model cards linked to registrations.

When a paper claims specific training procedures produced certain results, the registered model's metadata provides verifiable training configuration through content-addressed storage. Other researchers attempting to reproduce results can compare their attempts to the original registered configuration (learning rate, batch size, optimizer, training steps all documented), identifying discrepancies that explain reproduction failures.

If reproduction fails despite matching all documented parameters, the community knows whether to question the original paper's claims or their reproduction attempt based on whether multiple independent teams failed with identical configurations.

Origyn transforms model weights from invisible research artifacts into first-class outputs with attribution infrastructure comparable to academic papers.

Just as papers cite other papers through DOIs and citation databases (Google Scholar, Semantic Scholar), models cite other models through Origyn IDs and lineage graphs visualized in the explorer. The parallel infrastructure makes model development a reputationally-rewarded activity rather than an unrecognized contribution hidden behind papers where traditional academic incentives reward publishing papers about models but not releasing the actual weights that enable further research.

Last updated