Back to Blog

Onchain vs Offchain: What to Store Where and Why It Matters

Onchain 101Build Onchain

Learn when to use onchain vs offchain storage for Web3 projects. Clear trade-offs, practical patterns, and a decision framework for cost, trust, and permanence.

Summarize this post with

Every Web3 project faces the same foundational question: what goes onchain, and what stays offchain? Get it wrong, and you'll either overpay for storage, expose sensitive data publicly, or build a product that breaks when a server goes down.

The decision shapes everything. Cost, speed, privacy, trust, and long-term flexibility all hinge on where you store your data. Yet most builders approach this choice reactively, storing everything onchain until gas fees force a redesign, or keeping everything offchain until users demand verifiability.

This guide gives you a clear framework for making storage decisions from day one. You'll learn the real trade-offs between onchain and offchain architectures, understand when IPFS or Arweave make sense, and see practical patterns for hybrid designs that balance cost and trust. Whether you're launching a token, building a DAO, or creating an onchain website, you'll walk away knowing exactly what to store where and why.

What "Onchain" and "Offchain" Actually Mean

Onchain data lives directly on the blockchain. When you store something onchain, every node in the network keeps a copy. This makes the data immutable, transparent, and accessible to anyone running a node or using a block explorer.

Examples of onchain data include:

  • Token balances and ownership records
  • Smart contract code and state variables
  • Transaction history and event logs
  • Governance votes and proposal outcomes
  • NFT metadata hashes or pointers

Offchain data lives outside the blockchain entirely. It might sit on centralized servers, decentralized storage networks like IPFS or Arweave, or private databases controlled by specific organizations.

Examples of offchain data include:

  • NFT images, videos, and media files
  • User profiles and social data
  • Application interfaces and front-end code
  • DAO discussion threads and supporting documents
  • Analytics, logs, and performance metrics

Most Web3 projects use both. They store critical, immutable data onchain and keep larger, more dynamic content offchain. The key is choosing which data belongs where based on clear criteria, not just defaulting to one approach.

The Core Trade-Offs That Drive Storage Decisions

Six dimensions govern the onchain vs offchain decision. Understanding these trade-offs helps you optimize for what actually matters to your project.

Cost

Storing data onchain is expensive. On Ethereum mainnet, writing 1KB of data can cost $10 to $50 in gas fees during normal network conditions. Layer 2 networks like Base reduce this dramatically to $0.01 to $0.10 per KB, but even L2 storage adds up fast for large datasets.

Offchain storage costs pennies by comparison. IPFS is essentially free if you run your own node, though pinning services charge $5 to $20 per month for reliable hosting. Arweave charges a one-time fee of roughly $0.01 to $0.10 per MB for permanent storage. Centralized databases cost even less for dynamic data.

When to optimize for cost: If you're storing media files, frequently updated data, or large datasets that don't require blockchain guarantees.

When cost matters less: For token balances, ownership records, or governance votes where immutability and transparency justify the expense.

Trust

Onchain data requires no trust. Anyone can verify it by running a node or checking a block explorer. The network's consensus mechanism guarantees data integrity without relying on any central party.

Offchain data requires trust in whoever controls the storage. A centralized server can go offline, change data, or censor access. Even decentralized networks like IPFS rely on someone continuing to pin the content.

When to optimize for trustlessness: For financial data, ownership records, or any information where tampering would cause real harm.

When trust is acceptable: For content that benefits from flexibility, privacy controls, or the ability to update without blockchain friction.

Permanence

Onchain data is effectively permanent. Once written to the blockchain, it cannot be altered or deleted. This creates strong guarantees for historical records but also creates risks if you store the wrong information.

Offchain data can be changed, deleted, or lost. Centralized servers go down. IPFS content disappears if no one pins it. Only Arweave offers a permanence model approaching blockchain durability, charging upfront for perpetual storage.

When to optimize for permanence: For legal agreements, proof of ownership, audit trails, or historical records that must survive indefinitely.

When flexibility matters more: For UI elements, feature updates, or content that naturally evolves over time.

Privacy

Onchain data is public by default. Every transaction, every state change, every piece of data is visible to anyone who knows where to look. This transparency is a feature for many use cases but a serious limitation for others.

Offchain data can be private. You control access through authentication, encryption, or simply not publishing it. This enables user privacy, proprietary business logic, and compliance with data protection regulations.

When to optimize for privacy: For user profiles, sensitive business data, or any information that shouldn't be permanently public.

When transparency matters more: For governance decisions, token distributions, or financial operations where public verification builds legitimacy.

Speed

Offchain systems respond instantly. Databases return queries in milliseconds. API calls complete in under a second. Users get immediate feedback.

Onchain operations take time. Ethereum transactions need 12+ seconds for confirmation. Base transactions confirm in 2 seconds but still require network propagation. Complex smart contract interactions can take even longer.

When to optimize for speed: For user interfaces, real-time updates, or features where waiting for block confirmation hurts the experience.

When finality matters more: For token transfers, ownership changes, or state transitions where users need cryptographic proof of completion.

Composability

Onchain data is natively composable. Any smart contract can read another contract's state, call its functions, or build on its logic. This enables the "money legos" that make DeFi and Web3 powerful.

Offchain data requires bridges to participate in onchain logic. APIs must be called. Oracle networks must relay information. Composability becomes possible but requires additional infrastructure and trust assumptions.

When to optimize for composability: For protocol-layer building blocks, financial primitives, or features that other developers should be able to build on.

When isolation is acceptable: For application-specific features, user experience layers, or data that doesn't need to interact with other smart contracts.

Where Onchain Data Actually Lives

Understanding the technical differences between storage options helps you optimize costs and choose the right mechanism for each data type.

Contract Storage

Contract storage holds the persistent state variables in your smart contract. Every time you declare a state variable in Solidity, it occupies a storage slot that persists across transactions.

contract TokenRegistry {
    mapping(address => uint256) public balances; // Lives in storage
    address public owner; // Lives in storage
}

Storage is the most expensive data location. Reading from storage costs 200 to 2,100 gas depending on whether the slot was previously accessed. Writing to storage costs 20,000 gas for a new slot or 5,000 gas to update an existing one.

Use storage for: Token balances, ownership records, protocol parameters, governance state.

Avoid storage for: Large datasets, frequently updated values, temporary computations.

Event Logs

Event logs record information about contract activity without storing it in contract state. Events are cheaper than storage but cannot be accessed by other smart contracts.

event Transfer(address indexed from, address indexed to, uint256 value);

Emitting an event costs roughly 375 gas plus 375 gas per indexed parameter and 8 gas per byte of data. This makes events about 10x cheaper than storage for equivalent data.

Use events for: Transaction history, audit trails, off-chain indexing triggers, user activity records.

Avoid events for: Data that contracts need to read, current state that determines behavior.

Calldata

Calldata contains the input parameters passed to a function. It's read-only and exists only during transaction execution. Calldata is cheap but even more limited than events.

function verify(bytes calldata proof) external {
    // proof exists only during this call
}

Calldata costs 4 gas per zero byte and 16 gas per non-zero byte. This makes it the cheapest way to pass data into a contract, though the data isn't permanently stored.

Use calldata for: Large inputs needed for verification, temporary data for single-transaction operations.

Avoid calldata for: Any data you need to access later, state that must persist.

L1 vs L2: The Cost-Security Spectrum

Layer 2 networks like Base, Optimism, and Arbitrum reduce storage costs by processing transactions off the main Ethereum chain while inheriting Ethereum's security through periodic settlement.

Base transactions cost 10 to 100x less than Ethereum mainnet because you're only paying for L2 execution, not L1 data availability. A typical token transfer on Base costs $0.01 to $0.05 versus $1 to $5 on mainnet.

The trade-off is a slightly longer finality window. Base transactions achieve probabilistic finality in seconds but only achieve true finality when the L2 state root settles to Ethereum. For most applications, this 30-minute to 2-hour delay is acceptable.

Design pattern: Build on Base for cost efficiency, anchor critical state roots to Ethereum for maximum security.

Offchain Storage Options Explained

Centralized Databases and Servers

Traditional databases offer the fastest, cheapest, most flexible storage option. You control access, can update data freely, and get instant query responses.

The trade-off is complete trust dependence. Users must trust you to keep servers running, not tamper with data, and not censor access. For many applications, this is perfectly acceptable if you're transparent about the trust model.

Use centralized storage for: User profiles, analytics, frequently updated app state, private data, compliance-sensitive information.

When to avoid: For data where users need verifiable integrity or data that must survive company failure.

IPFS and Filecoin: Content-Addressed Storage

IPFS (InterPlanetary File System) identifies files by their content hash rather than their location. Upload a file, get a Content Identifier (CID), and anyone can retrieve it from any node that has pinned it.

The CID is a cryptographic hash of the file's content, making IPFS naturally tamper-proof. If someone changes the file, the CID changes. This makes IPFS ideal for storing verifiable content with an onchain hash pointer.

The catch is persistence. Files only remain available as long as someone continues to pin them. If all nodes stop pinning your content, it disappears. Filecoin addresses this by creating a market where storage providers are paid to maintain data, but it adds cost and complexity.

Use IPFS for: NFT metadata and media, website front-ends, documents that need verifiable integrity, content with moderate durability requirements.

Combine with pinning services: Pinata, NFT.Storage, or Filebase ensure your content stays available without running infrastructure.

Arweave: Permanent Storage

Arweave takes a different approach. Instead of ongoing storage markets, it charges a one-time upfront fee calculated to cover perpetual storage based on declining hardware costs.

Upload data to Arweave, pay once, and the network economically incentivizes miners to maintain it forever. This makes Arweave particularly attractive for historical records, legal documents, or any content that must survive decades.

The downside is higher initial cost and slower retrieval than traditional CDNs. Arweave also doesn't support content updates; every version is a new permanent upload.

Use Arweave for: Long-term archives, legal documents, historical records, immutable website versions, content that must outlive organizations.

IPFS vs Arweave: Choosing Between Them

Dimension IPFS Arweave
Cost Model Free (self-host) or $5-20/month (pinning) $0.01-0.10 per MB one-time
Permanence Requires active pinning Permanent by design
Retrieval Speed Fast with good pinning Slower, improving with gateways
Updates New CID for each version New transaction for each version
Best For Dynamic projects, moderate durability Archives, historical records, legal docs

Most projects start with IPFS because it's flexible and cost-effective during early iterations. As content stabilizes and permanence becomes important, Arweave becomes more attractive.

Hybrid Architectures: The Practical Default

The majority of successful Web3 projects use hybrid designs that store critical data onchain and everything else offchain with onchain proofs.

Pattern 1: Store Content Offchain, Hash Onchain

Upload your content to IPFS or Arweave, generate its hash, store only that hash onchain. Users can download the offchain content and verify it matches the onchain hash.

contract VerifiedContent {
    mapping(uint256 => bytes32) public contentHashes;

    function publishContent(uint256 id, bytes32 hash) external {
        contentHashes[id] = hash;
    }
}

This pattern works for NFT metadata, legal documents, website front-ends, and any content where integrity matters more than onchain access.

Pattern 2: Wallet Signatures for Authorization

Skip traditional passwords entirely. Let users prove identity by signing messages with their wallet. The signature proves they control a specific address without revealing private keys.

function authenticate(bytes memory signature, string memory message) external {
    address signer = recoverSigner(message, signature);
    require(hasAccess[signer], "Unauthorized");
}

This eliminates password databases, phishing risks, and account recovery flows while maintaining strong security.

Pattern 3: Commit-Reveal for Fairness

When users shouldn't see each other's submissions before a deadline, use commit-reveal. Users first submit a hash of their data (commit phase), then reveal the actual data after the deadline (reveal phase).

function commit(bytes32 hash) external {
    commitments[msg.sender] = hash;
}

function reveal(string memory data) external {
    require(keccak256(bytes(data)) == commitments[msg.sender]);
    revealed[msg.sender] = data;
}

This prevents frontrunning in auctions, fair voting in governance, and gaming in prediction markets.

Pattern 4: L2 Execution, L1 Anchoring

Build your application on Base or another L2 for low-cost transactions. Periodically commit state roots to Ethereum mainnet for maximum security.

Most users interact entirely on L2 and benefit from low fees. Power users who need absolute security can verify the L1 state root and challenge invalid L2 state transitions.

Getting Offchain Data Onchain: Oracles and Proofs

Sometimes you need offchain information to influence onchain logic. Price feeds, sports scores, weather data, or real-world events must cross the onchain-offchain boundary.

Oracle Basics

Oracles are services that relay offchain data to smart contracts. They query APIs, aggregate multiple sources, and publish results onchain where contracts can access them.

The challenge is trust. A compromised oracle can feed false data to your contract. Different oracles address this with different trust models:

Centralized oracles (Coinbase Oracle) offer speed and simplicity but require trusting a single entity.

Decentralized oracles (Chainlink) aggregate data from multiple independent node operators, making manipulation harder but adding cost and complexity.

Optimistic oracles (UMA) assume data is correct unless challenged, using economic incentives and dispute resolution.

Timestamping Offchain Data

Even if you can't bring full datasets onchain, you can timestamp their state. Upload a dataset to IPFS, compute its Merkle root, store that root onchain with a timestamp.

Later, users can prove any piece of the dataset existed at that time by providing a Merkle proof. This works for supply chain data, financial records, or any verifiable history.

When Oracle Dependence Is Acceptable

Using oracles is fine when:

  • The data source is objectively verifiable (price feeds, timestamps)
  • Multiple independent oracles agree on the value
  • The economic value at stake justifies oracle costs
  • Your application can tolerate oracle downtime

Avoid oracles when:

  • Data is subjective or easily manipulated
  • The trust model collapses to a single point of failure
  • Users need onchain guarantees without external dependencies

What Not to Put Onchain

Some data should never live on a public blockchain, regardless of cost or technical feasibility.

Personally Identifiable Information

Names, addresses, phone numbers, email addresses, social security numbers, medical records. Publishing PII onchain creates permanent privacy violations that no amount of encryption or hashing can fully mitigate.

Even if you hash PII before storing it, determined attackers can often reverse small datasets through rainbow tables or brute force. The legal implications are severe, especially under GDPR and similar regulations.

Safe alternative: Store PII in encrypted databases with proper access controls. Store only anonymous user IDs or commitments onchain if needed for authentication.

Private Keys and Secrets

Never store private keys, API credentials, encryption keys, or any secret material onchain. The blockchain is public. Everyone can read everything, forever.

Safe alternative: Use secure key management systems, hardware security modules, or threshold signature schemes for production secrets.

Copyrighted or Legally Sensitive Content

Storing copyrighted content without permission creates immutable evidence of infringement. Storing content subject to takedown notices or legal disputes creates compliance problems you cannot solve.

Safe alternative: Store content on platforms that support takedown processes. Use onchain hashes only to prove authenticity of authorized content.

Large Files and Media

Even ignoring cost, blockchains aren't designed for large files. They optimize for small, frequently-read state, not gigabytes of video or images.

Safe alternative: Use IPFS or Arweave for media. Store CIDs or hashes onchain to prove integrity.

Use-Case Playbook: What to Store Where

NFTs and Media

Onchain: Token ID, owner address, metadata hash or CID Offchain: Images, videos, detailed metadata JSON

Store the ERC-721 or ERC-1155 contract onchain with a tokenURI function that returns an IPFS or Arweave URL. The URL points to a JSON file containing the full metadata and media references.

function tokenURI(uint256 tokenId) public view returns (string memory) {
    return string(abi.encodePacked("ipfs://", tokenCID[tokenId]));
}

Community Tokens

Onchain: Token contract, balances, transfer logic, governance rules Offchain: Analytics dashboards, holder lists, engagement metrics

Token economics live onchain where they're transparent and immutable. Analytics and user-facing dashboards pull data from onchain events but compute and display it offchain for speed.

DAO Governance

Onchain: Proposal text hashes, vote tallies, execution logic Offchain: Discussion threads, supporting documents, voter rationales

Critical governance decisions go onchain. Discussions happen offchain on forums or Discord where they can be rich, threaded, and updated. Snapshot uses offchain voting with cryptographic signatures for gas-free governance.

Onchain Websites and Apps

Onchain: Domain pointer (ENS), content hash reference Offchain (IPFS/Arweave): HTML, CSS, JavaScript, images

Host your front-end on IPFS or Arweave. Store the CID in an ENS record or onchain registry. Users can verify they're loading the authentic version by comparing the CID with the onchain pointer.

Wallet-based authentication eliminates backend servers entirely. Smart accounts handle user sessions without passwords.

Loyalty and Rewards Programs

Onchain: Periodic checkpoints, redemption transactions Offchain: Real-time point balances, activity tracking

Track user activity offchain for speed and flexibility. Periodically commit Merkle roots onchain so users can prove their balances. When users redeem rewards, verify their proof and execute the onchain transaction.

Verifiability in Practice

Hybrid architectures only work if users can actually verify the offchain content matches onchain references.

How to Verify Content Hashes

  1. Download the offchain content from IPFS or Arweave
  2. Compute its hash using the same algorithm (usually keccak256 or SHA-256)
  3. Compare the computed hash with the onchain reference
  4. If they match, the content is authentic and unmodified

Many wallets and block explorers handle this automatically. MetaMask can verify IPFS content behind ENS domains. Etherscan shows event logs where you can see published hashes.

Reading CIDs and Understanding IPFS

IPFS CIDs look like QmXoypizjW3WknFiJnKLwHCnL72vedxjQkDDP1mXWo6uco. The prefix indicates the CID version, hash function, and encoding. CIDv1 starts with bafy... and is gradually replacing CIDv0.

To retrieve content, use any IPFS gateway:

  • https://ipfs.io/ipfs/[CID]
  • https://gateway.pinata.cloud/ipfs/[CID]
  • https://[CID].ipfs.dweb.link

IPFS content disappears if no one pins it. To ensure persistence:

  • Use paid pinning services like Pinata or NFT.Storage
  • Run your own IPFS node for critical content
  • Use Filecoin to create economic incentives for storage
  • Or choose Arweave for permanent storage with higher upfront cost

Explorer Tips for Verification

On Etherscan or Basescan:

  • Check the Events tab to see emitted logs with content hashes
  • Verify the Contract tab shows verified source code
  • Read State changes to confirm storage updates
  • Trace Internal Transactions to understand contract interactions

For NFTs, check the tokenURI to see where metadata lives. For DAOs, read governance event logs to verify proposal and voting history.

Cost, Performance, and UX: Designing for Real Users

Reducing Gas Costs

Compress data before storing: Use bit-packing, optimize structs, remove unnecessary state.

Batch operations: Group multiple updates into single transactions.

Store hashes, not payloads: Use onchain pointers to offchain content.

Use events instead of storage: If other contracts don't need to read it, emit an event.

Choose L2 over L1: Base offers 10-100x cost savings with minimal security trade-offs.

Balancing Speed and Finality

Layer 2 transactions confirm in 2 seconds but may not be final for 30 minutes to 2 hours. Design your UI to communicate this clearly.

Show pending transactions immediately with visual indicators. Update to confirmed when the L2 block includes it. Mark as final only after L1 settlement if that matters for your use case.

For most applications, L2 confirmation is sufficient. Users care about seeing their action complete, not about theoretical rollback scenarios.

Handling Upgrades and Versioning

Smart contracts are immutable by default. To update logic without redeploying:

Proxy patterns separate storage from logic. The proxy holds state and delegates calls to an implementation contract you can upgrade.

Versioned offchain content keeps iterations separate. Store each version's hash onchain so users can access previous states.

Transparent upgrade mechanisms give users notice before changes. Time locks, governance votes, or multisig approvals prevent surprise updates.

Common Pitfalls and a Practical Checklist

Over-Onchain Designs

Storing everything onchain drives up gas costs, slows user experience, and creates technical debt. The most common mistake is treating the blockchain like a database and writing all application state to storage.

Fix: Store only state that must be trustless and permanent. Move everything else offchain.

Oracles Without Clear Trust Models

Using an oracle without understanding its trust assumptions creates hidden centralization risks. If the oracle fails or feeds bad data, your entire application breaks.

Fix: Document oracle dependencies clearly. Use multiple independent sources when possible. Implement sanity checks on oracle data.

Missing Content Persistence Plans

Deploying an NFT project that points to IPFS without pinning the content is a recipe for broken metadata. Unpinned IPFS content disappears when nodes stop hosting it.

Fix: Use paid pinning services, run your own nodes, or choose Arweave for permanent storage. Test retrieval from multiple gateways before launch.

Publishing Private Data

Once data goes onchain, it's public forever. Accidentally publishing user data, business secrets, or legally sensitive information creates permanent problems.

Fix: Audit what goes onchain. Use encryption for sensitive offchain data. Train your team on the permanence and transparency of blockchain storage.

Storage Decision Checklist

Before choosing where to store data, answer these questions:

Must this data be verifiable by anyone?

  • Yes → Onchain or offchain with onchain hash
  • No → Offchain with access controls

Does this data need to be permanent and immutable?

  • Yes → Onchain (critical) or Arweave (large)
  • No → IPFS or centralized storage

How large is the dataset?

  • < 1KB → Consider onchain storage or calldata
  • 1KB–1MB → IPFS or Arweave
  • 1MB → Definitely offchain

How often does it change?

  • Never → Onchain or Arweave
  • Occasionally → IPFS with new CIDs
  • Frequently → Centralized database

Who needs to access it?

  • Other smart contracts → Must be onchain
  • Public users → IPFS or Arweave
  • Private users → Encrypted offchain storage

What's your budget?

  • Limited → L2 + IPFS or centralized storage
  • Moderate → L1 for critical data, L2 for everything else
  • High → Full onchain with L1 settlement

What's your trust model?

  • Zero trust → Fully onchain
  • Some trust acceptable → Hybrid with oracles
  • Trust isn't a concern → Centralized with onchain proofs

This framework gives you a starting point for every storage decision. As you build, revisit these trade-offs whenever requirements change or costs become a constraint.


The onchain vs offchain decision isn't binary. Most projects succeed by storing critical data onchain and keeping everything else offchain with cryptographic proofs connecting the two. Start with this hybrid approach, optimize for your specific constraints, and remember that storage architecture can evolve as your project grows.