At $1M ARR, founders can answer most of their questions by logging into their product analytics, their CRM, and their accounting system. The data is messy but manageable. By $5M ARR, the questions are harder, the systems have proliferated, and the answers require combining data from 4-6 sources that don't talk to each other.
The data infrastructure investment that prevents this — building a centralized data model early — is consistently underinvested at early-stage companies and consistently regretted by companies that skipped it.
The minimum viable data stack before $5M ARR:
A data warehouse. Snowflake, BigQuery, or Redshift depending on your cloud environment. The warehouse is the single place where data from all your systems lands in a consistent format. This is the foundation everything else builds on.
ETL pipelines for your critical systems. At minimum: your product database, your CRM (Salesforce, HubSpot), your billing system (Stripe), and your support system (Zendesk, Intercom). Fivetran or Airbyte are standard tools for managed pipelines.
A consistent customer identity layer. A mapping that ties the same customer across all systems — the Stripe customer_id and the Salesforce account_id and the product user_id all point to the same entity. Without this, cross-system queries are unreliable.
A BI tool for self-service analysis. Looker, Metabase, or Mode depending on your technical depth. The goal is enabling any team member to answer a data question without requiring an engineer.
The minimum metrics library. Standardize the calculation for your 15-20 core business metrics in the BI tool. NRR, ARR, CAC, churn, activation rate — calculated once, consistently, from the warehouse.
The cost of this stack: $5-15K/month in tooling, 1-2 engineers to build it, 4-6 months to complete. At $2-3M ARR, this is the investment that pays back 10x by the time you need to answer board questions at $10M ARR.