DataHub, the open-source metadata platform that originated at LinkedIn, has raised $35 million in Series B funding to expand its role as the connective layer for modern enterprise data stacks. The round was led by Bessemer Venture Partners, with participation from existing investors and a growing community of enterprise users.
Founded by Swaroop Jagadish and Shirshanka Das, DataHub is poised to solve one of the most fundamental challenges in enterprise data operations: making metadata not just available, but operational, programmable, and intelligence-ready.
Why Metadata is the Missing Link in Modern Data Systems
Modern enterprises are running hundreds of data pipelines, across dozens of tools - from ingestion to storage to BI. Yet despite massive investment in the stack, the biggest operational blind spot remains metadata - the “data about data” that governs context, lineage, ownership, and policy.
DataHub steps in as a real-time, developer-first metadata layer that lets organizations answer questions like:
- Where did this data come from?
- Who owns it?
- Can we trust it?
- Is it compliant?
Its architecture is built on streaming metadata ingestion, graph-based lineage models, and APIs designed for automation. In short, it turns metadata from documentation into infrastructure.
The Funding Round
The $35M Series B was led by Bessemer Venture Partners, who’ve backed infrastructure giants like HashiCorp, Twilio, and PagerDuty. Their participation signals that metadata management is no longer a niche feature - it’s a strategic layer.
This funding will allow DataHub to:
- Expand integrations with cloud platforms, orchestration tools, and ML workflows
- Enhance real-time observability and anomaly detection
- Grow its enterprise offering with SSO, RBAC, SLAs, and compliance support
- Deepen AI-powered capabilities for metadata enrichment and context inference
What Sets DataHub Apart
Unlike traditional data catalogs, DataHub is built for engineering teams and embeds into the workflows of those managing data quality, governance, and pipeline health. It’s open-source, API-driven, and pluggable - qualities that appeal to developers and large-scale teams alike.
Its core design principles reflect modern engineering values:
- Push-based metadata (not slow crawls)
- Versioned metadata for governance audits
- Event-driven architecture to support lineage and change tracking
- Contextual UIs tailored for both analysts and platform teams
Companies like Klarna, Stripe, and Pinterest already rely on DataHub to unify metadata across fragmented platforms.
Founders: This Is How You Embed into the Stack
What makes DataHub’s trajectory so instructive is that it didn’t try to build another layer of UI abstraction or a central dashboard. Instead, it became the metadata backbone inside other systems.
Here’s the strategic move: rather than building for the C-suite first, DataHub started at the engineer level, where data architecture actually lives. By embedding directly into jobs, pipelines, and workflows, it became indispensable.
Founders should take note: adoption doesn’t start with the prettiest UI or the loudest pitch - it starts with code-level fit. Ask yourself: if your product disappeared tomorrow, would anything break? If the answer is no, you're still optional. DataHub became essential by making metadata actionable - at the code, query, and pipeline level.
This is also where many early-stage infrastructure startups fail: they optimize too early for abstraction instead of integration. If your product can serve five edge cases brilliantly rather than one abstraction poorly, you're more likely to win trust. The best path to scale isn't breadth - it's embedded depth.
The Metadata Management Market Outlook
The metadata management market is growing rapidly - expected to rise from $6.5 billion in 2023 to $15.1 billion by 2028, at an 18.4% CAGR (MarketsandMarkets). This growth is driven by:
- Increasing complexity of cloud-native and hybrid data environments
- Regulatory demands around data access, lineage, and privacy (GDPR, CCPA, HIPAA)
- The need for unified governance as companies adopt data mesh and decentralized architectures
Gartner predicts that by 2026, 75% of data governance efforts will rely on active metadata management. Legacy catalogs can't keep up with this demand - what’s needed is a metadata platform that’s real-time, programmable, and infrastructure-ready. That’s exactly where DataHub is heading.
According to IDC, over 60% of data and analytics leaders report that a lack of metadata visibility is one of the top three blockers to scaling AI initiatives. In parallel, 40% of data engineering teams now spend more than a quarter of their time troubleshooting pipeline trust and data lineage issues.
At the same time, the shift toward data mesh architectures has made metadata the central nervous system of enterprise data - enabling decentralized domain ownership while maintaining global observability. Without a tool like DataHub, the promise of scalable, governed self-serve data remains out of reach.
What’s Next for DataHub
With $35 million in fresh capital, DataHub is doubling down on its role as a foundational metadata operating system. Priorities for the next phase include:
- AI-assisted discovery and governance, where the system auto-suggests tags, owners, and data domains
- Enterprise-ready features like fine-grained access control, audit trails, and SLAs
- Developer expansion, with SDKs, CLI tools, and plug-ins for Airflow, dbt, Looker, and beyond
- Community acceleration, through OSS contributions, meetups, and training content
The ultimate goal is to create an adaptive metadata graph that not only observes but also recommends and acts. Metadata won’t just describe data. It will guide how it’s used, governed, and monetized.