Web Analytics

DataHub Raises $35M to Lead the Metadata Infrastructure Movement

DataHub, the open-source metadata platform that originated at LinkedIn, has raised $35 million in Series B funding to expand its role as the connective layer for modern enterprise data stacks. The round was led by Bessemer Venture Partners, with participation from existing investors and a growing community of enterprise users.

Founded by Swaroop Jagadish and Shirshanka Das, DataHub is poised to solve one of the most fundamental challenges in enterprise data operations: making metadata not just available, but operational, programmable, and intelligence-ready.


Why Metadata is the Missing Link in Modern Data Systems

Modern enterprises are running hundreds of data pipelines, across dozens of tools - from ingestion to storage to BI. Yet despite massive investment in the stack, the biggest operational blind spot remains metadata - the “data about data” that governs context, lineage, ownership, and policy.

DataHub steps in as a real-time, developer-first metadata layer that lets organizations answer questions like:

Its architecture is built on streaming metadata ingestion, graph-based lineage models, and APIs designed for automation. In short, it turns metadata from documentation into infrastructure.


The Funding Round

The $35M Series B was led by Bessemer Venture Partners, who’ve backed infrastructure giants like HashiCorp, Twilio, and PagerDuty. Their participation signals that metadata management is no longer a niche feature - it’s a strategic layer.

This funding will allow DataHub to:


What Sets DataHub Apart

Unlike traditional data catalogs, DataHub is built for engineering teams and embeds into the workflows of those managing data quality, governance, and pipeline health. It’s open-source, API-driven, and pluggable - qualities that appeal to developers and large-scale teams alike.

Its core design principles reflect modern engineering values:

Companies like Klarna, Stripe, and Pinterest already rely on DataHub to unify metadata across fragmented platforms.


Founders: This Is How You Embed into the Stack

What makes DataHub’s trajectory so instructive is that it didn’t try to build another layer of UI abstraction or a central dashboard. Instead, it became the metadata backbone inside other systems.

Here’s the strategic move: rather than building for the C-suite first, DataHub started at the engineer level, where data architecture actually lives. By embedding directly into jobs, pipelines, and workflows, it became indispensable.

Founders should take note: adoption doesn’t start with the prettiest UI or the loudest pitch - it starts with code-level fit. Ask yourself: if your product disappeared tomorrow, would anything break? If the answer is no, you're still optional. DataHub became essential by making metadata actionable - at the code, query, and pipeline level.

This is also where many early-stage infrastructure startups fail: they optimize too early for abstraction instead of integration. If your product can serve five edge cases brilliantly rather than one abstraction poorly, you're more likely to win trust. The best path to scale isn't breadth - it's embedded depth.


The Metadata Management Market Outlook

The metadata management market is growing rapidly - expected to rise from $6.5 billion in 2023 to $15.1 billion by 2028, at an 18.4% CAGR (MarketsandMarkets). This growth is driven by:

Gartner predicts that by 2026, 75% of data governance efforts will rely on active metadata management. Legacy catalogs can't keep up with this demand - what’s needed is a metadata platform that’s real-time, programmable, and infrastructure-ready. That’s exactly where DataHub is heading.

According to IDC, over 60% of data and analytics leaders report that a lack of metadata visibility is one of the top three blockers to scaling AI initiatives. In parallel, 40% of data engineering teams now spend more than a quarter of their time troubleshooting pipeline trust and data lineage issues.

At the same time, the shift toward data mesh architectures has made metadata the central nervous system of enterprise data - enabling decentralized domain ownership while maintaining global observability. Without a tool like DataHub, the promise of scalable, governed self-serve data remains out of reach.


What’s Next for DataHub

With $35 million in fresh capital, DataHub is doubling down on its role as a foundational metadata operating system. Priorities for the next phase include:

The ultimate goal is to create an adaptive metadata graph that not only observes but also recommends and acts. Metadata won’t just describe data. It will guide how it’s used, governed, and monetized.


Related Articles