‹ Home | Osmos is now part of Microsoft!

Osmos is joining Microsoft: Reflections on our journey and what comes next

Today, Osmos joins Microsoft to become part of Microsoft Fabric. This moment is both a milestone and a reflection point for us.

 First and foremost, we want to thank our investors who bet on us, the customers who trusted us with some of their hardest data problems, and the partners who leaned in and built alongside us. Thank you – we would not be here without your support, feedback, and conviction.  

Where we started

We started Osmos with a broad and ambitious goal: to solve the problem of unstructured and semi-structured data. Very quickly, it became clear that the scope of the problem was enormous. To make meaningful progress, we needed focus. Three key observations shaped our early direction:

  • Companies are forming an increasing number of “data relationships” where they need to share data with customers, suppliers, and partners
  • The volume, variety, and velocity of data are increasing 
  • The number of data sources and destinations are increasing

Together, these forces pointed to a specific and painful bottleneck: external data ingestion. Data coming into the organization was inconsistent, fragmented, and constantly changing. And it required far too much manual work to make it usable.

What we built
To address this, we built a suite of products (Pipelines, Uploaders, Datasets) designed to help customers ingest more varied data, faster, with less human effort. Under the hood, we pushed hard on automation. We built and deployed at-scale Machine Learning directly into the data pipeline (Inductive Program Synthesis using Version Space Algebra and Directed Acyclic Graph search) to enable in-pipeline real time ML to set the high watermark of what was technically possible at that time.

How the world changed
While all three observations have held true, the broader data landscape evolved faster, and more dramatically than anyone expected. The distinction between external and internal data began to blur.  Organizations increasingly wanted to combine every available data source to build the most accurate and up-to-date view of their business.

At the same time, the data lake ecosystem matured rapidly. Organizations began consolidating their data estate into fewer, more scalable and more capable platforms: systems designed to pull together structured, semi-structured, and unstructured data into one place and combine it to provide insights and visibility never before possible.

The center of gravity was shifting for data as the era of AI was launched.  

AI, language models, and a new inflection point
In parallel, we were closely following research advances in language modelling and their applicability to data engineering. From early bidrectional transformer models like BERT through the emergence of encoder-only and decoder-only autorecursive large language models in the T5/GPT era, it became clear that these techniques could fundamentally change how data systems are built and operated.

As soon as LLM technology became available outside of big research labs (GPT 3, 3.5, Llama family), we began experimenting. We started with small scale, error prone usecases, and eventually deployed our own fine-tuned and optimized LLM directly in the data pipeline path. Doing this required solving training and inference challenges in a rapidly evolving LLM ecosystem.

As we made rapid progress on our AI-assisted products, three things became clear:

  • The enterprise data estate was consolidating around the data lake.
  • The future was agentic, i.e. AI driven, with humans as final approvers as opposed to purely AI-assisted workflows.
  • The tools transforming Software Engineering (IDEs, coding agents, MCP) were necessary but insufficient to automate Data Engineering.

Data engineering required automation not just grounded in the data, but with significantly more complex guardrails to protect the data and provide human approvers tools to undo agent missteps while enabling hands-off, long-horizon automation.

The bet we made
We made a deliberate bet: automate data engineering, as close to the data lake as possible.

This belief aligned perfectly with the launch of Microsoft Fabric’s Workload Hub and extensibility platform. Fabric offered the opportunity to embed our technology directly into the data platform itself.

So, we built our first wave of products natively within Fabric, focused on autonomous ingestion, transformation, and schema evolution.

Why Microsoft
Today, we’re inspired by the possibilities ahead that our technology, integrated into Fabric, can unlock for the data ecosystem.

Microsoft shares our belief that the future of data engineering is AI-native, deeply integrated, and built for enterprise scale. Fabric provides the foundation—OneLake, unified governance, and a rapidly growing ecosystem—on top of which autonomous data engineering can truly thrive.

By bringing Osmos’s technology and team into Microsoft, we have the opportunity to accelerate what we’ve been building and deliver it to a far broader audience—directly where customers already operate their data platforms.

What Comes Next
For our customers and partners: thank you for being part of this journey with us. Your problems shaped our thinking, and your feedback pushed us to build better systems.

As part of Microsoft, we’re excited to continue this work in helping define what autonomous, AI-driven data engineering looks like when it’s built directly into the modern data platform.

We believe the most impactful chapter is still ahead.

Terms of Service

Privacy Policy

© 2025 Microsoft. All rights reserved.