Announcing Osmos 2.0: the platform for external data ingestion

Written by 
Kirat Pandya
April 5, 2022

Data coming from external parties continues to grow 

Across every industry, businesses operate on customer and third party data. But working with external data files and systems is tedious, slow, and expensive for several reasons. 

  • The mechanism of data transfer varies for each organization involved - FTP, S3, email attachments, web uploads - and there is very little leverage to get the other party to change anything.
  • Incoming data arrives in different shapes and formats, but it all needs to land into a few curated "golden datasets" in your system.
  • The complexity of data cleanup spans across schema mapping (700 columns!) and dirty contents in the fields ("Addresses missing zip codes")
  • Finally, companies struggle with building custom solutions involving a lot of project management, custom glue code, elbow grease, and a ton of human time.

The headaches of working with messy external data

Without a universal standard for sharing data, companies importing external data from customers, partners, and vendors are easily overwhelmed by the amount of work needed to turn it into usable, structured data on a recurring basis.

working with messy external data

Here are a few common scenarios we hear from customers.

1) Customer data onboarding

All companies have some sort of onboarding process to welcome new customers and teach them how to use a product, but not all have an efficient process for onboarding customer data. Customer data onboarding is the process of ingesting online and offline customer data into your operational system(s).

Customers come to us saying,

  • Customer data: "We need to bring in datasets with 200M rows. The types of datasets are increasing faster than we can scale our eng team. A lot of it doesn't even fit on their laptops anymore."
  • Project Planning: "Our customers have legacy construction management systems. They need to bring in updated project planning information into our app every day."
  • Bulk orders: "Our largest customers send us orders via their supply chain platform, and we have a team of people who spend all day downloading the CSV, cleaning it, and putting it into Netsuite."

2) Partner data onboarding

Companies need to ingest clean partner and vendor data into their system as quickly as possible, but manually cleaning and importing data slows them down, which hinders growth.

We consistently hear,

  • Product catalogs: "I have 1000 partners who constantly update their product catalogs. Some of them have over 700 columns!"
  • Channel information: "I have 150 channel partners and they all send me channel data in unique Excel files as email attachments. I NEED to automate this - we are flying blind!"
  • Operational relationships: “We need to receive data from our customers on one side then send repackaged information to our large financial partners on the other end - everyone involved has different systems.”

3) Data migration

Even data migrations, the process of transferring existing data into a new system, is a huge data validation and cleanup pain for companies.

Our customers explain,

  • ERP migration: "We need to make it easier for our customers to migrate from their legacy ERP to our platform. It's decades of dirty data.”
  • CRM migration: "We want to move to the new platform, but need both to operate in parallel before the final move."
  • Backend migration: “I need to move product data from my legacy system to Cosmos DB and restructure the data in the process, while keeping both systems functional in parallel.”

Messy external data cleanup is a constant bottleneck

The bottleneck in all these scenarios is the manual process of data cleanup. Nearly every company we meet has deployed some mix of Excel, Python, and elbow grease to make their external data cleanup process work.

The journey usually starts with some CSVs, and a customer success person trying to manually onboard them. That becomes too much, so you buy some oversimplified CSV onboarding too  to make this person more efficient. Maybe you try to automate this by assigning some developers to the problem. But the data is too dirty so back it goes to the human with Excel as their primary tool.

Cleaning external data requires a balancing act

There's this constant balancing act between how companies and their customers and partners exchange data. Every company needs control over the shape, size, velocity, cleanliness - the whole process - to normalize the data and bring it into their operational system of record. They also need to give their customers, partners, and suppliers the freedom to share data in whatever shape and mode works for them.

Too much freedom puts the onus on your internal teams leading to a slow, tedious, and manual process. While too much control on your side puts the burden on your customers and partners leading to a poor experience.

Cleaning external data requires a balancing act

Despite the proliferation of tools designed to handle analytical and event data, there hasn't been an end-to-end solution to simplify how companies work with messy external data. Until now.

The new frontier of external data cleanup is codeless

Codeless data cleanup empowers everyone and every business to handle messy external data no matter their technical ability. 

Imagine the possibility of simply defining the shape and format of how you want to receive data, and all incoming customer and partner data automatically gets reshaped and formatted, and sent where it needs to go. 

No-code data cleanup allows you to:

  • Ingest clean data faster with less resources
  • Eradicate the need to build and maintain custom solutions
  • Optimize their engineering resources
  • Provide a better customer experience
  • Scale their data onboarding process to handle multiple use cases

Osmos: Simplify how you work with external data

Osmos is paving the path to automatic, codeless data cleanup. No matter the size, shape, format, frequency, or source. We envision a world where computer systems can communicate across organizational boundaries seamlessly, with human supervision, not human-in-the-middle-with-a-shovel-called-Excel.

If I can communicate in French without knowing French, why can't your FTP/S3/Salesforce talk just to my operational database? Osmos removed the burden of "lots of single manual data cleanups" and properly automated it. Don't use the machine to fish - teach it how to fish!

Our External Data Platform gives you control while freeing your customers, partners, and vendors to share data however is easiest for them - that is what Osmos provides.

external data platform

When we announced our Series A last year, I said, "We’re starting with data onboarding, but our ambitions are much larger. This is the beginning of a journey to becoming the 'railroads’ for inter-company data sharing.” 

Today, we are thrilled to announce the launch of Osmos 2.0 and our External Data Platform with new and updated features in the pursuit of this vision. Our no-code data transformation brings together:

1) AutoClean v2 - our machine learning (program synthesis) engine lets you clean up incredibly messy data by just providing examples, real-time. Our ML is now able to recognize and clean up phone numbers, addresses, and lots of other types of data while learning more complex if-else conditions.

osmos autoclean

2) AutoMap v2 - Our updated automatic schema inference and matching engine can automatically map headers. Powered by the advances in NLP via transformer models like the GPT3 family. Now columns are automatically mapped making it easier and faster to upload data.

osmos automap

3) Quickfixes - One-click, data cleanup from most common scenarios. They look up data across systems - this could be your own internal systems, or our ML services like address/city/state/zip extraction.

osmos quickfixes

4) More formulas, easier formulas - Math, string trimming, data cleanup, all made easier for your advanced users. Let them deploy their existing Excel skills to automate data onboarding, not just manually cleaning up one file after another every time.

osmos formulas

5) Customize AutoMap - You can now customize AutoMap for your data. These fine tuned models work across gnarly industry and scenario specific datasets.

6) Remember cleanup (coming soon) - Save your mapping and data cleanup logic to speed up future uploads.

7) Multi-dataset relationship (coming soon). Bring in datasets from various sources into Osmos. You can use it to do lookups, joins, aggregations and splits at massive scale. No code, API clients, or infrastructure required.

More flexible connectivity to systems:

  • HTTP API connector - Configure Osmos to call your existing APIs with just a few clicks. You can start receiving clean data into your systems without having to write a line of code. 
  • Email connector - Receive messy external data as email attachments? Teach Osmos how to clean it once, and permanently automate your data onboarding process.
  • Custom connectors - There is no reason data should get stranded in legacy systems. We have made advances in our schema transformation infrastructure and can now help enterprise customers deal with any system they need to exchange data with. We can build custom connectors for you in ~2 weeks.
  • 60+ connectors in private beta

7x performance increase

  • Onboard 1 million rows a minute via Osmos Pipelines or Osmos Uploaders.

Wrap Up

The Osmos low-code External Data Platform puts you in control of how you work with external data. We spend an intense amount of energy partnering with our customers, engaging deeply with their teams to really understand their headaches, and figuring out how our technology can help eliminate their external data pains. Our platform is fully customizable and configurable to handle multiple data scenarios, millions of records in minutes, large file sizes, and the ability to talk to a wide variety of systems.

Ready to reduce costs, move faster, and grow your business? Book a demo to see how Osmos 2.0 simplifies how you work with external data.

Should You Build or Buy a Data Importer?

But before you jump headfirst into building your own solution make sure you consider these eleven often overlooked and underestimated variables.

view the GUIDE

Kirat Pandya

CEO & Co-founder