Data Transformation

How to Quickly Clean CSVs With No-Code Data Transformations

Written by 
Katrina Kirsch
November 16, 2021

CSV files are a common way to share data, because they can be easily tracked, compressed, and read by computers and people. But too many data teams struggle to quickly transform CSVs into a clean, usable format. They have to deal with missing header columns, empty rows, escape characters, uncommon cell delimiters, and more.

Cleaning CSVs is notoriously time-consuming because there are so many formats and file variations. Companies structure CSVs to work for their organization, which typically differs from how another company structures their files. So before another business can use the data, it has to be cleaned by a data engineer. This means manually writing code to try and automate the repetitious and tedious work.

The problem with custom glue code is that it won't work for every customer (not scalable), needs to be maintained, and tends to break.

Is your external data process still “get file from customer then use some app to manually clean it each time”? There is absolutely no reason that your data teams should be dealing with customer/vendor data one file at a time in a browser.

Automation for messy external data is here, and the best companies in every industry are automating the whole process with Osmos. To make CSVs usable across any system, companies can take advantage of no-code data transformation to quickly clean and validate data. Tools like Osmos simplify data transformation—the process of converting the format, structure, or values of data to the required format of a destination system.

Let's explore how no-code data transformation works and a few ways to clean up your future CSV files with “human in the loop” automation not “human clicking and editing one cell at a time.”

No-Code Data Transformations to Clean CSVs

The ability to extract, validate, and restructure data without having to write code increases the efficiency of customer data onboarding and reduces your team's workload. It also helps improve the quality of data for both your company and customers. Here are a few no-code options to transform data.

1. Column Mapping

Column mapping is a way to map a source column to a destination column. This helps quickly clean CSVs by ensuring the source data is mapped to the destination using the correct format. It's useful when the source data doesn't need to be restructured.

If the source and destination columns have the same name, the mapping can be applied automatically. For columns with different names, you can use no-code data transformation to quickly configure a mapping between the columns. This ensures the source data is loaded into the destination column despite the name differences. The example below shows how Osmos uses column mapping for columns with the same and different names.

To understand how this can be applied, imagine that one of your partner companies collects its customers' first and last names in separate columns. But your operating system needs the format to be Full Name in one column. No-code data transformation tools make it simple to map the source data into the correct format for your destination. No copy/paste necessary.

2. Providing Examples

When cleaning CSVs with no-code data transformation, providing examples is a simple way to teach the system how to generate data. The system learns the desired output from the examples and auto-populates the remaining cells with the transformed data.

Let's walk through how this works. Your customer has a CSV file containing thousands of customer addresses in one column (ie. 1000 Acme Street, New York, NY, 10001). But your operational system needs each part of the addresses in individual columns. When the customer uses Osmos Uploader to share data with you, they can use examples to transform the data before it's imported into your system.

Without writing any code, the system learns from the examples and transforms the source data based on the destination parameters. The resulting data is clean, organized, and ready to use.

3. Advanced Formulas

Advanced formulas are another way to clean up CSVs without coding. These spreadsheet-style formulas transform and map source data to the columns in the output table. As a no-code data transformation, predefined formulas can be used together to complete complex transformations and clean CSV files in minutes. At Osmos, the formula categories include date and time, math, logical, and text.

Formulas speed up data transformation and make it simpler to automate repetitive data imports. Commonly used formulas include CONCAT and DATE.

  1. The CONCAT formula outputs concatenated input values. This is useful for combining data from multiple source columns, or adding a constant value to every source record for a specific column.
  2. The DATE formula converts the input into a date. This is useful for extracting dates from a string, and/or converting the format of the date.

Bonus: Schema Validations

Errors are bound to pop up when a CSV file has missing, corrupt, or incorrect data.  At Osmos, this happens through schema validations, which are defined by you and added directly in the code snippet

Here's an example of a validation function that checks if the value of the first user-visible column does not equal "Bob."

Custom validation functions are async functions that accept a 2D array of strings as their only argument. This corresponds to the array of all inputs for all rows and columns provided by the user. Validation functions are called every time a user enters any input in the transform table, even if the field that's being validated didn't change. This is done in order to facilitate contextual validation, where the validity of a field depends on the values of other fields.

In other words, the validators detect potential values that are incorrect to ensure data is correctly formatted, mapped, and ready to use every time.

Clean Your CSVs With Osmos No-Code Data Transformations

It's possible to quickly clean the data in CSV files with no-code data transformation tools like Osmos. Using "human in the loop" functions, such as column mapping, examples, and formulas your customers can send you clean data without writing any code. No more, "human clicking and editing one cell at a time.” Ultimately, no-code data transformation improves data quality and accuracy no matter the file size, format, or how complex it is.

Bring in newer, cleaner, fresher
external data

Try Osmos for free

Katrina Kirsch