Data Transformation

Clean CSV Files with No-Code Data Transformation

Written by 
Katrina Kirsch
November 16, 2021

CSV files are a common way to share data, because they can be easily tracked, compressed, and read by computers and people. But too many data teams struggle to quickly transform CSVs into a clean, usable format. They have to deal with missing header columns, empty rows, escape characters, uncommon cell delimiters, and more.

Cleaning CSVs is notoriously time-consuming because there are so many formats and file variations. Companies structure CSVs to work for their organization, which typically differs from how another company structures their files. So before another business can use the data, it has to be cleaned by a data engineer. This means manually writing code to try and automate the repetitious and tedious work.

Clean CSVs with No-Code Data Transformations Tools

Is your external data process still “get file from customer then use some app to manually clean it each time”? There is absolutely no reason that your teams should be dealing with customer/vendor data one file at a time in a browser.

Automated solutions to handle messy external data is here, and the best companies in every industry are automating the whole process with Osmos. To make CSVs usable across any system, companies can take advantage of no-code data transformation to quickly validate and clean CSVs. Tools like Osmos simplify cleaning CSV files from customers and partners—the process of converting the format, structure, or values of data to the required format of a destination system.

The ability to extract, validate, and restructure data without having to write code increases the efficiency of customer data onboarding and reduces your team's workload. It also helps improve the quality of data for both your company and customers. Here are a few no-code options to transform data.

quickly clean CSVs without writing code

Let's explore how no-code data transformation works and a few ways to clean up your future CSV files with “human in the loop” automation not “human clicking and editing one cell at a time.”

1. Column Mapping

Column mapping is a way to map a source column to a destination column. This helps quickly clean CSVs by ensuring the source data is mapped to the destination using the correct format. It's useful when the source data doesn't need to be restructured.

If the source and destination columns have the same name, the mapping can be applied automatically. For columns with different names, you can use no-code data transformation to quickly configure a mapping between the columns. This ensures the source data is loaded into the destination column despite the name differences. The example below shows how Osmos uses column mapping for columns with the same and different names.

Osmos column mapping

To understand how this can be applied, imagine that one of your partner companies collects its customers' first and last names in separate columns. But your operating system needs the format to be Full Name in one column. No-code data transformation tools make it simple to map the source data into the correct format for your destination. No copy/paste necessary.

2. Bulk Edit Column Data by Providing Examples

When cleaning CSVs providing examples is a simple way to teach the our system how to generate data. The system learns the desired output from the examples then makes bulk edits to auto-populate the remaining cells with the clean data.

Let's walk through how this works. Your customer has a CSV file containing thousands of customer addresses in one column (ie. 1000 Acme Street, New York, NY, 10001). But your operational system needs each part of the addresses in individual columns. When the customer uses Osmos Uploader to share data with you, they can use examples to transform the data before it's imported into your system.

osmos autoclean for bulk edits

Without writing any code, Osmos AutoClean learns from the examples and transforms the source data based on the destination parameters. The resulting data is clean, organized, and ready to use.

3. QuickFixes

QuickFixes are one-click, data-cleanup buttons that allow you to easily clean up your data for the most common scenarios for that data type (i.e. Date, Text, Numeric, etc.). You can combine multiple QuickFixes to both cleanup your data and resolve errors easily.

One-click, QuickFixes

4. Advanced Formulas

Advanced formulas are another way to clean up CSVs without coding. These spreadsheet-style formulas transform and map source data to the columns in the output table. As a no-code data transformation, predefined formulas can be used together to complete complex transformations and clean CSV files in minutes. At Osmos, the formula categories include date and time, math, logical, and text.

osmos advanced formulas for data clean up

Formulas speed up data transformation and make it simpler to automate repetitive data imports. Commonly used formulas include CONCAT and DATE.

  1. The CONCAT formula outputs concatenated input values. This is useful for combining data from multiple source columns, or adding a constant value to every source record for a specific column.
  2. The DATE formula converts the input into a date. This is useful for extracting dates from a string, and/or converting the format of the date.

Bonus: Data Validations

Errors are bound to pop up when a CSV file has missing, corrupt, or incorrect data.  At Osmos, this happens through schema validations, which are defined by you and added directly in the code snippet

Here's an example of a validation function that checks if the value of the first user-visible column does not equal "Bob."

osmos data validations

Custom validation functions are async functions that accept a 2D array of strings as their only argument. This corresponds to the array of all inputs for all rows and columns provided by the user. Validation functions are called every time a user enters any input in the transform table, even if the field that's being validated didn't change. This is done in order to facilitate contextual validation, where the validity of a field depends on the values of other fields.

In other words, the data validators detect potential values that are incorrect to ensure data is correctly formatted, mapped, and ready to use every time.

Automate Your CSV Cleanup Without Code

It's possible to quickly clean the data in CSV files with no-code data transformation tools like Osmos. Using "human in the loop" functions, such as column mapping, examples, and formulas your customers can send you clean data without writing any code. No more, "human clicking and editing one cell at a time.” Ultimately, no-code data transformation improves data quality and accuracy no matter the file size, format, or how complex it is.

The Definitive Guide to Data Onboarding

Discover the benefits and challenges of customer data onboarding, how to create an efficient process, and see how the right approach can accelerate business growth.

view the GUIDE

Katrina Kirsch

Marketing