Data Pipelines

Automate Product Data Ingestion with Low-Code ETL Pipelines

Written by 
Naresh Venkat
April 20, 2022

The retail landscape changes by the minute. How you handle product catalog data ingestion can be the most important variable in your supply chain management strategy. 

As competitors race to keep their product information updated and accurate, how can you ensure your organization is positioned for success?

In highly competitive markets like eCommerce, you’ll need to provide a streamlined flow of product data across multiple offline and online sales channels. From your website to social media platforms, online marketplaces, comparison shopping engines, and finally to your brick-and-mortar stores, your product data must move effortlessly and without error.

What is Product Catalog Data, and Why is it Crucial to Your Business?

Product catalog data is a structured data file that includes every attribute necessary to purchase and sell the products you carry.  

It should contain a list of your products and their corresponding attributes, such as descriptions, specifications, availability, pricing, shipping info, etc.

For example, each row in your catalog may represent an individual product (unique size/color variant). Each column would include product attributes like title, description, image, price, etc.

With so much data being presented to buyers, you have a responsibility to ensure your customers can easily find what they are looking for so they can procure the best quality products when and how they desire.

The Challenge: Product Catalog Data Ingestion

The more products you sell, the harder it is to manage an accurate product catalog. Especially when you’re ingesting and aggregating product data from a variety of suppliers, manufacturers, and distributors, it’s a common problem across the entire supply chain and across industries.

Here are four of the most common product data ingestion problems you might encounter

1.  Gathering Diverse Product Data from Multiple Sources

Let's say you’re a technology distributor with hundreds of partners. You need to aggregate 100s of data sources, but each partner provides their data in a shape and format. 

For example, Vendor #1 prefers to send you dimensions in three separate fields (Length, Width, and Height), while Vendor #2 sends it as a single field (L x W x H).

All this information is crucial for your operations, as it fuels your sales channels, such as your eCommerce platform, marketplaces, and channel price sheets. However, to achieve this, you must standardize all this data into your product tables to align with your internal product schema.

How do you go about importing hundreds of vendor and partner product catalogs and aligning them with your destination schema?

The process involves formatting each data source to match your schema, ensuring compatibility, and then forwarding it to your product tables or applications for utilization by your customers, sales teams, and resellers.

Graphic representing importing partner product data into your destination schema

The challenges you’ll likely run into include: 

  1. Dealing with a multitude of product data variations from diverse sources
  2. Ensuring data validation to guarantee accuracy
  3. Performing data cleanup to align with your schema
  4. Loading the data into your destination system
  5. Handling the inevitable changes to the vendor’s product schema

2. Accuracy of the Product Catalog Data

A product data catalog is an integral part of sourcing and procurement throughout the supply chain. It provides sourcing teams with the right information needed to make purchasing decisions.

Another challenge when compiling your product catalog from multiple sources is maintaining accuracy. Some product data is originates in-house, while other information is sourced from third parties such as manufacturers and suppliers. Regardless of the source, it’s crucial to validate and fine-tune all this data to ensure that the product catalog consistently maintains high-quality and accurate information

For ordering and fulfillment, you need access to accurate product data from every vendor you work with. When handling customer orders, timely access to product data, pricing information, and more is vital for efficient fulfillment. Any inaccuracies or staleness in your data catalog can have a significant impact on your business operations.

Graphic showing aggregation of product data
 Osmos is data ingestion simplified

3. Sluggish, Manual Processes

Relying on manual processes can become unwieldy, leading to an increased risk of human errors. This hampers your capacity to expand into new data sources and sales channels.

In the absence of an efficient workflow, you’re forced to manually inspect supplier product data for errors and omissions. Subsequently, you need to communicate with suppliers to rectify these issues, which significantly extends time-to-market and escalates operational costs.

4. Distributing Your Product Data Catalog

On the output side, each sales channel has its own set of attributes and feed specifications, making it difficult to optimize and streamline the flow of product data across various sales channels.

Distributing your product data catalog
Osmos de-risks data ingestion

If you have a large online inventory, manually updating product data, prices, and in-stock items is unthinkable. You need a way to ingest your product catalog data without the risk of human errors.

How to Solve the Product Data Ingestion Problem

Solution #1 = ELT + Manual Data Cleanup

One way companies address this challenge is to ingest each partner’s messy data into dedicated tables. For example, Partner 1’s data goes to Table 1, and Partner N’s data goes to Table N using an extract, load, and transform (ELT) process. Then data engineering resources are needed to write SQL queries and Python scripts to normalize the data, handle errors, and align it with your destination schema.

Graphic representing data clean up
ELT vs. Osmos: Your product ingest table reigns supreme with Osmos

Osmos customer and eCommerce distribution platform, Quartzy, deployed this solution before finding Osmos. As their business rapidly expanded, managing and ingesting third-party data became increasingly challenging.

Chloe Larson, Associate Product Manager, and her eCommerce Ops team at Quartzy even taught themselves Python to create custom scripts for each vendor, which quickly became cumbersome and unscalable.

Although this method may work for some top-tier partners, it becomes painful and more pronounced for your long tail  partners.

This process doesn’t scale because

  1. It’s very time-consuming, 
  2. It relies heavily on data engineers and operations teams, 
  3. It is very expensive to maintain 
  4. Long lead times result in stale data

Chloe noted that “using custom scripts to ingest product data from dropship vendors means that if a script dies or has an error, you could end up importing out of date pricing data to your product catalog.” In today’s fast-paced and competitive market, that means lost sales.

Solution 2:  Simplify Workflows with Automated Product Data Ingestion

The best solution automates your product catalog data ingestion and problems. This solution should empower your operations team to independently ingest data and reduce reliance on technical resources. Any solution enabling this should ensure that your “long-tail” partners, vendors, distributors, and suppliers can share data in their preferred format while you maintain control over the ingestion process.

With  streamlined solutions like Osmos, you can bring in partner data through a variety of sources, including FTP, API, or even email. Or, you can provide your partners with a self-serve data importer to simplify the data-sharing process. 

The most important feature of a data ingestion tool is its data transformation capabilities. AI-powered data transformation eliminates the need for manual data cleanup. Osmos, in particular, offers data transformation features that seamlessly validate, cleanup, and map each partner’s product catalog data to match your destination schema.

Animation representing automated data ingestion
Discover seamless data transformations with Osmos

Another key feature to consider when evaluating an automated solution is the ability to perform lookups and joins to identify things like category IDs against certain product types or consolidate a broad range of attributes into a predefined list of attributes.

The ability to organize datasets has, to date, been a task executed by those proficient in SQL because when attempting it in Excel, you are constrained by limited file sizes.

As your business grows, it’s imperative your operations team has a platform where they can load SKUs, quantities, order numbers, and sale prices. They need to be able to perform MSRP lookups from the product catalog and consolidate discount data in a single location. Once the data is clean and validated, it needs to be sent to your API, database, or product information management (PIM) system.

You can automate the distribution of your product catalog data to your sales channels with Osmos. Leveraging pre-built connectors, your operations team can streamline the flow of product data across sales channels with Osmos.

Adding partners should take minutes with minimal engineering effort. Osmos keeps your teams updated with monitoring, notifications, and alerts about any errors making data management a breeze.

Make Data Ingestion Easy, Fast, and Error-Free with Osmos

We set out to create a world where anyone can work with data regardless of technical proficiency. Osmos’s leading-edge tools empower teams to independently accelerate data ingestion processes while improving data quality, giving businesses the freedom to rapidly activate customers and partners by automating their messiest data cleanup tasks. 

Backed by world-class investors like Lightspeed, Kin Ventures, CRV, Pear, and SV Angel, Osmos helps democratize data ingestion for global enterprises, including Netflix, Rakuten, and Harman, among others.

Should You Build or Buy a Data Importer?

But before you jump headfirst into building your own solution make sure you consider these eleven often overlooked and underestimated variables.

view the GUIDE

Naresh Venkat

Co-founder and COO