Data Pipelines

Automate Product Data Catalog Ingestion with No-Code ETL Pipelines

Written by 
JD Prater
April 20, 2022

Keeping product catalog listings accurate, engaging, and up to date simultaneously is one of the biggest challenges across the modern supply chain.

Today’s retailers aren’t just racing to keep up with customer expectations, pricing trends, and competitors. They’re also striving to provide product catalog information across multiple offline and online sales channels such as ecommerce websites, social media platforms, online marketplaces, comparison shopping engines, and brick and mortar stores.

And don’t forget all the work it takes to ingest and populate your product information from different suppliers and distributors.

To be successful in 2022, you need an optimized and streamlined flow of product data at each point across your supply chain in order to populate and distribute your product data catalog.

What is a product data catalog?

It’s very similar to finding a book at a library. You use their library catalog of books to discover whether the book is there, which edition it is, where it’s located, a description—everything you need so that you can decide whether you want it, and if you do, how to go and find it.

If you haven’t been to the library in a while, think about all the information Amazon lists about a book. It includes attributes like book title, author name, publisher's name, number of pages, a brief snapshot of the book, price, dimensions, offers/discounts, reviews, and more.

All those details likely came from a product data catalog.

product data catalog

Simply put, a product data catalog is a detailed list of your inventory. This structured data file contains a list of your products and their corresponding attributes such as imagery, descriptions, specifications, availability, pricing, shipping info, etc.

For example, each row in your catalog may represent an individual product (unique size/color variant). And each column would include the product attributes such as title, description, image, price, etc.

By using a product data catalog, you can make sure that your customers are always able to see the most up-to-date information about the products they’re interested in. And it’s even better if you can easily access the same up-to-date information at any point in your own eCommerce software.

The challenges of ingesting and distributing product data catalogs

With so much data being presented to buyers, you have a responsibility to ensure your customers can easily find what they are looking for so they can procure the best quality products when and how they desire.

ingesting and distributing product data catalogs

The more products you sell, the harder it is to manage an accurate product catalog. Especially, when you’re ingesting and aggregating product data from a variety of suppliers, manufacturers, and distributors. It’s a common problem across the entire supply chain.

Here are three of the most common problems.

1. Ingesting clean data from multiple sources

Let's say you’re a technology distributor with hundreds of partners. And you need to aggregate 100s of data sources, but each partner provides their data in a shape and format. 

  • Partner #1 sends you data with 100 fields, call this shape #1). It's uniquely structured so height, width, and length are in separate columns. 
  • Partner #2 sends their data in a specific way, shape #2. 
  • And Partner N sends their data in shape N.
Ingesting clean data from multiple sources

This is all product data you care about, because it feeds your sales channels like your eCommerce site, marketplace, price sheets, etc. However, you need all these sources ingested into your product tables and formatted in your golden shape to match the destination schema.

How do you ingest 100s of partner product catalogs into your golden shape?

ingest 100s of partner product catalogs into your golden shape

You need to figure out a way to format each data source, make sure it fits your schema, and then send it to your product tables or your applications for your customers, sales teams, and resellers to use. They're all dependent on this one golden dataset.

The problems you’re likely to run into: 

  1. Hundreds of product data variations coming from different sources
  2. You need to validate it to ensure all the details correct
  3. You need to cleanup the data to match your schema
  4. You need to load it into your destination system
  5. Keeping in mind that change is the only constant. Every time you add a new partner, how do you get their data into your golden shape? And what happens when companies change fields, etc?

2. Accurate product catalog data

A product data catalog is an integral part of sourcing and procurement throughout the supply chain. It provides sourcing teams with the right information needed to make purchasing decisions.

Another challenge when drawing your product catalog from multiple sources is accuracy. Some product data is created in-house, while other information is acquired from third parties like manufacturers and suppliers. Either way, all these data sources need to be validated and adjusted to make sure the product catalog data is always accurate and of a high quality.

For ordering and fulfillment, you need access to accurate product data catalogs from every distributor you work with. When a large customer order comes in you need access to product data, pricing data, etc in order to fulfill it in a timely manner. If your data catalog is inaccurate or stale that’s going to impact the business.

3. Manual processes slowing you down

A manual process becomes enormous to manage and prone to human errors. This impedes your ability to scale new data sources and sales channels. 

Without a streamlined process, you have to manually check supplier product data for errors, omissions, or low-quality images. Then contact the supplier to address the issues, significantly slowing down time-to-market and increasing the costs of doing business. 

4. Distributing your product data catalog

On the output side, each sales channel has its own set of attributes and feed specifications making it difficult to optimize and streamline the flow of product data across different sales channels.

Distributing your product data catalog

If you have a large online inventory, manually updating product data, prices, and in-stock items is unthinkable. You need a way to ingest your product catalog data without the risk of human errors.

How to solve this ingestion and distribution problem

Solution #1 = ELT + manual data cleanup

One way we’ve seen companies solve this problem is to ingest each partner’s messy data into a specific table. So partner #1 to table #1 partner N to table N using an ELT. Then you’ll need data engineering resources to write SQL queries and/or Python scripts to normalize the data, handle errors, and get it to your golden shape.

ELT + manual data cleanup

You can get by with this method for your top partners. But for the long tail it gets painful and more pronounced. This process doesn’t scale, because 1) it’s very time consuming, 2) relies heavily on data engineers and op teams, 3) is very expensive to maintain, and 4) long lead times results in stale data. In today’s rapid and competitive market, that means lost sales.

Solution #2 = Automate the importing, normalizing, and unifying of product data feeds with Osmos

Osmos automats your product catalog data ingestion and cleanup problems with automated data pipelines and self-serve data uploaders. Now you can empower your partners, distributors, and suppliers to share data how they want, while you control how it gets ingested.

Simplify product catalog data ingestion with Osmos

With Osmos Pipelines you can automatically bring in partner data with our FTP, API, and email connectors. Or, you can provide your partners with a self-serve data upload experience with Osmos Uploader.

Then our no-code data transformations validate, cleanup, and map each partner’s product catalog data to match your destination’s schema.

Once the data is clean, we send it to your API, DB, PMI, App, etc. in one clean, golden shape. All ingested data is validated and ready to be consumed.

product catalog data ingestion and distribution with Osmos

You can also automate the distribution of your product data catalog to your sales channels with Osmos Pipelines. By using prebuilt connectors, APIs, or custom integrations optimize the flow of your product data across different sales channels.

Our low-code External Data Platform is built for agility and simplicity making implementation straightforward. With very little engineering time, you can add new partners in minutes with Pipelines and handle the long-tail with a self-service Uploader. Plus, we can keep your teams updated with monitoring, notifications, and alerting about any errors making management a breeze.

Should You Build or Buy a Data Importer?

But before you jump headfirst into building your own solution make sure you consider these eleven often overlooked and underestimated variables.

view the GUIDE

JD Prater