Creating a Dataset of 5 Million SKUs for an
E-Commerce Platform Using CosmosDB

Our client operates a small San Diego area family business that specializes in building and selling manufactured products for the personal, residential, commercial, and government sectors. We were introduced to this business in 2018 and it was then that we learned about “The Dream.”  

The CEO had a unique business vision. He wanted an elegant e-commerce website that would enable shoppers to build and order online a customized solution within a minute. The company sells manufactured products and thousands of options are available. This new e-commerce site would simplify the purchase process of allowing customers to choose their own body size, material, color, and much more.

This business vision followed the increasing trends today of enabling consumers to build their own products online. For instance, how many young people love Tesla where they can go online to customize a virtual car in their preferred make, model, and color. Dreams are powerful motivators!


A comparable customized build site had never been done before for this client and there were a number of immediate challenges. Our client’s data consisted of over 5 million stock-keeping units (SKUs) of uncomputerized data.  

The company didn’t have a large technical team and there was no clear roadmap. Nevertheless, the CEO had dreamed of this project for a long time and believed it was possible. He had already tried once to move this project forward, but unfortunately the first software vendor had failed to deliver.  

This is a story of how we stepped in, built a very difficult dataset from scratch, and delivered a stunning modern e-commerce website that successfully brought the CEO’s business vision to life.    

We offer more than test e-commerce. Check out what we can offer you!

Why This Project?

Every project we accept represents a unique story about the customer, their background, and their aspirations as a company. Sometimes we have to turn down projects because it’s simply not a good match. However, when we met the CEO, his passion and business vision were both palpable and compelling.  

He had a single-minded focus: how to make his industry more customer-friendly, seamless, and efficient. The solution was to offer shoppers a customized e-commerce portal for fully customizing their own products.

This was not going to be an easy project; that much was immediately obvious! In fact, the best projects never are. But there are several reasons that made this opportunity so compelling and why we determined to take it on. 

The Challenge 

The biggest and most immediate obstacle we faced was how to create the dataset and place all the product attributes into an online format. The information available to us was a phonebook sized product catalog with over 5 million SKUs – not to mention the product information in the heads of the sales team.  

We had to be honest with ourselves – this project posed some unique challenges. There were high-level business requirements, a large dataset, and a lot of unknowns. But with our Agile experience, we believed we could pull it off. 

The Vision 

The CEO was really the driving force for our decision to work with this company. He was a consummate professional and visionary whose passion to serve his industry inspired us to want to step in and help him achieve this dream.  

The Opportunity 

Our customer was extremely cautious—and with good reason. This was a fairly large custom e-commerce website and a substantial investment for a small company. Their previous experience with a software vendor was less than ideal and delayed bringing the CEO’s dream to life.  

We knew we could nail this project, but we also needed to show we were different. We saw it as a unique opportunity to prove ourselves in a new industry where we had to solve some unique business and technical challenges. There were plenty of unknowns of course, but one thing was certain: we knew our capabilities and our dedication to our client’s success would be unparalleled. 

How We Did It

The biggest challenge in building this customized e-commerce site was the dataset. All the important assets were in print form. The first task was how to build the online database containing all the various products and hundreds of attributes, such as body material, color options, engraving, and much more.  

We had not built this kind of dataset before. Most of the time, we receive datasets for projects that are fairly uniform and tabular with everything assigned to a certain set of attributes. Doing this manually was out of the question; that would require entering 5 million SKUs as line items in an Excel file – not a good use of time! Not to mention, in the long run, this was potentially not a sustainable model. We knew there had to be a better way and we were up to the challenge of finding out how. 

Our technical lead understood how critical this project was to the client and fully invested himself and our team to get the job done right. He learned the business inside and out by studying their product catalog and thoroughly interviewing each of the sales staff. In the process, he gathered a mountain of information to make sure he had all the pieces to orchestrate the project successfully. 

Here are the steps we took and questions we asked:

Show me the data and how you differentiate between products? 


The best resource we had was a printed catalog with 5 million product stock-keeping units (SKUs) – and the subject matter expertise of the sales team who knew that catalog inside out. During his first visit to kick off the project, our technical project lead (TPL) spent time getting to know the business, building relationships with the sales team, and earning their trust.  

Organizing the product families


The next step was to identify the product families. This was accomplished by interviewing various sales team members and using a rigorous system of classification – along with some cleverness.  


Identifying the product attributes

To understand and design the dataset, our technical lead dedicated the majority of his first three weeks to onsite interviews with the client sales team. His goal was to identify common attributes for the product families. One type of product family in the "Monkey Series,” for example, has the common attribute of the material: steel. From that attribute, the remaining products on that document are all identified as permutations based on other factors such as size, type, color, etc.  

Categorizing the product components


This step involved drilling down and understanding the vast number of product features and their differences. Besides the bulky catalog of 5 million SKUs, the best resource was the sales team. If the technical lead described information he needed, a salesperson would go to the catalog with some attributes in his mind and find a match. In other words, much of this exercise was understanding how the product components were modeled in the brains of the sales team. 

Family of products that only differ in 3-4 attributes


As we went along in this project, certain patterns began to emerge. A particular series related to material—steel, for example—might have four products available and one of those four has a certain feature on and another feature off, such as engraving. By the end of the three weeks, we knew a lot more about these products than we ever thought possible! 

"This dataset allowed a consumer to select—and our customer to manufacture—a customized product that had never been bought or sold before!"

Three Weeks and Thirty-Three Documents Later

Three weeks of discussion with the sales team identifying product families, attributes, and categorizing those attributes yielded a set of 33 documents with the following characteristics:  


One document had 2.5 million SKUs. If one attribute is changed, then the permutations of this product also change. In other words, if you had 10 attributes, each permutation of these 10 attributes would equal 2.5 million SKUs.  

The remaining 32 documents didn't have enough common attributes to be placed into just one document. 



Overall, we identified 53 families of products. Each family has a certain set of attributes and each attribute has a set of options that this attribute can be set to.  

Due to the lack of uniformity of the dataset, our team felt that a NoSQL database was the best option for storing all of this data. In order to condense 5 million SKUs down into just 33 NoSQL documents, our team used Azure CosmosDB, a schema-agnostic, document database. The team could have used any non-SQL DB but chose Cosmos because they were already accustomed to the Azure environment. 

In order to computerize the data, our customer technical lead still had to manually type the information into the database. After every modification, the data also had to be validated. Needless to say, there was a lot of trial and error involved throughout this process.

Once we finally had our data schema in place, it was time to find out if it held up against the ultimate test: the sales team. They knew the products better than anyone else. Our team relentlessly racked their brains—and the catalog—for the most obscure and unique products during the validation process. In the end, our data schema prevailed, but not without trial and error. After three weeks and countless iterations, we were happy to say we out-customized the sales team!   

What We Delivered

The biggest lift of this project was the dataset. Once we condensed the 5 million SKUs into 33 documents, the rest of the project was relatively straightforward. We built our UI using Angular 7 and then used .NET Core and CosmosDB on the backend. Check out our case study to learn more.   

The result of all this effort was an elegant, seamless e-commerce website, which made the process of ordering the client’s products easier than ever. In fact, this dataset allowed our a consumer to select—and our customer to manufacture—a customized product that had never been bought or sold before!  

Let's get together. Set up your free technical consultation and find out how we'll bring our vision, mission, and values to successfully execute your next project. 

We accomplished a formidable task of distilling 5 million SKUs into 33 documents. It stretched us as a team but ultimately made us stronger. 

We created a robust but flexible dataset; attributes can easily be turned off and on and it’s easy to enter additional products to the dataset. 

The intense customization of this dataset and website helped our client’s sales team to discover products and sell products they didn’t even know they had in their own catalogs. 

This was truly a one-team approach. Despite needing to build and document requirements, change requirements, and architect a whole data schema, we were able to stay within the client’s budget and meet our deadlines.   

The Benefits 

There were a number of obvious benefits that grew out of our engagement with our client. 

The Takeaways  

What seemed like an impossible task turned into one of our most rewarding projects ever. We were asked to design a seamless e-commerce website to enable shoppers to build a customized product within a minute. How could we step in and add value? The client was extremely cautious—and with good reason. It was a sizeable investment and their previous experience with a software vendor was less than ideal. Not to mention, all we had to work with was a telephone-book sized product catalog containing 5 million SKUs.

In the end, we made the right choice to work with this client. Despite the challenges, it was entirely worthwhile to see their dream come to life before their eyes. We could not have asked for a more committed, passionate team to work with. We were so inspired to witness the dedication and commitment they brought to this project.

This is what we are all about! Making our clients happy by building elegant, seamless web and mobile experiences to strengthen their businesses is what gets us out of bed every morning. Knowing that together we’re helping to make the world a better place drives our team to relentlessly pursue excellence each and every day.   

Related Content

We helped our cautious, small business client, successfully build their dream software project. Read more here.
We built a custom recommendation engine from scratch using Java and several other technologies. Click the link to learn how we did it. 
Subscribe to our newsletter!

We've been in the software industry for  30+ years so we have a lot to share with you!

Follow US

Address: 16870 W Bernardo Dr, Suite 250

San Diego, CA 92127

Phone: +1 858.731.8700

© 2021 Integrant, Inc. All Rights Reserved