Building a Recommendation Engine in Java
Why We Chose Java, Other Technologies that Made the Cut, and Our Starting Methodology
While shopping online, have you ever wondered how Amazon knows your favorite books, makeup, or shoes? Amazon uses a recommendation engine (typically with machine learning or artificial intelligence) to suggest products that it thinks you would be interested in buying.
In other words, “Product recommendation is basically a filtering system that seeks to predict and show the items that a user would like to purchase. It may not be entirely accurate, but if it shows you what you like then it is doing its job right.”
Recommendation engines have become increasingly popular in recent years and are often used to recommend movies, music, news, books, and other products. They’re also utilized to suggest research articles, search queries, and social tags.
Non-E-Commerce Uses for a Recommendation Engine
While many of our customers have a B2C component, there are still many uses for recommendations outside of e-commerce platforms. For example, imagine your company has a large product catalog, you can use a recommendation engine to help your sales team find additional services or products to suggest.
With the versatility and potential impact of using a recommendation engine internally or externally, what better way to showcase our team’s Java skills than to build our very own version. In the remainder of this article, we’ll describe some of the supporting technologies we used to build our Java-based recommendation engine and key decisions we’ve made so far.
For more insights, check out our case studies
Recommendation engines come in all shapes, sizes, and languages. Oftentimes, you may see one engine that works great for e-commerce-based solutions but that doesn’t work well when making movie recommendations. When deciding on the ideal engine for our client, we evaluated a variety of well-known technologies and selected the best ones for our needs.
Java vs. Python
Both Java and Python offer many advantages, but they differ in several ways. Python is one of the world’s most popular programming languages, and it’s easy to learn. Plus, Python doesn’t require a lot of complex programming skills often associated with object-oriented languages. It’s also the program of choice for building AI and machine learning applications – primarily because of the ease with which it handles mathematical applications.
However, Java contains many strengths as well. Java is a general-purpose programming language that contains a multitude of possible integrations and applications. Java is very stable and has lots of resources, libraries, and frameworks, and it has an extremely robust and long-established user community.
In addition, Java is hard to beat for end-over-end value in building complex enterprise applications. Java is a true workhorse throughout the industry, and it’s maintained an excellent reputation for many years.
Another important factor influencing our decision, and perhaps the most important for our purposes, is that the client’s project was based in Java. Obviously, we were biased in that direction.
Furthermore, we wanted to demonstrate that Java is a great language for building recommendation engines, and as we’re discovering, Java is second-to-none and works every bit as good, if not better, than Python as a recommendation engine code base.
If you’re curious about additional services, check out what we can offer you!
Frontend: Angular 7 & Bootstrap
After evaluating several front-end solutions, we decided to build the look and feel of the recommendation engine with Angular 7. Our decision was fairly straightforward. We already had experience with Angular, and we also found that it was relatively easy for Java developers to learn Angular. Unsurprisingly, we selected Bootstrap as our CSS framework.
Data storage: PostgreSQL
After debate amongst our team members, we chose PostgreSQL for our data storage solution. Since complex relationships exist between the client’s product categories and the specifications for each category, we determined that a relational database was well-suited for this project.
A document-based database would require additional processing to search documents to find the ones that included the right product information, and it would then have to update that information accordingly. Since PostgreSQL has a great reputation for reliability, diversity of features, and performance, it was an easy choice. Plus, the database is free.
We predicted that our client’s recommendation engine would need to gather a lot of information about user activities and interests. For that reason, we also needed to select a document-based database that would scale and organize big data quickly and efficiently. Thus, we chose MongoDB, one of the world’s most popular NoSQL databases.
In order to make the recommendation engine easy to deploy using microservices, we created a modular approach, so we wouldn’t be limited by numerous dependencies or impacts from other services. Since we were already building our client’s application within a microservices environment, we wanted to build the engine with the same containerized capabilities using Docker.
For the actual “nuts and bolts” of the recommendation engine, we leveraged Kaggle. Owned by Google, Kaggle is the world's largest data science community with thousands of ready-made datasets and algorithms. Kaggle saved us a significant amount of time since we didn’t have to create our own engine from scratch.
TensorFlow is an open-source, machine learning library that will enable us to more accurately match certain lists with each other and provide better recommendations. It requires a math background to understand the inputs and processing of the AI solution, and we’re exploring ways to integrate it into our recommendation engine.
Collaborative Filtering vs. Segmenting Users
While building our engine, we had to choose between two different approaches to filtering data: collaborative filtering or user segmentation. Collaborative filtering works by comparing a large group of people to find users that share similar tastes with another user. It compares the items they like and combines them to create a ranked list of suggestions. This approach requires integration with AI and a machine-learning approach called “Deep Learning.”
We eventually decided to use user segmentation as our filtering approach. User segmentation identifies user interests based on unique characteristics, needs, preferences, and tastes. Each user interest is assigned a particular “weight,” which is then used to calculate a recommendation.
We implement the right code for the job. Best practices for new features from Java
For instance, in the context of movie-watching, a user’s watch action has more value than simply adding a movie to their watchlist. Although someone may add a movie to their watchlist 100 times, the movie will still be discarded from future recommendations if they only watch it once.
For our client’s product catalog, we will segment based on user interest, so we can show the right product recommendations at the right time. While this approach doesn’t require AI integration, we’d like to eventually adopt a more AI-centric approach to train our applications and improve the engine with the latest machine-learning technologies.
Generic or Customized?
Another question that we had to answer was whether to make our engine generic or customized. Our original vision was to build a solution that would be focused specifically on our client’s product catalog.
As things progressed, it became apparent we should be more generic in our approach. By doing so, we’d learn more in a short amount of time by not being locked into a specific industry-focused solution. Therefore, we decided to adopt an already available movie database as our testing ground. This strategy would allow us to extend our learnings across multiple industries as opposed to focusing on just one specific industry.