Please find the project source code : project git repository
This project presents a comprehensive AI-driven recommendation system for Amazon apparel products, leveraging both text-based and image-based similarity measures to suggest visually and contextually similar clothing items. Designed using Python and state-of-the-art NLP and deep learning libraries, the system enhances product discovery through multi-modal data analysis—improving user engagement and product visibility on e-commerce platforms.
The development pipeline begins with parsing and filtering a large JSON dataset of over 180,000 fashion items, focusing on seven key attributes including product title, brand, color, and image URL. Data is cleaned by removing incomplete entries, and a deduplication process is applied in two phases to eliminate products with redundant titles differing only in size or color. Text preprocessing follows, which includes tokenization, stop-word removal, and vectorization using Bag-of-Words (BoW), TF-IDF, and IDF models. Word2Vec (average and IDF-weighted) embeddings are also used to capture deeper semantic similarity between product titles. For visual similarity, high-dimensional CNN-based features are extracted from product images using a pre-trained VGG-16 model.
These features are compared using cosine similarity to recommend visually similar products. The system also integrates brand and color metadata to improve the quality of recommendations through hybrid similarity scoring.
The final recommendation engine allows users to retrieve similar products using different algorithms—BoW, TF-IDF, Word2Vec, or CNN-based visual embeddings—each visualized via heatmaps and image grids. This solution provides a robust foundation for real-time product recommendation systems, offering an intuitive interface for exploring related fashion items. It significantly enhances the shopping experience by merging visual aesthetics and textual relevance, making it highly useful for e-commerce personalization, inventory management, and marketing optimization.