Product Classification API Part 2: Data Preparation

This post is part 2 of the series on building a product classification API. The API is available for demo here: datagene.io/categorize_web. Part 1 available here; Part 3 available here.

In part 1, we focused on data acquisition and formatting the categories. Here, we’ll focus on preparing the product titles (and short description, if you want) before training our model.

Continue reading

Advertisements

Sharing about my work in Lazada at Strata + Hadoop 2016

Recently, I had the opportunity to share about part of my work at Lazada—ranking products in catalog and search results to improve customer experience and conversion. Conference session details available here.

Here’s the deck presented. Any feedback on how we can improve our ranking framework, or how I can improve my presentation, is welcome.

Product Classification API Part 1: Data Acquisition and Formatting

To gain practice with building data products end-to-end, I recently developed a product classification API. The API helps classify products based on its title—instead of figuring out which category your product belongs to (out of thousands), you can provide the title and the API returns the top 3 most likely categories. The API is available for demo here: datagene.io/categorize_web.

screen-shot-2016-10-23-at-18-22-42

In this series of posts, I’ll share about the process of building such an API, including:

  • Data acquisition and formatting (this post)
  • Data cleaning and preparation (part 2)
  • API development (part 3)

Continue reading