Product Categorization API Part 3: Creating an API

This post is part 3—and the last—of the series on building a product classification API. The API is available for demo here. Part 1 and 2 are available here and here.

In part 1, we focused on acquiring the data, and cleaning and formatting the categories. Then in part 2, we cleaned and prepared the product titles (and short description) before training our model on the data. In this post, we’ll focus on writing a custom class for the API and building an app around it.

The desired end result is a webpage where users can enter a product title and get the top three most appropriate categories for it, like so.

screen-shot-2016-10-23-at-18-22-42

Continue reading

Advertisements

Image search is now live!

After finishing the image classification API, I wondered if I could go further. How about building a reverse image search engine? You can try it out here: Image Search API

What is reverse image search?

In simple terms, given an image, reverse image search finds other similar images—this would be helpful in searching for similar looking products.

How do I use it?

“My son has this plushie he really likes, but I don’t know what the name is… How can I find similar plushies?”

b003cth3tw

Continue reading

Product Classification API Part 2: Data Preparation

This post is part 2 of the series on building a product classification API. The API is available for demo here: datagene.io/categorize_web. Part 1 available here; Part 3 available here.

In part 1, we focused on data acquisition and formatting the categories. Here, we’ll focus on preparing the product titles (and short description, if you want) before training our model.

Continue reading

Image classification API is now live!

After toiling for a few months on this, product image classification is now live on Datagene.io! While the product classification API works with product titles, the image classification API works with product images, though only for fashion.

Some facts about the image classification API:

  • Works best with e-commerce like fashion images (as that’s what it was trained on)
  • Top-1 validation accuracy: 0.76; Top-5 validation accuracy: 0.974
  • Returns results under 300 milliseconds (will be faster in batch mode with GPU)
  • Built on Keras and Theano, and runs on a tiny AWS server without GPU.

Continue reading

Product Classification API Part 1: Data Acquisition and Formatting

To gain practice with building data products end-to-end, I recently developed a product classification API. The API helps classify products based on its title—instead of figuring out which category your product belongs to (out of thousands), you can provide the title and the API returns the top 3 most likely categories. The API is available for demo here: datagene.io/categorize_web.

screen-shot-2016-10-23-at-18-22-42

In this series of posts, I’ll share about the process of building such an API, including:

  • Data acquisition and formatting (this post)
  • Data cleaning and preparation (part 2)
  • API development (part 3)

Continue reading