Data Science Challenges & Impact @ Lazada

I was recently invited to share at the Big Data & Analytics Innovation Summit on Data Science at Lazada. There were plenty of sessions sharing on potential use cases and case studies based on other companies, but none on the challenges of building and scaling a data science function. Thus, I decided to share about some of the challenges faced during Lazada-Data’s three-year journey, as we grew from 4 – 5 pioneers to a 40-ish man team.

In a nutshell, the three key challenges faced were:

  • How much business input/overriding to allow?
  • How fast is “too fast”?
  • How to set priorities with the business?

How much business input/overriding to allow?

How do we balance the trade-off between having business and people providing manual input, vs machine learning systems that perform decision making automatically? Business input is usually in the form of rules or manual processes, while machine learning—when in production—is usually via a black box algorithm.

Before the data science team came to be, processes were done via rules or manual labour. E.g., rules (usually regex) to (i) categorize products, (ii) determine fraudulent transactions, or (iii) redirect users to specific pages based on their search terms. However, this approach was not scalable in the long run.

With the data science team helping with their “black box” algorithms and machine learning systems, the business had to get used to having those task automated. While there were several stakeholders that embraced the automation and freeing up of manpower, some resisted. Those that resisted wanted to retain control over business processes, usually through manual input and rules, as they believed the automated systems were inferior in some aspect. There was also the fear of being made redundant.

Our experience has been that manual input to override algorithms and systems is necessary to some extent, but harmful if overdone (example coming up next). In addition, rules are difficult to maintain! When you have more than 1,000 rules in each domain, who will maintain and QA them daily to check if they still make sense, are applied correctly, and lead to the desired outcomes?

Continue reading