Data Science and Agile (Frameworks for effectiveness)

This is the second post in a 2-part sharing on Data Science and Agile. In the last post, we discussed about the aspects of Agile that work, and don’t work, in the data science process. You can find the previous post here.

A quick recap of what works well

Periodic planning and prioritization: This ensures that sprints and tasks are aligned with organisational needs, allows stakeholders to contribute their perspectives and expertise, and enable quick iterations and feedback

Clearly defined tasks with timelines: This helps keep the data science team productive and on track, and being able to deliver on the given timelines — the market moves fast and doesn’t wait.

Retrospectives and demos: Retrospectives help the team to improve with each sprint, and provide feedback and insight into pain points that should be improved on. Demos help the team to learn and get feedback from one another. If stakeholders are involved, demos also provide a view into what the data science team is working on.

What about aspects that don’t work well? And how can we get around them?

Difficulty with estimations: Data science problems tend to be more ill-defined, with a larger search space for solutions. Thus, estimations tend to be tricker with a larger variance in error. One way around this is to have budgets for story-points / man days, and to time-box the experiments.

Rapidly changing scope and requirements: The rapidly evolving business environment may bring with it constantly changing organizational priorities. To mitigate this, have periodic prioritisations with stakeholders to ensure alignment. This also helps stakeholders better understand the overhead cost of frequent context switching.

Expectations for engineering-like deliverables after each sprint: Project managers and senior executives with an engineering background might expect working software with each sprint. This may require some engagement and education to bring about mindset change. While the outcome from each sprint may not be working code, they are also valuable (e.g., experimental results, research findings, learnings, next steps).

Being too disciplined with timelines: A happy problem is being too efficient and aligned with business priorities. Nonetheless, a data science team should be working on innovation. To take a leaf out of Google’s book, a team can build in 20% innovation time. Innovation is essential for 10x improvements.

How to adapt Agile for Data Science

In light of the points discussed above, how can we more effectively apply agile/scrum to data science?

Here, I’ll share some frameworks/processes/ideas that worked well for my teams and I — hopefully, they’ll be useful for you too. Namely, they are:

  • Time-boxed iterations
  • Starting with Planning and Prioritisation, Ending with Demo and Retrospective
  • Writing up projects before starting
  • Updated mindset to include innovation

Continue reading

Advertisements

Data Science and Agile (What works, and what doesn’t)

Since I last posted on moderating a panel on Data Science and Agile, some have reached out for my views on this. This topic is also discussed among the data science community, with questions on how agile can be incorporated into a data science team, and how to get the gains in productivity.

Can agile work well with data science? (Hint: If it can’t, this post, and the next, won’t exist.)

In this post, we’ll discuss on the strengths and weaknesses of Agile in the context of Data Science. At risk of irritating agile practitioners, I may refer to Agile and Scrum interchangeably. Nonetheless, do note that Scrum is an agile process framework, and there are others such as Kanban, etc. In the next post, I’ll share some agile adjustments and practices that have proven to be useful—at least in the teams I’ve led. Stay tuned!

Data science is part software engineering, part research and innovation, and fully about using data to create impact and value. Aspects of data science that work well with agile tend to be more of the engineering nature, while those closer related to research tends not to fit as well.

Continue reading

Data Science and Agile—can or not?

Recently, I was invited to moderate a panel on the topic “Data Science and Agile–can or not?” It’s a Singlish way of asking if Agile can be applied in the domain of data science. The panel was held in conjunction with GovTech’s inaugural STACK conference for developers, programmers, and technologists from the private sector.

7589883120_IMG_1347

Who was in the panel?

The panel involved the following guests, from right to left in the photo above:

  • Ivan Zimine: Physicist and neuroscientist who works on complex systems while applying open source and open practices.
  • Adam Drake: Formerly Chief Data Office at Skyscanner and Redmart, with an exemplary record in the design, development, and delivery of cost-effective, high performance tech teams and systems.
  • Steven Koh: Director of Government Digital Services at GovTech leading the Agile Consulting and Engineering team and evangelising agile development in the government.
  • Eugene Yan (that’s me as moderator): Formerly VP of Data Science at Lazada (acquired by Alibaba), currently Senior Data Scientist at uCare.ai.

Continue reading

Data Science Challenges & Impact @ Lazada

I was recently invited to share at the Big Data & Analytics Innovation Summit on Data Science at Lazada. There were plenty of sessions sharing on potential use cases and case studies based on other companies, but none on the challenges of building and scaling a data science function. Thus, I decided to share about some of the challenges faced during Lazada-Data’s three-year journey, as we grew from 4 – 5 pioneers to a 40-ish man team.

In a nutshell, the three key challenges faced were:

  • How much business input/overriding to allow?
  • How fast is “too fast”?
  • How to set priorities with the business?

How much business input/overriding to allow?

How do we balance the trade-off between having business and people providing manual input, vs machine learning systems that perform decision making automatically? Business input is usually in the form of rules or manual processes, while machine learning—when in production—is usually via a black box algorithm.

Before the data science team came to be, processes were done via rules or manual labour. E.g., rules (usually regex) to (i) categorize products, (ii) determine fraudulent transactions, or (iii) redirect users to specific pages based on their search terms. However, this approach was not scalable in the long run.

With the data science team helping with their “black box” algorithms and machine learning systems, the business had to get used to having those task automated. While there were several stakeholders that embraced the automation and freeing up of manpower, some resisted. Those that resisted wanted to retain control over business processes, usually through manual input and rules, as they believed the automated systems were inferior in some aspect. There was also the fear of being made redundant.

Our experience has been that manual input to override algorithms and systems is necessary to some extent, but harmful if overdone (example coming up next). In addition, rules are difficult to maintain! When you have more than 1,000 rules in each domain, who will maintain and QA them daily to check if they still make sense, are applied correctly, and lead to the desired outcomes?

Continue reading

Building a Strong Data Science Team Culture

I know, I know. I’m guilty of not posting over the past four months. Things have been super hectic at Lazada with Project Voyager (i.e., migrating to Alibaba’s tech stack) since last September and then preparing for our birthday campaign in end Apr. In fact, I’m writing this while on vacation =)

One of my first objectives after becoming Data Science Lead at Lazada—a year ago—was to build a strong team culture. Looking back, based on feedback from the team and leadership, this endeavor was largely a success and contributed to increased team productivity and engagement.

Why culture?

When I first joined the Lazada data team, we had 4-5 data engineers and data scientists combined. A year later, we grew to 16. After another year, we were 40-ish. During 1-on-1s with the team, some of the earlier team members raised concerns that our culture was being diluted as we scaled, and it “didn’t feel the same anymore”. Back then, different team members had different views of what our culture was.

In addition, during interviews, many candidates would ask about our culture—this was key in determining if Lazada Data Science was a good fit for them. Having a culture document available for sharing before interviews allowed candidates to learn more about us beforehand, and was more scalable (than answering questions at interviews).

Continue reading