What does a Data Scientist really do?

As a data scientist, I sometimes get approached by others on questions related to data science. This could be while at work, or at the meetups I organise and attend, or questions on my blog or linkedIn. Through these interactions, I realised there is significant misunderstanding about data science. Misunderstandings arise around the skills needed to practice data science, as well as what data scientists actually do.

Perception of what is needed and done

Many people are of the perception that deep technical and programming abilities, olympiad level math skills, and a PhD are the minimum requirements, and that having such skills and education qualifications will guarantee success in the field. This is slightly unrealistic and misleading, and does not help to mitigate the issue of scarce data science talent, such as those listed here and here.

Similarly, based on my interactions with people, as well as comments online, many perceive that a data scientist’s main job is machine learning, or researching the latest neural network architectures—essentially, Kaggle as a full time job. However, machine learning is just a slice of what data scientists actually do (personally, I find it constitutes < 20% of my day to day work).

How do these perceptions come about?

One hypothesis is the statical fallacy of availability. For the average population, they would probably know about data scientists based on what they’ve seen/heard on the news and articles, or perhaps a course or two on Coursera.

What’s likely to be the background of these data scientists? If it’s from this article on the recent Turing Award for contributions in AI, you’ll find three very distinguished gentlemen who have amazing publishing records and introduced the world to neural networks, backpropogation, CNNs, and RNNs. Or perhaps you read the recent article about how neural networks and reinforcement learning achieved human expert level performance, and found that the team was largely comprised of PhDs. If it’s from a course, the person is likely to have a PhD, and went through deep mathematical proofs on machine learning techniques. Thus, based on what you can think of, or what is available in memory, many people tend to have a skewed perception on what background a data scientist should have.

The same goes for what data scientists actually do. Most of the sexy headlines on data science involve using machine learning to solve (currently) unsolvable problems, everything from research-based (computer games) to very much applied (self-driving cars). In addition, given that the majority of data science courses are on machine learning, its no wonder that the statistical fallacy of availability would skew people towards thinking that machine learning is the be all end all.

Continue reading


Thoughts on CS6460: Education Technology

I recently completed the OMSCS course on Education Technology and found it to be one of the most innovative courses I’ve taken. There is no pre-defined curriculum and syllabus, though there are many videos and materials available. Learners have the autonomy and freedom to view the course videos and materials in any order and at their own pace. The course is focused around a big project, and learners pick up the necessary knowledge and skills as they progress on the project.

Here’s my thoughts on the course for those who are looking to enrol as well.

Why take the course?

One question I’ve asked myself (and close friends): “What do you think humanity needs most?” For Bill Gates, it was personal computing. For Elon Musk, it’s becoming a multi-planet species and clean vehicles and energy. Personally, my goals are not as lofty—I believe that humanity needs healthcare and education most. This belief, and the availability of these electives, was one of the key reasons I enrolled in OMSCS. Thus, I was elated to get a spot at the immensely popular EdTech course.

There are many rave reviews on how David Joyner is an excellent professor. His courses (i.e., human computer interaction, knowledge-based AI, and education technology) have great reviews and are notable for their rigour and educational value. He is also a strong proponent of scaling education (which I believe is one of the key approaches to improving education). Here’s his recent paper on scaling Computer Science education.

Being keenly interested on how I could use technology (and perhaps data science) to improve education and learning outcomes, I enrolled for the course in Summer 2018.

What’s the course like?

If you’re looking for a traditional post-graduate level course, you’ll not find it here. There is a surprising lack of obvious structure and step-by-step instructions. For some learners, they found this to be disorienting (initially), with some people getting lost along the way. For others, they found the course structure (wait, didn’t you say there’s no structure?) to be refreshing, allowing them to direct their focus and effort more effectively and learn more.

There’s no structure? What do you mean?

For a start, there are no weekly lectures. There is also no weekly reading list. Right from week 1, you’re immersed in the deep end. Your first assignment requires you to pick a few projects of interest, out of hundreds, and discuss them in an essay. There is a rich repository of curated videos, articles, and papers available from the first week, and you can view all of them in week 1, or none by the end of the course. This can feel like too much freedom for some learners, and slightly overwhelming.

Continue reading

Building a Strong Data Science Team Culture

I know, I know. I’m guilty of not posting over the past four months. Things have been super hectic at Lazada with Project Voyager (i.e., migrating to Alibaba’s tech stack) since last September and then preparing for our birthday campaign in end Apr. In fact, I’m writing this while on vacation =)

One of my first objectives after becoming Data Science Lead at Lazada—a year ago—was to build a strong team culture. Looking back, based on feedback from the team and leadership, this endeavor was largely a success and contributed to increased team productivity and engagement.

Why culture?

When I first joined the Lazada data team, we had 4-5 data engineers and data scientists combined. A year later, we grew to 16. After another year, we were 40-ish. During 1-on-1s with the team, some of the earlier team members raised concerns that our culture was being diluted as we scaled, and it “didn’t feel the same anymore”. Back then, different team members had different views of what our culture was.

In addition, during interviews, many candidates would ask about our culture—this was key in determining if Lazada Data Science was a good fit for them. Having a culture document available for sharing before interviews allowed candidates to learn more about us beforehand, and was more scalable (than answering questions at interviews).

Continue reading

My first 100 days as Data Science Lead

I recently passed my 100-day mark as Lazada’s Data Science Lead. Through this period, it wasn’t always clear what to do, or how to do it, in my new leadership role. I had numerous questions about how to transition from an individual contributor to leader, how to lead former peers, etc.

Several mentors, books, and articles provided guidance on how to transition successfully. Looking back on these first 100 days, here’s some things I did that were helpful.

Shift in mindset

As an individual contributor, I had the opportunity, and was expected, to know my project inside out. I was deeply involved in the technicalities, writing code, measuring impact, and gained immense technical satisfaction from this depth. In contrast, as a leader, I was expected to know all of the team’s projects in significant, though not necessarily complete detail, and get involved when necessary. I had to learn how to switch contexts quickly, and be comfortable with not knowing all the nitty gritty details.

In addition, my new role meant my peers now reported to me. I was aware of the burdens of leadership, such as no longer being able to share information that I previously could as a peer. Mentors also warned that former peers might be less chummy with me, due to the new reporting relationship. Thus, I had to change my thinking on my relationships with the team—we might not remain as close as before and this is a natural given the new leadership role. One mentor suggested that I connect more with peers at my new level to get advice and build more relationships.

Continue reading

Sharing at Singapore Management University on Data Analytics

The Singapore Management University Business Intelligence and Analytics Club approached me with a request to share about data analytics with undergraduates. These undergraduates–which were mostly from a non-technical background–had the following questions in mind:

  • What is data analytics?
  • Why data analytics?
  • How to pick up data analytics? (covered in a previous blog post)
  • How did I enter the data analytics field?

Here’s what I shared with them. Any feedback and suggestions for improvement welcome =)

My Sharing with Tech in Asia

Recently, Christopher, Managing Partner at Tri5 Ventures, reached out for an interview about “The Life of a Data Scientist”. The intent is to share knowledge and insight with people aspiring to enter the field, or those currently practicing data science.

Screen Shot 2017-07-26 at 18.06.30

The article was published a week ago on Tech in Asia and can be found here: “4 Singapore-based data scientists share how data has been impacting lives”. It covers data science professionals across multiple backgrounds, including researchers, entrepreneurs, and startups.

A few people have asked if I could build on what was shared in the article, so I’m reproducing my complete responses to Christopher here.

Continue reading

How to get started in Data Science

More than a handful of times have I been asked about how to get into the field of data science. This includes SMU’s Master of IT in Business classes, regular DataScience SG meet ups, and requests via email/linkedin. Though the conversations that follow differ depending on the person’s background, a significant portion is applicable to most people.

I’m no data science rockstar. Neither am I an instructor that teaches how to get into data science. Nonetheless, here’s some previously shared advice on “How to get started in Data Science”, documented here so it can be shared in a more scalable manner.

What this post will (not) cover

This post will focus on the tools and skills (I find) essential in data science, and how to practice them. Every organization has different needs, and what’s listed is largely based on Lazada’s data science stack and process. Nonetheless, they should be applicable to most data science positions. These should be viewed as minimum thresholds, and they do not necessarily predict success in data science. They are:

  • Tools: SQL, Python and/or R, Spark
  • Skills: Probability and Statistics, Machine Learning, Communication
  • Practice: Projects, Volunteering, Speaking and Writing

This post will not cover character traits, personalities, habits, etc. While there are some traits I find strongly correlated with success in data science (e.g., curiosity, humility, grit), we will not discuss them here. In some sense, these traits lead to success in all roles/life—not just data science.

Continue reading