Agile for Data Science
This post is a collection of ideas and work of others on Agile Data Science, combined with my thoughts on how to apply essence of agile methodology to data science work. Agile as applied to software and process is an extension of the Scientific Method that emphasizes a structured approach based on hypothesis, observation and learning.
There are three concepts that capture essence of agile: feature definition, user feedback and iteration. Let’s apply these concepts to data science work:
- Hypothesis formulation is the equivalent to feature driven development in software
- Well-designed experimentation and iteration based on new information or feature selection is similar to test driven development and implementation
- The concept of retrospectives / peer review is highly valuable to data science work
Why does it matter? In his post on this topic, John Akred nicely captures the value of agile to data science:
By using agile data science methods, we help data teams do fast and directed work, and manage the inherent uncertainty of data science and application development.
In a post titled Agile Data Science, Wacław Kuśnierczyk mentions that Agile Data Science means a focus on efficiency, creating MVPs based on research and preferring simple models over elaborate ones.
Manifestos are a good way to spark discussion and good practices. For instance, here’s a straightforward four-point software development…