How not to draw an owl
I've been thinking a lot lately about how to effectively learn new skills and technologies.
I was recently studying data testing with Great Expectations. It has solid documentation, a human-readable CLI, automatically generated and narrated notebooks, and so much more. Data teams could hardly expect a better foundation upon which to learn how to test their data.
This experience did, however, remind me that as tools become more composable, difficulties may emerge with onboarding because tools don't exist in a vacuum. For example, if tool B depends on setting up tools A and C, we quickly get into a "how to draw an owl" situation, hence the memetic title of this blog post.
Deepnote puts the pieces together
Since Deepnote is designed to bring tools, teams, and workflows together, it has become clear to me that, in the context of learning, we can be much more than a compendium for demonstrating scientific tools. Instead, we can promote learning by allowing scientists to observe tools in their natural habitat, plugged neatly into their associated technologies. In other words, Deepnote embodies context-based learning.
By way of example, let's take my recent foray into learning Great Expectations. For those who don't know, Great Expectations is the leading tool for validating, documenting, and profiling your data. Great Expectations brings the software development discipline of automated testing to data science teams.
(Image source is here)
Its docs clearly state what it is and what it isn't. However, in order to truly grok its value, data scientists will have to interact with "what it's not" sooner or later, and that is where the rub is. Great Expectations naturally shines when observed within a larger software ecosystem, as do many other tools (e.g., dbt, Airflow, Git). Let's take a look at how Deepnote puts the pieces together for you when learning Great Expectations.
Clean house, clean mind
The getting started tutorial for Great Expectations is very well done — human-readable CLI commands and automatically created and narrated notebooks. There is a fair bit of context switching though, from notebook to notebook, to CLI, to docs, and back. My thinking was that newcomers would be able to learn more effectively if context switching could be minimized since it's cognitively costly (similar to multi-tasking). This frees up more brain power for internalizing new concepts and tool-specific parlance.
(Image adapted from here)
An all-in-one-place demonstration of the basics makes learning Great Expectations "cheaper" for the mind and Deepnote provides exactly this: It spins up a complete, runnable workflow that does not require the terminal, environment setup/installation, or multiple notebooks. Everything that is related to the learning experience is presented in the same place. No context switching is needed.
The first across the pipeline
The very first item in what Great Expectations doesn't do relates to pipeline execution. It is not a pipeline execution framework. Makes perfect sense. The only problem with this is that we end up at the "how to draw an owl" problem again. Data testing of any real value will have to end up in a pipeline at some point. While Deepnote is not going to set up Airflow for you, it does provide a GUI for scheduling your notebooks. Scheduling puts your data testing into a production-level pipeline without any additional learning or peripheral setup.
Be a good host
One of the best features of Great Expectations is its data docs. Every time you validate tests against your data, Great Expectations builds a human-readable documentation site. These HTML docs describe your validation results and more. They are a continuously updated data quality report. In the image below, you can see a page from the data docs showing a failed set validation. The data docs are amazing!
(Image source is here)
Unfortunately, we're back at the owl-drawing issue again: Now we have to host these docs on the web so that our team can access them. There are likely plenty of data experts who don't want to deal with hosting sites at all. Now, Deepnote is not designed to host your personal website, but we do allow incoming connections from the web to your cloud machine. This means that Great Expectations learners can spin up the data docs, and even share them publicly, without having to draw the whole owl, so to speak.
There is a proliferation of tools that are capable of being integrated with other tools. This is helpful, but it also comes at a cost. Learners are often smacked with a list of prerequisites so long and complex that observing tools in the wild, let alone adopting them, is too heavy a burden to bear.
When it came to learning Great Expectations, I couldn't help but reflect on the snowballing effect of learning new technologies in general. The role Deepnote is playing with regard to learning is significant — it provides an instantly spun up, "terraformed" world where related tools can be seen truly living together.
See for yourself — click the link to the notebook to observe Great Expectations in the wild and enjoy learning.
Share this post