Great Expectations (GE) is a tool for data testing and documentation. Onboarding to GE, however, usually comes with a few challenges for newcomers such as switching between multiple notebooks, using the terminal, and hosting documentation. This guide will fast-forward you through any pain points and enable you to bring the software development discipline of automated testing to your data science team.
All default Python environments in Deepnote come with preinstalled Pandas (learn more about all preinstalled packages here). GE can be installed through a simple
!pip install great_expectations . All that is needed to get started with GE, then, is to initialize GE via
!great_expectations --yes --v3-api init. Note that both of these statements could also be run in a terminal within your Deepnote project (without the
!, of course).
Once initialized, you can start using Great Expectations within your notebooks. In the example below, three Expectations (tests) are defined on the fictitious
df_pass Pandas DataFrame. In simple terms,
skillcannot contain null values
runnercolumn must contain unique values
total_timecolumn must have values between 70 and 100
# import pandas and great_expectations import pandas as pd import great_expectations as ge # initialize a Pandas DataFrame df_pass = ge.from_pandas(df_pass) # define Expectations df_pass.expect_column_values_to_not_be_null('skill') df_pass.expect_column_values_to_be_unique('runner') df_pass.expect_column_values_to_be_between('total_time', 70, 100)
Jump right into Deepnote & take a look at this thorough walkthrough of Great Expectations in Deepnote. You can also save yourself some setup work by hitting the
View source button first before clicking on
Duplicate in the top-right corner to start exploring on your own!