4 Tips for Notebook Readability |

This week I conducted several interviews with data scientists about challenges in their workflow, one issue universally stood out: the time spent deciphering a colleague's Jupyter Notebook. To help avoid this frustration, here are four best practices to ensure your team isn’t pulling their hair out when reviewing your work.

1. Organize with Clear Sections and Headings

Use markdown cells to divide your notebook into logical sections like Introduction, Data Loading, Data Cleaning, Modeling, Results, etc. This helps create a flow that is easy to follow. Within these cells, use descriptive headings (#, ##, ###) for each section and add context about what the code is doing.

Example:

# Project Title: Analyzing Sales Data

In this project, we will analyze sales data to identify trends and patterns. The notebook is divided into several sections, each with a specific focus.

## 1. Data Loading

We will begin by importing the necessary libraries and loading the dataset. The data consists of monthly sales figures for different regions.

### Step 1.1: Importing Libraries

We will import common libraries such as pandas and matplotlib for data manipulation and visualization.

2. Add Explanatory Comments

Notebooks are great for quickly exploring ideas, but this scratchpad approach can lead to a bad habit of not including code comments. Remember to provide concise and meaningful comments throughout your code to explain what each section or key line does. It will probably be useful to you when you come back to the notebook in the future.

Example:

# Remove rows with missing values to avoid errors in the model

df = df.dropna()

3. Limit the Length of Code Cells

It's hard to remember to break up code while in the flow of writing, so one of the biggest offenders is code cells that do too much. Avoid long, cluttered code blocks. Instead, break up your code into smaller, functional cells, each focused on a single task. This will enhance readability and make it easier to debug.

Example:

Instead of a long cell:

# Load data, clean, and preprocess all in one cell

Split into smaller cells:

# Load data

df = pd.read_csv('data.csv')

# Clean data

df = df.dropna()

# Preprocess data

df['column'] = df['column'].apply(lambda x: x.lower())

4. Use Descriptive Variable and Function Names

Another victim of the scratchpad mindset, lazy variable names that come back to bite when the notebook needs to be revisited or shared. Choose variable and function names that describe their purpose or the data they hold. This practice also reduces the need for excessive comments and can help make the code self-explanatory.

Example:

Instead of:

x = df.groupby('region').mean()

Use:

average_sales_by_region = df.groupby('region').mean()

Conclusion

All of these are pretty basic but quick to be tossed aside. Follow these tips and your teammates will surely thank you! Once you have optimized your notebook’s readability and are checking it into Git for review, GitNotebooks makes reviewing the diffs as easy as reading your notebook.