How to Review Jupyter Notebooks in GitHub

The increasing popularity of notebooks in data science has led to the introduction of numerous tools designed to make the version control experience of notebooks faster and more intuitive.

We're currently in an in-between space, with no perfect solution for data scientists to collaborate on notebook changes. Here’s an overview of the options for reviewing notebooks in GitHub.

Option 1: Review Notebooks as JSON

By default, GitHub will show notebook diffs using the underlying JSON file format.

JSON review of a jupyter notebook

For those familiar with the underlying format of Jupyter notebooks, this format is completely feasible to review changes to notebooks. Cell contents are stored under the source keys of each cell as strings or arrays of strings. While the reviewer won’t see syntax highlighting of these cells, inline comments are supported like any other file format in GitHub.

The outputs are where things get complicated — namely for images and charts. Most images in Jupyter notebooks are stored in a base64 format, so it’s impossible to interpret the changes to these images. Some charts, like those generated by plotly, are stored in a custom format that can be rendered by Jupyter. If a notebook contains outputs of this type, it is easier to grok the changes when the changes are rendered.

Option 2: Turn on Rich Notebook Diff Feature Preview

For rendering notebook diffs, GitHub has recently released a feature preview:

GitHub notebook review feature preview

To turn on this feature: Log in to GitHub → Click the profile avatar in the upper right corner of GitHub → Feature preview → Rich Jupyter Notebook Diffs → Enable

While the feature preview does not yet support comments, it fixes a litany of issues with the JSON file format review:

It also introduces nice-to-have features like collapsible outputs and metadata changes that are hidden by default.

Option 3: Use GitNotebooks

If an inability to add comments is a showstopper, consider using a GitHub App like GitNotebooks:

GitNotebook screenshot

It’s minimally intrusive, never stores code, but instead retrieves and diffs the notebook versions on the fly. It supports features like: