The increasing popularity of notebooks in data science has led to the introduction of numerous tools designed to make the version control experience of notebooks faster and more intuitive.
We're currently in an in-between space, with no perfect solution for data scientists to collaborate on notebook changes. Here’s an overview of the options for reviewing notebooks in GitHub.
Option 1: Review Notebooks as JSON
By default, GitHub will show notebook diffs using the underlying JSON file format.
For those familiar with the underlying format of Jupyter notebooks, this format is completely feasible to review changes to notebooks. Cell contents are stored under the source
keys of each cell as strings or arrays of strings. While the reviewer won’t see syntax highlighting of these cells, inline comments are supported like any other file format in GitHub.
The outputs are where things get complicated — namely for images and charts. Most images in Jupyter notebooks are stored in a base64 format, so it’s impossible to interpret the changes to these images. Some charts, like those generated by plotly, are stored in a custom format that can be rendered by Jupyter. If a notebook contains outputs of this type, it is easier to grok the changes when the changes are rendered.
Option 2: Turn on Rich Notebook Diff Feature Preview
For rendering notebook diffs, GitHub has recently released a feature preview:
To turn on this feature: Log in to GitHub → Click the profile avatar in the upper right corner of GitHub → Feature preview → Rich Jupyter Notebook Diffs → Enable
While the feature preview does not yet support comments, it fixes a litany of issues with the JSON file format review:
- Syntax highlighting
- Image rendering
- Plotly chart rendering
It also introduces nice-to-have features like collapsible outputs and metadata changes that are hidden by default.
Option 3: Use GitNotebooks
If an inability to add comments is a showstopper, consider using a GitHub App like GitNotebooks:
It’s minimally intrusive, never stores code, but instead retrieves and diffs the notebook versions on the fly. It supports features like:
- Syntax highlighting
- Chart rendering
- Image rendering
- Comments on left and right
- A flag to indicate whether a comment is outdated
- Both the diff and the comments stay in sync with the pull request on GitHub