Let Deep Learning VMs and Jupyter notebooks burn the midnight oil for you: robust and automated training with Papermill

Red Hat Launches New Certification Program to Support the Future of Telecommunications Innovations
February 27, 2019
Intel Inside – SentinelOne Cryptominer Detection
February 27, 2019
two men putting puzzle pieces together
Red Hat Launches New Certification Program to Support the Future of Telecommunications Innovations
February 27, 2019
Intel Inside – SentinelOne Cryptominer Detection
February 27, 2019

In the past several years, Jupyter notebooks have become a convenient way of experimenting with machine learning datasets and models, as well as sharing training processes with colleagues and collaborators. Often times your notebook will take a long time to complete its execution. An extended training session may cause you to incur charges even though you are no longer using Compute Engine resources.

This post will explain how to execute a Jupyter Notebook in a simple and cost-efficient way.

We’ll explain how to deploy a Deep Learning VM image using TensorFlow to launch a Jupyter notebook which will be executed using the Nteract Papermill open source project. Once the notebook has finished executing, the Compute Engine instance that hosts your Deep Learning VM image will automatically terminate.

The components of our system:

First, Jupyter Notebooks

The Jupyter Notebook is an open-source web-based, interactive environment for creating and sharing IPython notebook (.ipynb) documents that contain live code, equations, visualizations and narrative text. This platform supports data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more.

Next, Deep Learning Virtual Machine (VM) images

The Deep Learning Virtual Machine images are a set of Debian 9-based Compute Engine virtual machine disk images that are optimized for data science and machine learning tasks. All images include common ML frameworks and tools installed from first boot, and can be used out of the box on instances with GPUs to accelerate your data processing tasks. You can launch Compute Engine instances pre-installed with popular ML frameworks like TensorFlow, PyTorch, or scikit-learn, and even add Cloud TPU and GPU support with a single click.

And now, Papermill

Papermill is a library for parametrizing, executing, and analyzing Jupyter Notebooks. It lets you spawn multiple notebooks with different parameter sets and execute them concurrently. Papermill can also help collect and summarize metrics from a collection of notebooks.

Papermill also permits you to read or write data from many different locations. Thus, you can store your output notebook on a different storage system that provides higher durability and easy access in order to establish a reliable pipeline. Papermill recently added support for Google Cloud Storage buckets, and in this post we will show you how to put this new functionality to use.

Installation

Submit a Jupyter notebook for execution

The following command starts execution of a Jupyter notebook stored in a Cloud Storage bucket:

Leave a Reply

Your email address will not be published. Required fields are marked *