The interactive notebook has recently become a major fixture in the data science toolbox, especially for sharing analyses. We have integrated notebooks with our platform, Polly, to allow for an easy setup process along with data management and computational resource management capabilities. This article will walk you through the notebook features available in Polly and how to set up the scripting environment quickly!
Interactive notebooks are environments that couple your code with the resulting output in a single document, including text and visualizations. In addition, they allow you to do the following:
Polly Notebook provides a Jupyter-like interface on the cloud. Here’s why we recommend Polly notebooks over other local hosting options:
The above features help reduce setup costs and barriers to entry in biological data analyses significantly. Combine these features with data storage on the cloud and the ability to share projects and you can heavily boost collaboration within your group and ensure reproducibility of every analysis you perform!
Although Polly remains behind a paywall to cover the storage and processing costs, the team is providing free Polly Notebook trials to selected bioinformaticians for a limited time period. You can register yourself for a free trial by filling out a form here.
In this section, we will learn how to create and save a Polly Notebook, learn about the scripting interface, and some other exciting features!
Once you have logged in to Polly, navigate to the default project, and create a new notebook.
On the next screen, you will be asked to choose between different environments and resources. Let’s discuss some basic terminology before we make our selection:
In the following example, we will use the Python3 kernel with a Polly Small machine.
Once the notebook is ready, you will see a cell with “Welcome to Polly <kernelname> Notebook. Now let’s see how to run your code.
All the code in a notebook is organized into cells for easier comprehension. To see how these cells behave, let us add some code in the first cell. Type `print(“Hello Jupyter!”)` in the first cell and click on the >| button from the toolbar to execute this statement. You can also press `Shift+Enter` for the same effect.
The output of the first cell is shown right under it and the label to its left is updated from `In [ ] ` to `In [1]:`, indicating that this was the 1st cell to be executed in the notebook. This is a powerful feature that helps you keep track of your variables if the cells are executed in a non-linear fashion. You can add more cells using the Insert tab and continue coding like you would on a local editor!
Polly Notebook provides simple methods to fetch files from your project and save new files from your notebook. These methods are applicable to both R and Python notebooks.
Here’s an example snippet to see how this would work. Suppose we have uploaded a CSV file named ‘my data file.csv’ in my Polly project. We then create a new notebook as shown above and write the following snippet.
As you can see, we first fetched the name of the file using the list method, then downloaded it into our current workspace. We used a pandas method to read this file and then display the contents right in the document.
Let’s manipulate this data and save it as a different file in Polly. We’ll be deleting the ‘id’ column from this CSV using the following snippet.
Let’s use the list method again to see if the file appears in the project.
You can also navigate to your project and check if this file has been added there.
Now that your work is done, you can save your notebook using the save button from the toolbar. This will ensure your current state has been stored in Polly for future use.
While our available environments are equipped with the most commonly used bioinformatics modules for R and Python, you might frequently run into “Module not found” errors while importing your package or library. In case you require a package that’s not available in our environment, you can install it using the terminal, just like you would on your local system.
Let’s take a look at the steps involved for both Python and R notebooks
Go to the Polly Offerings tab and open the Terminal. The terminal opens in a new tab automatically and looks something like this:
# For installing packages DON’T forget to use sudo. It will not ask for a password.
> sudo pip install <package-name>
# System binaries
> sudo apt install <package-name>
# If the above command outputs ‘package not found’, you can run this command to update the system package indices
> sudo apt-get update
To install a new package like biopython, simply type `sudo pip install biopython`.
Once the package installation is successful, you can import the package in your notebook using `import Bio`.
Go to the Polly Offerings tab and open the Terminal. The terminal opens in a new tab automatically.
# You can install R package by opening R terminal
> sudo R
# Install packages using the following command
> install.packages(c(‘<pkg-name>’), dependencies=TRUE, repos=<Enter your choice cran mirror link>)
# For CRAN mirror link: You can use either one of your choice or this one: “https://cran.cnr.berkeley.edu/”
# For importing the library using the terminal, use the following command (Note – You can also call the libraries from the notebook as usual)
> library(<pkg-name>)
Once your notebook is ready, save it using the save button and go back to the Polly project page.
Click on the ‘Share’ button and enter your collaborator’s email address. Note that you can only share your projects with other Polly users. To share your project with collaborators who are not yet on Polly, you can download your notebook in various formats like HTML, .ipynb, Markdown, LaTeX, etc. You will find the download option inside your notebook under the “File” menu.
We presented a brief look into the Polly Notebooks offering. There’s much more that can be done with notebooks, from making beautiful custom dashboards and mini-apps to automating computationally heavy workflows on the cloud. This is just a start. We will be pushing out more tutorials and documentation regularly to help you get the best of Polly!
As early bird users, we would love to get your feedback on the product. Love it? Hate it? Think it’s just another fish in the pond? We’re all ears at product@elucidata.io