Keegan Hines

Data Science from my iPad

For the last couple weeks, I've been interested in methods for deploying analytics environments so that they can be accessed solely from a web browser. The reason for this could be that we allocate all the high performance computing and number crunching to a remote server, then we could access that server from any device with a browser. Therefore, we might never need to purchase a high performance computing machine again, we could do everything from a tablet (or phone even). Hence, the pithy title. Anyway, here's some of my favorite things.

Rstudio Server and Docker

Rstudio Server is awesome way to set up an Rstudio environment in a web browser. But you may not have the savvy to set up the server-side backend and you may not have easy access to a persitent server. I've recently described how you can take advantage of Docker to quite painlessly deploy Rstudio using DigitalOcean for as little as a cent per hour. For the R crowd, this is an exciting possibility and you can get things up and running in minutes.

iPython Notebook with tmpnb

For the Python enthusiast, I'm sure you're familiar with the superbly cool iPython notebook. These are interactive development environments that can be run by anyone who has iPython and can be statically shared at nbviewer. However, we might like to have a hosted version of iPython notebooks so that we (and our collaborators) can execute and interact with the code from any system.

tmpnb provides exactly this, and is 100% free (as in beer). Again using Docker, these iPython notebooks are served by Rackspace and require absolutely no effort on our part in order to use them. We get a 'temporary' iPython notebook assigned to a randomized user and session. We can then edit this notebook and share the permalink with others so they can execute and modify our code, all the from the browser, no Python required on our end.

I'm a big fan of how immediate and effortless it is to use tmpnb. This work was recently featured by Nature to highlight the importance of these kinds of tools for reproducible science.

iPython Notebook with Wakari

A potential problem with tmpnb is, well, the temporariness. It's not obvious to me how long a particlar iPython notebook session will persist, and I'm not able to gather together multiple notebooks under a username. A very similar idea as tmpnb comes from the folks at Continuum with their product Wakari. Again, the idea is to host iPython notebooks remotely so that they can be shared and used from anywhere. With Wakari, we can create an account in order to store all of the notebooks we're interested in as well as their dependencies. The base-level Wakari is also free (as in beer), and if we need more memory we can pay a nominal fee for the use of enhanced server resources. This provides another advantage over tmpnb, giving us the option to access more computation if we're willing to pay for it.

Summary

All in all, these tools provide exciting possibilities for enhancing a data science workflow and facilitating collaboration. And even for the services which aren't free, the cost incurred over a year or two will pale in comparison to the cost of purchasing a high performance machine. Now I just need to get an iPad...