These days, the industry would have you believe that data and analytics is all being done in the service of AI. And, given that, there's a lot of orientation toward data scientists' seemingly favorite tool for working with data: the notebook. The biggies in this space include Apache Zeppelin, and more prominently, Jupyter (formerly iPython). If you're on the Databricks platform, you'll end up using their own brand of notebook, but even those are Jupyter-compatible.
One way to think of a notebook is a place to put executable code and annotate it. It might be more accurate, though, to say that notebooks are a place to put a bunch of annotations and decorate them with code. But regardless of how snarky you may want to be, notebooks let you intersperse rich text in Markdown format with code, in any number of languages, run it in place, and view its output, in textual, tabular or graphical format. If you finesse things enough, you can even cobble together a poor man's dashboard with these assets.
The problem with notebooks is that they're much better for experimental data science work than they are for production data engineering work. That's my own opinion, of course. But I stand by it. Notebooks are more about presentation than they are about development, and they lack many of the niceties of a real integrated development environment (IDE) like Eclipse, PyCharm, Visual Studio or, for that matter, RStudio.
But it doesn't have to be that way. Jupyter notebooks started out as a tool for Python code exclusively and Anaconda, which produces the leading Python distribution, has been hard at work enhancing the Jupyter notebook facility into something much more IDE-like, with JupyterLab. Earlier this week, the long-touted tool was released in a form its developers saw as "ready for users." Given that modest but significant milestone, I decided to download JupyterLab, fire it up and check it out. In general, I am impressed.
JupyterLab is a superset of Jupyter itself. As such, anything you can do in a Jupyter notebook, you can do in JupyterLab. But you can do a bunch more.
To begin with, syntax completion using the Tab key, and object tool tip inspection, using Shift-Tab, are present, just as they are in the conventional notebook experience. But JupyterLab enhances these facilities compared to their standalone Jupyter notebook implementations, with additional information about the types of matched items in tab completion and additional information about objects in the tool tips. Having that sort of contextual assistance and help reduces the amount of context switching developers have to do to figure things out.
Developers can also work in a more imperative programming mode, by running their code interactively inside of "consoles" and not just notebooks. Consoles are live sessions with the Jupyter "kernels" (the language interpreters that actually execute code, behind the notebooks ) so they can run their code in a more programmer-oriented environment before inserting it into the primarily text- and visualization-based document manifested in a notebook.
I spy more than iPy(nb)
JupyterLab goes well beyond notebooks though, allowing developers to open up files in a variety of formats, containing data or other assets they may use or produce from their code. This includes source code files in supported languages, as well as files in plain text, CSV and other delimited text formats, JSON, various image, and even in PDF formats. The viewers and editors that help out here include a full-featured text editor, an image viewer, a tabular data viewer, a tree-view JSON viewer, and viewers for Vega, Vega-Lite and VDOM files.
Sometimes, multiple viewers are applicable to certain files. For example, JSON and CSV files can be opened up as plain text. But a a tree viewer can also be used in the case of the former, and the table viewer can be used in the case of the latter. Jupyter allows such files to be opened up in multiple editors simultaneously, and keeps the multiple views synchronized, so that edits made in one are rendered in the other(s).
This multi-view paradigm works for notebooks as well. for example, by right-clicking on a rendered visual inside a notebook and selecting "Create New View for Output" from the resulting context menu, the visuals can be displayed in their own view, side-by-side with other such visuals and the conventional view of the notebook that produced them.
This is done with just a little drag and drop finesse, allowing multiple notebook, output and file viewers to be displayed in horizontally- or vertically-proximate regions, separated by splitter bars, with each region accommodating potentially multiple tabbed documents, and each being individually scrollable. Once you get good at creating such layouts, you can even set things up in a dashboard-like layout, as shown in the screen capture above.
Single document; multiple langauges
Recognizing that all this eye candy can sometimes be distracting, JupyterLab allows users to toggle between such tiled layouts and a single document view, wherein the active document takes over the entire editing region of the JupyterLab browser tab or window.
JupyterLab is supported in the Chrome, Firefox and Safari browsers. In my admittedly limited testing, it also worked quite well in Windows 10's Edge browser, with the exception of the single document view feature, which caused serious rendering anomalies -- and which I confirmed did not exist in Chrome.
We don't need no stinkin' installers...or do we?
Installation of JupyterLab can be complex, owing mostly to its cobbling together of multiple technologies. Installing Anaconda Python will get the core of Jupyter on your system, and a subsequent install step will install support for the R kernel as well. From there you can download JupyterLab itself, and then separately install other languages and the Jupyter kernels as well. That's what I did to get the Node.js kernel installed, and it took a couple of steps to get it right. Bringing down sample notebooks took additional searching and effort as did the installation of modules necessary to support the code within them.
This can be difficult but it's not rocket science (or even data science): I am a former developer and still pretty hands-on with tools, so I have some instinct for these things. But my dev skills are lapsed, and I'm no tools ninja. If I can do it, then likely so can anyone with the prerequisite skills and interest to use JupyterLab in the first place. Still, patience and some spare time are required, and a master installer would remove a lot of friction.
Cloud adoption dividend
But for many Jupyter users, installation has been a non-issue, as the facility is embedded in numerous cloud services, including Microsoft HDInsight, Amazon SageMaker, and Google Cloud Datalab. Will the various services and products that embed Jupyter now also embed JupyterLab? I hope so, as this will raise the bar in the data engineering experience on all these platforms.
Web browser-based developer tools are innovative, but they can also be limiting. Jupyter has been a good exemplar of this conundrum. But JupyterLab helps transcend the limitations, while retaining the innovation and convenience. Let's hope its adoption ion the ecosystem is brisk.