On the Rookout for live data: Instant observability to fix software bugs and open AI black boxes

Getting data to debug your code while running in production, without stopping or redeploying it. Whatever you may be running, wherever you may be running it. And now, with support for open source machine learning frameworks Apache Spark and Tensorflow added to the mix. This is what startup Rookout promises.

Software bugs are a pain: buggy software can drop anything from your sales to aircraft in mid-flight. Debugging software is hard, tedious, and costs a fortune. A multitude of frameworks and processes have been created to facilitate software testing and ensure fewer bugs make it to production, and invariably they all fail from time to time.

When this happens, the pain for software developers and the nail-biting for businesses starts. Developers have to find the source code that caused the bug, and execute this in a test environment that resembles the production one as closely as possible. The situation that caused the bug has to be recreated, too.

The way this usually works is by adding logging statements and breakpoints in the code, and retracing execution in the code and its dependencies until the bug is located and can be fixed. Then the new code has to be rebuilt and redeployed in production.

Frankly, it's a pain just thinking about it, let alone having to go through this. Or Weis and Liran Haimovitch are two software engineers who have gone through this time after time, and felt the pain, so they decided to do something about it.

Getting the data, skipping the redeployment

Weis and Haimovitch founded Rookout in 2017, based on a seemingly simple premise: Adding non-breaking breakpoints into live code, and getting data in real time from its real-life environment, without stopping applications. It sounds a bit like software development black magic, so we had to wonder how it works.

As CEO Weis told ZDNet, Rookout's framework works in two steps. First, the framework is integrated into a running application via an SDK. When done, instructions are received on-demand from Rookout's management system, and bytecode or opcode manipulation is applied on the fly to augment the running service(s) as if it was deployed with the additional logging/data-collection code in the first place. 

severity-of-bugs-are-we-doomed-infographic.jpg

Software bugs are a pain to fix, they cost a fortune, and they can have severe side effects for businesses. So shortening the cycle needed to deal with them is always welcome.

So, not exactly magic: there is an SDK integration via needed. But this is something that would have to be done once, and Weis said it's "a simple dependency, 2-minute setup". What can you do once this is done? Collect data as you please, and send it where you please, according to Weis.

Rookout's breakpoints can collect any data that could be collected by adding an extra log line in the code: local variables, global variables, stack traces, thread-local storage, metrics, and anything else your code has access to. The only exception is the return values for method invocations, which is not enabled for security reasons.

As for performance, Weis said that adding a log-line with Rookout, or doing so by coding and re-deploying creates the same basic end result in memory, hence there is no overhead. He went on to add that Rookout makes sure to only apply read-only effects, and always remain within the resource limitation bounds the admin has set.

Rookout supports JVM programming languages, such as Java, Scala, or Groovy, Node.js, and Python, in all clouds, containers, Kubernetes, and serverless. Rookout also works on external and third party libraries, as long as a file name and a line number can be provided. The collected data points can then be sent wherever they need to go: an application performance monitoring solution, alerting and logging tools, business intelligence tools and more. 

There is a catch though: at this point, it's not certain that your favorite IDE coding environment is supported by Rookout. You may have to switch to Rookout's own web IDE to use Rookout. Weis said Rookout is an API driven infrastructure platform, meaning you can connect to it through any interface of your choice.

Rookout is working to tailor multiple interfaces on top of its API as well as part of other products, for example Sentry.io + Rookout, Circle-CI. Its current focus, however, is its web IDE. Weis noted that:

Building our own custom web IDE allows us to build a tailored experience suited to our unique data-collection experience, which would be harder to do with the limited power of modern IDEs. That said, allowing developers to never leave their IDE is a big YES and we intend to provide this very soon for main IDEs like Jetbrains and VScode.

Instant observability for machine learning frameworks

Today, Rookout announced adding support for two of the most popular open source machine learning frameworks out there: Apache Spark and Tensorflow. Spark is supported both as a code instrumentation platform and as a data target for Rookout collected data. Weis said other related environments such as Jupyter notebooks are also supported.

With the major shift towards developing machine learning-powered applications, that should come as no surprise. But what can Rookout offer to machine learning framework users, and how does it work?

Rookout calls the capabilities it adds "instant observability" and notes that by bringing observability to AI, Rookout will empower data scientists to understand, improve, monitor, debug and iterate on their machine learning (ML) models faster, develop new data dimensions, add features and improve accuracy. Interestingly, Rookout draws the connection to AI explainability there. 

snapshot-2019-05-08-134719-www-rookout-com.png

Rookout's solution for collecting data from live code deployments work on a number of environments

"AI's a black box. You put data in and you get data out, we don't usually understand why it's making decisions," said CTO Haimovitch. "Data scientists developing AIs rely on engineering and IT teams to get them the data they need to improve their models. That takes time, costs money and can be extremely frustrating. The data they need is right there, but they couldn't access it -- until now."

Unexpected behavior could be an error in the structure of the model or some bias in the data, or it could be a classic bug in the enveloping code. Each of these will have its own very different solution. If a model needs more training it could take weeks of computing time. If the model itself needs expanding, data scientists may have to do complicated design work. On the other hand, a logical error, once found, could be fixed in seconds.

Rookout says it allows data scientists to observe and understand their models while they run, giving them access to model answers, inputs, and peripheral data on-demand, without having to ask the engineering team for code changes or waiting for the next release.

On the Rookout

Seeing this we wondered about the underpinnings of Rookout's framework, whether it is possible to keep expanding it to more environments, and whether they work with vendors to achieve this. Weis noted that the exact implementation depends on the framework runtime, but at its core, Rookout supports most of the frameworks out of the box with zero customization:

Rookout uses some existing reflection / instrumentation capabilities of each runtime it attaches to, and also has a lot of proprietary techniques per runtime to achieve the right controlled manipulation effect. But of course, some frameworks require special attention (for example AWS Lambda or GitHub Electron).

Some of the frameworks require us to develop special cases in our code, some require the user to call Rookout in a specific way - for example, wrapping Lambda functions. So far we haven't required any assistance from framework creators to add support. That said, we'd be happy to collaborate if it's ever necessary, including by contributing to open-source. 

screen-shot-2019-05-07-at-13-29-01.png

Rookout live code debugging using the web IDE.

Rookout, which has registered patents around its data collection and pipelining technology, came out of stealth in 2018. It has gotten seed investment of $4.2M led by TLV Partners and Emerge, has a growing core team of low-level engineers and CyberSecurity experts, and is focusing on the North Americas market, with customers such as Backblaze, Two-Sigma, Maverik and Guesty.

And if you're wondering what's in a name, Weis said that rooks inspired them because they are smart birds that can use tools and even shape hooks to find things. Rookout founders envisioned a bird that sits by developers and serves them any time they need to get data on-the-fly from their live code.

If you're on the lookout for mitigating the pain of debugging, Rookout may be your thing.