The state of microservices according to Temporal Technologies

Engineering around node failures is critical for cloud-native apps to continue working. Temporal steals a page from Java history with middleware for state management designed to handle failure.
Written by Tony Baer (dbInsight), Contributor

For the end-user, cloud-native services are supposed to simplify life and provide more agility. Still, for the developer, they can make life far more complex because of their distributed nature. Among the challenges is managing state, something that is second nature to database practitioners, but not necessarily app developers. That's the challenge that Temporal Technologies has taken on, providing the state management behind the orchestration of microservices, picking up where service meshes like Istio leave off.

Understandably, you've probably never heard of this two-year-old company before, as its sparse website makes the company almost look like it's still in stealth. It's not clear if Temporal has much of a paid client base; it lists a number of logos like Datadog, Netflix, Instacart, Qualtrics, Box and others, but they are users of the open source technology, not paying customers. If you dig down closely enough, you can actually find some real documentation. But just in case we forget to mention it, Temporal just secured a $103 million Series B round.

Specifically, Temporal pinpoints a narrow task: managing the state of microservices. Given that microservices typically fire up in highly distributed cloud environments, managing state is akin to choreographing transactions in a masterless or multimaster database. That's a challenge that, for instance, Cassandra developers know quite we l. In databases, it's all about balancing transactional consistency with write availability. In the application or microservices tier, it's about availability, where the chain (in this case, compute nodes hosting specific microservices) will only be as strong as its weakest link.

Managing state, which commits transactions, is key to ensuring that results are valid and current and for keeping the system -- whether it is a database or application -- from crashing. For instance, when you withdraw cash from a bank ATM machine, state management is essential for ensuring that the transaction is only completed when the account has been debited.

The need to manage state in distributed environments is very critical because, with multiple moving parts, there's a decent likelihood that one of them will misfire. And so anything running on the Internet or in the cloud requires engineering for failure, involving failover and workarounds, so the outage of a single node won't crash the whole application or service.

In the database world, state engines were typically built-in; if you launch a database, you don't have to write your own state engine. In the AppDev world, that's not the case; developers typically had to write their own.

For microservices, organizations would typically have to write their own state machines in addition to application co e. For Temporal user Checkr, a service that provides online employee background checks, a typical workflow often involves a series of 50 -- 60- automated and manual steps (each of them microservices) retrieving data from a wide variety of external sources. There were lots of Kafka queues to juggle, writing data to multiple target databases, then writing logic to merge the results. With a Temporal server, they could focus on the app rather than the state engine.

Temporal characterizes its solution as "the open source platform for orchestrating highly reliable, mission-critical applications at scale." For microservices, at first glance, that sounds a lot like what service meshes do. But service meshes operate at the infrastructure level, making connections and ensuring failover if nodes go do n. By contrast, Temporal focuses on an application level, and more specifically, checking whether the code or logic in the microservice is executed and, if not, managing workarounds dealing with cascading dependencies.

The problem that Temporal solves with microservices is nothing n w. As noted above, in the AppDev world, state engines have to be written as external code or bundled as part of some framework. That's exactly the problem that Internet applications also had to resolve because the web was stateless, and that's what led to dedicated middleware, or app-servers, to handle the process with web applications, where popular language like Java carried their own mechanisms for managing state.

With Temporal history is repeating itself in the microservices tier. Its state management server technology comes from a five-year-old open source project that was the outgrowth of work developed at Uber. It's built around Temporal Server, a microservice orchestration platform that sits between compute servers and executable source code.

That prompts the obvious question: if microservices are distributed in nature, executing in distributed computing environments, won't a central orchestration server defeat the purpose by introducing a single point of failure? The answer is a new "experimental" multi-cluster asynchronous replication feature that should provide the necessary failover capabilities. When it comes to transactional guarantees for microservices, the future is still a work in progress.

Editorial standards