BETA
This is a BETA experience. You may opt-out by clicking here

More From Forbes

Edit Story

What Are Microservices? - Part 2: An Interview With Confluent CEO Jay Kreps

Following
This article is more than 6 years old.

What are microservices? In the first part in this series, I interviewed Matt Miller of Sequoia Capital. In this, the second part of the series, I interview Jay Kreps, the founder and CEO of Confluent, an open-source streaming platform based on Apache Kafka. Kreps developed Kafka while he was the Principal Staff Engineer at LinkedIn. Three years ago, he left that social media giant to develop his own company to develop real-time data streams leveraging microservices.

In this interview, he describes his entrepreneurial journey, highlights the opportunity that microservices offer to large and small companies alike, and offers advice on how best to harness the power of this trend.

(To listen to an unabridged audio version of this interview, please click this link. This is the first interview in the "What Is?" series. To read future articles in the series covering such topics as artificial intelligence, virtual reality, robotics, and blockchain, among other topics, please follow me on Twitter @PeterAHigh.)

Peter High: Jay, you were at LinkedIn for seven years; the last three as the principal staff engineer. You came to microservices through that experience, and then developed some key technology in the evolution of the topic. Can you talk about the opportunity as you saw it from your position at a fast-growing dynamic technology company?

Credit: Confluent

Jay Kreps: When I joined LinkedIn in 2007, they were starting to figure out how to effectively break up a monolithic application that everybody put their code into. There had been an earlier wave of technology called service-oriented architecture, which is pretty much the same thing as microservices, that had died off. LinkedIn was born in the space between service-oriented architecture and the advent of microservices. Generally, to scale a software engineering effort, you add software engineers. However, when you add engineers you do get more done, but each individual engineer adds less capacity than the one before. This is a fundamental problem of big projects with lots of people and big applications; as you add people, you get slower. When people talk about microservices, they discuss scaling for more web traffic, reliability, and all kinds of other things. Microservices do not help those things.

Microservices only help one thing -- scaling software engineering efforts. It lets you add more money and turn it into more software at a more constant rate.

LinkedIn made a lot of mistakes when they were trying to scale because not a lot was known about best practices. As a fast-growth company, one of the most important things was creating a product that could evolve quickly. Having agility and speed was critical because the social networking space was competitive. It ended up working out in the end because everyone could deploy their part of the application and move independently. However, there were a bunch of hills and valleys in between where we made changes that were supposed to make us more effective, but had the opposite effect.

A significant evolution in technology was needed to get the promised outcome. This is where Apache Kafka came in. People starting out today benefit from the knowledge of what works and what does not, and many of the tools around deployment and the communication between services are built up. There is now a whole family of technologies that solve problems around deployment, how you monitor your applications to make sure they are all running well, how they communicate with each other, and how you can ensure reliability and security globally when you have many moving pieces.

High: Apache Kafka is the open source data-streaming platform you co-developed during your time at LinkedIn. It is also part of the backbone of Confluent. What was the process of codifying that into open source technology?

Kreps: At LinkedIn, we realized that not all parts of the application are the same. There is a part of a website or a mobile app that does fast interactive lookups as part of a user interface, and then there is back-end machinery that reacts to that and gets work done. This is true in every type of business. A retailer has interactive digital experiences like an online site, but also has a whole machine that does reordering, inventory management, analysis on pricing, and other logistics behind the scenes. We were struggling with the back-end side and the flow of data between services. That is what Kafka was aimed at. We had gone through an evolution where we tried a category of technology called enterprise messaging systems. We adopted a bunch of those and struggled to make them work at scale for a large number of engineers and large amounts of data. The flow of data was more difficult now that we had a bunch of microservices, each of which had their own dedicated databases. We were trying to figure out how to bring all the data together and build a holistic piece that triggers off that. Our solution was a streaming platform that any service could publish a message into, that could capture any change in a database, and could be reliably propagated to any other system. This handled the asynchronous side of the business that often would happen a little bit later, but has to get done. We see this often in financial services, retail, and insurance where a lot of their processing involves things that have to eventually happen, but they do not have many quick look ups for a user interface. This is where Kafka shines.

Kafka evolved out of a number of things. There is a prior generation of messaging system technology. It is maybe 10 or 20-year-old architecture and includes products from Oracle and IBM that many companies run at large scale, but it is hard to manage across a large number of applications. Kafka is also an evolution of data movement tools like extract, transform, load (ETL). We basically took those two areas, blended them together, and put them on top of a modern distributed systems foundation. This allows you to work across an entire organization, have all your teams work independently around streams of data, and have everything be real time. This ended up being the backbone for LinkedIn’s data flow. Because it is open source, it is now in thousands of companies including tech giants like Uber, PayPal, Pinterest and eBay; global banks and insurance companies; large retail companies; telecom companies; and a whole burgeoning set of use cases around the IoT. We had big hopes for the technology when we created it, but it has gone more places than we had imagined

High: You mentioned that service-oriented architecture was a related trend that seemed to have legs, but then petered out. What is the difference between that time and now? Why is microservices experiencing the success that service-oriented architecture did not?

Kreps: A lot of people have tried to answer that question. Engineers always want there to be a crisp technical difference between the two; I do not think there is. However, there are enough differences in the technology, the people, and the approach that microservices deserves a different name. The right way to talk about it is that microservices is the product of a new generation of technology that is built on reimplementing older ideas - and a rebranding. This was necessary because many of the service-oriented architecture attempts were not successful. In part, and I say this as a vendor, it was too vendor driven. It had a good architectural idea about splitting up functionality in a company, but its implementation had not been tried at scale. There were a lot of vendors who built something in an enterprise software company, shipped it to a customer, and said, "We are going to come in and solve this big social problem of how all your engineers can work on software independently." It did not work. Conversely, microservices came up in the trenches. The practices for building microservices evolved in real companies that were building technologies like Kafka, and then applying those to the problem of scaling their software engineering effort. While there is probably not a real underlying technical difference, finding a solution for these big social problems required doing it in a real life setting.

High: Roughly three years ago, you co-founded Confluent, a company you are now the CEO of. What was the inspiration for Confluent? Why was the timing right to leave LinkedIn and do something else?

Kreps: We always had big plans for Kafka. We intended for it to be used at LinkedIn and to be an open source project. We felt there was a gap in the center of a modern digital company. We wanted to bridge the space between the processing that happens in batch at the end of the day and the databases that support quick lookups. Our thesis was: The world is moving toward companywide digital platforms and there is nothing that targets the asynchronous space of streams of data; even though that is where most of the problems for a modern business lie. The only thing that existed was old junky technology in the messaging system retail space. That was how we got started. At first, only Silicon Valley companies got what we were doing; we had some big adoptions among those companies. Then it started to spread. Companies that were going through serious digital transformations implemented Kafka. These were typically traditional businesses in retail, banks, etc. They recognized that transforming how they operated was essential to their survival. A part of that is a big platform shift, which includes things like microservices and cloud adoption, and how they all come together to form a modern digital company. Motivation has increased because companies now understand they can bring modern software to bear on their core business problems. We see it at work in many companies we partner with. Car companies, for instance, are going through an incredible transformation. Buying a car used to be a one-time purchase; you bought it and you were done. Now, companies sell you an ongoing set of services that in turn, produce data. Car companies want applications that do things like tell you when you need to take your car to the shop, and they want to be able to do things like collect live traffic data and send it their customers. Suddenly, there is a whole digital side to a car. We see this across industries. Companies are changing how they view technology. This is where microservices comes in. They realize they are not as fast or agile as they should be.

Through our discussions with companies that were doing revolutionary projects with Kafka, we recognized it is hard to take a low-level infrastructure engine and just go with it. If this was going to take over the world the way we hoped, a company needed to turn it into a product that made it easier to adopt and put into practice. That was the genesis of Confluent. We left LinkedIn to start Confluent and have been growing ever since. We have big projects in the IoT space and big projects involving mainframe offloading where we plug into much older systems.

High: In your earlier responses, you talked about the breadth of companies that have used your platform. You have worked with digital native organizations and with more established organizations that have significant legacy applications through which critical data flows. How difficult was it for you to make the case for involving Confluent to organizations that might see this as a major undertaking?

Kreps: I did not find the adjustment terribly difficult. Silicon Valley technology companies spend a lot of time recruiting, since talent is competitive. As a result, they tend to talk up how wonderful their technology environment is. The reality is those companies are full of legacy too, it is just not as old. However, it turns out that legacy from five years ago may be more painful than legacy from 15 years ago. If anything, I have found working with established enterprises more rewarding. Most of those companies have significant businesses that are out doing something impactful in the world. The ability to apply technology to that and advance their product is exciting.

High: Who tends to be the curator of that on your client side?

Kreps: We are seeing a transformation in that developers have more power and are making more technology choices. Companies are looking for ways to enable their software engineers to bear on their important business problems. Nonetheless, they also want to have some control in securing, operating, and standardizing solutions across the organization. Typically, our adoption and involvement is driven by a set of engineers who have a problem to solve and the people they report to that are betting on this platform working for the application.

High: What advice do you have for companies that have yet to start or are in the early stages of implementing microservices? Can you offer insight into differentiators that determine whether an organization will be successful?

Kreps: Kafka and Confluent are most successful when there is a business driver like a strong ROI case or a new area of the business. We can usually figure out within 60 seconds of talking to somebody whether it is a science project or the real thing.

Microservices are most successful when there is real buy-in within the organization and a path to get from the lab to the real life of the organization. That has to be done gradually in a dedicated fashion. You cannot use a bottom up approach with microservices where everybody does something different. There have to be standards of how things talk to each other, how they are managed, and how they are deployed. That stuff has to be right, and then you need a way of gracefully onboarding a few things at a time so that the capabilities can be built up within the organization. Most companies that pursue our services are early in that process. At the most, they may have a few things working that way. Implementation to completion takes time. The companies that are most likely to be successful are the ones that have the will to get there. It is difficult to carry through these types of core infrastructure transformations because there is no quarter-to-quarter pressure that you have to get the microservices initiative done in order to make your numbers. Yet, you know that if you are not ahead of digital transformation, you are in trouble. That is the dilemma for organizations trying to go after this, how can they maintain the focus and get there in a gradual way, while still getting everything done quarter-to-quarter.

I am suspicious of any technology that requires a big bang rollout. You need something that is viable and has value for one application. After that, it can grow to a second application, and then spread from there as it proves itself. Many of the big “rewrite the world” initiatives do not come to fruition.

Two of the things I like most about our model of working with companies is that we are centered around something that is open source, and our product basically adds operational capabilities on top of that, which allow companies to put up production. They can start small with a few developers who download the open source and put together a prototype of something useful. Then, as it goes to production, we can work with them as it scales to become a central platform that continues to add value. But they do not have to get there in one step.

High: As you have pointed out, the technology Confluent is based upon is open source. It is critical for technology companies, your own in particular, to operate within an ecosystem. How do you balance marking your own space but also create something that works well with other technologies?

Kreps: It is a little different for us than for many other companies. We focus on how streams of data flow, which is not a problem many other people are focused on. My infrastructure technology peers are mostly stores of data. We are streams of data. Streams work well with stores because you want to hook the stores up to the streams so they can fill up your stores. As a result, we have a few hundred partners. These include technology integrations, applications, and systems. We serve a useful purpose when used in conjunction with our partners’ products. We answer the questions, “How do I get data out of that thing? How do I get data into that thing? How do I hook all these things together?” It is an area we have put our internal resources into, but it is also an area where open source is most effective. Confluent is doing a fair amount of the development of the core Kafka engine because it is a big replicated distributed system that is hard to build. However, most of the integration into the hundreds of things that are out there is contributed by the open source community. It is a way to be able to plug into all the things a company runs. That is one of the most essential things for any software product. It is nice that you have made your little engine or little widget, but how do you effectively and efficiently plug it into the companies that want to use it? And into all of the systems they have? We have put effort into curating the ecosystem, but it is absolutely an ecosystem, it is not built by one individual company.

High: You recently hit an exciting milestone referred to as Exactly-Once Delivery. Can you describe that?

Kreps: When companies process data, there is a set of batch systems that often runs at the end of the day. If you are a modern digital global company, that is kind of weird. What does the end of the day even mean? What do your computers have to do with the sun coming up and going down? Also, why is everything so slow? The back-end makes up a significant chunk of the more sophisticated data processing. One of the things that holds companies back is it is easy to build complex data processing in a batch way, meaning taking it back almost to the mainframe. A company will kick off a big batch job, it will churn through all your data, and spit out results. The problem is that it is hard to take things that happen once a day, make them continuous, and still get the answer you intend if there is a temporary interruption, like a network glitch or a server getting restarted. One of the things we have been doing at Confluent is bringing the strong capabilities around data to a real time environment. Even if some of the servers you are running on fail while the processing is continuing, failure is not an option. That is technically hard. It is something a large community of people have been working on getting into place over a number of years. We added the capability to Kafka and our Confluent platform in our most recent releases. It makes it easy to build things that are not only scalable, but are also real time, and get the right answer.

Peter High is President of Metis Strategy, a business and IT advisory firm. His latest book is Implementing World Class IT Strategy. He is also the author of World Class IT: Why Businesses Succeed When IT Triumphs. Peter moderates the Forum on World Class IT podcast series. He speaks at conferences around the world. Follow him on Twitter @PeterAHigh.