The topic of this episode of the New Relic Modern Software Podcast is microservices, addressing questions like: What is a microservices architecture? What are the best use cases for microservices? How do you monitor them? What’s the trade-off between flexibility and complexity? How can microservices help future-proof you against changes in your business needs? We even address the cultural implications of microservices!
To discuss these issues, and much more, we’re thrilled to welcome an amazing pair of guests with deep knowledge and expertise—and some strong and perhaps contrarian opinions—on microservices.
Serial entrepreneur Richard Rodger is CEO of VoxGig, a global conference and event SaaS company based in Waterford, Ireland. He’s also the author of The Tao of Microservices, published by O’Reilly Media.
Sean Carpenter, meanwhile, is a Principal Technical Evangelist for New Relic in Portland, Oregon, who’s been working with microservices concepts for more than a decade.
You can listen to the full episode right here, or get all the episodes by subscribing to the New Relic Modern Software Podcast on iTunes, or wherever you get your podcasts. Read on below for a full transcript of our conversation, edited for clarity:
New Relic was the host of the attached forum presented in the embedded podcast. However, the content and views expressed are those of the participants and do not necessarily reflect the views of New Relic. By hosting the podcast, New Relic does not necessarily adopt, guarantee, approve, or endorse the information, views, or products referenced therein.
Fredric Paul: Maybe we can start by learning a little bit more about our podcast participants today. So Sean, what’s your background and what’s your role at New Relic?
Sean Carpenter: I’m on the product team at New Relic, currently working in product marketing, mostly focused on APM but pretty much the whole platform. Before that, I was in product management, working on our APM product for about three and a half years, and before that, I was a leader of software engineering teams at a couple different companies in Chicago.
Fredric: My hometown, as well. So Sean, what’s your background in microservices specifically?
Sean: Yeah, interesting. So I worked at a smaller company, back in like 2005, 2006—called homefinder.com, doing real-estate search. And that was really my first exposure to microservices. We were actually building microservices in PHP if you believe that, very lightweight composable services. We didn’t really call them microservices at the time. We were just trying to build something that we could build on iteratively and quickly.
Fredric: And so Richard, what’s your background? How did you come to the world of microservices?
Richard Rodger: So I’m currently on startup number four, two exits and one crash-and-burn. That sounds a little bit crazy; nobody in a proper company would ever give me a job. I did math in school so I’m kind of a self-taught programmer. After 20 years of doing it wrong, I kind of discovered microservices, so never underestimate the faith and zeal of the converted.
The Tao of Microservices
Fredric: And that conversion has led you to actually write a book, The Tao of Microservices. Can you tell us a little bit more about the book and how you came to write it?
And the book that I wrote, The Tao of Microservices, in some ways takes you on that journey from the days when we were building monoliths and how we converted them into microservices, and then how we started helping bigger companies get to that place. It’s like all those learnings over that five-year period. Hopefully, it’s kind of a practical approach.
Fredric: So let’s talk about what we really mean when we talk about microservices. Sean, maybe you can start us off and talk for a moment about the traditional definition of what microservices is and how we think about it here at New Relic.
Sean: That’s an interesting word, “traditional.” To sort of build on what Richard said…
Richard: Isn’t it?
Fredric: Traditional for something brand new. Okay, yeah I get the problem, sorry about that.
Richard: We’ve got three years here, you know, this industry moves fast. It’s good.
Moving beyond a standard definition of “microservices”
Sean: A lot of time gets spent trying to create a precise definition for microservices, and it doesn’t seem like a really good place to spend time and energy.
I think that the key is looking for the markers of microservices, which is a strong separation of concerns between business logic into composable pieces. I think that that’s the best place to start: Instead of thinking about the technology itself, think about the purpose. And then after that, I think that it’s about identifying the sharable components within services as they evolve, so you can easily extract those out and abstract them separately so that other components of your business can take advantage of those things?
And then there’s obviously a big discussion that happens with engineers about: When does a microservice have too many end-points to be called a microservice anymore? And there’s a new terminology being used called “nanoservice,” when we’re talking about functions as a service, little, very lightweight single-purpose functions that operate within a system. But I think that to pull back the larger thing is to think about it in the context of the larger system as opposed to wondering if any single component part is, by definition, a microservice.
Fredric: Richard, does that fit into your perspective that you take in The Tao of Microservices?
Richard: For me that’s the first step. I share with Sean this idea that looking for a trite, concise definition of the term is kind of pointless. For me, the most important aspect is to understand that microservices give you a software component model. And if you look at the history of software design and architecture, this is a kind of Holy Grail that we’ve been looking for. Components are super powerful because it’s like Legos, right; it lets you put things together. You have this fundamental operation, which is composition, and so you build complex things out of small things. And if you look at all the different approaches that people have taken—even things like, you know, classical object-oriented programming or CORBA, or the way Erlang fits together its processes—all of those things are groping in the dark for a component model that is sufficiently powerful but not too complex to allow large-scale software development. That’s kind of the perspective that I come in on as opposed to implementation details.
Fredric: Sean, does that make sense to you?
Sean: Yeah, absolutely. And I love this idea of thinking about it in terms of the compositional components. There’s a lot of technical objectives that you’re trying to achieve when you’re building with microservices. You want strong separation of concerns, so that it’s easier to understand one individual component, and it’s easier to test and deploy it without a lot of sort of unknown unknowns that can crop up like they will in a monolithic architecture.
Like I said, there’s a laundry list of these technical constraints that you could concern yourself with. I think it’s always a challenge for us as engineers to pull back and think about the business. One of the best benefits that I can think of is that by having these compositional components, it sort of future-proofs you against the changes in your business needs. So that when you’re building a service right now, if you build it as a discrete component, in the future when the business comes and asks you to do a new kind of feature for your customers, you don’t have to go back and do a lot of technical re-architecture to take advantage of that component, at least hopefully you won’t.
The benefits of microservices
Fredric: So is that what you see as the benefits of microservices as well, Richard?
Richard: Yeah. So Sean, I mean that’s a fantastic example because that’s actually literally a problem that has just hit us in the last couple of weeks. We’re a startup, we’ve taken an MVP trial approach.
We actually started with microservices—that’s another topic where I’m kind of a little bit contrarian because I say “do microservices from day one.”
In our system, we had a user microservice that when you log in, it would generate a description of the user and the username, and whatever permissions they had. But we didn’t have any groups or organizations in our system and the system that we’re building—it’s event management software, so you need to have teams, and you need to have groups, and you need to have permissions.
When we got to the point where we had to build that stuff we were like, “Well, hold on, the user microservice doesn’t understand anything about organizations. And it shouldn’t, because we have an organization microservice that does. So the piece of functionality that builds that description of the user’s permissions—we moved that functionality to the organization service. From the perspective of the other services in the system, they didn’t know that it was a different service that was providing them with that description, and they didn’t care. But you can use microservices that way, where you introduce new functionality to deal with changing business requirements. The user microservice is still there and we’re going to have to update it and delete the functionality that doesn’t exist. But we didn’t have to make it more complex, we just built a new microservice for the new functionality.
Flexibility, complexity, and adaptability
Fredric: What it sounds like we’re all getting at here is that the real benefit of microservices is flexibility without complexity, or am I getting that wrong? Maybe you guys can state it better than I can.
Sean: The first part was right but I think that complexity is unavoidable. I’ve never seen a situation where we’re able to maintain that simplicity and clarity—I love the title of that book, The Tao of Microservices, I’m picturing this Zen garden. But our reality is much more messy than that, more messy than I like to keep my desk. You just sort of have to accept it and plan thoughtfully both from a business and a technology. I think that that’s where the real art is—not so much in how clean and dogmatically are you able to follow these patterns but in how quickly can you change when you need to.
Richard: It’s certainly true that there’s inherent complexity in a system. All you can do is move the complexity from one place to another. In general, 80% of your use cases or your traffic is going to follow the more simplistic business rules. It’s the final 20% that has all the really hard stuff in it. And usually what ends up happening is your data schemas and your logic, your if/then/else trees, your business rules, your lookup tables, they all end up getting polluted by this extra 20% of edge cases. If you’ve nowhere to put it where it’s safe, it pollutes the more simplistic cases.
So what I think microservices lets you do, as Sean pointed out, is take that inherent complexity and isolate it into separate services that can handle the really ugly, messy stuff and try to keep a core of simple services that handle common cases. That strategy seems to be pretty effective.
Fredric: So what I’m hearing is that even if we can’t get rid of the complexity, we can isolate the complexity in the edge cases.
Richard: Yeah, absolutely and it pretty much is as simple as that.
Why microservices bring the most value at the beginning
Fredric: So, given how we’ve been talking about microservices so far, where do we think that microservices has the most value to bring? What kind of use cases is it most appropriate for? And where maybe should we be a little wary of applying it?
Richard: There is an undeniable overhead in running a microservices system. There’s got to be a trade-off that makes sense.
Microservices primarily give you this ability to compose components together, and that gives you the ability to have a high-velocity development cycle, but you pay for it with latency and increased DevOps complexity. So it’s most appropriate in the first six to nine months of a project when you are still figuring out a whole bunch of requirements stuff. But once your database schemas solidify and the business rules are starting to work—or in a startup, if you’re starting to hit product-market fit—you start to identify pathways through the system that meet where the latency is too high, because it’s too many jumps and too many microservices. At that point, you start merging them, and you actually move in some cases towards a macroservices configuration. I would characterize it, again, in somewhat of a contrarian way as saying they’re most appropriate when you have the least amount of certainty.
Sean: I would agree with that. I think that this is a key point at the beginning, where you want to get really clear about your business and your domain object design. What are your main domain objects in the real world? And what are the main ways you expect those things to interact together in the real world?
Those end up painting a really, really good picture, not only of your data design, but of the flows of how you would expect that data and those entities to interact in the real world. And then, you can use that sort of thinking, that sort of business-architecture design, to then inform this process that Richard is talking about—quickly building microservices to prove out end-to-end whatever that use case you’re trying to prove out.
I think it goes to helping future proof, so that when you get to those later stages, when you realize you have spreading complexity, to be able to pull things together and shift services around from person to person or team to team in ways that you can’t predict at the beginning.
Fredric: That makes a lot of sense. With all that, any tips or best practices on microservices you guys want to share before we move to the next issue: monitoring microservices?
Richard: Sure. So, this one is kind of a biggie, because this comes from a whole bunch of mistakes that we made in the early days.
So we’re building all these Node.js applications; they’re monoliths; it’s kind of crazy, and then we realize, “Hey, let’s break it all into separate services. And oh, hey, microservices are a thing.” So, we go in all these projects for clients and we say, “You know what? We’re going to sit down, we’re going to figure out what services we’re going to build first.” And we come up with a list of 10 or 20 services. And yeah, it kind of works, but there’s a whole bunch of friction, and it turns out it’s really hard. It’s really hard to analyze a system and decide what services to build.
And then we changed our focus to the messages. The messages are the most important part.
If I go back to that example I was using earlier on, there’s a message that says, “Hey, give me all the data about this user.” And it doesn’t really matter if it’s the user service that knows that, or if it’s the organization service that knows that, plus the groups, and put the group data in there. What matters is that the client service just gets a response from somebody.
So we ended up with this model. It was kind of homegrown. The business analysis side was mostly figuring out the activities that happen in the system, and then you kind of turn those activities into messages. You allocate the messages to services, and that changes over time. So you list your requirements, break it down into messages (you might end up with 40 or 60 or whatever), and then break them down into whatever services makes sense. That approach worked pretty well. I have to say, I recommend it because it gives you a natural pathway from the business to what you’re going to actually deploy.
Sean: I completely agree. And the way that I’ve thought about it in the past with teams that I’ve worked with is, like I was saying before, get really clear about the entities that exist in the real world and how they interact to provide value, and then that will highlight some of these pathways that Richard is talking about.
It’s the requests and responses. Like that’s what really sort of matters to me because that solidifies your contract about how you’re going to pass these messages around the system.
And once you have those things fairly well defined, it’s pretty clear to engineers how to go and build that system to meet the use cases that they need to meet. If they’re going to have a lot of asynchronous activity, it tells them that they need to look at different kinds of queuing technologies. It really informs them about the types of technologies to use to then implement those use cases.
Monitoring microservices and the importance of institutional knowledge
Fredric: How does monitoring fit into the world of microservices? Or maybe the better question is, how does the world of microservices affect how you need to go about monitoring?
Sean: I think that as with building any modern system, you need to think ahead of time about how you’re going to have observability into that system as you build it. Just like you would with your test-driven development: You wouldn’t wait to the end to implement your tests. Similarly, we believe pretty strongly that you should start with your monitoring, so you can build iteratively and know if that last deploy that you did, if it made things worse or better from a performance standpoint. It’s also just more efficient to do it that way.
I think that when it comes to this question of microservices, I think you can anticipate some of the problems that you’re going to have, where you’re not going to have that many code execution paths within a microservices as you will in a monolith. So, you can think a little bit more clearly about the spots where you want to get metrics and the kinds of metrics you collect, and tracing is obviously required. But I think with microservices, you can anticipate you’re going to have a lot of complexity eventually, and you’re going to want to automatically discover the graph of call paths that you’re going to have within this system. I think that’s really critical to this issue of understanding complexity in your microservices because you’ve moved to this world where you have teams or individuals that know parts of the system really, really well, but no one person will understand the entire system completely.
Richard: Oh, boy! I wish I’d take Sean’s advice. Here I am preaching about microservices from day one and observability from day one. That’s kind of basic, as well.
You know what? I’ve never done that, and I’m still paying the price.
It’s just one of those things that unless it’s very explicit, you just don’t think about it, and you jump in and start building. And everything’s pretty small at the start, so you can still kind of understand it. The microservice architecture lets you get away sometimes with a lack of rigor because you can define this contract, like Sean was saying, and then you can isolate people, and they just do their one microservice, and everything still works.
But you always end up in a place where you have a live, running system in production that nobody understands. That is always where you end up. And that’s, again, kind of a trade-off that you make because it gives you high development velocity.
Sean: I think that that’s a really great point, Richard. I spend a lot of time talking to our customers, and there’s this other interesting thing that you find about the life cycle of teams. So when we talk about new systems, whether it’s a startup or an innovation center within a large company, what we’ll often find is that the engineers that are doing that kind of work, and they have a really strong sense—even in a complex, difficult domain—about how to monitor and understand it. But the price of success, as you grow and succeed and get better, is that you end up bringing in more people and handing things off to maybe people that have a different skill set, or may be earlier in their careers, or sort of specialists that can do one thing, and those are the moments when you’re really glad that you thought about monitoring and observability from the very beginning.
Those innovators of those services, they move on, they go on to other new challenges, other fuzzy frontends of problems. And someone needs to operationalize it behind them, and that’s probably the most difficult moment: when we see customers really having that regret that they didn’t have a monitoring strategy in place that they implemented just like they implemented their CI/CD system.
Richard: If you are a product manager or anybody who is running an engineering team, the thing that should give you nightmares is losing your institutional knowledge—that’s how technical debt really starts racking up. You have people coming in that might be better engineers than that initial team, but they just never get that depth of intuitive understanding.
Fredric: Because they weren’t there at the beginning?
Richard: That’s the scary thing.
Distributed tracing in a microservices environment
Fredric: One more question on this topic. What is the role of distributed tracing in the complex interaction between monitoring and microservices?
Sean: Distributed tracing is really a central pillar of the ways that we think about monitoring in these modern systems, and I think that it’s driven by a couple things. One is this decomposition, so you have many more services. But you also have many more kinds of things that make up the system. It’s not just small code bases; it’s cloud queuing systems, or storage systems like Amazon S3; it’s API gateways spread across many different AWS accounts. It’s multiple different clouds and lots of external services being able to use different kinds of functions. And so it’s not just about having code visibility down into all of these services that you’re executing, but it’s all the other dependencies that they’re interacting with, that aren’t exactly coded running software runtimes.
Richard: The reality is the best you can hope for is to take that distributed monitoring and use it to understand or get a sense of what’s going on in the system. But it allows you to do new things that you couldn’t do before as well.
One of the things we really like to do is compare flow rates. If I have a single message, like a shopping cart, and I’m checking out, and that generates two additional messages—one to the warehouse, and one to an invoicing system—that means that every time you have a checkout message, you’ve got two additional messages in your system. The flow rates of those messages should have a two to one ratio. And if you deploy a new version of the checkout, and you’re pushing 10% of your traffic through it to check that it’s okay, well that might have passed all of your unit tests, and it might pass all of your integration tests or whatever. But there’s some issue in production that, because that always happens, you just couldn’t predict. You will see the ratio of that flow rate change because it’s failing. 10% of your traffic is failing. So it’s not just about understanding who’s talking to who, it’s also understanding if the business intent of the system is actually being met.
That’s a super powerful technique because you can do that in an automated way, right? As part of your CI/CD pipeline, you can say, “Oh, wow, we just deployed this thing and the ratios are off. Revert.” And then have an automated revert, and then check it out afterwards. Doing stuff like that is super powerful.
Sean: I agree. And I think on top of that then, when you discover that in the moment— and you can discover it quickly and revert back—once you’ve reverted back to the state that you were in, you’re going to want to go in and follow these complex graphs of paths, and zero in on them really specifically to figure out, “Now what was going wrong that we didn’t anticipate? What system had more load on it than we expected and why? What is the exact number of additional requests that are spawned off from this request that I was executing, that I didn’t pick up on in my local dev environment?”
I guess what I’m saying is that, in a microservices world, you can pretty much guarantee you’re going to be in a moment where there’s going to be a tweet, or there’s going to be an alert, or there’s going to be a call from your support team, saying that some customer’s unhappy, and you’re going to mitigate the situation, and then you’re going to want to go back and understand exactly what happened, from beginning to end, for all the requests that were happening in my system. And you need enough realistic, exact, end-to-end examples that you can really understand the flow of work through your system.
The cultural impact of microservices
Fredric: What is the cultural impact of microservices? I know, Richard, you have some thoughts about how it affects the concept of remote working.
Richard: Yeah, it’s a huge enabler. The world is moving to a place where remote working is much more accepted now. Also, if you look at the diversity problem in our industry, remote or flexible working goes a long way to addressing that sort of stuff. But remote working isn’t just about using Slack, or that sort of stuff, right? Because they’re just tools that enable you to express the culture. I deeply believe that your technical architecture has to support that cultural goal, and microservices are a fantastic way to do that because the fundamental physical architecture of the system enables people to work in parallel at high efficiency, where they only have a limited amount of touch points, and maybe they’re just communicating through GitHub issues, or they’re communicating through a message schema. But it still works.
Sean: I think it’s really powerful too, not just how it enables the logistics of being able to have many people contributing to different projects. I love that point you made about diversity. Often, companies that care a lot about diversity are constrained by whatever their geographic location is. I thought that that was a really strong point.
One thing that we think about a lot too is how it unlocks decision making. Microservices, by isolating business logic and building them into more manageable chunks, reduces the need for decision makers higher up the chain who can see across all of those services, and it allows you to empower individual teams that understand their domain well to make decisions and not have to wait for permission to do the right thing for their system or for customers. It really unlocks velocity.
Richard: There’s one little extra thing you need to do culturally to make that really work, which I’ve seen happen in a couple of clients. You’ve got to have an acceptable error rate.
So in a lot of organizations, there’s sort of this grand delusion, where it’s like, errors are totally unacceptable, right? We want zero bugs. But in reality, every single organization, every single business process, every single system, has some level of errors: Cosmic rays or whatever, or somebody just pressed the wrong button. If you say to your teams, “You know what? You can fail on 2% of transactions,” you get incredible velocity, and maybe the business cost is worth it if you can reach your market sooner. That’s a business decision you’ve got to take to a fairly high level, but I’ve seen it work,
Fredric: The acceptable error rate for the New Relic Modern Software Podcast is some undetermined percentage, but we’ll keep it as low as we can.
Note: The intro music for the Modern Software Podcast is courtesy of Audionautix.