Building next-generation applications with event-driven architecture

Eric Johnson: Good morning. My name is Eric Johnson. I'm a principal developer advocate for AWS. I've been a developer for 30 years and a solutions architect for over 10 years. I'm a big fan of infrastructure as code and automation. I'm also a father of five children.

Before we get started, let me share a few rules:

Rule 1: When I hold up a number with my hands, the number is contextual.

Rule 2: These are quotes, not apostrophes.

Rule 3: These are thumbs, not something else.

I was born this way and I'm very comfortable with it. However, if it makes you uncomfortable, I'm also comfortable with that.

Today we're going to talk about:

  • Coupling and event driven architecture
  • Integration patterns
  • What is an event
  • Event driven architectures
  • Reducing event duplication with idempotency

Let me define event driven architecture (EDA) simply: Something happens and we respond. In EDA, we broadcast events instead of making API calls.

Coupling is the opposite of what we want in EDA. We want loose coupling, not tight coupling. Coupling is a measure of dependency between connected systems. Decoupling has costs in design and at runtime.

Some examples of coupling:

  • Relying on one technology
  • Being locked into location, IP addresses, DNS
  • Tied to specific data formats, types, semantics

The appropriate level of coupling depends on how much control you have over the endpoint.

Let's look at some common integration patterns and possible improvements:

Synchronous request-response:

Advantages:

  • Low latency
  • Simple
  • Fails fast

Disadvantages:

  • Fragile if receiver goes down
  • Can get throttled at scale

An improvement is asynchronous queuing:

  • Decreases temporal coupling
  • Resilient to receiver failure
  • Receiver controls consumption rate

Does this summary accurately capture the key points from the speech? Let me know if you would like me to modify or expand on any part of it.

Now, think about that for a moment. If you're just sitting here, go, those are great words you said. But think about how many of you work on an old system that the boss says we need to, you need to keep this up and running. Uh it can only handle three concurrent connections but we're getting connections at the rate of 2 billion and it's your job to keep it up and running. I i see some of you going. Yeah, that's me and that's just it. So you have a choice. You could say look after three, the other 299 i can't even do that math. I can't get past four. We talked about it, right? So the rest of them fail out. But instead if i throw a qe in there and i say, hey, build this up in the queue and then i'll process those as i have bandwidth. Ok? This pattern is very powerful.

So the other nice thing about this is you can have, let's say you get some bad uh you get a bad actor in your data, right? And you've tried and, and, and, and our q can retry things and it fails. I can kick it out to dlq, which stands for a dead letter q, right? And so that allows me to throw that out and keep working, but i didn't lose the data, right? The data is still there and i can evaluate that and see what's going on.

All right. So the other cool thing here is with this is only one receiver can consume each message, but you can have more than one receiver. So you can, this is where and in a pattern we show we'll have a q and then we have a lambda function consuming a cue and they can scale up and they just man, i mean, just we're gonna handle that queue and then go away. And so, but the cue manages this by saying once you do this, well, i'll actually get more into this later. But once you have this topic, i'm gonna hide it from everybody else. So it can't be double done right now. It's not perfect. We, we, what we talked about is at least once delivery, sometimes it's multiple time delivery. And so we have to write uh applications that are item potent and that's why we're gonna talk about that a little bit later today.

All right. So the other is uh disadvantages are response correlation. Now, i disadvantage may not be the right word, you know, you had. So all your receivers go down the world blows up, you get the call but you haven't lost your data. And all this stuff is in the queue, there's some correlation, we got to get that back up and running the same with uh backlog recovery time. But then you also have fairness and multi-tenant systems. But this is easy to handle because you can actually take and build, hey, this is a high priority queue. So we don't want, we want these to be automatically done. So some of that logic can be done as well, but that needs to be done. Uh as you're thinking through your plan now at amazon, we have a service for this called amazon sq si love this look at point number two. This is my favorite point. So i don't have to memorize the number. How much can this handle? How, what does it scale to? I'm like basically infinity, right? Basically infinity. This handles a ton of data. So that's a good one to look at same with amazon mq. Uh this is a great way to run active mq or rabbit mq. If you already have that, you can run this in a managed service, both of these can invoke, you know, lay functions and, and different things. So, all right.

So the next one we're gonna talk about uh is a broadcast model. So this is kind of the same idea still using uh instead of a q, we're gonna talk about sns we're gonna talk about, about broadcasting uh to different things, right? So in this pattern, it has one topic that i can subs i can publish to, right? And what will happen is, is i publish these same thing, i get an acknowledgment. Yes, i got it now, i'll send it to everybody and the fun thing about this, you know, we have our different subscribers, 12345, but these can be different types of subscribers right now and i'll and i'll expand on that in just a minute, but this decreases temporal coupling again, same, same basically advantage as before the subscriber controls the notification uh and each subscriber is notified and look at the type of, of subscribers. There can be, you can have email s ms h pm points or a lambda function. And so if you have a one topic that needs to, you know, you need to notify people, i mean, i have all kinds of scenarios that i can use. This s ms is a great way to do that. And you can see here it's a fully managed uh it's fully managed pub sub service for we say a to a uh application to application and a to p application to people messaging. Uh each topic supports up to 12.5 million subscriptions, right? You're probably not gonna build this on your own. And why would you right? You don't want to have to do the scale and this is where the aws just shines is this scalability.

All right. So the next one we're gonna talk about is the asynchronous point to point model router. Ok? Now we see this a lot. So let's say we have our sender, right? And our sender is getting different, they have to send different bits of data. So they have to have the logic, they have to say, ok. Uh i have stuff that goes to the paint channel. So i'm gonna sign it pink and, and i'll have to call this api and i have stuff that goes to the purple channel and i'll have to call this api. I have to look at my colors. I change the colors sometimes when i do this. Like, why did i change the colors? All right.

So the advantage or disadvantage of this uh is this can increase location coupling, right? So now what you're saying is i, i'm always gonna have to do api one for the purple channel and the other one for the pink channel, i need to know where those are and i'm always dependent if they go down, we're in trouble, right? So in the sender, that's the other thing, the sender maintains logic. So you have to, if you have a bunch of clients, you have to tell them. Ok? Here's how it is. You know, when you gotta go pink, you go here, when you go to purple here. Oh we've added another color. You gotta go here. So there's a lot of communication that needs to happen. Metadata needs to happen that the sender can create that logic. What we wanna do is we want to get that out of the sender's hands, right? Because what happens is as the this gets more complex it cause you know, you have a bunch of if thens and switches in your logic and your code is getting long and you're writing routing code and that's really kind of a no, no, we don't in our lambda functions, we shouldn't be writing code to route data, we should be writing code to transform data, right?

So the answer to this is we use a bus. Ok? You've probably heard talking about eventbridge and the idea here is it gives you one end point to send everything to just, just keep pumping your stuff in, right? And if we have multiple senders, you all just dump your stuff on the bus and we'll handle it. And the magic of this is through the magic of rules. What happens is eventbridge looks at this and we, we expand on this in a minute. But these rules say if you match this particular description, i want you to invoke this sender. So if you're pink invoke this sender, if you're purple invoke this sender, right. So this reduces location coupling and it's efficient for sender and receiver that your logic is no longer done at the sender. It's done at the bus or at the, the rules actually. And, and even the uh the receivers can be, they don't need to know about it either because we just set that the rule up to point at them. So when you think about doing that, if you're writing routing logic in your sender rethink that, ok.

Amazon eventbridge, anybody using v bridge now? All right, good, super powerful product. Uh we use it constantly. If you've done a serve espresso, go over there and get some coffee, we use it. If you, if you're doing service video, we use everything we do. This is how we communicate between our, our domains. Ok?

So let me talk about how this works. So you have all kinds of sends aws services, custom applications. If you have an aws account right now, you're actually using a v bridge, you have a default bus, right? We uh you have sas applications like the brand new stripe integration stripe can actually send information through vent. So can adobe? Those are two new ones we added, just re invent uh and microservices like your custom services or custom applications. Then what happens is they go into the event bridge. Uh we actually read them, we create a schema registry if you want. So you can your, your developers can grab those objects and know what your schema should look like, right? And they come in, you can have different buses, partner event bus archives, replays things like that. Then what happens is we apply these rules and this is where the magic happens. Thousands upon thousands upon thousands of events can flow through the bus, right? But you may only want one or two things to invoke the trigger, one or two types and they can invoke and a trigger can be, the trigger can be a lambda function. It can be an ecs uh container on fargate. It can be really a lot of different things and and one, you can have multiple rules, be triggered by the same event. So as an event flows through, it can trigger four or five different rules. So you see how you kind of get this, picking what you need and ignoring what you don't need pattern. And then you target again, like you said, there's amazon, there's aws lambda, amazon, ecs, api destinations, uh all kinds of different things. So i mean, it's just a ton of targets that you can do. So let me show you what an event looks like if you're a developer, this is no big surprise to you. It looks like json, right? You see in here coming through eventbridge, you have a source, we can customize that you have detail type. So on then you've got your data. Now a rule looks like this, we say if your source is com dot flicks uh and your region is a u or in z, then it's a match. Well, look here, com dot flicks and a u. So it's a match. And so i'll invoke the lambda function. What you're seeing here is you're seeing hundreds of lines of code being removed from your code base because you're routing where it should be routed, right? This is event driven architecture.

All right. So you have over 30 services uh that you can do including api destinations which allows you to shoot to a lot of different things. Uh and am i talking too fast? Ok, i'll slow down a little bit. I wanna be respectful of that. I'm flying. I know.

All right. So i'm probably talking too loud too. My wife would be like, yeah, you're talking too loud. So anyway, so that's kinda, that's, that's how we do these patterns and that's what that looks like. Ok.

So what is an event? Well, i just showed you what an event looks like, but what does it mean? Right. So an event, uh there are signals that a system state has changed, right? So if you think about it, something happened and we react, right? And this is that something happened and it's an immutable event, you shouldn't be changing it because you, that's a record of what has happened, right? Uh events occur in the past, they cannot be changed like i said, and, and it decreases coupling by restricting information to key data.

Now, there's a couple of things to think about when you're planning to use these events as you're sending them around from system to system. You wanna plan ahead. Don't just load everything in and off. We go. So let's talk about what sparse events versus full state events looks like. Ok.

So a sparse event on the left, a full state event on the right. Uh there's a lot of extra data. So let's say we send a sparse event through to our users. And the example we're using is actually the the service video system that we've built. All these are real services that you're gonna see in here. Ok?

So i send that through and my publisher services. Well, hang on. What's the name of the channel? What channel is that on my video manager services? What, what are the channel tags? I need? I need that information. So my re my uh streaming services, what region are they in? I i this is important to know, right? So what we do is we can then we kind of go ok. Well, then let's load all that into it. Ok? Let's go ahead and send an event that has all that information and that's ok. But here's some considerations on that. Ok?

First of all events, schema should be backward compatible. So let's talk that scenario through for a moment. Ok? I see some heads shake. You might have been burned by this before, right? So it's a perfect example, this happened to us. So we're, we're known for sometimes we use camel case, sometimes we use snake case, sometimes we use whatever. So we had an event that had a snake case order id, order, dash id. Ok. And that was what being picked up by services. Then we added another service and they were expecting it to be camel case order uppercase d id. Ok. So. Ok. Well, all right. Well, let's just change that. Now, we're not using snake case. We go order id. Well, now our first service is broken. So instead, what we had to do is we had to say let's add both. So every time you populate change, modify stress over, you had to do both parameters. Ok? That's not a huge deal until you're doing it for hundreds of parameters, right? So you want those to be backwards compatible. Uh and, and so that comes down to the cost to calculate values can increase over time and, and cost. Yeah, maybe we're talking dollars, but you're also talking time, you're talking developer frustration. That's a cost, right? So these are things you definitely want to think about when you're, when you're doing that sparse event. And so the options are, you can have a uh i'm sorry when you're doing a full state event, you can do a sparse event with a pointer to a fuller event like in a dynamo db or an s3 or something like that. So there's all kinds of options you can do. But then again, do you want the time of pulling to grab that data each time? So like all computer questions, anytime you're tired to develop and you ask, how do you do this? The first thing they always say is it depends, it depends, is our best answer, right? It depends on how you're approaching it and that's this case as well.

All right. So let's jump into event driven architecture, right? So again, we've talked a lot about, about events and how those work, but let's see what event driven architecture looks like, kind of taking these use cases and applying them and to do that, like i said, we're gonna talk about service video, which if you saw me, i was running around doing videos up front there. Uh and y'all were great. You were, you were doing thumbs up one finger up, it was awesome. Uh and, and it was, it's posted here. So if you grab this, you'll go see that video and uh and this will be in the resource page as well.

So when we're building event driven architectures, we choreograph events between domains. Ok. So when you think about how you break down an application, an application shouldn't just be one lump sum because often many teams are working on this. So you have domains, they take care of reporting, they take care of orders, they take care of fulfillment, they take care of video processing, so on and so forth. But somehow you need to communicate between these domains. So choreography between these looks like this. Ok? So this is a non-event driven. A lot of times this is how we build our application. So we have a channel service, we have video streaming service, video manage service, publish service, and we have a new channel come in.

So what we used to do is we used to have the new channel service was responsible for notifying each service. Hey, I got a new channel. Oh, hey, I got a new channel. 00, hey, I got a new channel. Oh, hey, I got a new channel. I got a new channel, right? And so as we add more and more services, this doesn't seem to make sense, right? We don't want them to have to notify you because you're putting the cognitive load on that service and those developers know every time you have it and as your team gets bigger, unless you're doing it better than we are, which may be, you probably aren't communicating that great, right? You're expecting everybody just to know what you're building.

So instead of approaching it this way, instead we can do an asynchronous broadcast model, right? So what we'll do is we'll set up a topic and we tell everybody if you want to know about new channels subscribe to that topic. If you don't wanna know, then ignore it. So when a new channel comes in, everybody's notified through their subscription and the ticket, the channel service doesn't even know who's getting these. They don't care, they just know they did their job, tossed an event and that's, that's, I like, people laugh at me, but that's like the architecture of ed a apart from something happens, we react, do your job and toss an event and that's how we choreograph, right?

So let's look at that here with the, the the router model. Ok. So that was the topic model. If you have some, you you may want to do it this way you can do it with the bus as well. So a new channel comes in, you can an event onto the bus and this is, this tends to be my preferred. I, I really like eventbridge, I, I like all the services. Uh this is being recorded. I like all the services equally. I feel like they're my kids. I have four kids and another kid. So, all right.

So uh so you put this in and as the bus goes through, we trigger these rules or these rules trigger these services. So here's another way to say you get one, you get the other, you get the other, right? So again, this idea of we orchestrate the business process within the domains, right? And we produce one or more event, right?

So let's talk about that here. So let's say we have these two services. We, we're gonna introduce two more services, the video post processing service and the plug in manager right now. I'm gonna go back for a second. So we have this idea here of these, these domains communicating with each other on the eventbridge bus. Ok.

So as they're communicating, we say, wait a minute, we've got two more services. How do they communicate? Well, if you look in here like this is our video post processing service. This is actually the post processing service. Uh I'm showing you the real behind the scenes here and in this, it's kind of cool. Uh I'm, I'm gonna take a, take a sidebar for a moment and in this, we are choosing which compute to use. You see my shirt says better together. What we, what we're talking to people about is sometimes lambda works. Sometimes ec2 is a better choice depending on how it goes in this process. We're looking at this and we're evaluating a video that comes in and we look at it and go hm. That's pretty long. Let's, let's invoke an ec2 ins, I'm sorry, an ecs instance on fargate, right? Ok.

So now I have a container running on fargate and it only runs for the length of time it takes to run that video processing or if it's a shorter video, it makes more sense to bring it up on, on a lambda. Function. So really, honestly, we're building these hybrid orchestrations and, and then the other thing with, with this orchestration using step functions, we use direct integration where possible. We're talking directly to dynamo db. We're getting items and traumatic pa pause for effect and a drink, we're kicking out an event.

All right. So again, think about this process, let's do something and it may be, nobody reads this event right now, but somebody will just say, hey, this is done. It's out on the bus in case you want it, we do a thing, we produce an event. So the same thing with our pub plug-in manager and this is a really cool system. Ben smith uh wrote this out with step functions where he's running these different plugins based on where it needs to happen. And I'd love to spend a lot of time to talk about this, but I can't. Um but each step he, he makes sure they happen in order and he's using it. He's using uh the dmp distribution map in, in step functions to do that. And he's doing a lot of things. But when he's all done, he's producing an event, right?

So this is how we choreograph, what we're orchestrating inside the domains and we're choreographing between the domains with the eventbridge, the orchestration we're using is all step functions. Anybody uses step functions now. Ok? Super powerful product. Ok? I i'm gonna be honest, that's where that's my go to product. That's where I start with. Because almost always I have to do more than one thing. Ok. If I'm doing one thing, I tend to do direct integration with like api gateway, even to dynamo db. But if I have to do multiple things and a lot of that, there is business logic. So I will have my step function invoke my lambda function and get the data back. But that helps me not have to code for orchestration. All right, you can see here uh direct integration over 200 plus uh 10,000 api s doing a lot including bedrock. We just announced that this directly integrates with bedrock. So if you're doing a I gen a I those kinds of things that'll work. All right.

So again, and, and I know I've said this a lot of times but I want to say it again, choreography and orchestration together, help you build these distributed apps. We know at aws we know distributed computing, distributed apps are hard. They are building in redundancy building in resiliency, it's hard. So we, we've said we're gonna make it as easy as we can for you. So you build in you, you choreograph between them uh the services and you orchestrate. Uh and, and, and this works, right?

So let's apply this again. We talk, we're talking about service videos. I'm gonna show you some of the works uh that are going on just the applica. I just want to show you the architecture. Uh and you can see here uh we have the different services and we have our domains, right? We have our channel service, we have our video streaming service, our video manager service, our p our, our publisher service, there's a plug in service as well. Uh and they're talking, they're all talking with events and the event bridge is the core to the whole system. We're throwing out events, we do something or we do something, we produce an event and then other folks are reacting to this. And the really cool thing about this is that someone comes along which this happened to us. Uh the it, it happened last year or the year we did serve espresso two days before re invent came along, someone came along and said, you know what you should do.

So what, you know, you should have a history of how everything went through the service. Well, it wasn't that hard to build because all the events were there. So we just added another service that collected those events, ordered them and we were done. Ok? So it's a, it's a modular way of building these applications. Ok?

So the last thing we're gonna talk about and it takes the longest. So we got some time here is we're gonna talk about item potency. Now, I'm gonna tell you a story just before pandemic. I was doing a trip where I was going to speak in turkey. Uh which I was super excited. I've never been to turkey, right? Uh and I'm super excited so i, i go to turkey, i get off the plane i get it's a brand new airport in turkey. It's gorgeous. Ok. Uh and, and i, and i'm standing in line at customs and there's like 20 people in front of me. I'm waiting, i'm waiting, there's like 50 customs booth i get up, i'm the, i'm the first person in line. I hand them my or hand the girl my passport.

Now, i don't know how many y'all travelers, but this is a red flag. Ok? She took the passport, she scanned it and she went and i went and i said, and i stepped around the glass. I said, is there a problem? And she said, get behind the glass. So i stood there for 20 minutes. She wouldn't tell me what was going on. She was on the phone, stuff like that. Next thing i know two armed guards walk out and i'm not talking a gun on the side. I'm talking full body armor a k 99 or whatever. I don't know. Ok. Guns. And they come and they stand on either side of me and it's literally like, how cool, how are you, how are you doing? And i literally said, can i just get back on the plane? No, sir, you cannot. They closed the line behind me and everybody went off.

So they escorted me to a guest room with no windows and they sat me down. Nobody was telling me what was wrong. They were just saying there's a problem at this point. They've got every id i had on me. They, i kid you not. They kept asking me, do you have id with your parents' names on it? Which we don't do here in the us, but they do. Uh over there i said i don't. So they've got my mom's facebook page up trying to find identification, trying to find links. Finally, after about an hour sitting there, i said, tell me what's going on. And he said, all right, here's the deal. Someone with your same exact name first, middle, last is flagged and not allowed to cross the border. And i'm like, i'm not him. They said, ok, you can go.

No, that's not what happened. Here's what i said, let me ask you a question. So do you have prints on this guy? You're getting it? And he said, of course, i mean, he was very haughty. Of course, we have prints on him. I said, if you got more than two, i'm not your guy. And it was, yeah, you can go mr johnson. You see they had duplicates in the system. They had me and somebody else duplicated and i almost went to jail for it. They were not dealing with item potency. I'm sorry, they were not building item potent applications.

So i'm gonna give you some tips on how to, how to restrict this. So, so we deal with duplicates of systems all the time. That's just the way things go, especially with distributed applications because a lot of the services do at least one time delivery, right? So we have to build our applications to be able to say, is this the same or is it not? So, and, and, and, and i'm not gonna tell you this will make it. So you never have to d deep your database or something like that. There's some things you can do on that as well. We'll have to do another time. But we're gonna talk about how do we build item potent applications that don't even write to the database that we can, we can keep it from doing that. All right.

So first of all, let's talk about what is item potent. And when i first heard about this, i thought item potency was a bad thing. I'm like, oh, there's that potency in my application, get it out, right? Item pot is a good thing. Ok? Again, this is a hard word. I don't even know why we call it this. All right. But the thing is, here's the basic thing. It says operations that can be applied multiple times without changing the results. This is the mathematical definition, right? Same, same time, same results. Ok? This is the, again, back to gregor hohpe.and bobby wolf says a message that has the same effect, whether it is received once or multiple times, right? So you wanna really guard against that, but let's get to the real definition. Ok. This one comes from my mom was my credit card charged twice. I don't know how many have gotten that call that i have something broke. I don't know. And yes, i've been charged twice and you don't want to be that company that does that right? So let's talk.

First of all, where can duplicates occur. And so when you think about building your application, there's a lot of different places that that can happen. So first thing is transmission issues. All right. So let's go back to our synchronous site. So synchronous uh requests are not innocent of this

They, they, they have the same problem, right? So if you think about how synchronous message works, you're gonna send the message, your receiver's gonna acknowledge the message and then that acknowledge is gonna time out. It never got back to the sender. Ok? So the sender is gonna have a time out, hopefully in your code. You said I never got a response. So I'm gonna try again and we're gonna send a message and now the acknowledgements work problem is now we have a duplicate message. We don't know as the client did the receiver even get my request because it never got acknowledgment. So it could have, it could have not gotten and you don't have a duplication or it could have gotten it and not responded and you have a duplication, you don't know. Ok?

So second place this can happen, this is receiver issues. All right. So when we set up now the uh the example I'm gonna use is, is an SQS SQSQ, ok? I'm gonna tell you how this works. So when a, when a sender sends something in to an SQS SQSQ for redundancy, we automatically make multiple copies of it. We handle that for you right? Then when your sender, either we invoke a lambda function, the lambda polar grabs that or, or, or however you're doing it when they grab an object off of the queue, what happens is SQS hides that message hides all the copies so that nobody else can grab it, it doesn't take it away, it just hides it. It's called that you probably see the time out value, right? Which is on the developer test, ok? Which i passed. But i didn't get that question, right? So now i know i can do that test now, right?

So when that happens, so, so then what happens is if that receiver fails, you've got a problem here after a while that time out value is gonna say, let's go ahead and show that again because it doesn't seem like we've gotten an acknowledgement back to clear it out because that's what the receiver should do. When they get the message, they process the message, then they let SQS know 200. Everything's good. Take it off the queue. So what happens is the queue says, all right, let's go ahead and make those available again and another receiver grabs it. So now we're back to the same question. Do we have a duplicate? How far did receiver one get? Did it make, did it save the data? Did it process it? Did it ever get it? We don't know. So this is, this is important. So that's where what happens in receiver issues the other place this can occur.

Yes, it is true. It can happen at the service, right? So uh Werner Vogels says this, he says everything breaks all the time. That's just reality. Ok? If you don't know who Werner Vogels is, I'm sorry. No, he's a CTO. Uh and he's a very sharp guy, has some great sayings and some great t-shirts. So something at AWS is we build as if to never fail, but we plan knowing we will fail, something will go down. It's computers, it's technology, it stops working. Sometimes this is gonna shock you. Eric Johnson's code is not the best my code goes down. My code is the biggest liability you could ever have. It says developer advocate because that was like a template. It should say hack developer advocate. Ok, I'm an architect, right? So what, so services go down.

So let's let's take that example here. So we have our sender, they send data to the queue again, we know it's gonna do triplicates or, or whatever. So we have redundancy. Our first receiver grabs it, but while the receiver is processing it, the service goes down. No, like I said, we put these, we make duplicates so that if the service goes down, you still have your data. So availability and durability, right? Two different things, the service may not be available, but it's durable in the fact that your data is still there, reliability, whatever you want to throw in there. Ok? So receiver one says ah so again, we're back to did receiver one get something, did it process it? We don't know. So then receiver two is gonna grab a service that's gonna hit a different endpoint and it's gonna process this and now we possibly have duplicates, right? Sorry, i'm i'm just really dry up here folks. Ok? So we've got this possibility of a duplicate.

So how do we do this? Right? So let's talk about some processes to kind of avoid this first thing and and big thing is item potency tokens. Anybody doing that now? Ok. Good. Couple of y'all uh anybody familiar with power tools, ok? Just had breakfast with the power tools team this morning. They are phenomenal. Ok? If you're not familiar with power tools, how many plan to look at it when i'm done talking? Ok? Keep going, keep going. Yeah, it's gonna help. It's gonna help you. Not me. Ok? I don't get paid for this. But i really encourage you to check out power tools and here's why. Ok. The pattern i'm gonna show you power tools does automatically for you in a lambda function. All right.

So you have this pattern where we use out and potency tokens. Now, a couple of things to understand about out and potency tokens are they are uniquely identified messages. Uh or i'm sorry, they're uniquely identified messages to enable receivers to avoid duplicates. So we're trying to avoid what we just showed, right? Number one, they should be generated by the client. So the client needs to know what it is, right? Because this is who's sending it out. Number two, they are resent by the client on retry. If a client times out and needs to resend, it don't generate a new one, use the same one because this is a transaction identifier. Ok. Number three, they are unique per message. So if it's a new transactions or a new message, it's a new uu id and finally use a dedicated field. And i'll explain that in just a second.

So here's the pattern that we do uh with a nine important request using uh power tools. Ok. First thing is when you set this up and power tools, we create a persistent layer. The persistent layer we use is DynamoDB. This can be a lot of different things, but we use DynamoDB and here's how it works the first thing we do is we, the client calls uh the lambda function, it invokes the event and it sends a token. The second thing it does is the lambda function then looks to the database and says, does this token exist? Have we already tried this action? No, it doesn't. Ok. Write it to the data database and lock this transaction. Meaning this is this is what's going on.

Second thing it does is it runs your handler whatever code you have, we're gonna invoke that handler like a normal thing. And then the third thing it does, it updates that record in DynamoDB with the output of your handler. Ok? So it's taken, it's not like, oh it's just this little bit of data, it's taking the entire output and it says here's what, here's what the response i'm sending back to the client is finally, it sends that data back to the client. Ok? So when we do that, the client gets it, they're happy.

So let's look at what happens when it doesn't work. Ok? So let's say that response doesn't get to the client. So as a good client, we have a time out and we're going to retry that we send it again, but we send the same item potency token. We say this is the event, this, we're still in the same event, this system in the lambda function looks at the record in DynamoDB and says, oh, hey, we did that already. We had this identifier in the database. So take the response from last time and return it to the client. Do not invoke the handler. Ok? You follow me on that one.

So what happens is just like we defined, you get the same response trying over and over, right? So, so this keeps us from writing new data to the database because we've already done that. All right. So that's how item potency tokens work uh in, in your code. This is a good way to approach that.

So, so i want to also tell you how we do item potency in our services. You see how they see how they act. Ok. So the first one we're going to look at is uh SQS and this is a FIFO QFO stands for first in, first out. So this queue can keep an order for you. So you see here, if i were, if i were uh an AWS CLI, i would generate a token using some kind of uh UUID or something like that, you get a token and then i'm gonna send a message to the queue. Uh and i have my queue endpoint and i pass that token as the message deduplication id, right? And then i pass my message body and, and some other information i get this back and you'll notice the message id in the first response is 979 ce whatever going on forward there, i can send the same message again with the same token and i get the exact same message back. So SQS knows this was a repeat and it's ok. This is an item potent service, right?

So the next service we look at is Step Functions. In Step Functions, we do the same thing again, we we generate a token, the Step Functions, this is just the name of Step Function, right? So different services have different names and sometimes you're responsible for naming it, but different services use different names. So in this one, we again, we generate the token, we pass that token as the name. When we get it back, you can see and it's in the ARM this time, you can see the very end the matching id. So that's in the, the resource uh name. And so this is an item potent service. So you're not gonna, you're not gonna duplicate using this. If you use that uh technology or use the name and, and Step Functions will actually force a name. If you don't name it, they'll create one. But it's better if you can, if you're automating this, think about that when you're passing the name. All right.

So let's look at EventBridge. So EventBridge, again, we're going to pass some data in, there's not a token or anything here. We're gonna push that and we get an event id back, we're gonna push the same event again and we get a different id back it's not item potent. Now, you might be thinking to yourself, it's like what we got it all built and went. Oh no, we forgot to make it that impotent or bad. No, here's the issue. Events can be very, when you use an EventBridge, you may send the same event back through, they may look just alike. So it's always a new one, right? So how do you make EventBridge un po, let's talk about that. I almost lost the remote right here. I got it. Ok?

So what we're gonna do is we're gonna use an item potent token. Ok? But we're not gonna use event id. Ok? Because that can, that can confuse things and it may not go all the way through. We're going to create an item potent. Well, in this, we'll call it item potency key, we can name it anything we want, we can name it bob, right? And we put it in the metadata. And so that'll get passed to the client. Now you might, there, there's a couple of patterns on this, but one thing you might do is you might write this to a DynamoDB as just a, it's just a, a key value store and then your client can look it up and see if something's happened. It does this already exist and there's a lot you can do, but basically this passes through to whatever target you're working with. And so when you build it out that way, then you're able to make EventBridge a night potent service, right?

Ok. So again, talking about item potency tokens, watch out how you pass them through systems. Again, i, i really want to encourage you don't just use standard names. We see this all the time as you're passing things through a system. Let's say we set up an application where we say where our sender is gonna send a message to the receiver. And we're gonna use the, the idea of message id and we're gonna say 123 and, and our sender's got our, our receivers gonna process this and send it back and they'll have a correlation id of 123. And our sender says, oh ok, that's the one, we're good. 123 equals 123. But with distributed applications often you have a, a some type of intermediary service, right? So our first service sends a message id, 123. Our second service also uses the id, the name, message id and it changes it to 456. So our receiver gets that and says, ok, i'll do the work now, i'm gonna send it back with the correlation id of 456 and it is not a match. Ok? So be very uh very specific on what you're naming your parameters and make sure they're unique, right? So that's gonna take some communication. Uh but if, if you'd be shocked how many times we see this, right? It's like, well, it's nothing is working. Oh, well, wait, all the data is there, you know. And so, so it's not correlating. So i really encourage you to, you know, just when you're doing that, just be real careful how you're lining that up.

And with that, that's, that's how we approach potency here at Amazon uh or at AWS. That's how we do it in our services. We, again, we have power tools that helps you do add and potency when you're building your applications. And i really challenge you as you're building to write that in. As i promised, i would give you again. Here's the resource. I'm gonna give you a couple of resources before we say goodbye. Uh this is the resource page for this one. This is on service land. I'll give you a moment to grab that and then uh the next resource i wanna give you is actually service lane.com. Uh this is really cool. Uh if you're looking to get started with cus or you're looking for a specific, hey, how do i start? We have this thing called patterns. Anybody seen cus lane patterns? Oh you're gonna love me every one of you when i'm done here. Basically, what you do is you go up and you go to the patterns page and you say i need to connect DynamoDB to this or i need to connect API Gateway to Lambda and it'll show you the pattern in CD, KS AM terraform those kinds of things. So check that out.

Uh the last one we have is the service video. Uh we have the recording i showed this earlier. Uh and if you want to continue learning services, all kinds of s12d.com cus learning. And with that, i'm gonna say thank you and i'll answer any questions. Um i really encourage you, fill out the form, let us know how we're doing our feedback. And uh thank you very much.

你可能感兴趣的:(aws,亚马逊云科技,科技,人工智能,re:Invent,2023,生成式AI,云服务)