李白的朋友王维

Resilient architectures at scale: Real-world use cases from Amazon.com

Thank you for joining us today. You're going to learn about resilient architectures at scale. And the way you're going to learn about them is through real life examples, going to share with you today from amazon.com.

My name is Seth. I'm really happy to be here, really glad to see you here in this lovely theater and I'm really thrilled to be joined by my co presenters today, Avinash and Tip. They're going to introduce themselves a little bit later, but I am a developer, architect, developer, architect, developer, I know my own title. I was a solutions architect, so I just put them together. And when I was a solutions architect, I was the reliability lead for AWS. Well architected. So I've worked with a lot of folks and a lot of customers on the resilience challenges and I've had about 12 years total at Amazon, the last four years of those at AWS, but the previous eight at amazon.com and that's the examples you're going to hear about today. Those are examples of resilient scalable architectures from the.com side and to get us kicked off, I mean, the title is about resilient architecture is what is resilience resilience is when your application can withstand or mitigate or recover from the kinds of faults and load spikes you're going to see in production.

If any of you are running applications in a data center or the cloud or any kind of production environment, you know, it's chaos out there, right? There's always things happening, unusual user patterns, network issues. So you have to build resilience so that your application remains available. And towards that end about a month ago, maybe two months, we released this, it's the life cycle framework for resilience because resilience is a continuous process. It's not the one and done kind of thing. You just don't do it and you're done. And we made this process such that it maps to like a software development life cycle. So you could be aware of that. And in your software development life cycle being make sure that resilience is part of that and you can actually learn more about that later today, there's going to be a breakout session about that. But if you miss that there's a link there, you can read about it for our purposes. Today, we're going to present to you, as I said, multiple real life examples of resilient architectures from Amazon. And it's going to fall into three categories on this framework.

The first one is about designing, designing and implementation, right? Designing for the best practices for resilience. And we're going to show you examples that show you things like fault isolation, using things like cells show you auto scaling, they show you decoupled architectures. And then the next one on the life cycle is about testing and evaluation. We're going to show you examples of teams that have done chaos engineering and load testing. And finally, you know, sometimes people forget about this, you can design what you want into an application to make it resilient, but you have to operate it resiliently. And we're going to show you examples of how teams are using metrics and observable across accounts across services to ensure the resilience of their workloads of their applications.

And so the second part of the title is it's resilient architectures at scale. So I might as well define what scale is. I think we all know, but it's basically about, you know, when, when you get extra load, extra scope and your system, your application accommodates that and remains available despite the amount of load or scope you're getting and going back to amazon.com, which where our example is going to come from. Amazon was in 1995 running on two servers. So it started out small and they had a motto. Get big fast. You can see that t shirt there from the 1997 picnic, get big fast, eat another hot dog. But the motto is get big fast. And so they started with two servers, right? But what are? And they started with two servers running this website. One server is running the executable. The other was running the database, but they did get big fast. If you look at prime day from this year, the number of items sold, the dollars of sales are quite impressive.

Now, we're a tech audience here though. So this is the slide I really like to show, this is showing the AWS services and resources. Some of them, just a few of them that Amazon teams are using to be resilient and to scale. You can see DynamoDB with millions of requests per second. You can see Aurora with billions of transactions and terabytes of data. I'm not going to read all the stats to you. But the point here is to show you how if you want to be resilient and and scalable and resilient at scale using the cloud and a services are a way that can help you achieve that goal.

And the point I forgot to make earlier about Amazon starting small and scaling up is no matter what scale your applications are or your enterprises, everything, almost everything we're going to show you here today applies to you. You want to put in those resilience, best practices and you want to put them in so that you can scale when you need to so fast forward. This is the architecture I said two servers before now this is the services each dot on here is a service, a microservice or a part of a solutions oriented architecture. At Amazon and there are lines between them showing the dependencies between the services. There's tens of thousands of services running at Amazon today. It's a mic uh Amazon does like to use the so a and microservices type architectures.

I'm going to show you an example of that right now. This is our first example. Actually, this is just an Amazon web page. It's called a detail page. It's the page where you buy stuff. In this case, you're going to buy a Kindle Fire tablet tablet, right? And when you look at that page, it has everything you need, you know, it has the reviews, it has the picture, it has the title has the price, etcetera, right? But this is actually a framework, an internal framework that is supported and owned by a team inside Amazon. And the framework makes hundreds of calls to back end services called widgets and those widgets are essentially microservices. And each widget owns a little piece of business logic and a little piece of what's displayed on the page and it's making those calls in parallel and rendering them very quickly.

So if I took this page and I run it through an internal tool at Amazon, it looks like this. Now, you can see there's a microservice for serving the image, there's a microservice for serving the title, there's a microservice, we're serving the average customer reviews and all of these are being called in parallel and being rendered and this leads to both resilience and scalability. Because if one of these services to have a fault or a failure and not operate properly, as long as it's not the title or the image or the price, the customer still has a usable experience, they could still get most of what they need and make a purchase.

So this is what we call graceful degradation rather than go down, we remain available to the customer and maybe, you know, not have some functionality that's the resilience part of it. The scalability part is each of these back end micro services can be deployed independently. So that gives these teams the ability to deploy when they need to and innovate when they need to and put in features when they need to.

Finally, the third point is, as I said, it's a framework owned by a centralized team that, that maintains the framework, the business logic is owned by the teams that own the widgets and these are different teams that own each widget, right? So that means that the teams that own the widgets can focus on the business logic and not have to focus on what the framework is doing that and that basically takes the burden off of them so they can innovate better.

Ok. So with that, we're gonna go to our next example with two up.

Thank you, sir. So let's get a quick raise of hands to see how many of you know about CBA architecture, I see a few hands out there. So in this part of the session, I'll talk about the basics of cell based architecture. Some use cases from Prime Video and Amazon Music and how they were able to improve availability and fault isolation using cell based architecture to start off with.

I'm Tulip Gupta. I'm a senior solution architect with AWS. I've been with AWS for the past 2.5 years helping Amazon customers like Prime Video. Amazon Game Studios, Amazon Music, Twitch and Audible.

So you might be familiar with traditional scaling and in traditional scaling usually have your worker notes. In this case, eight worker not serving the needs of all your customers. And we have eight customers out there. But let's say one of the customer intentionally or unintentionally sends in a bad request. Now, one of your worker nodes gets impaired and so he retries again and slowly all your worker nodes are impaired and thus all your customers are impacted. So the blast radius is all your customers and in cell based killing, let's see how we can avoid that poison pill situation that you saw in the previous slide.

So the same customer sends in a bad request. But in this case, what we have done is we have broken it up into cells and each cell consists of two worker nodes. So now when the customer sends in the bad request, only two worker nodes are getting impacted. It's only one cell and any customer that is being served that by that cell is also impacted. So as you can see, only two customers are impacted out of the eight in this scenario. So the blast radius has reduced considerably and there's a forex improvement from entire system impact.

So this is what a cell based architecture looks like. So you have cells, cells are a design pattern where a service is split into multiple deployment stacks called cells, they are independent instance of their own and they can independently service the full workload of customers. One important thing to note that cells share nothing. So if we have like three cells, in this case, like cell zero, cell one or let's say cell two, it's very important that cell zero and cell one do not share any data. And the reason behind that is if there's any data that cell one needs and if cell zero is impaired and cell one would be impaired too.

The other key important thing is the cell router. Now cell router routes the request based on some configuration logic. So request comes in cell router routes. It based on like maybe a partition key like customer id, maybe it'll do on robin from cell zero to sell one to sell two. And it's just rather to sell uh the different cells. One important thing about the cell router is it has to be as thin as possible. And that the reason behind that is like because if, if that's impaired, then your then it will not be able to route your request to the different cells and your customers would be impacted too.

So we're going to look into deep dive into some of the use cases from Prime Video and Amazon Music and i helped them adopt CBA architecture this year.

So the team that adopted CBA architecture at Prime Video is the Prime Video analytics team. And it allows internal clients to deep dive into the external experiences of the external customers as they're watching Prime Video and thus provide improved video delivery quality. One of the key reasons they were trying to adopt CBA architecture was simplifying global setups. They wanted to remove uh they wanted to move their workload quickly from an underperform region to a healthy region. And also if a region doesn't have enough capacity, they wanted to be able to quickly move to a different region.

So let's say they had their workload on us east one and there was not enough capacity for certain instance types. They wanted to be able to quickly move to us east too well. For Amazon Music, it was for the team metric transition service and what it does, it collects the metrics for from different clients and helps improve music delivery quality. And what the key reason they wanted to adopt cell based architecture was fault isolation. They had different kinds of events coming in from the most critical to the least critical and events coming in from different device types. They wanted that fault isolation so that if there's a lot of noisy traffic like on operational events, they didn't want their most critical customer impact events getting impacted.

So I'll go to the key decisions that Prime Video took one of the key decisions that they did was how they wanted to design their themselves. So previously they had this one workload serving the needs of all their uh customers. And they headed across all the zones in one region and they split it up into different cells. Uh and they had three cells per region. And the uh the reason they had it across a cs in one region because they had regional services like lambda. And that's why it was a regional cell.

And so uh the one of the key decisions uh you know, whenever you adopt cell based architecture is to look what services you're using. So if you're using ec2 or services like that, which are ac based, you could have c which are, can be a cba. Um and the second decision that they took is around cellular traffic policy. So when a request came in from the devices to route 53 they had traffic policies built in on route 53 that would round uh route the traffic based on round robin.

So the request would go into cell one, cell two and cell three and so on. Let's say the request comes in the cell two and they had route 53 dns policies out there as well, which would do geo proximity routing and geo proximity routing means it will route the request to the region that is closest to where the request came from. So let's say the request came from New York. Route it to the closest region.

In this case, US is one in and when it goes into a region, it has the Application Load Balancer and then to the corresponding cell behind it. The third decision that they took was calculated health check. Now, one of the things I want you to note is that you don't want to route your request to a cell that's underperforming or unhealthy. So the way they checked if a cell is healthy is they set up Route 53 health checks. They ping the bootstrap API or the individual cells. And if they got a 400 or 500 error, they would know that the cell is unhealthy and would not route the request.

The second thing that they did is CloudWatch alarms. They looked at the ELB 500 errors and if there were more than 100 errors for a minute, they would know that a load balancer in that particular region is unhealthy and would not route the request as well. And as a result of this, they were able to see an outcome of 99.999% availability over a span of four weeks. And this is the percentage of events that were processed successfully.

And the way they calculated availability was:

Total Requests - Errors / Total Requests x 100

Any failure was labeled as a service-side failure like an ELB 500 error. And all of this came with improved ability, ability to failover that comes with solarization.

This brings us to Amazon Music. In one of my previous slides I talked about the cell router and that's what exactly they did. They had their cells contain a LB, Lambda, and SQS for Prime Video. But for Amazon Music a LB, Lambda, and Kinesis. And by stateless system, I mean they did not store any information in their cells.

The routing policy for Prime Video was round robin and geo proximity while for Amazon Music it was device type and event based. And as a result, they both saw similar outcomes. It was increased availability and resiliency.

So with that, I'll hand it over to Sal.

Alright, thank you Tulip.

So we learned about the website. She made me get a drink of water. We learned about Amazon Music, learned about Prime Video. So now we're gonna learn about Ring and Ring built a massively scalable event driven architecture that could achieve six 9s of availability while serving about 129,000 requests per second.

So before I dive into what it looks like, I gotta make sure everybody knows what Ring is. I'm a Ring customer, I'm a Ring fan. So Ring is a set of doorbells and cameras and alarm equipment that you can put on your house. And then you know, when something happens in your driveway, you get a motion alert and you look on your phone and see, oh there's my driveway. Oh there's my minivan and oh there's my bunny, oh that's not really my bunny, but it was a bunny crossing my driveway and it was still fun to see.

So that's what Ring is about. And before I get into the 129,000 requests per second case, I want to present a different service. This is their video encoder service. This is the one where that previous slide I showed you was a snapshot of a video. So there's a camera in my driveway, taking raw video footage and putting it into an S3 bucket, object storage. But that's not what Ring wants to show me on my phone. They need to do some kind of post processing transcoding.

So when they put the video in the bucket, it sets off an event that puts a request on an SQS queue where a fleet, that's those three little boxes, a fleet of two instances running a transcoder service are pulling that queue and when they get that, oh yeah there's work to do. They're going to pick up that video, transcode it, put it in that other bucket and that's where I can look at it on my camera.

So that's how the transcoder works. But like many services at Amazon, Ring has to be able to scale up and scale down. With most services at Amazon, if you're looking at the website or video or music, they're going to have big events around Prime Day. But Ring is different, right? So Ring is doing this video transcoding. What do you think the big event for Ring is where they're doing video transcoding?

Yeah, you got it. It's Halloween. So there's kids going door to door setting off the motion detection. I personally love it because I take the kids out trick or treating, my wife stays home with the candy bowl and I can get little alerts showing the kids coming up to our door and see that we didn't waste our money buying all that candy. So it's great.

But Ring needs to be able to scale up. That's quite a massive scale that's happening there to be able to transcode all that video. So how do they do it?

Well, here's that architecture again and they monitor using CloudWatch the queue and they monitor a metric called EmptyReceives. EmptyReceives is interesting because if there's too many EmptyReceives, meaning the poller is asking for work and there's nothing there, it means we're probably over scaled, we can scale down. But if they're asking for work and there's never an EmptyReceive, there's always work there, it means we're probably backing up the queue and we need to scale up.

So they feed that data into a Step Function which is a state machine where they could take that data plus some other proprietary metrics and decide whether to scale up or scale down to be able to serve that video as quickly as possible.

And so that example, that's a previous example about reducing the latency to see video. So the next example is also latency focused about reducing the latency between everything. Basically Ring is built in event driven architecture between the devices and their backend services and the services to services, everything is event driven.

So they wanted to build a system that was going to reduce the latency for those things to talk to each other by as much as possible and make it as resilient and scalable as possible. What do I mean by event driven? So for example, a camera might record an event like StreamStart. That's the name of the event internally, you know, to us that means the device detected motion and that event then needs to get routed to a notification service because that notification service is going to send the push notification to me and tell me there's someone at my doorbell.

In this case, me, I'm at my own doorbell, but still as a user, you want to know that as quick as possible. So that's an example.

So they built the Streaming Event Bus or SEB. So I love making architecture diagrams. This one looks a little complex, but I'm going to walk you through it and we're going to break it down piece by piece.

So first thing there's multiple, it's a multi-tier architecture, everything that's in gray is outside of scope of SEB. So there's event producers like the cameras and various other services and there's event consumers like that, like the event, the notification service I showed you earlier. So that's, that's in gray, everything in white is SEB.

So in that first tier is the API layer and at the API layer, it's doing some authentication, it's doing some logic, but it's also doing routing just like Tulip showed you. It's deciding which cell to send a given event to based on the event topic.

At the processing layer, you can see there's multiple cells here running Kafka. Apache Kafka is a high throughput, highly scalable event stream processing system.

And then at this layer is the consumer proxy, they did something clever, they wanted to be able to onboard many consumers and they didn't want all those consumers to have to be pulling Kafka. So what they did is they built a consumer proxy that pulls Kafka for them and then serves them the events either by direct API call or by putting in an SQS queue.

So that's SEB. As I said, it's multi-cell and you might notice that each of these pink boxes is a separate AWS account. They divide all these tiers and cells into different AWS accounts both for blast radius and manageability.

As I said, Kafka is an event streaming system, highly scalable, high throughput. So what is Managed Kafka or Managed Streaming for Kafka (MSK)? Managed Kafka is a way you can run Kafka on AWS and AWS takes care of setting up the cluster for you. You don't have to worry about setting up the servers. You know, you tell it what servers you want, it takes care of that, you tell it you want encryption, it takes care of that. You tell you want shared storage, it takes care of that. So it's managed, that's what managed means.

And so they built SEB, the Streaming Event Bus, as a cellular architecture, right? So cellular architecture as Tulip said, those little pink boxes on top are different events coming in thousands of them based on the event topic, they're either going to go to Cell 1 or Cell 2.

And the thing about a cellular architecture is blast radius. So we've already shared with you if a cell goes down, then only half the topics are affected, the other half are still going to work. So that's pretty good. But the team found something really interesting after they implemented this, they found that if a cell goes down, they could actually scale up the other cell and accommodate all the topics there.

So it's cellular. But when it needs to be, it actually scales up to accommodate all the cells. So they get all the benefits, all the scalability of a cellular architecture. And now the blast radius is nothing because all topics are being served by the remaining healthy cell.

And this is something really clever that they did that I really like. In this case, you can see Cell 1 and Cell 2 are not healthy, they're down. Why might that be? Because Tulip said that a cell should be a fault isolation boundary. That's the whole point of it - that fault shouldn't go across cells. But it can happen. Call that a correlated failure, there's some failure that has some correlated impact across multiple cells.

Let's say Kafka is having an issue or they deploy a bug to their Kafka implementation. In this case, a lot of services, a lot of implementations might choose to go multi region, right? Oh something's wrong with Kafka in US East, we'll fail over to US West too. They didn't do that route. They created what's called a fallback cell. I call it cell three. But I put cell in quotes because the cell is really supposed to be the same stack everywhere. But cell three is not the same stack. Cell three is not running Kafka. It's not running MSK, it's running SNS, Simple Notification Service. It's using Simple Notification Service and Simple Q Service to do the stream processing, not necessarily as efficiently as Kafka would do it in C1 and C2. But to maintain availability and this avoids the correlated failure. If the correlated failure is something to do with MSK or Kafka, it's not likely to be affecting SNS and SQS. Therefore, they're able to maintain availability.

The other pattern they implemented is circuit breaker. I think many people have heard about circuit breaker. We're just going to cover it anyway. So we're all on the same page and show you how they did it with a circuit breaker. The circuit starts closed and closed is good. Think of a light switch when you close, when you, when your light switch is on the circuit is closed, which means that there's electricity flowing, right? And that's a good thing in this case, circuit closed means that the cells are accepting requests. But if you get a certain number of errors above above a threshold that tells you that you think that cell is unhealthy, you open the circuit. And you can see with cell one, we've opened the circuit there, that way requests are not going to go to an ailing cell, they're not going to go to a cell where they're not going to be served. They're gonna go to the healthy cell. And remember if you get a few sporadic failures in cell two, it's going to fail back to that quote unquote cell three using the different technology and serve it from there that gives you that extra layer of resilience.

Now, once a circuit's been opened, it's going to go into a half open state and a half open state, it sends request occasional requests. And if it gets enough healthy requests, it assesses that the cell is healthy and closes the circuit again and the cell will be receiving requests again. So just I showed you an example of what these events and notifications looked like before. But I want to bring it home for me. I learned from these examples by understanding what the service actually does.

So in this case, here's another example. So remember the stream start example from before where a camera detects motion and it sent me a push notification. Well, here's another example. It will actually send that same event to the event manager and the event manager is going to put together a nice timeline for me. So I can see when a person was detected or a package was detected in front of my door there. In this case, an Amazon delivery driver. And here's a fun example, the ring ding event, the ring ding event means someone rang your doorbell. If you've configured it, it'll send that event to the bot service and the bot service will do an auto response. The auto response could be something as mundane as please leave a message or for Halloween, it could be like trick or treat or something like that. You could read what they are there. They're kind of fun and I promise massive scale.

So you see on the right there that's showing all those little pink boxes or messages, thousands of messages coming in. So how many messages per hour is Seb receiving? If you look at the graph on the left, it's actually showing the messages per hour for eight different regions. So Seb is deployed by Ring in eight different regions in US East. One, the biggest region where it's deployed, it goes up to that promise 129,000 per second. It actually goes higher than that. That's just the highest it was for the screen grab I got. But that's, that's a good representation of how high it gets. But you can see it's actually multiple regions all running their separate sub stacks and serving all these events. And because it's eight different regions, when you look at it, it says 299,000 total requests per second total across all eight regions and being served at the promised six nines of availability averaged across all of those eight regions. And we look at each region, break down each region. You look at even US East one for the three day period I was looking at here was able to achieve 100% availability on Seb. And that's because they implemented the cellular architecture, they implemented the uh fail over and the fail back and they implemented circuit breaker and took these actions and applied these best practices to get that six nines and even 100% availability. And with that, I'm gonna hand it over to Abas. Thank you.

Alright, let's do a quick hand raise and see how many of you are here using Alexa mobile app. Some I see a few hands. Alright. Today, I would be discussing about how Alexa has improved their resiliency and improved their developer velocity. And this is in particular, with an example of Alexa mobile personalization.

I'm Avinash Kuri. I'm a senior solutions architect with AWS. I'm I'm supporting Amazon as a customer of AWS and primarily working with Alexa and devices.

Alright. So Alexa mobile personalization is basically a landing zone app for all these sort of smart devices that are integrated with Alexa. With that, you can also go ahead and arrange sort of actions that you wanted to quickly sort out on your favorite devices. Or else you could also take certain shortcuts in order to do particular actions like controlling your thermostat temperatures or switching on the living room lights. And at the same time, it also helps you to focus mainly on your daily routines such as weather updates or traffic updates as set pointed out earlier in the vast ecosystem of microservices that we support Alexa mobile personalization is one among them, which serves as a kind of a triggering point for many other downstream services across Alexa.

Here are some of our resiliency goals that we wanted to focus for. And this is kind of like a snapshot of common goals across different organizations. For us. We come from a customer obsession background and improving customer experience is one of our key priorities. And at the same time when we see a certain peak events. As discussed earlier, it could be either a Prime Day or any sort of event. We see a lot of new devices being added and when a new device is added, at the same time, we also would have to scale for corresponding downstream services and support their transparency. So it is necessary for us to scale and support all these peak events and also the downstream services. The next is fault tolerant. We wanted to make sure that we identify pre pre identify the faults and issues before customers catch this. And then at the same time, take contingency measures. So that fault tolerance is another important key aspect for us to do all of this. It requires a lot of developer efforts. And at the same time, it involves a lot of resilience efforts within the developers. Still, we wanted to make sure our developers is always focusing primarily on innovation and the development activities but not more on operational or resiliency activities.

Well, we had many challenges in this overall journey of resiliency. But here are again, some of those challenges that we wanted to bring it to here. So we tried initially working with many different tools and technologies. We have built our own homemade scripts so that all of these scripts tools requires a lot of operational capabilities because you would have to go maintain them, patch them and deal certain operational burdens as well. So while doing that, what we have observed is as we come from a diversified technology stack. It is equally important for us to go ahead and attain a compatibility. And when you have many tools or agents and libraries attaining compatibility across all of this technology stack is another challenge. And at the same time, we also wanted to make sure our security is tightened. We are not leaving a room for any sort of intruders or leaks when using different tools or agents within our production systems. And last is mimicking a real world events, a real world scenarios to do that. We would have to pull in different teams together and the workforces and make sure all of them are aligning to a certain standard in order to mimic or simulate a set of events. Unfortunately, to do all of this, we are not an Avengers and just a developers.

Alright. So with that, we have started leaning on to AWS Fault Injection Service. This is a managed chaos experiment service that helps you to directly run fault isolation experiments, our actions on your AWS resources while it supports a lot of actions on various AWS resources. But today, I would be showcasing about these 21 is about Amazon EC2 instances on how, how you could terminate, stop or boot. And the other one is about how you could run Systems Manager, run commands and take an explicit control action on your resources. And one good thing is Fault Injection Service goes hand in hand with the CloudWatch so that you can set up your own monitoring and alarms and make sure that you have your staff or an exit condition in a guard rail so that, you know, when to exit while conducting this sort of experiments.

Here is a quick overview of our steady state. We want to make sure that our CPU is always less than 50% and with the memory of less than 20%. But at the same time, we wanted to support at least 3 million users within an P99 latency of less than 100 milliseconds.

Here is the first example of what I would be discussing about on CPU and memory stress experiment. On the left side, you're looking at a hypothesis. So what am I doing here here is within the Alexa mobile personalization. We are injecting 40% CPU and memory load. And at the same time, we are also scaling the traffic, our load generator with an additional of 30%. And when we do that, the outcome, what we have expected is there are no incidents reported. And at the same time, our P99 latency stays within 100 milliseconds. And with an exception of having a spike of 130 milliseconds at sometimes. And the mitigation, what we are using here is is our the experiment templates is basically JSON or YAML templates that you could directly use within your Fault Injection Service. And this starts with the name description and roar. Roar is again used to give you an explicit control on the targets or AWS resources that you wanted to go ahead and execute these actions. For as I stated earlier, we have a stop conditions here making sure that we take the control manner within our experiment.

And the stop condition is the CloudWatch alarm and the targets are uh is Easy to instances. And we are using uh resource s uh to classify all our uh instances and make sure that we are uh leveraging this experiment on those targets.

And these are the actions. The actions includes with uh uh Systems Manager uh uh command that that's introducing uh CPU and memory stress. And these actions are executed against those targets of EC2 instances.

And this is the whole template for you. So when we have executed this template, here is a quick snapshot again of many events that we have been capturing across our entire infrastructure stack and we use CloudWatch to capture all of these events.

And this is a particular instance of those events we have observed there's a CPU utilization because we have introduced additional 40% CPU. And at the same time, the memory has grown up. And what we have also observed was there is a network spike of uh traffic out from these EC2 instances because we are also generating an additional uh TPS load.

And when we do that. What we kind of noticed is our P99 latency still stays within our uh within our uh outcome. It's 131 33 milliseconds. It's just one instance of it, but our P90 latency still remains less than 100 milliseconds. So giving us the confidence that our infrastructure stack is ready to take extra load and extra traffic, even though we have limited our CPU and memory.

Here is a second experiment. So in this experiment, what we are trying to see is we are trying to implement an availability zone outage. So in order to have a kind of a real time failure, what does it happen when an availability zone goes down and how you going to deal with it?

And again, the hypothesis over here is we are injecting an availability zone impairment and the mitigation that expected is the auto skiing should get kicked in in another availability zones. And the outcome is the traffic should be handled gracefully because now that we are cutting down an availability zone, we wanted to make sure that traffic is handled gracefully.

And at the same time, our P90 latency is again in less than 100 milliseconds. I stated earlier. The experiment starts with a role description and a RN and the stop condition is again the CloudWatch alarm here and the targets are your two instances.

And the actions over here is we are trying to stop a set of EC2 instances related to a specific availability zone. The beauty of uh using uh AWS FIS experiments is you go you get to have uh the entire experiments conducted in either in a series or panel.

At the end of year, you are looking at uh the CPU stress and uh memory stress experiments are gonna start a uh are sorry. This uh chaos experiment of uh availability zone outage is gonna start after the CPU and memory stress.

And here is another observation based out of our CloudWatch dashboards. What we have seen while executing this experiment is we have seen a constant increase in TPS. That's because we have taken down one of the availability zone. And then there is an immediate CPU spike because the other two availability zones has to start taking this traffic.

And at the same time, our average latency is still uh under 100 milliseconds. So this is something uh noticing us like giving a room uh that we could go ahead and make sure our infrastructure is always in a ready state and we are able to serve the traffic even though uh there's an outage of one availability zone.

So this is how we have started. We have started with our manual testing load testing and then we have moved to the game days. Those are always evergreen. And then we have introduced all the functional and traditional testing into the pipeline baked into the pipeline. And make sure they cater every time to your deployment cycles.

And at the same time, we also started involvement with designing our own set of tools and technologies along the side of scripts to do these sort of resiliency testing. But we have understood all of this to include a lot of operational overload for us and also the cost and then we have completely moved to fully automate chaos testing using Fault Injection Service.

Now, the additional advantage of using AWS is these experiment templates can be shared across different developer communities. So the other teams need not go ahead and start this from scratch. Whereas they could directly use this as an abstraction layer on top of their services too.

Some of the key takeaways that that we have observed out of this entire experiment was we were able to scale from 3 to 4 million users without the change of anything on our infrastructure site that's improving a lot on our operational resilience.

And now that our developers get more time to focus on innovation and new development activities that we got to know that as per calculation, there's almost 640 hours, developer hours have been saved per quarter. And that's improving a lot of on the developer productivity too.

We also made sure that we have taken down some of the infrastructure that we have provision based on our experiments of 40% CPU and memory. And that has helped us to reduce 60% of the cost on the entire infrastructure and also committing to a carbon savings of 30%.

With that, I'll hand over to Tulip for the next use case.

Thank you, Avinash. I'll get a quick sip of water. It's very dry up here. So you heard about the stories around chaos engineering, cell based architecture, and ri rings architecture as well. And so when you have this massive architectures or massive workloads running on AWS observ becomes really important, you want to be able to monitor your infrastructure.

And so in this part of the session, we're going to learn about how Audible scales observable using the CloudWatch unified observ solution. So you might know that Audible is one of the largest producer of audio books in the world. And so they have a lot of services and each of the services generates its own logs and metrics.

So previously, they lacked that holistic view to be able to pinpoint root causes. They weren't able to get down to the bottom of why a severity issue occurred very quickly and it took them a long time. And so when the CloudWatch cross account observability was released last year as one of the features, they were one of the early adopters and were able to quickly realize the benefits.

So this is what CloudWatch cross account observability looks like. Let's say you have three AWS accounts and you're running your ECS EC2 and Lambdas out there. And so you might have set up AWS X-Ray. Now, what AWS X-Ray does is it's able to trace a request from one service to the other service and create this trace map and this collects all these traces and there's CloudWatch as well set up in all of these accounts which collects the logs and the metrics.

Now with Cross CloudWatch cross account observability, you're able to send this traces logs and metrics into one single AWS monitoring account and that becomes your centralized AWS observable account. And all you need to do is log into that monitoring account and be able to correlate, correlate your logs, traces and metrics across all your source accounts.

And now I'm going to do a demo of tracing a severity issue walk. Basically step into an on call engineer from Audible and see how he would do it.

So here's this trace map, what it looks like. So for folks who don't know what a trace map is, it basically shows how your request flows from one service to the other service. So let's put ourselves in the shoes of an on-call engineer. And let's say you're seeing a lot of error codes coming up because one of the clients is not able to see the request. And so how would you do it? It's traditionally like for, for um on call internet audible, you had to log into separate accounts and to look at those logs and metrics. But now they can just log into one monitoring account out here and be able to trace that request out here.

And so in a trace map, they're able to see the arrows as how the request is flowing from one service node to the other. While the circles out here are the service nodes and on the top, you can see a little red dot and what that red dots indicates is there is an error or a fault happened in that one particular service node. So this becomes really easy to just go into that one account, look at the service map and see where all the errors occurred.

And you can also filter down. So this is collecting all the services from all the like the four AWS accounts out here and you can filter down and select one of the accounts in this case, treat you and see all the services attached to it while if you wanted to and select a different account like 70. You can see like two services that are attached to 70 and be able to like also like look at separate AWS accounts also get and and get a holistic view.

So let's go back. So you uh you're able to see the service node that has that error. So we click on that service node and then it brings us to this view. What it helps you is to correlate your metrics to your traces and you're able to see a trace map. And you can, are you able, and you're able to see your metrics at the bottom and you can see metrics like latency and you can also see metrics like false, which basically indicates there are errors are out there.

And if you want to deep down deep dive further, all you need to do is click on view traces which picks up the trace segment around the time when this false occurred. So if you click on that, it brings us to something like this where you can see the trace map at the top and you can see there are some faults associated when this trace was collected.

And then it also shows you what happened. What exactly was the cause? The cause was the customer ID didn't get propagated or didn't get sent from one service note to the other service. And so we're able to quickly deep dive and find out why the error occurred.

And the other thing, you can also to the page where you clicked on view traces. You can also click on view container logs inside and this brings you to this logs inside screen out here. And here it automatically selects the time frame when that error is occurred. And you can see all the logs associated with that time frame.

And it also picks up the trace ID when this error is occurred. So you're able to get further information as to why those errors occurred. And as a result, after Audible implemented this cross account observability, they were able to correlate the logs, traces and metrics easily.

They were able to only use one monitoring account to look at across all their source accounts. And as a result, they saw over 60% reduction in debugging time. So previously, they would spend around a on an average two hours on any separate issue to debug. And now they're spending like um almost 20 to 30 minutes.

And this is one of the quotes from one of the developers who, who is saying like previously, he had to log into multiple windows. Now he has to only log into one window. And so he's able to see query all the services under one single pane of view and and saves him a lot of time.

And with this, I'm gonna hand it over to Say thank you.

I i love seeing quotes from developers and that picture we had there is just, you know, licensed photography and it's the same one i've used with other developer quotes in the past. So somebody's gonna think that's the happiest developer in the world as all the happiest quotes from Amazon.

The conclusion today is pretty straightforward. We wanted to show you how you could build resilient scalable architectures and we wanted to do that by example, and we showed you examples from all these teams from all these teams here showing you real life examples to help inspire you and show you it could be done. As I said, no matter what size you are now small to large. A lot of these principles apply. A lot of these best practices apply. I want you to go out there and do them now to learn more.

There's other sessions you can check out. This slide is from an earlier version of this talk. So some of these sessions might have happened already if they're breakouts, they're going to be recorded. So don't worry about it.

That life cycle I started out talking about there's the link to it there and there's a couple of other things you can learn about cell based architecture, chaos engineering X-ray, etc from all of these links here. And again, this will be posted to YouTube. Eventually you get the links there.

And well, Avinash talked about FIS, there's several different purpose built services for resilience at AWS. You should be aware of them. Resilience Hub is another great one. And Adios Backup and Elastic Disaster Recovery and R53 Arc. So these are all good services that you might want to use for your resilience journey.

And with that, we do have some time for questions. I want to make sure you fill out the survey, but we thank you very much. Thank you.

你可能感兴趣的:(aws,亚马逊云科技,科技,人工智能,re:Invent,2023,生成式AI,云服务)

MotionLayout（二）：MotionLayout是什么？MotionLayout调试技巧、KeyFrame关键帧等等前期后期 android kotlin 学习
一、MotionLayout是什么？●定位：AndroidJetpack中的高级布局容器，继承自ConstraintLayout。●核心功能：通过状态（State）和过渡（Transition）定义复杂的界面动画，支持手势交互、路径动画等。●优势：简化动画开发流程，替代传统Animator或TransitionManager，适合处理多视图联动、复杂转场效果。1.1应用场景使用MotionLayo
Centos7部署Graylog5.2日志系统 LoongKK linux 运维 linux ssh graylog centos 日志
Graylog5.2部署Graylog5.2适配MongoDB5.x~6.x，MongoDB5.0+要求CPU支持AVX指令集。主机说明localhost部署Graylog，需要安装mongodb-org-6.0、Elasticsearch7.10.2参考：https://blog.csdn.net/qixiaolinlin/article/details/129966703https://blo
探索Google AI聊天模型的集成和使用 qahaj 人工智能 python
随着人工智能的飞速发展，GoogleAI的聊天模型提供了强大的自然语言处理能力，可以应用于多种场景中。本文将为你介绍如何通过GoogleAI和LangChain库来使用这些聊天模型。技术背景介绍GoogleAI提供了一系列强大的聊天模型，这些模型具备不同的功能和参数设置。它们不仅可以通过GoogleAI服务访问，还可以通过GoogleCloudVertexAI以企业级功能使用。在本文中，我们将重点
python 绘图（爱心） @小H python 开发语言
#-*-coding:utf-8-*-fromturtleimport*defcurvemove():foriinrange(200):right(1)forward(1)color('red','pink')begin_fill()left(140)forward(111.65)curvemove()left(120)curvemove()forward(111.65)end_fill()don
扫地机高增长神话破灭！科沃斯、石头科技艰难 “破冰”！ liukuang110 科技
扫地机器人赛道太冷，陆续有企业倒在寒风里。先是，老牌研发商广东宝乐机器人宣布破产重整；曾获得腾讯和红杉资本大额融资，并邀请罗永浩代言的“追光”品牌，也在短短两年内宣告失败。就连雷军投资、小米生态链孵化的睿米科技，也发布了停止运营的通告。头部玩家近况亦不乐观。以科技创新而闻名的科沃斯业绩大幅下滑，在过去几个月中股价的剧烈下跌，引发了市场的高度关注与深刻反思。另一头部玩家石头科技，毛利率下滑、存货周转
线程中run方法与start方法的差别夜君客 java 开发语言
run()方法run()方法是Runnable接口中定义的方法，Thread类实现了Runnable接口。当你直接调用run()方法时，它会在当前线程中执行，而不会启动一个新的线程。也就是说，run()方法只是一个普通的方法调用，不会产生多线程的效果。start()方法start()方法用于启动一个新的线程。当你调用start()方法时，JVM会创建一个新的线程，并在这个新线程中调用run()方法
docker（10、日志管理4）5、Graylog 日志系统(1、部署Graylog日志系统，2、Graylog管理日志) junior1206 k8s docker
部署Graylog日志系统Graylog是与ELK可以相提并论的一款几种式日志管理方案，支持数据收集、检索、可视化Dashboard。将实践用Graylog来管理Docker日志Graylog架构Graylog架构如下图所示：Graylog负责接收来自各种设备和应用的日志，并未用户提供Web访问接口。Elasticsearch用于索引和保存Graylog接收到的日志MongoDB负责保存Grayl
代码随想录算法训练营DAY59｜110.字符串接龙、105.有向图的完全可达性、106. 岛屿的周长阿緑代码随想录打卡算法
110.字符串接龙fromcollectionsimportdequedeffindshortestpath(strlist,beginstr,endstr):que=deque()visited={}que.append(beginstr)visited[beginstr]=1result=0whileque:cur=que.popleft()result=visited[cur]foriinr
Java高频面试之集合-08 牛马baby java 面试 python
hello啊，各位观众姥爷们！！！本baby今天来报道了！哈哈哈哈哈嗝面试官：详细说说CopyOnWriteArrayListCopyOnWriteArrayList详解CopyOnWriteArrayList是Java并发包（java.util.concurrent）中提供的线程安全列表，基于“写时复制”（Copy-On-Write）机制实现。它适用于读多写少的高并发场景，如事件监听器列表、配置
Java高频面试之SE-23 牛马baby java 面试 windows
hello啊，各位观众姥爷们！！！本baby今天又来了！哈哈哈哈哈嗝Java中的Stream是Java8引入的一种全新的数据处理方式，它基于函数式编程思想，提供了一种高效、简洁且灵活的方式来操作集合数据。Stream的核心思想是声明式编程（告诉程序“做什么”，而不是“怎么做”）。1.Stream的核心特点无存储：Stream不存储数据，只是对数据源的视图（如集合、数组、I/O通道等）。函数式操作：
使用 Python 绘制爱心图形（高级版）徐浪老师徐浪老师大讲堂 python 开发语言
以下是一段使用Python绘制高级“爱心”图案的代码，结合数学公式生成精美的爱心形状，并附加一些交互式的效果，比如渐变颜色或动态展示：动态渐变爱心importnumpyasnpimportmatplotlib.pyplotaspltimportmatplotlib.animationasanimation#设置爱心的数学公式defheart_shape(t):x=16*np.sin(t)**3y=
“租赁业务ERP+deepseek”模式的应用软件研究员汽车 DeepSeek 汽车租赁系统
汽车租赁业务从上世纪90年代发展至今，从传统的人工管理到软件辅助，随着互联网的发展，业务公司对汽车租赁系统提出了更高的要求，比如自助订单，业务推广、客户资质评估，车辆风控，风险预警等，又随着近期人工智能的出现，业务公司对业务系统的期望更高，期望都节约更多人工成本，让管理变得简单快捷高效和智能。所以就引发人们新的启发：“业务系统ERP+deepseek”，但业务系统ERP+deepseek能否满足业
Jarslink 是一个 SOFA 方舟插件，用于管理多应用部署后端java
前言大家好，我是老马。sofastack其实出来很久了，第一次应该是在2022年左右开始关注，但是一直没有深入研究。最近想学习一下SOFA对于生态的设计和思考。sofaboot系列SOFABoot-00-sofaboot概览SOFABoot-01-蚂蚁金服开源的sofaboot是什么黑科技？SOFABoot-02-模块化隔离方案SOFABoot-03-sofaboot介绍SOFABoot-04-快
Cesium实践（1）—— Hello World 迦南giser WebGIS #Cesium webgis cesium
文章目录前言Cesium是什么Cesium核心类ViewerSceneEntityDataSourceCollection创建第一个Cesium应用工程搭建Cesium版helloworld总结前言工作大半年来主要的技术栈是mapbox-gl和threejs，但是作为一名GIS专业毕业生，一直对Cesium充满兴趣。Cesium不仅保持了threejs的三维绘制能力，而且内置大量渲染地理数据的AP
Electron打包文件生成.exe文件打开即可使用糕冷小美n electron javascript 前端
1、Electron打包，包括需要下载的内容和环境配置步骤注意：Electron是一个使用JavaScript、HTML和CSS构建跨平台桌面应用程序的框架首先需要电脑环境有Node.js和npm我之前的文章有关nvm下载node的说明也可以去官网下载检查是否有node和npm环境命令node-vnpm-v输出版本号，说明安装成功2、创建Electron项目2.1创建项目目录打开命令行工具，创建一
JavaScript基础-事件对象難釋懷 javascript 开发语言
在现代Web开发中，事件处理是实现动态和交互式网页的核心。当用户与页面进行交互时（如点击按钮、提交表单等），浏览器会生成相应的事件。为了有效地响应这些事件，JavaScript提供了事件对象，它包含了关于事件的详细信息。本文将详细介绍事件对象的概念、重要的属性和方法，并通过实例展示其应用场景。一、什么是事件对象？每当一个事件被触发时，浏览器都会创建一个事件对象，这个对象包含了该事件的所有相关信息，
雷军从 6 楼扔涂有防弹涂层西瓜，西瓜完好无损，这种防弹涂层是什么材质？用在车上效果怎么样？日记成书热门实事材质网络运维
雷军展示的“防弹涂层”是一种基于第四代高分子材料聚脲（Polyurea）的升级技术，其核心特性是通过纳米级交联结构形成弹性防护层，兼具柔韧性与刚性，能够有效吸收冲击能量并抵御尖锐物体的穿刺。以下是关于该涂层材质及在车用场景中的详细分析：一、防弹涂层的材质与技术特性材料本质该涂层属于聚脲材料的升级版本，由异氰酸酯与氨基化合物反应生成。其分子链结构密集交错，形成类似“钢筋网”的防护层，可瞬间形变吸收冲
python poetry添加某个git仓库的某个分支 waketzheng git
命令行不太清楚怎么弄，但可以通过编辑pyproject.toml实现实例：pypika-tortoise={git="https://github.com/henadzit/pypika-tortoise",branch="do-not-use-builder"}参考：WIPDonotcopypypikaquerybyhenadzit·PullRequest#1851·tortoise/torto
The following modules are *disabled* in configure script:_sqlite3 waketzheng python
Unabletoupgradepast3.6.9-#24byRosuav-PythonHelp-DiscussionsonPython.orgsudoaptinstalllibsqlite3-devcdPython-3.13.1./configure--enable-optimizations--enable-loadable-sqlite-extensionsmakesudomakealtins
使用GitHub API进行智能文档加载 fgayif github python
GitHub是一个强大的开发者平台，提供了代码存储、管理和分享的功能。它采用Git软件，增强了分布式版本控制，同时提供了访问控制、错误跟踪、软件功能请求、任务管理、持续集成和项目的wiki等功能。随着AI技术的发展，我们可以利用GitHub的API实现智能文档加载，以便更好地进行代码管理和分析。下面我将介绍如何使用GitHubAPI进行文档加载，并通过实用的代码示例来帮助大家理解。技术背景介绍Gi
debian(ubuntu) 系统 vsftpd 配置虚拟帐号 eli960 LINUX vsftpd ftp
首先说明帐号的认证通过pam认证方式,采用pam的mysql插件.安装libpam-mysql和vsftpdapt-getinstalllibpam-mysqlapt-getinstallvsftpdmysql的库,表,字段,假设如下:库名DBV表名TB字段USER和PASSWORD数据库的帐号密码DBUSERDBPASSWROD/etc/pam.d/vsftpd的内容如下authrequired
node-imap-sync-client, imap 客户端库, 同步专用 eli960 MAIL 前端 javascript node.js
node-imap-sync-client说明网址:https://gitee.com/linuxmail/node-imap-sync-client同步操作imap客户端，见例子examples本imap客户端,特点:全部命令都是promise风格主要用于和IMAPD服务器同步邮箱数据和邮件数据支持文件夹的创建/删除/移动(改名)支持邮件的复制/移动/删除/标记/上传支持获取文件夹下邮件UID列
node-ddk, electron 组件,任务栏,托盘,通知 eli960 node-ddk electron javascript node.js
node-ddk任务栏,托盘,通知https://blog.csdn.net/eli960/article/details/146207062也可以下载demo直接演示http://linuxmail.cn/go#node-ddk在渲染进程(既web端)操作importrenderer,{NODEDDK}from"node-ddk/renderer"letw=renderer.window//让托
node-ddk,electron 开发组件 eli960 node-ddk electron javascript 前端 node.js js
node-ddk-demo说明node-ddk是ELECTRON开发框架,封装常见操作npminode-ddk演示:https://live.csdn.net/v/468440本项目是一个DEMO,项目地址:https://gitee.com/linuxmail/node-ddk-demogitclonehttps://gitee.com/linuxmail/node-ddk-democdnode
高效快速教你DeepSeek如何进行本地部署并且可视化对话大富大贵7 程序员知识储备1 程序员知识储备2 程序员知识储备3 经验分享
科技文章：高效快速教你DeepSeek如何进行本地部署并且可视化对话摘要：随着自然语言处理（NLP）技术的进步，DeepSeek作为一款基于深度学习的语义搜索技术，广泛应用于文本理解、对话系统及信息检索等多个领域。本文将探讨如何高效快速地在本地部署DeepSeek，并结合可视化工具实现对话过程的监控与分析。通过详尽的步骤、案例分析与代码示例，帮助开发者更好地理解和应用DeepSeek技术。同时，本
c ++零基础可视化——数组 zhangpz_ 算法 c++
c++零基础可视化数组一些知识：关于给数组赋值，一个函数为memset，其在cplusplus.com中的描述如下：void*memset(void*ptr,intvalue,size_tnum);Setsthefirstnumbytesoftheblockofmemorypointedbyptrtothespecifiedvalue(interpretedasanunsignedchar).将p
网络安全知识：网络安全网格架构网络安全-杰克 web安全架构安全
在数字化转型的主导下，大多数组织利用多云或混合环境，包括本地基础设施、云服务和应用程序以及第三方实体，以及在网络中运行的用户和设备身份。在这种情况下，保护组织资产免受威胁涉及实现一个统一的框架，该框架根据组织内每个实体的上下文提供安全性。此外，强化组合环境需要可互操作的跨域功能，以增强协作，这样就不需要多个解决方案来实现相同的功能。在这种情况下，网络安全网格架构（CSMA）提供了一种可扩展的方法来
P3375 【模板】KMP 好好学习^按时吃饭算法
题目来自洛谷网站：思路：从题目名字知道这是KMP模板题目，对于KMP算法，就两步，1、构造next数组。2、在s1中找到s2出现的位置。KMP代码：#includeusingnamespacestd;constintN=1e6+10;chars1[N],s2[N];//全局变量名字不能定义为next//C++标准库中有一个函数名字是nextintnext1[N];//ne数组intmain(){/
node-ddk, electron组件, 自定义本地文件协议,打开本地文件 eli960 node-ddk electron javascript 前端 node.js
node-ddk文件协议https://blog.csdn.net/eli960/article/details/146207062也可以下载demo直接演示http://linuxmail.cn/go#node-ddk安全考虑到安全,本系统禁止使用file:///在主窗口,自定义文件协议,可以多个importmain,{NODEDDK}from"node-ddk/main"main.protoc
【嵌入式学习2】指针 - 数组 XYN5114 嵌入式学习学习笔记嵌入式硬件 c语言
目录##概述##指针###指针特点##指针变量###指针变量特点##区别##指针变量的使用定义指针变量时：使用指针变量时：##通过指针间接修改变量的值##指针大小指针大小与数据类型无关：无论指针指向什么类型的数据（int、char、double等），指针本身的大小只取决于系统的位数（32位或64位）。##指针步长###指针步长的计算方式##空指针和野指针##多级指针##指针与常量##函数参数传递内
Maven Array_06 eclipse jdk maven
Maven Maven是基于项目对象模型(POM)，信息来管理项目的构建，报告和文档的软件项目管理工具。 Maven 除了以程序构建能力为特色之外，还提供高级项目管理工具。由于 Maven 的缺省构建规则有较高的可重用性，所以常常用两三行 Maven 构建脚本就可以构建简单的项目。由于 Maven 的面向项目的方法，许多 Apache Jakarta 项目发文时使用 Maven，而且公司
ibatis的queyrForList和queryForMap区别 bijian1013 java ibatis
一.说明 iBatis的返回值参数类型也有种：resultMap与resultClass，这两种类型的选择可以用两句话说明之： 1.当结果集列名和类的属性名完全相对应的时候，则可直接用resultClass直接指定查询结果类
LeetCode[位运算] - #191 计算汉明权重 Cwind java 位运算 LeetCode Algorithm 题解
原题链接：#191 Number of 1 Bits 要求：写一个函数，以一个无符号整数为参数，返回其汉明权重。例如，‘11’的二进制表示为'00000000000000000000000000001011', 故函数应当返回3。汉明权重：指一个字符串中非零字符的个数；对于二进制串，即其中‘1’的个数。难度：简单分析：将十进制参数转换为二进制，然后计算其中1的个数即可。 “
浅谈java类与对象 15700786134 java
java是一门面向对象的编程语言，类与对象是其最基本的概念。所谓对象，就是一个个具体的物体，一个人，一台电脑，都是对象。而类，就是对象的一种抽象，是多个对象具有的共性的一种集合，其中包含了属性与方法，就是属于该类的对象所具有的共性。当一个类创建了对象，这个对象就拥有了该类全部的属性，方法。相比于结构化的编程思路，面向对象更适用于人的思维
linux下双网卡同一个IP 被触发 linux
转自： http://q2482696735.blog.163.com/blog/static/250606077201569029441/ 由于需要一台机器有两个网卡，开始时设置在同一个网段的IP，发现数据总是从一个网卡发出，而另一个网卡上没有数据流动。网上找了下，发现相同的问题不少：一、关于双网卡设置同一网段IP然后连接交换机的时候出现的奇怪现象。当时没有怎么思考、以为是生成树
安卓按主页键隐藏程序之后无法再次打开肆无忌惮_ 安卓
遇到一个奇怪的问题，当SplashActivity跳转到MainActivity之后，按主页键，再去打开程序，程序没法再打开（闪一下），结束任务再开也是这样，只能卸载了再重装。而且每次在Log里都打印了这句话"进入主程序"。后来发现是必须跳转之后再finish掉SplashActivity 本来代码： // 销毁这个Activity fin
通过cookie保存并读取用户登录信息实例知了ing JavaScript html
通过cookie的getCookies()方法可获取所有cookie对象的集合；通过getName()方法可以获取指定的名称的cookie；通过getValue()方法获取到cookie对象的值。另外，将一个cookie对象发送到客户端，使用response对象的addCookie()方法。下面通过cookie保存并读取用户登录信息的例子加深一下理解。（1）创建index.jsp文件。在改
JAVA 对象池矮蛋蛋 java ObjectPool
原文地址： http://www.blogjava.net/baoyaer/articles/218460.html Jakarta对象池 ☆为什么使用对象池恰当地使用对象池化技术，可以有效地减少对象生成和初始化时的消耗，提高系统的运行效率。Jakarta Commons Pool组件提供了一整套用于实现对象池化
ArrayList根据条件+for循环批量删除的方法 alleni123 java
场景如下： ArrayList<Obj> list Obj-> createTime, sid. 现在要根据obj的createTime来进行定期清理。（释放内存） ------------------------- 首先想到的方法就是 for(Obj o:list){ if(o.createTime-currentT>xxx){
阿里巴巴“耕地宝”大战各种宝百合不是茶平台战略
“耕地保”平台是阿里巴巴和安徽农民共同推出的一个 “首个互联网定制私人农场”，“耕地宝”由阿里巴巴投入一亿，主要是用来进行农业方面，将农民手中的散地集中起来不仅加大农民集体在土地上面的话语权，还增加了土地的流通与利用率，提高了土地的产量，有利于大规模的产业化的高科技农业的发展，阿里在农业上的探索将会引起新一轮的产业调整，但是集体化之后农民的个体的话语权将更少，国家应出台相应的法律法规保护
Spring注入有继承关系的类（1） bijian1013 java spring
一个类一个类的注入 1.AClass类 package com.bijian.spring.test2; public class AClass { String a; String b; public String getA() { return a; } public void setA(Strin
30岁转型期你能否成为成功人士 bijian1013 成功
很多人由于年轻时走了弯路，到了30岁一事无成，这样的例子大有人在。但同样也有一些人，整个职业生涯都发展得很优秀，到了30岁已经成为职场的精英阶层。由于做猎头的原因，我们接触很多30岁左右的经理人，发现他们在职业发展道路上往往有很多致命的问题。在30岁之前，他们的职业生涯表现很优秀，但从30岁到40岁这一段，很多人
[Velocity三]基于Servlet+Velocity的web应用 bit1129 velocity
什么是VelocityViewServlet 使用org.apache.velocity.tools.view.VelocityViewServlet可以将Velocity集成到基于Servlet的web应用中，以Servlet+Velocity的方式实现web应用 Servlet + Velocity的一般步骤 1.自定义Servlet，实现VelocityViewServl
【Kafka十二】关于Kafka是一个Commit Log Service bit1129 service
Kafka is a distributed, partitioned, replicated commit log service.这里的commit log如何理解？ A message is considered "committed" when all in sync replicas for that partition have applied i
NGINX + LUA实现复杂的控制 ronin47 lua nginx 控制
安装lua_nginx_module 模块 lua_nginx_module 可以一步步的安装，也可以直接用淘宝的OpenResty Centos和debian的安装就简单了。。这里说下freebsd的安装： fetch http://www.lua.org/ftp/lua-5.1.4.tar.gz tar zxvf lua-5.1.4.tar.gz cd lua-5.1.4 ma
java-14.输入一个已经按升序排序过的数组和一个数字，在数组中查找两个数，使得它们的和正好是输入的那个数字 bylijinnan java
public class TwoElementEqualSum { /** * 第 14 题：题目：输入一个已经按升序排序过的数组和一个数字，在数组中查找两个数，使得它们的和正好是输入的那个数字。要求时间复杂度是 O(n) 。如果有多对数字的和等于输入的数字，输出任意一对即可。例如输入数组 1 、 2 、 4 、 7 、 11 、 15 和数字 15 。由于
Netty源码学习-HttpChunkAggregator-HttpRequestEncoder-HttpResponseDecoder bylijinnan java netty
今天看Netty如何实现一个Http Server org.jboss.netty.example.http.file.HttpStaticFileServerPipelineFactory： pipeline.addLast("decoder", new HttpRequestDecoder()); pipeline.addLast(&quo
java敏感词过虑-基于多叉树原理 cngolon 违禁词过虑替换违禁词敏感词过虑多叉树
基于多叉树的敏感词、关键词过滤的工具包，用于java中的敏感词过滤 1、工具包自带敏感词词库，第一次调用时读入词库，故第一次调用时间可能较长，在类加载后普通pc机上html过滤5000字在80毫秒左右，纯文本35毫秒左右。 2、如需自定义词库，将jar包考入WEB-INF工程的lib目录，在WEB-INF/classes目录下建一个 utf-8的words.dict文本文件，
多线程知识 cuishikuan 多线程
T1，T2，T3三个线程工作顺序，按照T1，T2，T3依次进行 public class T1 implements Runnable{ @Override
spring整合activemq dalan_123 java spring jms
整合spring和activemq需要搞清楚如下的东东1、ConnectionFactory分： a、spring管理连接到activemq服务器的管理ConnectionFactory也即是所谓产生到jms服务器的链接 b、真正产生到JMS服务器链接的ConnectionFactory还得
MySQL时间字段究竟使用INT还是DateTime？ dcj3sjt126com mysql
环境：Windows XPPHP Version 5.2.9MySQL Server 5.1 第一步、创建一个表date_test（非定长、int时间） CREATE TABLE `test`.`date_test` (`id` INT NOT NULL AUTO_INCREMENT ,`start_time` INT NOT NULL ,`some_content`
Parcel: unable to marshal value dcj3sjt126com marshal
在两个activity直接传递List<xxInfo>时，出现Parcel: unable to marshal value异常。在MainActivity页面（MainActivity页面向NextActivity页面传递一个List<xxInfo>）： Intent intent = new Intent(this, Next
linux进程的查看上（ps） eksliang linux ps linux ps -l linux ps aux
ps:将某个时间点的进程运行情况选取下来转载请出自出处：http://eksliang.iteye.com/admin/blogs/2119469 http://eksliang.iteye.com ps 这个命令的man page 不是很好查阅，因为很多不同的Unix都使用这儿ps来查阅进程的状态，为了要符合不同版本的需求，所以这个
为什么第三方应用能早于System的app启动 gqdy365 System
Android应用的启动顺序网上有一大堆资料可以查阅了，这里就不细述了，这里不阐述ROM启动还有bootloader，软件启动的大致流程应该是启动kernel -> 运行servicemanager 把一些native的服务用命令启动起来（包括wifi, power, rild, surfaceflinger, mediaserver等等）-> 启动Dalivk中的第一个进程Zygot
App Framework发送JSONP请求(3) hw1287789687 jsonp 跨域请求发送jsonp ajax请求越狱请求
App Framework 中如何发送JSONP请求呢? 使用jsonp,详情请参考:http://json-p.org/ 如何发送Ajax请求呢? (1)登录 /*** * 会员登录 * @param username * @param password */ var user_login=function(username,password){ // aler
发福利，整理了一份关于“资源汇总”的汇总 justjavac 资源
觉得有用的话，可以去github关注：https://github.com/justjavac/awesome-awesomeness-zh_CN 通用 free-programming-books-zh_CN 免费的计算机编程类中文书籍精彩博客集合 hacke2/hacke2.github.io#2 ResumeSample 程序员简历
用 Java 技术创建 RESTful Web 服务 macroli java 编程 Web REST
转载：http://www.ibm.com/developerworks/cn/web/wa-jaxrs/ JAX-RS (JSR-311) 【 Java API for RESTful Web Services 】是一种 Java™ API，可使 Java Restful 服务的开发变得迅速而轻松。这个 API 提供了一种基于注释的模型来描述分布式资源。注释被用来提供资源的位
CentOS6.5-x86_64位下oracle11g的安装详细步骤及注意事项超声波 oracle linux
前言：这两天项目要上线了，由我负责往服务器部署整个项目，因此首先要往服务器安装oracle，服务器本身是CentOS6.5的64位系统，安装的数据库版本是11g，在整个的安装过程中碰到很多的坑，不过最后还是通过各种途径解决并成功装上了。转别写篇博客来记录完整的安装过程以及在整个过程中的注意事项。希望对以后那些刚刚接触的菜鸟们能起到一定的帮助作用。安装过程中可能遇到的问题（注
HttpClient 4.3 设置keeplive 和 timeout 的方法 supben httpclient
ConnectionKeepAliveStrategy kaStrategy = new DefaultConnectionKeepAliveStrategy() { @Override public long getKeepAliveDuration(HttpResponse response, HttpContext context) { long keepAlive
Spring 4.2新特性-@Import注解的升级 wiselyman spring 4
3.1 @Import @Import注解在4.2之前只支持导入配置类在4.2,@Import注解支持导入普通的java类,并将其声明成一个bean 3.2 示例演示java类 package com.wisely.spring4_2.imp; public class DemoService { public void doSomethin