李白的朋友王维

Stripe: Architecting for observability at massive scale

Welcome everyone. Thanks for joining today's session, "Architecting for Observability at Massive Scale", joined by Stripe. I'm sure most of you are working with systems and applications that are critical for business. So you need these systems to be observable as businesses grow. The scale of observability also grows with it, presenting some new challenges.

In today's session, you will learn how Stripe navigated similar challenges and the solutions they implemented to address them.

My name is Hassan Tariq. I'm a Principal Solutions Architect with AWS. I work closely with Stripe on multiple projects including their observable architecture. Uh joining me today is uh K. I'll let him introduce himself uh when he takes the stage.

Here is our overview of today's agenda:

I'll start by giving some context of importance of observability, followed by the observable landscape at AWS. I'll cover some services a little bit more in detail that are most relevant to today's session followed by Cody from Stripe who will walk you through the architecture that Stripe has implemented, the lessons learned and how Stripe builds a culture of observability.

Then we'll wrap the session up by giving you some takeaways. So let's dive into it.

I'll start by addressing the question "Why we need observability?" Amazon CTO W Vogel said "Everything fails all the time." What this means is despite your best efforts, the systems and applications that you're working on may eventually experience failures that can be due to a configuration change, faulty deployment or just something in the underlying infrastructure that may fail.

Now, you may not be able to control all of these aspects. But what you can do is give your users visibility into their environments so they can respond quickly in case of issues. A comprehensive observable strategy not only alerts you when there is a problem, but also helps you understand and deal with it.

So how do we create such a comprehensive strategy? You need to be looking beyond just operations and cover all aspects of business. But it starts with collecting system level telemetry data like CPU utilization, memory metrics from your VMs, control plane. You can use these signals to manage and track business level metrics.

For example, two of the important metrics that Stripe tracks are charge latency and charge success rate. Using these metrics, you can drive business insights that are very valuable and can help you make informed decisions and also improve the business performance.

Now that we have established the importance of observability, let's take a look at some of the challenges as we dive deep into the world of observability at scale.

One of the challenges that customers face is infrastructure complexity. Our customers tell us that with the adoption of microservices and expansion of compute platforms, their infrastructure has become more complex over time. One of the primary factors that contribute towards this increased complexity is our need to deploy distributed and dynamic systems that scale automatically based on customer demand.

Some of the underlying or related factors of this complexity is also edge computing, hybrid systems, containerization and orchestration.

So to understand these distributed and dynamic systems, businesses need data - lots of it. This data can be in the form of metrics, logs and traces. As you deploy more of these dynamic systems, more systems may eventually fail, scale in and out. So that dynamic behavior produces even larger volumes of data. And as a direct consequence, the overall cost of managing this data becomes higher and higher and it can become pretty significantly higher.

So at its core, observability at scale presents a fundamental challenge of balancing the benefit of insights with increased complexity and cost.

Now to solve for these challenges, AWS offers a wide range of observable services that you can use as building blocks to architect a solution that works for you.

We understand that observability comes in many shapes and sizes. In general, there are two high level categories of these services - the CloudWatch services which are cloud native to the AWS ecosystem and the open source managed services.

I'll just briefly explain some of these services. Starting with the CloudWatch, at the collectors level, you have the CloudWatch agent which is used by millions of customers to collect metrics and logs data from EC2 instances, applications and managed services. This data is then sent to the CloudWatch service for processing.

You also have X-Ray agent which collects application trace data. As we go higher up in this stack, there are higher level insight services you can use. The Container Insights to understand how your containerized workloads are doing. Similarly, you have Lambda Insights and Application Insights that can give you a much broader insight into your application performance.

Switching over to the managed open source side, on the collectors level, you have AWS Distro of OpenTelemetry that is used to collect metrics and traces data. This data is sent to the managed services like Amazon Opensearch or Amazon Managed Service for Prometheus for processing, to query and visualize this data.

You can use Amazon Managed Grafana. And in between, you can either develop your own applications to further process this data or even use a third party ISV application.

So the idea here is to give you choices so you can select the services that best fit your need. For today's session, I'll just cover these two services a little more in detail because these are relevant and will be used in the architecture that Cody will be presenting soon.

Amazon Managed Service for Prometheus is compatible with open source Prometheus, which is a monitoring and alerting tool. Amazon Managed Grafana is compatible with open source Grafana, which is a popular visualization tool.

Now, with a show of hands, how many of you have used Prometheus or Grafana in any shape or form? Wow, looks like a lot of you are familiar with it.

So before I go further and explain these two services a little more in detail, I'd like to address one question that some of you might be thinking - why not to use open source software?

Well, when it comes to open source, enterprises have some concerns. The number one concern is - are these software secure and compliant with other applications and tools in their environment? Upgrading and patching is another concern. Whenever there is a new version, you have to test it and validate to make sure it works well. What happens when a vulnerability is discovered?

So in short, when you're using open source, you are responsible for patching, securing and scaling the underlying infrastructure. Amazon managed services does all of that for you - it abstracts away the operational overhead so you can focus on the functionality.

Let's go through these one by one.

Amazon Managed Service for Prometheus is serverless, meaning there are no servers for you to manage. You can create a workspace and just start ingesting metrics. If need be, you can create multiple workspaces in different regions, each of these workspaces will allow you up to 500 million active time series metrics ingested in a two hour time frame.

This service is fully managed and can scale automatically based on your query and ingest needs. You can use Prometheus query language, which I'm sure a lot of you are familiar with, or PromQL, to query the data.

What this means is if you are using open source Prometheus and you migrate over to Amazon Managed Service for Prometheus, all your queries stay the same. There is no change required.

And finally, the pricing is very flexible - it's a pay as you go model, you only pay for the data that you ingest and query.

Amazon Managed Grafana is also a managed, highly scalable service. You start by creating a workspace and then you connect that workspace to many different data sources. There are a lot of AWS native data sources that are supported. For example, you can connect it to Amazon Managed Service for Prometheus and use PromQL to query this data. You can also connect it to CloudWatch and use CloudWatch Insights or Metric Insights to query this data.

In addition to that, there are dozens of non-AWS services also that are supported. So when using Amazon Managed Grafana, you can essentially create a dashboard where you're pulling data from various different sources and creating a single pane of glass for you to observe your systems.

Now, let's take a look how these services can be used together to create a scalable solution.

You always start with collecting the data from your applications and services. So that's why we have the collectors layer. And you have multiple choices here.

You can use AWS Distro of OpenTelemetry that I explained earlier to collect the data. You can also use Prometheus server to scrape the metrics and then use the remote write APIs to write them to Amazon Managed Service for Prometheus workspace. If need be, you can create a Prom-compatible agent, write it on your own to collect this data.

When you create a workspace in Amazon Managed Service for Prometheus, you get an ingest endpoint that is used to ingest this data coming from these collectors. This data is then saved in an internal storage which is highly scalable and has 11 9s of durability.

I'm sure most of you can guess what AWS service is internally used, which is very scalable and have 11 9s of durability. If you guessed S3, you are correct.

The query component allows you to query this data. You can summarize and aggregate based on different durations, rules or conditions that get evaluated regularly.

There are two types of rules that are supported:

Recording rules which can pre-compute frequently accessed and computationally expensive expressions and the result of those computations are written back as new time series. So if you use those new time series in your queries, your queries yield much faster results as compared to using the original expression in your queries.
Alerting rules are conditions with thresholds. When a rule is triggered, a notification gets sent to the Alert Manager that can route these to, for example, SNS to receive this notification and fan it out and further maybe generate further notifications or send it downstream to other tools or send emails out.

No, to query and manage this data. You can use Amazon Manage Grafana to create the dashboards and visualizations that are needed. All these services can be connected together with or without VPC end points. So I hope this gives you a good overview of how these services can be used together to create a scalable observable solution that can work for you.

Now, I'd like to invite Cody to walk you through Stripe's architecture and cover the rest of the session.

If you continue to scale your business indefinitely, you will fast reach a point where the standard observable stack no longer meets the needs of your business. This might be because it can't scale to the cardinality that you need to feed into it. It might be reliability issues related to the total volume of data you're throwing at it, but most likely it has to do with the cost of the overall observable solution.

So my name is Cody Re and I spent the last year at Stripe helping them navigate this inflection point. But before that, I spent about eight years at Netflix working in similar problems. Most of that time I spent working on a distributed system named Mantis that is designed specifically to implement the architectures that we'll discuss today.

So what I'd like to do with you over the next 40 or so minutes is go through a bunch of lessons that I've learned in the last 10 years of my career where we'll cover a few facts that will help us inform our decision making. Then we'll cover five architectural changes that you can bolt onto the standard observable stack that you're probably using today in order to solve some of the problems that you're having. And then finally, we'll discuss the cultural elements that go into creating a company that has good observable practices because as your business continues to scale, your problems will become less technological or well, they'll still be technological, but you will also have people problems or cultural problems related to this.

But before we dive into that, I'd like to discuss the scale of what we're dealing with at Stripe here. So we have approximately 3000 engineers across 360 teams. Those engineers are producing about half a billion metrics every 10 seconds. And on those metrics, they have about 40,000 alerts and 150,000 dashboards or 150,000 dashboard queries, apologies.

Now, a few things might jump out at you about this half a billion metrics or sorry, half a billion metrics sounds like a lot. But the reality is, is there are solutions in distributed systems these days that can handle that volume without much trouble. 40,000 alerts and 150,000 dashboard queries also sounds like a lot. But you'll see a little further in this talk that those two sets are actually much smaller than they appear on the surface.

I would argue that the most difficult part about managing and affecting change in this environment is actually the 3000 engineers part affecting change on 3000 engineers mental model is much more difficult than deploying distributed systems these days.

So much of this talk is gonna focus on how can you convince people uh to change the way that they're doing observable in order to leverage some of the benefits that we'll talk about here.

So we have 3000 engineers at Stripe who are generally working on things that are not observable. What are the two dozen or so Stripes that are working on observable doing well. I like to say that our mandate at Stripe in the observable team is to support Stripe's availability, reliability and performance cost effectively and at scale that that's quite the mouthful. But I'd really like to direct your attention to the highlighted word cost effectively. That is a key element in observable. I would argue that cost effectiveness is one of the most important factors in observable and hassan alluded to this as well. uh, in, in his section of the talk cost is very important. If something is too expensive, you simply won't do it. And if it's cheap, you can do a lot of it.

And to think about this a little bit, I'd like to maybe just take this to two extremes and think about what happens on the first day, you spin up a new service, you probably don't have any observable for it, right? And, and that's your cost is zero and that's great. But your effectiveness is also zero. You have no observable on the other end of the spectrum over here, you can think of a world where you record absolutely everything and you could replay all that state and get your system into any state possible. But this is unfortunately far too expensive, not only in terms of money, but in terms of developer time, figuring out what's going on and latency and complexity on your service as it ensures that the observability system is consistent with what's going on in production. That's also not something we would do.

So somewhere between that point and this point that I'm standing exists this, you know, point in the middle where you have reached your cost effectiveness balance if you will. And I like to think about this as sort of the efficient frontier of observable. The efficient frontier of insight to cost it is the job of the observable team to bend that curve. To get your cost closer to observing nothing. That limit of observing nothing but to get your observable insight closer to that location of being able to view everything.

And in order to get there, we're going to have to take some, some axioms, some lessons that I've learned over the last decade and apply them to our architecture here.

So it wouldn't be a Monday morning if somebody wasn't throwing an axiom sheet at you here, I'd like to discuss three facts uh that I've learned about observability and then we will take these facts and apply them to the observable stack that we put up on screen here.

So the first one and this one came as a surprise to me, some of the work that we did in 2023 was we wrote a parser to analyze all the queries that our users were using in the, in their alerts. And one of the things that we discovered was there, there are actually only a few dozen unique queries used in most alerts. The 40,000 alerts that I put up on screen, it turns out that there's actually only a few dozen unique ones. And we learned this because my colleague Michael, who is somewhere in the crowd here, took those parse trees and ignored some nodes that don't matter. And then performed some clustering on them. And what he discovered was that just three mo uh queries that are modular that we provided to our users represents a quarter of our alerts, just eight of them, which is the set that observability provides for users represents 60% of our alerts. And if you send that out to a few dozen, you have almost everything except for the, the long tail where each unique alert has its own unique parse tree.

And there are some pretty profound implications for user experience here. Primarily that our users don't necessarily care about the query language or how they're expressing it. What they really want is just to declare what they want out of the alert. And that gives us a major advantage because we can move the storage layer, the query engine or we can even move that query outside of the time series database into a stream processor. And we'll talk about all of this a little later. But let's put this lesson in our back pocket here. We have a ton of alerts. We have 150,000 dashboard queries. A very similar dynamic exists there. Most of these are actually belonging to a very small set.

Now, our second axiom and this one might be my favorite. Um mostly because I've known this one for a while. There's actually uh an error on this slide. Can anyone point it out and feel free to shout it out. Oh, really quiet bunch today. Ok. So I wrote metrics, but that's actually not true. It's metrics, logs and traces. This applies to almost all of your observable data. You will only use between two and 20% of that data that you're shipping to your observable store and whether those are logs, metrics or traces. And when I say this, I mean that 80 to 98% of that data that you write in is not referenced by an alert will never be looked at in a dashboard and will never be looked up ad hoc during or outside of an incident. It will be written and never read.

And now if you're an engineer in this crowd, hopefully you're thinking, hey, maybe there's some way I can bifurcate this set and save my company a bunch of money and hopefully get a promotion. And if you take nothing else away from this talk and you go back to work next week and do that, that's great. And if you're a leader in this crowd and you're thinking, hey, that big observable bill that I have. This guy on stage is telling me that we're wasting 80 to 98% of that money. Yes and no right there, you're not going to read it, but there is some value in the optionality of having it there just in case you want to look at it in the future. So this isn't necessarily about getting rid of it as much as it is about reducing the cost of that optionality, creating cost effectiveness in our optionality.

So let's take that one, put it in our back pocket along with the small set of total alerts and look at the next one.

So actually I'm three, the tradeoffs are fundamentally different for observable data. And what I mean when I say this is that we generally use the same distributed systems architecture for observability that we use when we're building any other kind of system. But the tradeoffs that we want to make in those distributed systems are very different. And if we don't, you know, if we're not cognizant of this, while we're building those observable systems, we are doomed to build systems that are too expensive and too slow to achieve the objectives we want to.

And we'll dive into that a little bit when we get into the architectures. But I wanted to do sort of an illustrative example here at Stripe we process payment, transaction data. And if I had to choose between a payment being correctly logged in a ledger or the observable in telemetry about that being logged correctly, I would always choose the payment ledger being logged correctly or being written correctly. And so this tells me that no matter how important, I think my observable data is, I know that I have a preference for my business's control plane functioning correctly rather than the information about it functioning correctly.

And we can leverage this um to essentially build systems that cost significantly less. And this has an added benefit that the tradeoffs that will make tend to make our systems faster as well, which is equally important because the value of observable data decays exponentially over time. The data that's coming in right now is very important because it's being used in our alerts. And we need that data to be accurate so that we can be alerting accurately so that we can get alerted when there's a problem. And so that we don't get alerted when there isn't a problem, an hour goes by and that data is significantly less valuable to us a day, a week, a month goes by. And we almost don't care about that anymore

So it's critical that our data not only is inexpensive to process, but that it comes in quickly because if it doesn't, it's not very useful to us. So with those three axioms in our back pocket, let's start to look at the architecture a little bit.

I have this very unlabeled diagram up here and I've done that on purpose. So when I look at that cloud, I see my cloud stripes cloud and that cloud's piping data over to an observable vendor. And again, I've left that unlabeled so that you can picture your vendor there. And our users engage with that vendor's user interface in order to get their dashboards, get their alerts, get all the insights that we get from our telemetry data.

Now, quick show of hands whose observable stack looks just like this today. I've seen a lot of hands up if it doesn't keep those hands up, if it doesn't look like this today, has it looked like this in the past? Ok. Decent number of hands. So I think we all understand each other. We're on the same page with this architecture and generally speaking, this is a really good architecture. There's some stuff to love about this. Uh number one being it's simple, it's so simple. You just feed data across, you know, you install the client, you configure it and or it configures itself and then you configure your network, you feed data to your vendor and everything works, you users engage with that vendor and they get the insights that they need.

Now, the nice part about this is that this simplicity comes with limited failure modes, right? Your vendor, when you deploy changes to your environment, whether those are code changes, configuration changes or environmental changes like increased traffic occurs in your environment. These are very unlikely to break that link between your service and your vendor. And because of that, your vendor is likely to be working when you're not working. And that's your critical moment because that's the moment of truth for your vendor. If they're up when you need them, then that observability is functioning for you.

And of course, this offers us a feature-rich walled garden and we like that, right? All the new features work together really well. Everything is integrated, great. But the one thing that we know about walled gardens is that once you need to go outside them, it can become really painful. And if you're trying to grow your business, which I hope almost everybody in here is doing. One of the things you'll realize is that your business growth eventually results in super linear metrics growth. And the reason for that is simply the complexity of your environment will continue to expand. This is something that Hasan mentioned earlier. The inflection point for most companies is when they break up their monolith into microservices, you used to have one thing reporting metrics. Now you have end things reporting metrics. And so now you're reporting a super linear number of metrics compared to last week, your vendor probably charges you linearly based on metrics growth. But of course, a linear function times a super linear function is anyone here? That's, I've, I've heard the word super linear. That's that's correct.

So now the cost of observable is growing faster than your business is growing. And this is a major problem if that growth continues, you'll run into scale limitations. Eventually you'll start recording metrics that are higher cardinality than your time series database can handle. That's going to be a problem for you because then you're gonna have blind spots in your observability and it's probably in the locations that matter, the most, these high cardinality situations get worse as you continue to scale. Because while they represent maybe a small percentage of the overall business, they represent larger and larger amounts of your customers.

And finally, if this scale problem continues long enough, you'll actually start running into reliability issues and very likely you'll be firefighting all three of these before you actually go ahead and do something about this. So if I had to sum this problem up in one single sentence, I would say that this baseline architecture is very database centric. You have to put everything into the time series database before you can get the insights out of it. And because of this, you are subject not only to the technical limitations of that time series database, but also the economic limitations of whatever that interaction is. If it's, if it costs a certain amount to deal with the metrics, you want to deal with, you are subject to that cost and it's very difficult for you to bend that curve because you're in this walled garden.

So what can we do to become less database centric? Well, I'd like to talk about five architectural changes that you can bolt onto this existing architecture. So you don't need to do some sort of big swap and these five changes will help you deal with scale reliability. And most importantly, the cost effectiveness of this entire system.

The first one that you probably should be reaching for is fairly simple sharding. You can see I've got two copies of the same database here. And now, and I've got, you know, the rest of this blurred to the background here. So we can focus on these essentially starting across some partition keys. Uh is one easy way to increase the scalability and the reliability of your time series database. And I'd recommend you reach for this first, despite the fact that that might be kind of unintuitive because when you start out, you're going to be of a scale where one database works really well and you can probably double or even 10x in size and that single database will work really well.

The problem with that is that when you shard, you create a user experience problem for your customers. If they're used to everything being in one database, and now it's spread across multiple databases or data sources, it's very painful for your users to figure out where all their metrics went and to change their mindset when they're observing their systems. So shard early is, is what I'm getting at when we discovered axiom one there, when we ran our analysis on the time series database or on the uh sorry on the alert queries. One of the things that we discovered is that well, I didn't discover this. I knew this at Stripe. We have a QA, a prep prod and a prod environment. But what we discovered was that our users are querying specifically for their QA, prep prod or prod environment in their alerts.

So, something we learned was that they were sharding, implicitly in their mental models, even if we hadn't sharded the time series database underneath them. And this works pretty well. We get some extra mileage out of, out of that sharding. But what would be really great is if we could shard along one of our company's fault lines, right? A lot of, a lot of companies use, let's say regional failover to create, you know, their reliability story. If you deploy regionally and you break that region, you can simply fail your traffic into another region while you debug and and triage the the existing region. If you shared your time series database across that fault line, you'll already be up and running in the new region without taking any action whatsoever. And your users will already be thinking about all that traffic in that region now without you having to do anything.

So I would highly encourage you to shard your database earlier than you think you need to and consider your reliability fault lines when you're doing it. The second solution you can see I've stuck this cloud thing in between our cloud and the time series database is aggregation. And I think this is the first thing that teams tend to reach for because it's easy. Actually, maybe it's the second thing that teams reach for the first thing they tend to reach for is austerity measures as in you're producing too many metrics. So you go back to your client teams and you ask them, hey, do you need this tag? Can you stop producing this thing?

But one of the things that you really quickly learn is that 3000 engineers can produce metrics a lot faster than 24 engineers can ask them to clean it up. And once you realize that this is a losing battle and every team that does this will you end up in a situation where you decide to stick an aggregator in between your users who are generating metrics and your time series database and you start to remove, let's say tag values that you don't think are useful. And this is an implicit endorsement of that 80/20 or that 98/2 rule that we talked about. You are trying to get rid of the unused data while maintaining the useful 2% or 20%.

But what ends up happening here is you end up losing the optionality of that unused data as you discard it. So there's a trade off with the aggregation, you might be able to get it right. And if you nailed it perfectly, you'd save a ton of money. But you need to be careful because throwing away too much data, you end up losing information that people need.

Let's dive into a little bit what that architecture looks like you might be looking at this and thinking like, ok, wow, this is a streaming map produce not super interested. And that's true. And generally speaking, when people think about a streaming mapreduce, there's two technologies that tend to come to mind. Um anyone wanna shout them out, Kafka and Flink tend to be the two most people immediately think, ok. If I need to do a streaming mapreduce, I'm gonna throw everything on Kafka, I'm gonna spin up a Flink architecture and I'm going to process all these metrics, reduce the dimensions. I need out of it and report it into my time series database.

The problem with this is that when you change the Flink architecture, whether you're deploying a new version or maybe a node has failed, what happens? It stops, it waits for that change to deploy fully, then it resets to the last checkpoint and starts processing again. Meanwhile, none of your metrics are being delivered, none of your alerts are functioning correctly. And you've essentially just given up x minutes of data while you or x minutes of observability, while you reprocess everything. This is entirely unacceptable for your observable architecture.

So what led us to that decision? The problem was, was that we applied the same thought process that we applied to all of our other distributed systems here. We didn't consider axiom three. What are our tradeoffs? Our trade off here is that we want our data to be really fresh we want our data to be really fast and we want our data to be really cheap.

So what we need to do here is apply other technologies, right? There are a lot of great technologies in this space right now. Right I mentioned Mantis earlier, there's Vector, there's the OpenTelemetry Collector both deployed into aggregation mode. Uh there are a lot of new stream processors coming up in this space and I highly encourage you to check them out. I think even Flink can be tuned to behave like this. But you probably want to keep Kafka out of your observability stack.

The third architectural change we can apply...

I like this one, a lot tiered storage also nothing completely wild. But unlike our aggregation, if we can bifurcate that 80/20 perfectly into, you know, putting the 80 into our our cold tier here and keeping the 20 in the hot tier, we can essentially create all that optionality and not give up any of any of the observable power because we still have all the data.

Typically speaking with this cold storage, we get less expensive data, but at the cost of speed, right, looking it up in the cold tier is much, much slower. But the thing that we know about that data we're putting in the cold tier is that we're probably not going to look it up anyway.

And unlike the rest of these Hassan alluded to this earlier as well, there's really only one technology that you should consider for the 80% case. And that's Amazon S3, it's incredibly durable and it's a somewhat open secret in the industry that S3 with the correct file format and with a good index can behave almost as quickly as a database with effectively infinite scalability and at a very favorable price point.

Now, the fourth change and this one's probably my favorite because i spent a lot of my career working on this. I wrote streaming alerts, but really this should be written something like arbitrary computation on your metric stream. So again, if you've deployed this stream processor in between your cloud, that's generating metrics and your observability solution, you can effectively decouple the concept of alerting from the time series database.

And that means that you can decouple the concept of alerting from the technical limitations of your storage layer. And this might sound basic, but this becomes huge, right? You may have teams that are dealing with really high cardinality metrics that can't be put into the time series database, right? Maybe your service mesh wants to do an end by m comparison of all the connections at your company or maybe you're comparing you know, devices and and back end versions in countries or across networks, you know, you multiply a few things together and you get a very high cardinality computation.

What you can do if you deploy this stream processor in between is that you can perform alerts in memory where it's very inexpensive and where cardinality is effectively limitless, it's, you know, whatever you can fit onto the box. And this is a major advantage because now we can optionally choose to store it in the time series database to store it in the cold storage or to not store it at all. Right, we could just write a summary value in, we can toss the data away unless of course, maybe an alert is happening. Then maybe we optionally write it into the time series database. Maybe we pump it into the cold storage or maybe we just write a snapshot into the alert and toss away the rest of the data.

The point is is that this this streaming system can be reactive to the changes in your environment, including your changing alerting needs as you go from a normal state to an anomalous state. And of course, I've been talking about metrics this whole time, but this exact same concept applies to logs and traces as well. You can actually take this a step further. You know, we probably all have those users who've written these really gnarly multi 100 line alerts that are basically an application that's actually monitoring their, their service. As opposed to, you know, a simple declared alert, those users probably aren't that happy with that alert either.

I know that, you know, you on the observable team, aren't this gives them an additional option. One of the things that I've seen, you know, in the last 10 years here is that if you enable users to write an application on top of this stream processor to monitor their service, they will implement all that domain knowledge that they have on top of your your stream processor and build an application that monitors their service. You know, in the context of your business.

I've seen cloud gateway teams build applications that monitor all the middle tier services for connectivity problems, latency errors. I've seen globally distributed database teams that have used this to monitor the entire view of their distributed architecture. You buy yourself a lot of leeway, not only with your power users, but also with your users who are hungry for more speed and more cardinality.

So streaming alerts or more specifically arbitrary computation on that stream of data before it goes into your time series database. Not only lets you serve the power users serve your cardinality hungry users, it also makes it very easy to route to the different data stores depending on changing conditions within your environment.

The fifth and final architectural consideration is isolation and this is a really big one because you're going to be working at a company that has probably a lot of infrastructure teams, they're building great platform code. Everyone's going to be excited about that. You're going to want to use it. The problem is is that every time you take a dependency on a piece of technology that's been developed at your company. You risk being in a failure state at the same time that that technology is in a failure state. And if they're depending on you to get them out of that failure state, that's a major problem.

So we'll talk about this a little bit more in the cultural section. But you need to cultivate a culture of self reliance here on the observable team, you need to evaluate your tradeoffs realistically. And ultimately, you need to minimize the probability of observable, experiencing an incident while your company is experiencing an incident. But you still need these technological levers, right? You can't just exist, you know, outside of all your company's technology and I've been there, i, i've worked on observable teams that have rewritten the platform code at their company because they didn't want to depend on the platform, team's code.

I've been on teams that have rewritten their deployment system to not use the company's deployment system. So these teams have existed essentially completely outside of their company's normal platform offerings. I think an easier place to get a lever is through your partner, right? So when you start out with that baseline architecture that we talked about your partner is your observable vendor. They are the start and finish of your technical offerings. But as you start to bolt more of these solutions onto it, more of the responsibility falls on you to maintain that separation from your environment.

So I have up here, you know, I've shown you Kinesis streams for the stream processor, but you could just as easily deploy EKS, deploy Mantis to it, deploy Vector to it, deploy OpenTelemetry to it. I've shown Amazon Managed Service for Prometheus, but you could just as easily again deploy something like that to EKS or to ECS. I don't really consider S3 optional. It's kind of an essential, essential component of this architecture. And then of course, same thing with Amazon Managed Grafana.

So the point here is you really need to lean on your partners. That was true when you were just using a vendor based solution. And it continues to be true as you begin to take more responsibility for your architecture. If you don't heed this, you're going to have outages at the same time that your company is and that's a major problem for you and it's a major problem for your business.

But I mentioned that this isn't just a technical problem, it's also a cultural problem and the larger your company is, the more cultural this problem becomes. I'd like to take a few minutes to talk about how we evolved our culture at Stripe in order to meet these changing needs and changing demands of observable.

And the first one is of course, creating a culture of self reliance on the observable team. And really what I mean by this, if I had to distill it down to one definition is minimizing the probability of having an observable incident conditional on your company having an incident. Let's say, you know, you take a dependency on your company's container platform that they're offering, maybe it's really reliable, maybe it has like five nines of reliability. And that means that your overall probability of having an incident based on the decision seems very low.

The problem is if they're having a problem, you're having a problem. And that means your conditional probability of having an incident in that case is very high. This is something that we want to minimize and this influences your technology choices. And so I've sort of drove this point home that your engineers need to make their technology decisions based on, on this mindset, but that's not where it stops.

The leaders on your team are going to need to be able to get behind this mindset and change the way that they interact with the company. What I mean by this is they have peers on the probably in the infrastructure org at your company and those peers are going to be evaluated on the uptake of the technologies that they're building and they're going to look at the observable team and say, hey, you run, you know, a lot of services here at the company you're, you're running, you know, a lot of compute resources. Why aren't you using our platform that would look really good for us. Your leaders need to be able to successfully negotiate that relationship without, you know, caving in and giving in to the, the pressure and temptation to adopt the cool technologies that are being built across your company.

Furthermore, they need to be able to manage this relationship upward as well because you're going to have executives at the company who are looking at things from a very high level view and they're going to start seeing redundancy. Why isn't the observable team using these high leverage solutions that these other teams are building? You know, I'm told this is great. I'm told that is great. Why is the observable team using that? Why are they wasting all these resources and simply put the leaders on your team are going to need to be able to navigate that conversation with this mindset because if they don't, you're going to end up taking dependencies that are very unfavorable for the reliability dynamics that you need on observable.

Finally, you need to cultivate a culture of observable across the company. And this really boils down to one thing. You need to make it easy for your users to do the right thing. And there comes in, you know, this, this is an old user experience adage, but there comes so many or there are so many points or moments of truth where you can do this for your users.

I mentioned earlier that we did an analysis and we found out that 60% of our alerts came from just eight modules that we offered to users. But one of the other things that we discovered in that analysis is that there are a few extra tens of percentage alerts that could fit into those modules, but just didn't happen to be written that way and part of that's on us because I think maybe we didn't make it easy enough to implement those modules.

Taking this thought process a step further. If we manage to modular, all of those alerts, we could do things like move around which storage layer they're in or move around which query engine they run on or even again, move them out of the out of the query engine and into the stream processor all without your users knowing. Ultimately, we need to engineer our user experience such that our users are guided towards the things that we want them to do, creating lower cardinality metrics or executing those higher cardinality queries on the on the stream processor and not in the query engine or on the uh on the cold storage and not where you want them to be

A corollary is we want to make it hard to do the wrong thing. So don't, you know, don't use an ultra flexible query language that allows them to build entire programs in the query engine, right? If you want them to do that, make them write a stream processing job on your stream processor, use simple declarative alerts instead of complex programmable alerts, this thought process can be taken, you know, to the extreme. And I'll leave that as an exercise to the listener.

So I've thrown a lot at you here. So I'd like to recap. There are five architectural decisions that you should make in order to reduce your scale problems, your reliability problems and ultimately improve your cost effectiveness.

The first one should be sharding. The first one you're probably going to go for is aggregation. Tiered storage is critical to getting that 80/20 split correctly. And streaming alerts are really sort of the future of, of what we're doing here in observability as you begin to take control over your architecture and you should be considering isolation at all points within this. Because if you don't stay isolated from your company's technology, you're going to be subject to failures of your company's technology.

Culturally, you need to really cultivate that culture of self reliance and finally make it easy to do the right thing for your users. But if I had to distill this just down to one line, I want you to become way less database centric and focus on your data plane and observable.

So if you stuck with us this far, thank you. And I hope you have a great re:Invent.

你可能感兴趣的:(aws,亚马逊云科技,科技,人工智能,re:Invent,2023,生成式AI,云服务)

guava loadingCache代码示例 IM 胡鹏飞 Java 工具类介绍
publicclassTest2{publicstaticvoidmain(String[]args)throwsException{LoadingCachecache=CacheBuilder.newBuilder()//设置并发级别为8，并发级别是指可以同时写缓存的线程数.concurrencyLevel(8)//设置缓存容器的初始容量为10.initialCapacity(10)//设置缓存
系统学习Python——并发模型和异步编程：进程、线程和GIL
分类目录：《系统学习Python》总目录在文章《并发模型和异步编程：基础知识》我们简单介绍了Python中的进程、线程和协程。本文就着重介绍Python中的进程、线程和GIL的关系。Python解释器的每个实例都是一个进程。使用multiprocessing或concurrent.futures库可以启动额外的Python进程。Python的subprocess库用于启动运行外部程序（不管使用何种
为什么会出现“与此站点的连接不安全”警告？
当浏览器弹出“与此站点的连接不安全”的红色警告时，不仅会让访客感到不安，还可能直接导致用户流失、品牌信誉受损，甚至引发数据泄露风险。作为网站运营者，如何快速解决这一问题？一、为什么会出现“与此站点的连接不安全”警告？浏览器提示“不安全连接”，本质上是检测到当前网站与用户之间的数据传输未经过加密保护。以下是触发警告的常见原因：1.未安装SSL证书SSL（SecureSocketsLayer）证书是网
有必要获得WHQL测试认证吗，有什么好处？
什么是WHQL认证？WHQL是MicrosoftWindowsHardwareQualityLab的缩写，中文意思是Windows硬件设备质量实验室，主要是对Windows操作系统的兼容性测试，检验硬件产品和驱动程序在windows系统下的兼容性和稳定性。当某一硬件或软件通过WHQL测试时，制造商可以在其产品包装和广告上使用“DesignedforWindows”标志。该标志可以证明硬件或软件已经
驱动程序为什么要做 WHQL 认证? GDCA SSL证书网络协议网络
驱动程序进行WHQL（WindowsHardwareQualityLabs）认证的核心价值在于解决兼容性、安全性和市场准入三大关键问题，具体必要性如下：️‌一、规避系统拦截，保障驱动可用性‌消除安装警告‌未认证的驱动在安装时会触发Windows的‌红色安全警告‌（如“无法验证发布者”），甚至被系统强制拦截。通过WHQL认证的驱动获得微软数字签名，用户可无阻安装‌。满足系统强制要求‌Windows1
求是网：“内卷式”竞争的突出表现和主要危害有哪些？加百力财经研究科技知识人工智能大数据
"内卷式"竞争主要表现为：企业层面的低价竞争、同质化竞争和营销"逐底竞争"；地方政府层面的违规优惠政策、盲目重复建设和设置市场壁垒。危害体现在三个层面：微观上导致"劣币驱逐良币"，损害消费者利益；中观上破坏行业生态，挤压产业链利润空间；宏观上扭曲资源配置，抑制创新活力。什么是“内卷式”竞争？概括其一般特征，是指经济主体为了维持市场地位或争夺有限市场，不断投入大量精力和资源，却没有带来整体收益增长的
WHQL签名怎么申请 GDCA SSL证书 windows
WHQL（WindowsHardwareQualityLabs）签名是微软对硬件和驱动程序进行认证的一种方式，以确保它们与Windows操作系统的兼容性和稳定性。以下是申请WHQL签名的基本步骤，供您参考：1.准备阶段准备硬件设备和驱动程序：确保您的硬件设备已经准备好，并且对应的驱动程序已经经过充分的测试，能够在各种配置和环境下正常工作。获取EV代码签名证书：根据微软的要求，驱动程序进行WHQL认
C++ 11 Lambda表达式和min_element()与max_element()的使用_c++ lamda函数 min_element(
网上学习资料一大堆，但如果学到的知识不成体系，遇到问题时只是浅尝辄止，不再深入研究，那么很难做到真正的技术提升。需要这份系统化的资料的朋友，可以添加戳这里获取一个人可以走的很快，但一群人才能走的更远！不论你是正从事IT行业的老鸟或是对IT行业感兴趣的新人，都欢迎加入我们的的圈子（技术交流、学习资源、职场吐槽、大厂内推、面试辅导），让我们一起学习成长！intmain(){vectormyvec{3,
【LeetCode 热题 100】24. 两两交换链表中的节点——（解法一）迭代+哨兵 xumistore LeetCode leetcode 链表算法 java
Problem:24.两两交换链表中的节点题目：给你一个链表，两两交换其中相邻的节点，并返回交换后链表的头节点。你必须在不修改节点内部的值的情况下完成本题（即，只能进行节点交换）。文章目录整体思路完整代码时空复杂度时间复杂度：O(N)空间复杂度：O(1)整体思路这段代码旨在解决一个经典的链表操作问题：两两交换链表中的节点(SwapNodesinPairs)。问题要求将链表中每两个相邻的节点进行交换
Guava LoadingCache sqyaa. java并发编程 Java知识 jvm 缓存 guava
LoadingCache是GoogleGuava库提供的一个高级缓存实现，它通过自动加载机制简化了缓存使用模式。核心特性自动加载机制当缓存未命中时，自动调用指定的CacheLoader加载数据线程安全：并发请求下，相同key只会加载一次灵活的过期策略支持基于写入时间(expireAfterWrite)和访问时间(expireAfterAccess)的过期可设置最大缓存大小，基于LRU策略淘汰丰富的
JavaScript 树形菜单总结 Auscy microsoft
树形菜单是前端开发中常见的交互组件，用于展示具有层级关系的数据（如文件目录、分类列表、组织架构等）。以下从核心概念、实现方式、常见功能及优化方向等方面进行总结。一、核心概念层级结构：数据以父子嵌套形式存在，如{id:1,children:[{id:2}]}。节点：树形结构的基本单元，包含自身信息及子节点（若有）。展开/折叠：子节点的显示与隐藏切换，是树形菜单的核心交互。递归渲染：因数据层级不固定，
基于定制开发开源AI智能名片S2B2C商城小程序的社群游戏定制策略研究说私域人工智能小程序游戏
摘要：本文聚焦社群游戏定制领域，深入探讨以社群文化和用户偏好为导向的定制策略。通过分析互动游戏活动、社群文化塑造等关键要素，结合定制开发开源AI智能名片S2B2C商城小程序的技术特性，提出针对性游戏定制方案。研究旨在提升社群用户参与度与游戏体验，为社群游戏发展提供理论支持与实践指导。关键词：社群游戏定制；定制开发开源AI智能名片S2B2C商城小程序；社群文化；用户偏好一、引言在数字化社交蓬勃发展的
嵌入式系统LCD显示模块编程实践
本文还有配套的精品资源，点击获取简介：本文档提供了一个具有800x480分辨率的3.5英寸液晶显示模块LW350AC9001的驱动程序代码，以及嵌入式系统中使用C/C++语言进行硬件编程的实践指南。该模块的2mm厚度使其适用于空间受限的便携式设备。内容包括驱动程序源代码、硬件控制接口使用方法，以及如何在嵌入式系统中进行图形处理、电源管理与性能优化。1.嵌入式系统原理1.1嵌入式系统概念嵌入式系统是
深入剖析OpenJDK 18 GA源码：Java平台最新发展想法臃肿
本文还有配套的精品资源，点击获取简介：OpenJDK18GA作为Java开发的关键里程碑，提供了诸多新特性和改进。本文章深入探讨了OpenJDK18GA源码，揭示其内部机制，帮助开发者更好地理解和利用这个版本。文章还涵盖了PatternMatching、SealedClasses、Records、JEP395、JEP406和JEP407等特性，以及HotSpot虚拟机、编译器、垃圾收集器、内存模型
Android 开源组件和第三方库汇总 gyyzzr Android Android 开源框架
转载1、github排名https://github.com/trending,github搜索：https://github.com/search2、https://github.com/wasabeef/awesome-android-ui目录UIUI卫星菜单节选器下拉刷新模糊效果HUD与Toast进度条UI其它动画网络相关响应式编程地图数据库图像浏览及处理视频音频处理测试及调试动态更新热更新
FPGA小白到项目实战：Verilog+Vivado全流程通关指南（附光学类岗位技能映射）阿牛的药铺算法移植部署 fpga开发 verilog
FPGA小白到项目实战：Verilog+Vivado全流程通关指南（附光学类岗位技能映射）引言：为什么这个FPGA入门路线能帮你快速上岗？本文设计了一条**"Verilog语法→工具链操作→光学项目实战→岗位技能对标"的阶梯式学习路径。不同于泛泛而谈的FPGA教程，我们聚焦光学类产品开发**核心能力（时序接口设计、图像处理算法移植、高速接口应用），通过3个递进式项目（从LED闪烁到图像边缘检测），
docker-compose方式搭建lnmp环境——筑梦之路筑梦之路 linux系统运维国产化 docker android adb
docker-compose.yml文件#生成docker-compose.ymlcat>docker-compose.ymlnginx/conf.d/default.conf">www/index.phpecho"开始启动服务..."docker-composeup-d#获取本机ipip_addr=$(hostname-I|awk'{print$1}')echo"部署完成！"echo"访问测试页
ARM嵌入式可编程控制器技术开发拉勾科研工作室 arm开发
PLC自动化设计|毕业设计指导|工业自动化解决方案✨专业领域：PLC程序设计与调试工业自动化控制系统HMI人机界面开发工业传感器应用电气控制系统设计工业网络通信擅长工具：西门子S7系列PLC编程三菱/欧姆龙PLC应用触摸屏界面设计电气CAD制图工业现场总线技术自动化设备调试主要内容：PLC控制系统设计工业自动化方案规划电气原理图绘制控制程序编写与调试毕业论文指导毕业设计题目与程序设计✅具体问题可以
Android ViewBinding 使用与封装教程积跬步DEV Android 开发实战大全 android
AndroidViewBinding使用与封装教程：一、ViewBinding是什么？核心功能：为每个XML布局文件自动生成一个绑定类（如ActivityMainBinding），直接暴露所有带ID的视图引用。优点：避免繁琐的findViewById()，类型安全且编译时检查。对比DataBinding：ViewBinding仅处理视图引用，无数据绑定功能。DataBinding支持双向数据绑定，
Java大厂面试实录：谢飞机的电商场景技术问答（Spring Cloud、MyBatis、Redis、Kafka、AI等）
Java大厂面试实录：谢飞机的电商场景技术问答（SpringCloud、MyBatis、Redis、Kafka、AI等）本文模拟知名互联网大厂Java后端岗位面试流程，以电商业务为主线，由严肃面试官与“水货”程序员谢飞机展开有趣的对话，涵盖SpringCloud、MyBatis、Redis、Kafka、SpringSecurity、AI等热门技术栈，并附详细解析，助力求职者备战大厂面试。故事设定谢
【超硬核】JVM源码解读：Java方法main在虚拟机上解释执行 HeapDump性能社区 java 开发语言后端 jvm
本文由HeapDump性能社区首席讲师鸠摩（马智）授权整理发布第1篇-关于Java虚拟机HotSpot，开篇说的简单点开讲Java运行时，这一篇讲一些简单的内容。我们写的主类中的main()方法是如何被Java虚拟机调用到的？在Java类中的一些方法会被由C/C++编写的HotSpot虚拟机的C/C++函数调用，不过由于Java方法与C/C++函数的调用约定不同，所以并不能直接调用，需要JavaC
算法学习笔记：17.蒙特卡洛算法 ——从原理到实战，涵盖 LeetCode 与考研 408 例题
在计算机科学和数学领域，蒙特卡洛算法（MonteCarloAlgorithm）以其独特的随机抽样思想，成为解决复杂问题的有力工具。从圆周率的计算到金融风险评估，从物理模拟到人工智能，蒙特卡洛算法都发挥着不可替代的作用。本文将深入剖析蒙特卡洛算法的思想、解题思路，结合实际应用场景与Java代码实现，并融入考研408的相关考点，穿插图片辅助理解，帮助你全面掌握这一重要算法。蒙特卡洛算法的基本概念蒙特卡
分布式学习笔记_04_复制模型 NzuCRAS 分布式学习笔记架构后端
常见复制模型使用复制的目的在分布式系统中，数据通常需要被分布在多台机器上，主要为了达到：拓展性：数据量因读写负载巨大，一台机器无法承载，数据分散在多台机器上仍然可以有效地进行负载均衡，达到灵活的横向拓展高容错&高可用：在分布式系统中单机故障是常态，在单机故障的情况下希望整体系统仍然能够正常工作，这时候就需要数据在多台机器上做冗余，在遇到单机故障时能够让其他机器接管统一的用户体验：如果系统客户端分布
Python之七彩花朵代码实现 PlutoZuo Python python 开发语言
Python之七彩花朵代码实现文章目录Python之七彩花朵代码实现下面是一个简单的使用Python的七彩花朵。这个示例只是一个简单的版本，没有很多高级功能，但它可以作为一个起点，你可以在此基础上添加更多功能。importturtleastuimportrandomasraimportmathtu.setup(1.0,1.0)t=tu.Pen()t.ht()colors=['red','skybl
Python 脚本最佳实践2025版
前文可以直接把这篇文章喂给AI,可以放到AI角色设定里,也可以直接作为提示词.这样,你只管提需求,写脚本就让AI来.概述追求简洁和清晰：脚本应简单明了。使用函数(functions)、常量(constants)和适当的导入(import)实践来有逻辑地组织你的Python脚本。使用枚举(enumerations)和数据类(dataclasses)等数据结构高效管理脚本状态。通过命令行参数增强交互性
（Python基础篇）循环结构 EternityArt 基础篇 python
一、什么是Python循环结构？循环结构是编程中重复执行代码块的机制。在Python中，循环允许你：1.迭代处理数据：遍历列表、字典、文件内容等。2.自动化重复任务：如批量处理数据、生成序列等。3.控制执行流程：根据条件决定是否继续或终止循环。二、为什么需要循环结构？假设你需要打印1到100的所有偶数：没有循环：需手动编写100行print()语句。print(0)print(2)print(4)
（Python基础篇）字典的操作 EternityArt 基础篇 python 开发语言
一、引言在Python编程中，字典（Dictionary）是一种极具灵活性的数据结构，它通过“键-值对”（key-valuepair）的形式存储数据，如同现实生活中的字典——通过“词语（键）”快速查找“释义（值）”。相较于列表和元组的有序索引访问，字典的优势在于基于键的快速查找，这使得它在处理需要频繁通过唯一标识获取数据的场景中极为高效。掌握字典的操作，能让我们更高效地组织和管理复杂数据，是Pyt
基于开源AI智能名片链动2+1模式与S2B2C商城小程序的渠道选择策略研究说私域人工智能小程序
摘要：在数字化商业环境下，品牌与产品的渠道选择对其市场推广和运营成功至关重要。本文聚焦于如何依据自身品牌和产品特性，结合开源AI智能名片链动2+1模式与S2B2C商城小程序，运用科学的渠道选择方法，慎重挑选1-2个适宜平台，集中资源发力并取得成绩后再拓展其他渠道。通过理论分析与案例研究，探讨该策略的有效性和可行性，为企业渠道布局提供参考。关键词：渠道选择；开源AI智能名片；链动2+1模式；S2B2
深入解析 TCP 连接状态与进程挂起、恢复与关闭誰能久伴不乏 tcp/ip 网络服务器
文章目录深入解析TCP连接状态与进程挂起、恢复与关闭一、TCP连接的各种状态1.**`LISTEN`**（监听）2.**`SYN_SENT`**（SYN已发送）3.**`SYN_RECEIVED`**（SYN已接收）4.**`ESTABLISHED`**（已建立）5.**`FIN_WAIT_1`**（关闭等待1）6.**`FIN_WAIT_2`**（关闭等待2）7.**`CLOSE_WAIT`**
基于架构的软件设计（Architecture-Based Software Design，ABSD）是一种以架构为核心的软件开发方法
ABSD方法与生命周期基于架构的软件设计（Architecture-BasedSoftwareDesign，ABSD）是一种以架构为核心的软件开发方法，强调在开发的各个阶段都要以架构为中心，确保系统的整体结构和质量属性得到有效管理。ABSD方法是一个自顶向下、递归细化的过程，软件系统的架构通过该方法得到细化，直到能产生软件构件和类。ABSD方法的三个基础功能的分解：使用基于模块的内聚和耦合技术，将
多线程编程之join()方法周凡杨 java JOIN 多线程编程线程
现实生活中，有些工作是需要团队中成员依次完成的，这就涉及到了一个顺序问题。现在有T1、T2、T3三个工人，如何保证T2在T1执行完后执行，T3在T2执行完后执行？问题分析：首先问题中有三个实体，T1、T2、T3，因为是多线程编程，所以都要设计成线程类。关键是怎么保证线程能依次执行完呢？ Java实现过程如下： public class T1 implements Runnabl
java中switch的使用 bingyingao java enum break continue
java中的switch仅支持case条件仅支持int、enum两种类型。用enum的时候，不能直接写下列形式。 switch (timeType) { case ProdtransTimeTypeEnum.DAILY: break; default: br
hive having count 不能去重 daizj hive 去重 having count 计数
hive在使用having count()是，不支持去重计数 hive (default)> select imei from t_test_phonenum where ds=20150701 group by imei having count(distinct phone_num)>1 limit 10; FAILED: SemanticExcep
WebSphere对JSP的缓存周凡杨 WAS JSP 缓存
对于线网上的工程，更新JSP到WebSphere后，有时会出现修改的jsp没有起作用，特别是改变了某jsp的样式后，在页面中没看到效果，这主要就是由于websphere中缓存的缘故，这就要清除WebSphere中jsp缓存。要清除WebSphere中JSP的缓存，就要找到WAS安装后的根目录。现服务
设计模式总结朱辉辉33 java 设计模式
1.工厂模式 1.1 工厂方法模式 (由一个工厂类管理构造方法) 1.1.1普通工厂模式(一个工厂类中只有一个方法) 1.1.2多工厂模式(一个工厂类中有多个方法) 1.1.3静态工厂模式(将工厂类中的方法变成静态方法) &n
实例：供应商管理报表需求调研报告老A不折腾 finereport 报表系统报表软件信息化选型
引言随着企业集团的生产规模扩张，为支撑全球供应链管理，对于供应商的管理和采购过程的监控已经不局限于简单的交付以及价格的管理，目前采购及供应商管理各个环节的操作分别在不同的系统下进行，而各个数据源都独立存在，无法提供统一的数据支持；因此，为了实现对于数据分析以提供采购决策，建立报表体系成为必须。业务目标 1、通过报表为采购决策提供数据分析与支撑 2、对供应商进行综合评估以及管理，合理管理和
mysql 林鹤霄
转载源：http://blog.sina.com.cn/s/blog_4f925fc30100rx5l.html mysql -uroot -p ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: YES) [root@centos var]# service mysql
Linux下多线程堆栈查看工具(pstree、ps、pstack) aigo linux
原文：http://blog.csdn.net/yfkiss/article/details/6729364 1. pstree pstree以树结构显示进程$ pstree -p work | grep adsshd(22669)---bash(22670)---ad_preprocess(4551)-+-{ad_preprocess}(4552) &n
html input与textarea 值改变事件 alxw4616 JavaScript
// 文本输入框(input) 文本域(textarea)值改变事件 // onpropertychange(IE) oninput(w3c) $('input,textarea').on('propertychange input', function(event) { console.log($(this).val()) });
String类的基本用法百合不是茶 String
字符串的用法; // 根据字节数组创建字符串 byte[] by = { 'a', 'b', 'c', 'd' }; String newByteString = new String(by); 1,length() 获取字符串的长度 &nbs
JDK1.5 Semaphore实例 bijian1013 java thread java多线程 Semaphore
Semaphore类一个计数信号量。从概念上讲，信号量维护了一个许可集合。如有必要，在许可可用前会阻塞每一个 acquire()，然后再获取该许可。每个 release() 添加一个许可，从而可能释放一个正在阻塞的获取者。但是，不使用实际的许可对象，Semaphore 只对可用许可的号码进行计数，并采取相应的行动。 S
使用GZip来压缩传输量 bijian1013 java GZip
启动GZip压缩要用到一个开源的Filter：PJL Compressing Filter。这个Filter自1.5.0开始该工程开始构建于JDK5.0，因此在JDK1.4环境下只能使用1.4.6。 PJL Compressi
【Java范型三】Java范型详解之范型类型通配符 bit1129 java
定义如下一个简单的范型类， package com.tom.lang.generics; public class Generics<T> { private T value; public Generics(T value) { this.value = value; } }
【Hadoop十二】HDFS常用命令 bit1129 hadoop
1. 修改日志文件查看器 hdfs oev -i edits_0000000000000000081-0000000000000000089 -o edits.xml cat edits.xml 修改日志文件转储为xml格式的edits.xml文件，其中每条RECORD就是一个操作事务日志 2. fsimage查看HDFS中的块信息等 &nb
怎样区别nginx中rewrite时break和last ronin47
在使用nginx配置rewrite中经常会遇到有的地方用last并不能工作，换成break就可以，其中的原理是对于根目录的理解有所区别，按我的测试结果大致是这样的。 location / { proxy_pass http://test;
java-21.中兴面试题输入两个整数 n 和 m ，从数列 1 ， 2 ， 3.......n 中随意取几个数 , 使其和等于 m bylijinnan java
import java.util.ArrayList; import java.util.List; import java.util.Stack; public class CombinationToSum { /* 第21 题 2010 年中兴面试题编程求解：输入两个整数 n 和 m ，从数列 1 ， 2 ， 3.......n 中随意取几个数 , 使其和等
eclipse svn 帐号密码修改问题开窍的石头 eclipse SVN svn帐号密码修改
问题描述： Eclipse的SVN插件Subclipse做得很好，在svn操作方面提供了很强大丰富的功能。但到目前为止，该插件对svn用户的概念极为淡薄，不但不能方便地切换用户，而且一旦用户的帐号、密码保存之后，就无法再变更了。解决思路：删除subclipse记录的帐号、密码信息，重新输入
[电子商务]传统商务活动与互联网的结合 comsci 电子商务
某一个传统名牌产品，过去销售的地点就在某些特定的地区和阶层，现在进入互联网之后，用户的数量群突然扩大了无数倍，但是，这种产品潜在的劣势也被放大了无数倍，这种销售利润与经营风险同步放大的效应，在最近几年将会频繁出现。。。。如何避免销售量和利润率增加的
java 解析 properties-使用 Properties-可以指定配置文件路径 cuityang java properties
#mq xdr.mq.url=tcp://192.168.100.15:61618; import java.io.IOException; import java.util.Properties; public class Test { String conf = "log4j.properties"; private static final
Java核心问题集锦 darrenzhu java 基础核心难点
注意，这里的参考文章基本来自Effective Java和jdk源码 1)ConcurrentModificationException 当你用for each遍历一个list时，如果你在循环主体代码中修改list中的元素，将会得到这个Exception，解决的办法是： 1)用listIterator, 它支持在遍历的过程中修改元素， 2)不用listIterator, new一个
1分钟学会Markdown语法 dcj3sjt126com markdown
markdown 简明语法基本符号 *,-,+ 3个符号效果都一样，这3个符号被称为 Markdown符号空白行表示另起一个段落 `是表示inline代码，tab是用来标记代码段，分别对应html的code，pre标签换行单一段落( <p>) 用一个空白行连续两个空格会变成一个 <br> 连续3个符号，然后是空行
Gson使用二（GsonBuilder） eksliang json gson GsonBuilder
转载请出自出处：http://eksliang.iteye.com/blog/2175473 一.概述 GsonBuilder用来定制java跟json之间的转换格式二.基本使用实体测试类：温馨提示：默认情况下@Expose注解是不起作用的,除非你用GsonBuilder创建Gson的时候调用了GsonBuilder.excludeField
报ClassNotFoundException: Didn't find class "...Activity" on path: DexPathList gundumw100 android
有一个工程，本来运行是正常的，我想把它移植到另一台PC上，结果报： java.lang.RuntimeException: Unable to instantiate activity ComponentInfo{com.mobovip.bgr/com.mobovip.bgr.MainActivity}: java.lang.ClassNotFoundException: Didn't f
JavaWeb之JSP指令 ihuning javaweb
要点 JSP指令简介 page指令 include指令 JSP指令简介 JSP指令（directive）是为JSP引擎而设计的，它们并不直接产生任何可见输出，而只是告诉引擎如何处理JSP页面中的其余部分。 JSP指令的基本语法格式： <%@ 指令属性名="
mac上编译FFmpeg跑ios 啸笑天 ffmpeg
1、下载文件：https://github.com/libav/gas-preprocessor，复制gas-preprocessor.pl到/usr/local/bin/下，修改文件权限：chmod 777 /usr/local/bin/gas-preprocessor.pl 2、安装yasm-1.2.0 curl http://www.tortall.net/projects/yasm
sql mysql oracle中字符串连接 macroli oracle sql mysql SQL Server
有的时候，我们有需要将由不同栏位获得的资料串连在一起。每一种资料库都有提供方法来达到这个目的： MySQL: CONCAT() Oracle: CONCAT(), || SQL Server: + CONCAT() 的语法如下： Mysql 中 CONCAT(字串1, 字串2, 字串3, ...): 将字串1、字串2、字串3，等字串连在一起。请注意，Oracle的CON
Git fatal: unab SSL certificate problem: unable to get local issuer ce rtificate qiaolevip 学习永无止境每天进步一点点 git 纵观千象
// 报错如下： $ git pull origin master fatal: unable to access 'https://git.xxx.com/': SSL certificate problem: unable to get local issuer ce rtificate // 原因：由于git最新版默认使用ssl安全验证，但是我们是使用的git未设
windows命令行设置wifi surfingll windows wifi 笔记本wifi
还没有讨厌无线wifi的无尽广告么，还在耐心等待它慢慢启动么教你命令行设置笔记本电脑wifi： 1、开启wifi命令 netsh wlan set hostednetwork mode=allow ssid=surf8 key=bb123456 netsh wlan start hostednetwork pause 其中pause是等待输入，可以去掉 2、
Linux（Ubuntu）下安装sysv-rc-conf wmlJava linux ubuntu sysv-rc-conf
安装：sudo apt-get install sysv-rc-conf 使用：sudo sysv-rc-conf 操作界面十分简洁，你可以用鼠标点击，也可以用键盘方向键定位，用空格键选择，用Ctrl+N翻下一页，用Ctrl+P翻上一页，用Q退出。背景知识 sysv-rc-conf是一个强大的服务管理程序，群众的意见是sysv-rc-conf比chkconf
svn切换环境，重发布应用多了javaee标签前缀 zengshaotao javaee
更换了开发环境，从杭州，改变到了上海。svn的地址肯定要切换的，切换之前需要将原svn自带的.svn文件信息删除，可手动删除，也可通过废弃原来的svn位置提示删除.svn时删除。然后就是按照最新的svn地址和规范建立相关的目录信息，再将原来的纯代码信息上传到新的环境。然后再重新检出，这样每次修改后就可以看到哪些文件被修改过，这对于增量发布的规范特别有用。检出