weixin_30552635

elasticsearch容量规划

https://docs.bonsai.io/article/123-capacity-planning

Capacity Planning

Capacity planning is the process of estimating the resources you’ll need over short and medium term timeframes. The result is used to size a cluster and avoid the pitfalls of inadequate resources (which cause performance, stability and reliability problems), and overprovisioning, which is a waste of money. The goal is to have just enough overhead to support the cluster’s growth at least 3-6 months into the future.

This document discusses how to estimate the resources you will need to get up and running on Elasticsearch. The process is roughly as follows:

Estimating Shard Requirements
Estimating Disk Requirements
Planning for Memory
Planning for Traffic

We have included a section with some sample calculations to help tie all of the information together.

Estimating Shard Requirements

Determining the number of shards that your cluster will need is one of the first steps towards estimating how many resources you will need. Every index in Elasticsearch is comprised of one or more shards. There are two types of shards: primary and replica, and how many total shards you need will depend not only on how many indices you have, but how those indices are sharded.

If you’re unfamiliar with the concept of shards, then you should read the brief, illustrated Shard Primer section first.

Why Is This Important?

Each shard in Elasticsearch is a Lucene instance, which often creates a large number of file descriptors. The number of open file descriptors on a node grows exponentially with shard counts. Left unchecked, this can lead to a number of serious problems.

Beware of High Shard Counts

There are a fixed number of file descriptors that can be opened per process; essentially, a large number of shards can crash Elasticsearch if this limit is reached, leading to data loss and service interruptions. This limit can be increased on a server, but there are implications for memory usage. There are a few strategies for managing these types of clusters, but that discussion is out of scope for preliminary capacity planning.

Some clusters can be designed to specifically accommodate a large number of shards, but that’s something of a specialized case. If you have an application that generates lots of shards, you will want your nodes to have plenty of buffer cache. For most users, simply being conscientious of how and when shards are created will suffice.

Full Text Search

The main question for this part of the planning process is how you plan to organize data. Commonly, users are indexing a database to get access to Elasticsearch’s full text search capabilities. They may have a Postgres or MySQL database with products, user records, blog posts, etc, and this data is meant to be indexed by Elasticsearch. In this use case, there is a fixed number of indices and the cluster will look something like this:

GET /_cat/indices
green  open   products      1 1 0 0 100b 100b green open users 1 2 0 0 100b 100b green open posts 3 2 0 0 100b 100b

In this case, the total number of shards is calculated by adding up all the shards used for all indices. The number of shards an index uses is the number of primary shards, p, times one the number of replica shards, r. Expressed mathematically, the total number of shards for an index is calculated as:

shards = p * (1 + r)

In the sample cluster above, the products index will need 1x(1+1)=2 shards, the users index will require 1x(1+2)=3 shards, and the posts index will require 3x(1+2)=9 shards. The total shards in the cluster is then 2+3+9=14 shards. If you need help deciding how many shards you will need for an index, check out the blog article The Ideal Elasticsearch Index, specifically the section called “Benchmarking is key” for some general guidelines.

Time Series Applications

Another common use case is to index time-series data. An example might be log entries of some kind. The application will create an index per hour/day/week/etc to hold these records. These use cases require a steadily-growing number of indices, and might look something like this:

GET /_cat/indices
green  open   events-20180101 1 1 0 0 100b 100b green open events-20180102 1 1 0 0 100b 100b green open events-20180103 1 1 0 0 100b 100b

With this use case, each index will likely have the same number of shards, but new indices will be added at regular intervals. With most time series applications, data is not stored indefinitely. A retention policy is used to remove data after a fixed period of time.

In this case, the number of shards needed is equal to the number of shards per index times the number of indices per time unit, times the retention policy. In other words, if an application is creating 1 index, with 1 primary and 1 replica shard, per day, and has a 30 day retention policy, then the cluster would need to support 60 shards:

1x(1+1) shards/index * 1 index/day * 30 days = 60 shards

Estimating Disk Requirements

The next characteristic to estimate is disk space. This is the amount of space on disk that your cluster will need to hold the data. If you’re a customer of Bonsai, this is the only type of disk usage you will be concerned with. If you’re doing capacity planning for another host (or self-hosting), you’ll want to take into account the fact that the operating system will need additional disk space for software, logs, configuration files, etc. You’ll need to factor in a much larger margin of safety if planning to run your own nodes.

Benchmarking for Baselines

The best way to establish a baseline estimate for the amount of disk space you’ll need is to perform some benchmarking. This does not need to be a complicated process. It simply involves indexing some sample data into an Elasticsearch cluster. This can even be done locally. The idea is to collect some information on:

How many indices are needed?
How many shards are needed?
What is the average document size per index?

Database Size is a Bad Heuristic

Sometimes users will estimate their disk needs by looking at the size of a database (Postgres, MySQL, etc). This is not an accurate way to estimate Elasticsearch’s data footprint. ACID-compliant data stores are meant to be durable, and come with far more overhead than data in Lucene (what Elasticsearch is based on). There is a ton of overhead that will never make it into your Elasticsearch cluster. A 5GB Postgres database may only require a few MB in Elasticsearch. Benchmarking some sample data is a far greater tool for estimation.

Attachments Are Also a Bad Heuristic

Some applications are indexing rich text files, like DOC/DOCX, PPT, and PDFs. Users may look at the average file size (maybe a few MB) and multiply this by the total number of files to be indexed to estimate disk needs. This is also not accurate. Rich text files are packed with metadata, images, formatting rules and so on, bits that will never be indexed into Elasticsearch. A 10MB PDF may only take up a few KB of space once indexed. Again, benchmarking a random sample of your data will be far more accurate in estimating total disk needs.

Suppose you have a development instance of the application running locally, a local instance of Elasticsearch, and a local copy of your production data (or a reasonable corpus of test data). After indexing 10% of the production data into Elasticsearch, a call to /_cat/indices shows the following:

curl localhost:9200/_cat/indices health status index pri rep docs.count docs.deleted store.size pri.store.size green open users-production 1 1 500 0 2.4mb 1.2mb green open posts-production 1 1 1500 0 62.4mb 31.2mb green open notes-production 1 1 300 0 11.6mb 5.8mb

In this example, there are 3 indices. Each index has one primary shard and one replica shard, for a total of 2 shards per index.

We can also see that users-production has 1.2MB of primary data occupied by 500 documents. This means one of these documents is 2.4KB on average. Similarly, posts-production documents average 20.8KB and notes-production documents average 19.3KB.

We can also estimate the disk footprint for each index populated with 100% of its data. users-production will require ~12MB, posts-production will require around 312MB and notes-production will require ~58MB. Thus, the baseline estimate is ~382MB for 100% of the production data.

The last piece of information to determine is the impact of replica shards. A replica shard is a copy of the primary data, hosted on another node to ensure high availabilty. The total footprint of the cluster data is equal to the primary data footprint times (1 + number_of_replicas).

So if you have a replication factor of 1, as in the example above, the baseline disk footprint would be 382MB x (1 + 1) = 764MB. If you wanted an additional replica, to keep a copy of the primary data on all 3 nodes, the footprint requirement would be 382MB x (1 + 2) = 1.1GB. (Note: if this is confusing, check out the Shard Primer page).

Last, it is a good idea to add a margin of safety to these estimates to account for larger documents, tweaks to mappings, and to provide some “cushion” for the operating system. Roughly 20% is a good amount; in the example above, this would give a baseline estimate of about 920MB disk space.

Medium-term Projections

The next step is to determine how quickly you’re adding data to each index. If your database creates timestamps when records are created, this information can be used to estimate the monthly growth in the number of records in each table. Suppose this analysis was performed on the sample application, and the following monthly growth rates were found:

users-production: +5%/mo posts-production: +12%/mo notes-production: +7%/mo

In a 6 month period, we would expect users-production to grow ~34% from its baseline, posts-production to grow ~97% from its baseline, and notes-production to grow ~50% from its baseline. Based on this, we can guess that in 6 months, the data will look like this:

GET /_cat/indices
health status index             pri rep docs.count docs.deleted store.size pri.store.size green open users-production 1 1 6700 0 32.0mb 16.0mb green open posts-production 1 1 29607 0 1.23gb 615mb green open notes-production 1 1 4502 0 174mb 86.9mb

Based on this, the cluster should need at least 1.44GB. Add the 20% margin of safety for an estimate of ~1.75GB.

These calculations for the sample cluster show that we should plan on having at least 1.75GB of disk space available just for the cluster data. This amount will suffice for the initial indexing of the data, and should comfortably support the cluster as it grows over the next 6 months. At the end of that interval, resource usage can be re-evalutated, and resources added (or removed) if necessary.

Time Series Data

Some use cases involve time-series data, in which new indices are created on a regular basis. For example, log data may be partitioned into daily or hourly indices. In this case, the process of estimating disk needs is roughly the same, but instead of looking at document sizes, it’s better to look at the average index footprint.

Consider this sample cluster:

GET /_cat/indices
health status index             pri rep docs.count docs.deleted store.size pri.store.size green open logs_20180101 1 1 27954 0 194mb 97.0mb green open logs_20180102 1 1 29607 0 207mb 103mb green open logs_20180103 1 1 28764 0 201mb 100.7mb

One could estimate that the average daily index requires 200MB of disk space. In six months, that would lead to around 36.7GB disk usage. With the margin of safety, a cluster with 45GB of disk allocated to the cluster is needed.

There are two caveats to add: first, time series data usually does not have a six month retention policy. A more accurate estimate would be to multiply the average daily index size by the number of days in the retention policy. If this application had a 30 day retention policy, the disk need would be closer to 7.2GB.

The second caveat is too many shards can be a problem (see Estimating Shard Usage for some discussion of why). Creating two shards every day for 6 months would lead to around 365 shards in the cluster, each with a lot of overhead in terms of open file descriptors. This could lead to crashes, data loss and serious service interruptions if the OS limits are too low, and memory problems if those limits are too high.

In any case, if the retention policy creates a demand for large numbers of open shards, the cluster needs to be sized not just to support the data, but the file descriptors as well. On Bonsai, this is not something users need to worry about, as these details are handled for you automatically.

Planning for Memory

Memory is an important component of a high-performing cluster. Efficient use of this resource helps to reduce the CPU cycles needed for a given search in several ways. First, matches that have been computed for a query can be cached in memory so that subsequent queries do not need to be computed again. And servers that have been sized with enough RAM can avoid the CPU overhead of working in virtual and swap memory. Saving CPU cycles with memory optimizations reduces request latency and increases the throughput of a cluster.

However, memory is a complicated subject. Optimizing system memory, cache invalidation and garbage collection are frequent subjects of Ph.D. theses in computer science. Fortunately, Bonsai handles server sizing and memory management for you. Our plans are sized to accommodate a vast majority of Elasticsearch users.

“I want enough RAM to hold all of my data!”

This is a common request, and it makes sense in principal. RAM and disk (usually SSD on modern servers) are both physical storage media, but RAM is several orders of magnitude faster at reading and writing than high-end SSD units. If all of the Elasticsearch data could fit into RAM, then we would expect an order of magnitude improvement in latency, right?

This tends to be reasonable for smaller clusters, but becomes less practical as a cluster scales. System RAM availability offers diminishing returns on performance improvements. Beyond a certain point, only a very specific set of use cases will benefit and the costs will necessarily be much higher.

Furthermore, Elasticsearch creates a significant number of in-memory data structures to improve search speeds, some of which can be fairly large (see the documentation on fielddata for an example). So if your plan is to base the memory size on disk footprint, you will need to not only need to measure that footprint, but also add enough for the OS, JVM, and in-memory data structures.

For all the breadth and depth of the subject, 95% of users can get away with using a simple heuristic: estimate needing 10-30% of the total data size for memory. 50% is enough for >99% of users. Note that because Bonsai manages the deployment and configuration of servers, this heuristic does not include memory available to the OS and JVM. Bonsai customers do not need to worry about these latter details.

So where does that heuristic break down? When do you really need to worry about memory footprint? If your application makes heavy use of any of the following, then memory usage will likely be a factor:

Highlighting
Aggregations (a.k.a faceting)
Dynamic scripting
Geospatial search
Deep pagination. This is when you have hundreds or thousands of pages of results; accessing results 9900 to 10000 requires substantially more overhead than results 1-100.
Sorting large numbers of documents. This is similar to, but distinct from, deep pagination. See Efficient sorting of geo distances in Elasticsearch for a real world example.
Frequent cache evictions (ex: caching timestamps by second or minute, instead of hourly or daily)

If your application is using one or more of these features, plan on needing more memory. If you would like to see the exact types of memory that Bonsai meters against, check out the Metering on Bonsai article.

Planning for Traffic

Capacity planning for traffic requirements can be tricky. Most applications do not have consistent traffic demands over the course of a day or week. Traffic patterns are often “spiky,” which complicates the estimation process. Generally, the greater the variance in throughput (as measured in requests per second, rps), the more capacity is needed to safely handle load.

Estimating Traffic Capacity

Users frequently base their estimate on some average number of requests: “Well, my application needs to serve 1M requests per day, which averages to 11-12 requests per second, so that’s what I’ll need.” This is a reasonable basis if your traffic is consistent (with a very low variance). But it is considerably inaccurate if your variance is more than ~10% of the average.

Consider the following simplified examples of weekly traffic patterns for two applications. The plots show the instantaneous throughput over the course of 7 days:

In each of these examples, the average throughput is the same, but the variance is markedly different. If they both plan on needing capacity for 5 requests per second, Application 1 will probably be fine because of Bonsai’s connection queueing, while Application 2 will be dramatically underprovisioned. Application 2 will be consistently demanding 1.5-2x more traffic than what it was designed to handle.

You’ll need to estimate your traffic based on the upper bounds of demand rather than the average. Some analysis will be necessary to determine the “spikyness” of your application’s search demands.

Traffic Volume and Request Latencies

There is a complex economic relationship between IO (as measured by CPU load, memory usage and network bandwidth) and maximum throughput. A given cluster of nodes has only a finite supply of resources to respond to the application’s demands for search. If requests come in faster than the nodes can respond to them, the requests can pile up and overwhelm the nodes. Everything slows down and eventually the nodes crash.

Simply: complex requests that perform a lot of calculations, require a lot of memory to complete, and consume a lot of bandwidth will lead to a much lower maximum throughput than simpler requests. If the nodes are not sized to handle peak loads, they can crash, restart and perform worse overall.

With multitenant class clusters, resources are shared among users and ensuring a fair allocation of resources is paramount. Bonsai addresses this complexity with the metric of concurrent connections. There is an entire section devoted to this in Metering on Bonsai. But essentially, all clusters have some allowance for the maximum number of simultaneous activeconnections.

Under this scheme, applications with low-IO requests can service a much higher throughput than applications with high-IO requests, thereby ensuring fair, stable performance for all users.

Estimating Your Concurrency Needs

A reasonable way to estimate your needs is using statistics gleaned from small scale benchmarking. If you are able to determine a p95 or p99 time for production requests during peak expected load, you can calculate the maximum throughput per connection.

For example, if your benchmarking shows that under load, 99% of all requests have a round-trip time of 50ms or less, then your application could reasonably service 20 requests per second, per connection. If you have also determined that you need to be able to service a peak of 120 rps, then you could estimate the number of concurrent connections needed by dividing: 120 rps / 20rps/connection = 6 connections.

In other words, a Bonsai plan with a search concurrency allowance of at least 6 will be enough to handle traffic at peak load. A few connections over this baseline should be able to account for random fluctuations and offer some headroom for growth.

Beware the Local Benchmarking

Users will occasionally set up a local instance of Elasticseach and perform a load test on their application to determine how many requests per second it can sustain. This is a problem because it neglects network effects.

Requests to a local Elasticsearch cluster do not have the overhead of negotiating an SSL connection over the Internet and transmitting payloads over this connection. These steps introduce a non-trivial latency to all requests, increasing round trip times and reducing throughput ceilings.

It’s better to set up a remote cluster, perform a small scale load test, and use the statistics to estimate the upper latency bounds at scale.

Another possibility with Bonsai is to select a plan with far more resources than needed, launch in production, measure performance, and then scale down as necessary. Billing is prorated, so the cost of this approach is sometimes justified by the time savings of not setting up performing and validating small-scale benchmarking.

Sample Calculations

Suppose the developer of online store decides to switch the application’s search function from ActiveRecord to Elasticsearch. She spends an afternoon collecting information:

She wants to index three tables: users, orders and products
There are 12,123 user records, which are growing by 500 a month
There are 8,040 order records, which are growing by 1,100 a month
There are 101,500 product records, which are growing by 2% month over month
According to the application logs, users are averaging 10K searches per day, with a peak of 20 rps

Estimating Shard Needs

She reads The Ideal Elasticsearch Index and decides that she will be fine with a 1x1 sharding scheme for the users and ordersindices, but will want a 3x2 scheme for the products index, based on its size, growth, and importance to the application’s revenue.

This configuration means she will need 1x(1+1)=2 shards for the users index, 1x(1+1)=2 shards for the orders index, and 3x(1+2)=9 shards for the products index. This is a total of 13 shards, although she may eventually want to increase replication on the users and orders indices. She plans for 13-15 shards for her application.

Estimating Disk Needs

She sets up a local Elasticsearch cluster and indexes 5000 random records from each table. Her cluster looks like this:

GET /_cat/indices
green  open   users         1 1 5000 0 28m 14m green open orders 1 1 5000 0 24m 12m green open products 3 2 5000 0 540m 60m

Based on this, she determines:

The users index occupies 14MB / 5000 docs = 2.8KB per document.
The orders index occupies 12MB / 5000 docs = 2.4KB per document.
The products index occupies 60MB / 5000 docs = 12KB per document.

She uses this information to calculate a baseline:

She will need 2.8KB/doc x 12,123 docs x 2 copies = 68MB of disk for the users data
She will need 2.4KB/doc x 8,040 docs x 2 copies = 39MB of disk for the orders data
She will need 12KB/doc x 101,500 docs x 3 copies = 3654MB of disk for the products data
The total disk needed for the existing data is 68MB + 39MB + 3654MB = ~3.8GB.

She then uses the growth measurements to estimate how much space will be needed within 6 months:

The users index will have ~15,123 records. At 2.8KB/doc and a 1x1 shard scheme, this is 85MB.
The orders index will have ~14,640 records. At 2.4KB/doc and a 1x1 shard scheme, this is 70MB.
The products index will have ~114,300 records. At 12KB/doc and a 3x2 shard scheme, this is 4115MB.
The total disk needed in 6 months will be around 4.27GB.

Adding some overhead to account for unexpected changes in growth and mappings, she estimates that 5GB of disk should suffice for current needs and foreseeable future.

She also uses this to estimate her memory needs, and decides to estimate a memory footprint of up to 20% of the primary data, give or take. She estimates that 1.0GB should be sufficient for memory.

Estimating Traffic Needs

She knows from the application logs that her users hit the site with a peak of 20 requests per second. She creates a free Bonsai.io cluster, indexes some sample production data to it, and performs a small scale load test to determine what kinds of request latencies she can expect her application to experience while handling user searches with a cloud service.

She finds that 99.9% of all search traffic completes the round trip in less than 80ms. This gives her a conservative estimate of 12-13 requests per second per connection (1000ms per second / 80ms per request = 12.5 rps). With a search concurrency allowance of 2, she would be able to safely service around 25 connections, which is a little more than her current need for 20 rps.

Conclusion

Based on her tests and analysis, she decides that she will need a cluster with:

Capacity for at least 13-15 shards
A search concurrency of at least 2
1 GB allocated for memory
5 GB of disk to support the growth in data over the next 3-6 months

She goes to https://app.bonsai.io/pricing and finds a plan. She decides that at this stage, a multitenant class cluster offers the best deal, and finds that the $50/plan meets all of these criteria (and then some), so that’s what she picks.

转载于:https://www.cnblogs.com/bigben0123/p/11271479.html

你可能感兴趣的:(elasticsearch容量规划)

情绪觉察日记第37天露露_e800
今天是家庭关系规划师的第二阶最后一天，慧萍老师帮我做了个案，帮我处理了埋在心底好多年的一份恐惧，并给了我深深的力量！这几天出来学习，爸妈过来婆家帮我带小孩，妈妈出于爱帮我收拾东西，并跟我先生和婆婆产生矛盾，妈妈觉得他们没有照顾好我…。今晚回家见到妈妈，我很欣赏她并赞扬她，妈妈说今晚要跟我睡我说好，当我们俩躺在床上准备睡觉的时候，我握着妈妈的手对她说:妈妈这几天辛苦你了，你看你多利害把我们的家收拾得
《投行人生》读书笔记小蘑菇的树洞
《投行人生》----作者詹姆斯-A-朗德摩根斯坦利副主席40年的职业洞见-很短小精悍的篇幅，比较适合初入职场的新人。第一部分成功的职业生涯需要规划1.情商归为适应能力分享与协作同理心适应能力，更多的是自我意识，你有能力识别自己的情并分辨这些情绪如何影响你的思想和行为。2.对于初入职场的人的建议，细节，截止日期和数据很重要截止日期，一种有效的方法是请老板为你所有的任务进行优先级排序。和老板喝咖啡的好
回溯 Leetcode 332 重新安排行程 mmaerd Leetcode刷题学习记录 leetcode 算法职场和发展
重新安排行程Leetcode332学习记录自代码随想录给你一份航线列表tickets，其中tickets[i]=[fromi,toi]表示飞机出发和降落的机场地点。请你对该行程进行重新规划排序。所有这些机票都属于一个从JFK（肯尼迪国际机场）出发的先生，所以该行程必须从JFK开始。如果存在多种有效的行程，请你按字典排序返回最小的行程组合。例如，行程[“JFK”,“LGA”]与[“JFK”,“LGB
人生的每一步路都算数 sheli
如果你想打工，一直靠打工赚钱，那你就会不断的希望自己变得更专业，不断的希望能够获得更好的工作机会，升职加薪。如果你的目标志不在此，而是拥有自己的企业，那你的选择就会出现差别。在认真打工的人眼里，会“不务正业”，会总是选择不同岗位，甚至放弃高薪机会。但是这背后都是有更加长远的规划。成功富人所必需的管理技能包括：1．对现金流的管理。2．对系统的管理。3．对人员的管理。所以，在没有获得这些能力之前，只要
每日算法&面试题，大厂特训二十八天——第二十天（树）肥学 ⚡算法题⚡面试题每日精进 java 算法数据结构
目录标题导读算法特训二十八天面试题点击直接资料领取导读肥友们为了更好的去帮助新同学适应算法和面试题，最近我们开始进行专项突击一步一步来。上一期我们完成了动态规划二十一天现在我们进行下一项对各类算法进行二十八天的一个小总结。还在等什么快来一起肥学进行二十八天挑战吧！！特别介绍小白练手专栏，适合刚入手的新人欢迎订阅编程小白进阶python有趣练手项目里面包括了像《机器人尬聊》《恶搞程序》这样的有趣文章
基于CODESYS的多轴运动控制程序框架：逻辑与运动控制分离，快速开发灵活操作 GPJnCrbBdl python 开发语言
基于codesys开发的多轴运动控制程序框架，将逻辑与运动控制分离，将单轴控制封装成功能块，对该功能块的操作包含了所有的单轴控制（归零、点动、相对定位、绝对定位、设置当前位置、伺服模式切换等等）。程序框架由主程序按照状态调用分归零模式、手动模式、自动模式、故障模式，程序状态的跳转都已完成，只需要根据不同的工艺要求完成所需的动作即可。变量的声明、地址的规划都严格按照C++的标准定义，能帮助开发者快速
ES聚合分析原理与代码实例讲解光剑书架上的书大厂Offer收割机面试题简历程序员读书硅基计算碳基计算认知计算生物计算深度学习神经网络大数据 AIGC AGI LLM Java Python 架构设计 Agent 程序员实现财富自由
ES聚合分析原理与代码实例讲解1.背景介绍1.1问题的由来在大规模数据分析场景中，特别是在使用Elasticsearch（ES）进行数据存储和检索时，聚合分析成为了一个至关重要的功能。聚合分析允许用户对数据集进行细分和分组，以便深入探索数据的结构和模式。这在诸如实时监控、日志分析、业务洞察等领域具有广泛的应用。1.2研究现状目前，ES聚合分析已经成为现代大数据平台的核心组件之一。它支持多种类型的聚
骑昆明到北海—119 砚山县 61清风i
从十年前第一次长途骑行青海湖开始每年一次长途骑行看风景，尝各地美食，探访异域文化，记录途中美食美景美事，已逐渐形成习惯。每年春季详细规划好线路，夏季出行，2020年因为疫情迟迟不能确定线路和行程。总算到了暑期疫情逐渐消失，规划了50多天的云南昆明—广西北海计划。本次行程从云南昆明出发到广西北海市结束，五十一天骑行二千多公里线路昆明-官渡古镇-环滇池--澄江市一抚仙湖—路居镇--江川区--通海县—龙
ArrayList 源码解析程序猿进阶 Java基础 ArrayList List java 面试性能优化架构设计 idea
ArrayList是Java集合框架中的一个动态数组实现，提供了可变大小的数组功能。它继承自AbstractList并实现了List接口，是顺序容器，即元素存放的数据与放进去的顺序相同，允许放入null元素，底层通过数组实现。除该类未实现同步外，其余跟Vector大致相同。每个ArrayList都有一个容量capacity，表示底层数组的实际大小，容器内存储元素的个数不能多于当前容量。当向容器中添
4.C_数据结构_队列荣世蓥数据结构数据结构
概述什么是队列：队列是限定在两端进行插入操作和删除操作的线性表。具有先入先出(FIFO)的特点相关名词：队尾：写入数据的一段队头：读取数据的一段空队：队列中没有数据，队头指针=队尾指针满队：队列中存满了数据，队尾指针+1=队头指针循环队列1、基本内容循环队列是以数组形式构成的队列数据结构。循环队列的结构体如下：typedefintdata_t;//队列数据类型#defineN64//队列容量typ
情绪低迷单点登录
1、当初说的，行，只做朋友，那以后不会啦。真到这一步，是这么的难受。2、当在一个环境呆腻，又对新环境感到抗拒之后，是这么的疲惫3、之前规划了一件事儿，到进行中的时候意外颇多，犹豫不决也是这么心酸当上述情况聚集到一起的时候，整个人都放空了，想要放纵自己，却始终不得法，压抑
ERP企业资源规划系统点滴~ 教育电商
ERP企业资源规划系统ERP（EnterpriseResourcePlanning）企业资源规划系统是一种综合性的管理信息系统，旨在通过信息技术手段实现对企业内部资源的全面规划、管理和控制。以下是对ERP企业资源规划系统的详细解析：一、定义与核心思想ERP系统建立在信息技术基础上，以系统化的管理思想，为企业决策层及员工提供决策运行手段的管理平台。它不仅仅是一个软件，更重要的是一个管理思想，实现了企
微信小程序开发注意事项 jun778895 微信小程序小程序
微信小程序开发是一个融合了前端开发、用户体验设计、后端服务（可选）以及微信小程序平台特性的综合性项目。这里，我将详细介绍一个典型的小程序开发项目的全过程，包括项目规划、设计、开发、测试及部署上线等各个环节，并尽量使内容达到或超过2000字的要求。一、项目规划1.1项目背景与目标假设我们要开发一个名为“智慧校园助手”的微信小程序，旨在为学生提供一站式校园生活服务，包括课程表查询、图书馆座位预约、食堂
代码随想录Day 41|动态规划之买卖股票问题，leetcode题目121. 买卖股票的最佳时机、122. 买卖股票的最佳时机Ⅱ、123. 买卖股票的最佳时机Ⅲ LluckyYH 动态规划 leetcode 算法数据结构
提示：DDU，供自己复习使用。欢迎大家前来讨论~文章目录买卖股票的最佳时机相关题目题目一：121.买卖股票的最佳时机解题思路：题目二：122.买卖股票的最佳时机II解题思路：题目三：123.买卖股票的最佳时机III解题思路总结买卖股票的最佳时机相关题目题目一：121.买卖股票的最佳时机[[121.买卖股票的最佳时机](https://leetcode.cn/problems/combination
承担即成长吉林付巍巍
《苏霍姆林斯基教育学》课程，几天前召开了义工培训会，我听了回放后主动联系郑老师要求加入义工团队。虽然这样每周要付出至少一天的时间进行打卡阅读和点评，但这样可以强迫规划好每日的作息时间，完成专业阅读方面的学习，这种重要的事情是必须要融入日常的生活中的，这一工作的申请也督促我合理安排自己的时间，把碎片化的时间整合好，无形中提高了每日利用时间的效率。上学期跟随着教师阅读地图课程组进行点评，发现了许多优秀
新的一年，春节假期期间，你有没有去深度思考过自己的未来？十八点心理
新的一年，是不是应该思考些什么？是继续和亲朋好友聊聊天，还是想一条属于自己的路？我们很多人会在过年的氛围中去享受当下的一切，打打麻将、打打牌、聊聊天、侃侃大山，整个人的精神状态特别好。觉得完全有一种自我满足的状态体验。但是从另外一个层面看，看到那些厉害的人，那些对于自己人生取得巨大成就的人来说，根本没有春节休息一说，在春节时分，还在见缝插针去写点文章、录个视频、思考新一年的规划。当看到那种忙碌的身
生成式地图制图 Bwywb_3 深度学习机器学习深度学习生成对抗网络
生成式地图制图（GenerativeCartography）是一种利用生成式算法和人工智能技术自动创建地图的技术。它结合了传统的地理信息系统（GIS）技术与现代生成模型（如深度学习、GANs等），能够根据输入的数据自动生成符合需求的地图。这种方法在城市规划、虚拟环境设计、游戏开发等多个领域具有应用前景。主要特点：自动化生成：通过算法和模型，系统能够根据输入的地理或空间数据自动生成地图，而无需人工逐
L1 L2 L3 缓存京天不下雨 windows 缓存 windows
L1L2L3缓存L1Cache(一级bai缓存)是CPU第一层高速缓存，分为数据缓存和指令缓存。du内置的zhiL1高速缓存的容量和结构对daoCPU的性能影响较大，不过高速缓冲存储器均由静态RAM组成，结构较复杂，在CPU管芯面积不能太大的情况下，L1级高速缓存的容量不可能做得太大。一般服务器CPU的L1缓存的容量通常在32—4096KB。L2由于L1级高速缓存容量的限制，为了再次提高CPU的运
打造专业投票评选平台：创建大型活动的完整指南口碑信息传播者
在数字化时代，打造专业的投票评选平台成为举办大型活动的不可或缺的一环。本指南将深入探讨如何创建一个高效、安全、用户友好的投票平台，旨在帮助您成功举办大型投票评选活动。从平台的设计和功能规划到活动的推广和安全性保障，每个步骤都将得到详细解析。第一部分：构建投票平台的基础在创建投票平台之前，首先需要明确平台的基础构建要素：1.**投票平台的定义和关键功能：**确定您的平台将提供的服务和功能，包括投票方
遗落的光阴古诗风光
第七篇，小明的学生时代。小明所做的城乡专线，经过二十分钟的笛鸣不断的飞驰，到了小镇中心红绿灯位置。小明家的小镇是依靠着国道建立起来的，沿着国道两侧不断的建设楼房门店，并且这些房子大多是在政府的规划下盖的，只有很少一部分是镇府盖的其他的都是住户自己自由发挥盖的，所以除了门口的门面房看起来还算一直，后面基本上都是哪个有钱哪个盖的多。所以卖东西的也都集中在路两侧，刚好还有一条横向的县道，连接着其他两个镇
听音云少nn
晚上睡的早醒的也早，就来的也早公司没有开门，就在路边溜达着想着东西，走着走着发现路上的车越来越多，过去的发出的声音越来越频繁，是啊到了上班的高峰期了，发出的声音都是有序的，这是交通秩序规划导致的，声音突然小了一般是到了十字路口或是前面有车多引起的，声音大了那就是绿灯了通行顺畅了导致的。人生就有顺的慢的这才叫人生。
《Mesh 组网和 AC+AP 组网的优缺点》 jiyiwangluokeji 网络工程网络
Mesh组网和AC+AP组网的优缺点。Mesh组网的优点：1.部署灵活：节点之间可以通过无线方式连接，新增节点比较方便，无需事先规划布线。2.自我修复和优化：如果某个节点出现故障，网络可以自动重新路由数据，保证网络的稳定性。3.覆盖范围广：可以通过添加节点轻松扩展覆盖区域。4.设备选型多样：市面上有多种不同品牌和型号的Mesh路由器可供选择。Mesh组网的缺点：1.无线回程可能存在性能瓶颈：如果节
后端开发刷题 | 把数字翻译成字符串（动态规划） jingling555 笔试题目动态规划 java 算法数据结构后端
描述有一种将字母编码成数字的方式：'a'->1,'b->2',...,'z->26'。现在给一串数字，返回有多少种可能的译码结果数据范围：字符串长度满足0=10&&num<=26){if(i==1){dp[i]+=1;}else{dp[i]+=dp[i-2];}}}returndp[nums.length()-1];}}
ElasticSearch查询超过10000条（1000页）时出现Result window is too large的问题王月亮17
问题当ES数据量较大，使用分页查询超过10000条（1000页）时，出现如下错误：Cannotexecutejestaction,responsecode:500,error:{"root_cause":[{"type":"query_phase_execution_exception","reason":"Resultwindowistoolarge,from+sizemustbelesstha
建立系统写写停停
Echo说要建立系统，把零碎化的东西成系统。这个真的很赞。自己最近涉猎的东西很多，可是好像当时收获很大，可是事后却总也记不清楚。2019年，沉下心来，去沉淀。现在认准猎头这条路，那就走下去，管TM的豁出去了。这一年任务很艰巨，2019年1月也过去了大半。这一年最主要的任务是1、猎头系统掌握；2、职业规划学习；3、专升本。一、猎头系统学习。8点哄睡时间可以听一下微分享9：00-9:30看小密圈，Ec
滑动窗口+动态规划 wniuniu_ 算法动态规划算法
前言：分析这个题目的时候，就知道要这两个线段要分开，但是要保证得到最优解，那么我们在选取第二根线段的时候，要保证我们第一根线段是左边最优解并且我们选的两根线段的右端点一定是我们的数组的点（贪心思想）classSolution{public:intmaximizeWin(vector&prizePositions,intk){intn=prizePositions.size();vectormx(n
9月9日，王绎龙日精进京心达王绎龙
今日体验：今天过的很快，也比较忙，之前给我预约的客户今天过来喷漆了，洗车家的伙给咱介绍的，登完记车主就回去了，然后给漆房联系，规定好时间，到时候还要给客户车呢。核心：提前规划好。转身：该做的事情提前做，别老拖拉。
开发游戏的学习规划杰克逊的日记游戏学习
第一阶段：●C#语言快速系统地学习一遍（基础的语法、面向对象、基础的数据结构、基础的设计模式）●Unity的2D和3D部分及UI、动画、物理系统●阶段性测验：需要去用前面所学的这些基础知识来完成一个简单的2d或者3d的案例，将通过一个自制的《Flappybird》游戏案例讲解游戏开发的思想及方法，并将《Flappybird》这个游戏进一步改造成一个横版射击类游戏《Crazybird》以巩固并且升华
详解mybatis的一二级缓存以及缓存失效原因仰望天花板缓存数据库 mybatis java mysql
数据库的大部分场景下是从磁盘读取，如果数据从内存进行读取，速度较比磁盘要快得多。但因为内存的容量有限，所以一般只会把使用和查询较多的数据缓存起来，以便快速反应，其他使用率不太多的继续存放在磁盘。mybatis分为一级缓存和二级缓存1.一级缓存一级缓存存放在SqlSqeeion上，默认开启1.1pojo@DatapublicclassRole{privateLongid;privateStringr
精力是碎片化时代的核心竞争力——精力管理介绍爱写作的harry
《掌控：开启不疲惫、不焦虑的人生》读书笔记精力是碎片化时代的核心竞争力精力包括身、心两个层面，包括体力、专注力和意志力等多个维度。在信息爆炸、全球化竞争的时代，谁的体力充沛，专注力和意志力更强，谁获胜的机会就更大。而要做到这些，不做精力管理，一切都是空谈。另外，人的精力是有限的，表现会有高低起伏，所以需要管理，需要规划使用。怎样才算做到了精力管理精力管理是指主动掌握自己的体力、专注力和意志力，让自
web报表工具FineReport常见的数据集报错错误代码和解释老A不折腾 web报表 finereport 代码可视化工具
在使用finereport制作报表，若预览发生错误，很多朋友便手忙脚乱不知所措了，其实没什么，只要看懂报错代码和含义，可以很快的排除错误，这里我就分享一下finereport的数据集报错错误代码和解释，如果有说的不准确的地方，也请各位小伙伴纠正一下。 NS-war-remote=错误代码\:1117 压缩部署不支持远程设计 NS_LayerReport_MultiDs=错误代码
Java的WeakReference与WeakHashMap bylijinnan java 弱引用
首先看看 WeakReference wiki 上 Weak reference 的一个例子： public class ReferenceTest { public static void main(String[] args) throws InterruptedException { WeakReference r = new Wea
Linux——（hostname）主机名与ip的映射 eksliang linux hostname
一、什么是主机名无论在局域网还是INTERNET上，每台主机都有一个IP地址，是为了区分此台主机和彼台主机，也就是说IP地址就是主机的门牌号。但IP地址不方便记忆，所以又有了域名。域名只是在公网（INtERNET)中存在，每个域名都对应一个IP地址，但一个IP地址可有对应多个域名。域名类型 linuxsir.org 这样的；主机名是用于什么的呢？答：在一个局域网中，每台机器都有一个主
oracle 常用技巧 18289753290
oracle常用技巧 ①复制表结构和数据 create table temp_clientloginUser as select distinct userid from tbusrtloginlog ②仅复制数据如果表结构一样 insert into mytable select * &nb
使用c3p0数据库连接池时出现com.mchange.v2.resourcepool.TimeoutException 酷的飞上天空 exception
有一个线上环境使用的是c3p0数据库，为外部提供接口服务。最近访问压力增大后台tomcat的日志里面频繁出现 com.mchange.v2.resourcepool.TimeoutException: A client timed out while waiting to acquire a resource from com.mchange.v2.resourcepool.BasicResou
IT系统分析师如何学习大数据蓝儿唯美大数据
我是一名从事大数据项目的IT系统分析师。在深入这个项目前需要了解些什么呢？学习大数据的最佳方法就是先从了解信息系统是如何工作着手，尤其是数据库和基础设施。同样在开始前还需要了解大数据工具，如Cloudera、Hadoop、Spark、Hive、Pig、Flume、Sqoop与Mesos。系统分析师需要明白如何组织、管理和保护数据。在市面上有几十款数据管理产品可以用于管理数据。你的大数据数据库可能
spring学习——简介 a-john spring
Spring是一个开源框架，是为了解决企业应用开发的复杂性而创建的。Spring使用基本的JavaBean来完成以前只能由EJB完成的事情。然而Spring的用途不仅限于服务器端的开发，从简单性，可测试性和松耦合的角度而言，任何Java应用都可以从Spring中受益。其主要特征是依赖注入、AOP、持久化、事务、SpringMVC以及Acegi Security 为了降低Java开发的复杂性，
自定义颜色的xml文件 aijuans xml
<?xml version="1.0" encoding="utf-8"?> <resources> <color name="white">#FFFFFF</color> <color name="black">#000000</color> &
运营到底是做什么的？ aoyouzi 运营到底是做什么的？
文章来源：夏叔叔（微信号：woshixiashushu），欢迎大家关注！很久没有动笔写点东西，近些日子，由于爱狗团产品上线，不断面试，经常会被问道一个问题。问：爱狗团的运营主要做什么？答：带着用户一起嗨。为什么是带着用户玩起来呢？究竟什么是运营？运营到底是做什么的？那么，我们先来回答一个更简单的问题——互联网公司对运营考核什么？以爱狗团为例，绝大部分的移动互联网公司，对运营部门的考核分为三块——用
js面向对象类和对象百合不是茶 js 面向对象函数创建类和对象
接触js已经有几个月了,但是对js的面向对象的一些概念根本就是模糊的,js是一种面向对象的语言但又不像java一样有class,js不是严格的面向对象语言 ,js在java web开发的地位和java不相上下 ,其中web的数据的反馈现在主流的使用json,json的语法和js的类和属性的创建相似下面介绍一些js的类和对象的创建的技术一:类和对
web.xml之资源管理对象配置 resource-env-ref bijian1013 java web.xml servlet
resource-env-ref元素来指定对管理对象的servlet引用的声明，该对象与servlet环境中的资源相关联 <resource-env-ref> <resource-env-ref-name>资源名</resource-env-ref-name> <resource-env-ref-type>查找资源时返回的资源类
Create a composite component with a custom namespace sunjing
https://weblogs.java.net/blog/mriem/archive/2013/11/22/jsf-tip-45-create-composite-component-custom-namespace When you developed a composite component the namespace you would be seeing would
【MongoDB学习笔记十二】Mongo副本集服务器角色之Arbiter bit1129 mongodb
一、复本集为什么要加入Arbiter这个角色回答这个问题，要从复本集的存活条件和Aribter服务器的特性两方面来说。什么是Artiber？ An arbiter does not have a copy of data set and cannot become a primary. Replica sets may have arbiters to add a
Javascript开发笔记白糖_ JavaScript
获取iframe内的元素通常我们使用window.frames["frameId"].document.getElementById("divId").innerHTML这样的形式来获取iframe内的元素，这种写法在IE、safari、chrome下都是通过的，唯独在fireforx下不通过。其实jquery的contents方法提供了对if
Web浏览器Chrome打开一段时间后，运行alert无效 bozch Web chorme alert 无效
今天在开发的时候，突然间发现alert在chrome浏览器就没法弹出了，很是怪异。试了试其他浏览器，发现都是没有问题的。开始想以为是chorme浏览器有啥机制导致的，就开始尝试各种代码让alert出来。尝试结果是仍然没有显示出来。这样开发的结果，如果客户在使用的时候没有提示，那会带来致命的体验。哎，没啥办法了就关闭浏览器重启。结果就好了，这也太怪异了。难道是cho
编程之美-高效地安排会议图着色问题贪心算法 bylijinnan 编程之美
import java.util.ArrayList; import java.util.Collections; import java.util.List; import java.util.Random; public class GraphColoringProblem { /**编程之美高效地安排会议图着色问题贪心算法 * 假设要用很多个教室对一组
机器学习相关概念和开发工具 chenbowen00 算法 matlab 机器学习
基本概念：机器学习(Machine Learning, ML)是一门多领域交叉学科，涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为，以获取新的知识或技能，重新组织已有的知识结构使之不断改善自身的性能。它是人工智能的核心，是使计算机具有智能的根本途径，其应用遍及人工智能的各个领域，它主要使用归纳、综合而不是演绎。开发工具 M
[宇宙经济学]关于在太空建立永久定居点的可能性 comsci 经济
大家都知道,地球上的房地产都比较昂贵,而且土地证经常会因为新的政府的意志而变幻文本格式........ 所以,在地球议会尚不具有在太空行使法律和权力的力量之前,我们外太阳系统的友好联盟可以考虑在地月系的某些引力平衡点上面,修建规模较大的定居点
oracle 11g database control 证书错误 daizj oracle 证书错误 oracle 11G 安装
oracle 11g database control 证书错误 win7 安装完oracle11后打开 Database control 后，会打开em管理页面，提示证书错误，点“继续浏览此网站”，还是会继续停留在证书错误页面解决办法：是 KB2661254 这个更新补丁引起的，它限制了 RSA 密钥位长度少于 1024 位的证书的使用。具体可以看微软官方公告：
Java I/O之用FilenameFilter实现根据文件扩展名删除文件游其是你 FilenameFilter
在Java中，你可以通过实现FilenameFilter类并重写accept(File dir, String name) 方法实现文件过滤功能。在这个例子中，我们向你展示在“c:\\folder”路径下列出所有“.txt”格式的文件并删除。 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
C语言数组的简单以及一维数组的简单排序算法示例，二维数组简单示例 dcj3sjt126com c array
# include <stdio.h> int main(void) { int a[5] = {1, 2, 3, 4, 5}; //a 是数组的名字 5是表示数组元素的个数，并且这五个元素分别用a[0], a[1]...a[4] int i; for (i=0; i<5; ++i) printf("%d\n",
PRIMARY, INDEX, UNIQUE 这3种是一类 PRIMARY 主键。就是唯一且不能为空。 INDEX 索引，普通的 UNIQUE 唯一索引 dcj3sjt126com primary
PRIMARY, INDEX, UNIQUE 这3种是一类PRIMARY 主键。就是唯一且不能为空。INDEX 索引，普通的UNIQUE 唯一索引。不允许有重复。FULLTEXT 是全文索引，用于在一篇文章中，检索文本信息的。举个例子来说，比如你在为某商场做一个会员卡的系统。这个系统有一个会员表有下列字段：会员编号 INT会员姓名
java集合辅助类 Collections、Arrays shuizhaosi888 Collections Arrays HashCode
Arrays、Collections 1 ）数组集合之间转换 public static <T> List<T> asList(T... a) { return new ArrayList<>(a); } a）Arrays.asL
Spring Security（10）——退出登录logout 234390216 logout Spring Security 退出登录 logout-url LogoutFilter
要实现退出登录的功能我们需要在http元素下定义logout元素，这样Spring Security将自动为我们添加用于处理退出登录的过滤器LogoutFilter到FilterChain。当我们指定了http元素的auto-config属性为true时logout定义是会自动配置的，此时我们默认退出登录的URL为“/j_spring_secu
透过源码学前端之 Backbone 三 Model 逐行分析JS源代码 backbone 源码分析 js学习
Backbone 分析第三部分 Model 概述： Model 提供了数据存储，将数据以JSON的形式保存在 Model的 attributes里，但重点功能在于其提供了一套功能强大，使用简单的存、取、删、改数据方法，并在不同的操作里加了相应的监听事件，如每次修改添加里都会触发 change，这在据模型变动来修改视图时很常用，并且与collection建立了关联。
SpringMVC源码总结（七）mvc:annotation-driven中的HttpMessageConverter 乒乓狂魔 springMVC
这一篇文章主要介绍下HttpMessageConverter整个注册过程包含自定义的HttpMessageConverter，然后对一些HttpMessageConverter进行具体介绍。 HttpMessageConverter接口介绍： public interface HttpMessageConverter<T> { /** * Indicate
分布式基础知识和算法理论 bluky999 算法 zookeeper 分布式一致性哈希 paxos
分布式基础知识和算法理论 BY [email protected] 本文永久链接：http://nodex.iteye.com/blog/2103218 在大数据的背景下，不管是做存储，做搜索，做数据分析，或者做产品或服务本身，面向互联网和移动互联网用户，已经不可避免地要面对分布式环境。笔者在此收录一些分布式相关的基础知识和算法理论介绍，在完善自我知识体系的同
Android Studio的.gitignore以及gitignore无效的解决 bell0901 android gitignore
　　github上.gitignore模板合集，里面有各种.gitignore ： https://github.com/github/gitignore 　　自己用的Android Studio下项目的.gitignore文件，对github上的android.gitignore添加了　　　　　　# OSX files　　　　　　//mac os下　　　　　　.DS_Store
成为高级程序员的10个步骤 tomcat_oracle 编程
What 软件工程师的职业生涯要历经以下几个阶段：初级、中级，最后才是高级。这篇文章主要是讲如何通过 10 个步骤助你成为一名高级软件工程师。 Why 得到更多的报酬！因为你的薪水会随着你水平的提高而增加提升你的职业生涯。成为了高级软件工程师之后，就可以朝着架构师、团队负责人、CTO 等职位前进历经更大的挑战。随着你的成长，各种影响力也会提高。
mongdb在linux下的安装 xtuhcy mongodb linux
一、查询linux版本号： lsb_release -a LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noa