culuo4781

SQL Server中的查询优化技术：基础

描述 (Description)

Fixing and preventing performance problems is critical to the success of any application. We will use a variety of tools and best practices to provide a set of techniques that can be used to analyze and speed up any performance problem!

修复和防止性能问题对于任何应用程序的成功都是至关重要的。我们将使用各种工具和最佳实践来提供可用于分析和加速任何性能问题的技术！

This is one of my personal favorite areas of research and discussion as it is inherently satisfying. Taking a performance nightmare and tuning it into something fast and sleek feels great and will undoubtedly make others happy.

这是我个人最喜欢的研究和讨论领域之一，因为它本质上令人满意。参加一场表演噩梦并将其调整为快速而时尚的感觉真是太好了，并且无疑会让其他人感到高兴。

I often view optimization as a detective mystery. Something terrible has happened and you need to follow clues to locate and apprehend the culprit! This series of articles is all about these clues, how to identify them, and how to use them in order to find the root cause of a performance problem.

我经常将优化视为侦探之谜。发生了一件可怕的事情，您需要遵循线索来找到并逮捕罪魁祸首！本系列文章全部涉及这些线索，如何识别它们以及如何使用它们以查找性能问题的根本原因。

For more information about Query optimization, see the SQL Query Optimization — How to Determine When and If It’s Needed article
有关查询优化的更多信息，请参见“ SQL查询优化-如何确定何时以及是否需要”一文。

定义优化 (Defining Optimization)

What is “optimal”? The answer to this will also determine when we are done with a problem and can move onto the next one. Often, a query can be sped up through many different means, each of which has an associated time and resource cost.

什么是“最佳”？答案也将决定我们何时解决问题，并可以继续解决下一个问题。通常，可以通过许多不同的方式加快查询的速度，每种方式都有相关的时间和资源成本。

We usually cannot spend the resources needed to make a script run as fast as possible, nor should we want to. For the sake of simplicity, we will define “optimal” as the point at which a query performs acceptably and will continue to do so for a reasonable amount of time in the future. This is as much as a business definition as it is a technical definition. With infinite money, time, and computing resources, anything is possible, but we do not have the luxury of unlimited resources, and therefore must define what “done” is whenever we chase any performance problem.

我们通常不能花费使脚本尽可能快地运行所需的资源，我们也不应该这样做。为了简单起见，我们将“最佳”定义为查询可接受的执行点，并且在将来的合理时间内将继续执行该操作。这既是业务定义，又是技术定义。有了无限的金钱，时间和计算资源，一切皆有可能，但是我们没有无限资源的奢侈，因此，当我们追求任何性能问题时，必须定义“完成”是什么。

This provides us with several useful checkpoints that will force us to re-evaluate our progress as we optimize:

这为我们提供了几个有用的检查点，这些检查点将迫使我们在优化时重新评估我们的进度：

The query now performs adequately.
查询现在可以正常执行。
The resources needed to optimize further are very expensive.
进一步优化所需的资源非常昂贵。
We have reached a point of diminishing returns for any further optimization.
对于任何进一步的优化，我们已经达到了收益递减的地步。
A completely different solution is discovered that renders this unneeded.
发现了一个完全不同的解决方案，从而不需要此解决方案。

Over-optimization sounds good, but in the context of resource management is generally wasteful. A giant (but unnecessary) covering index will cost us computing resources whenever we write to a table for the rest of eternity (a long time). A project to rewrite code that was already acceptable might cost days or weeks of development and QA time. Trying to further tweak an already good query may net a gain of 3%, but take a week of sweating to get there.

过度优化听起来不错，但是在资源管理的情况下通常是浪费的。每当我们在永恒的余下时间（很长一段时间）中向表写入数据时，一个庞大的（但不必要的）覆盖索引将使我们浪费计算资源。重写已经可以接受的代码的项目可能会花费数天或数周的开发和质量检查时间。尝试进一步调整本来不错的查询可能会获得3％的收益，但要花上一周的时间才能达到目标。

Our goal is to solve a problem and not over-solve it.

我们的目标是解决问题，而不是过度解决。

查询做什么？ (What Does the Query Do?)

Question #1 that we must always answer is: What is the purpose of a query?

我们必须始终回答的问题＃1：查询的目的是什么？

What is its purpose?
目的是什么？
What should the result set look like?
结果集应该是什么样？
What sort of code, report, or UI is generating the query?
什么样的代码，报告或UI会生成查询？

It is first-nature for us to want to dive in with a sword in hand and slay the dragon as quickly as humanly possible. We have a trace running, execution plans in hand, and a pile of IO and timing statistics collected before realizing that we have no idea what we are doing

我们想用手中的剑跳入水中并尽快杀死人，这是我们的天性。在意识到我们不知道自己在做什么之前，我们有一个跟踪运行，手头的执行计划以及一堆IO和时序统计信息

Step #1 is to step back and understand the query. Some helpful questions that can aid in optimization:

步骤＃1是退后一步并了解查询。一些有助于优化的有用问题：

How large is the result set? Should we brace ourselves for a million rows returned, or just a few? 结果集有多大？ 我们应该为返回的一百万行还是为几行做好准备？
Are there any parameters that have limited values? Will a given parameter always have the same value, or are there other limitations on values that can simplify our work by eliminating avenues of research. 是否有任何值有限的参数？ 给定的参数将始终具有相同的值，还是在值上存在其他限制，可以通过消除研究途径来简化我们的工作。
How often is the query executed? Something that occurs once a day will be treated very differently than one that is run every second. 查询多久执行一次？ 每天发生一次的事情与每秒运行一次的事情将有很大不同。
Are there any invalid or unusual input values that are indicative of an application problem? Is one input set to NULL, but never should be NULL? Are any other inputs set to values that make no sense, are contradictory, or otherwise go against the use-case of the query? 是否有任何指示应用程序问题的无效或异常输入值？ 一个输入是否设置为NULL，但永远不应为NULL？是否将其他任何输入设置为没有意义，自相矛盾或违反查询用例的值？
Are there any obvious logical, syntactical, or optimization problems staring us in the face? Do we see any immediate performance bombs that will always perform poorly, regardless of parameter values or other variables? More on these later when we discuss optimization techniques. 有没有明显的逻辑，句法或优化问题盯着我们？ 我们是否看到任何即时性能炸弹，无论参数值或其他变量如何，总是会表现不佳？稍后，当我们讨论优化技术时，将详细介绍这些内容。
What is acceptable query performance? How fast must the query be for its consumers to be happy? If server performance is poor, how much do we need to decrease resource consumption for it to be acceptable? Lastly, what is the current performance of the query? This will provide us with a baseline so we know how much improvement is needed. 可接受的查询性能是什么？ 查询必须多快才能使消费者满意？如果服务器性能不佳，我们需要减少多少资源消耗才能使服务器可接受？最后，查询的当前性能如何？这将为我们提供基线，因此我们知道需要多少改进。

By stopping and asking these questions prior to optimizing a query, we avoid the uncomfortable situation in which we spend hours collecting data about a query only to not fully understand how to use it. In many ways, query optimization and database design force us to ask many of the same questions.

通过在优化查询之前停止并提出这些问题，我们避免了这种不舒服的情况，在这种情况下，我们花费数小时来收集有关查询的数据只是为了不完全了解如何使用它。在许多方面，查询优化和数据库设计迫使我们提出许多相同的问题。

The results of this additional foresight will often lead us to more innovative solutions. Maybe a new index isn’t needed and we can break a big query into a few smaller ones. Maybe one parameter value is incorrect and there is a problem in code or the UI that needs to be resolved. Maybe a report is run once a week, so we can pre-cache the data set and send the results to an email, dashboard, or file, rather than force a user wait 10 minutes for it interactively.

这种额外的远见卓识往往会导致我们获得更多创新的解决方案。也许不需要新索引，我们可以将一个大查询分解为几个较小的查询。可能一个参数值不正确，并且代码或UI中存在问题需要解决。也许报告每周运行一次，所以我们可以预先缓存数据集并将结果发送到电子邮件，仪表板或文件，而不必强迫用户以交互方式等待10分钟。

工具类 (Tools)

To keep things simple, we’ll use only a handful of tools in this article:

为简单起见，本文中将仅使用少数工具：

执行计划 (Execution Plans)

An execution plan provides a graphical representation of how the query optimizer chose to execute a query:

执行计划提供了查询优化器如何选择执行查询的图形表示：

The execution plan shows us which tables were accessed, how they were accessed, how they were joined together, and any other operations that occurred along the way. Included are query costs, which are estimates of the overall expense of any query component. A treasure trove of data is also included, such as row size, CPU cost, I/O cost, and details on which indexes were utilized.

执行计划向我们显示访问了哪些表，如何访问它们，如何将它们连接在一起以及在此过程中发生的任何其他操作。其中包括查询成本，这是对任何查询组件的总费用的估计。还包括大量数据，例如行大小，CPU成本，I / O成本以及使用索引的详细信息。

In general, what we are looking for are scenarios in which large numbers of rows are being processed by any given operation within the execution plan. Once we have found a high cost component, we can zoom in on what the cause is and how to resolve it.

通常，我们要寻找的是执行计划中的任何给定操作正在处理大量行的场景。一旦找到了高成本的组成部分，我们就可以放大原因并解决问题。

统计IO (STATISTICS IO)

This allows us to see how many logical and physical reads are made when a query is executed and may be turned on interactively in SQL Server Management Studio by running the following TSQL:

这使我们可以查看执行查询时进行了多少逻辑和物理读取，并且可以通过运行以下TSQL在SQL Server Management Studio中以交互方式打开它：

SET STATISTICS IO ON;

将统计信息IO设置为ON；

Once on, we will see additional data included in the Messages pane:

启用之后，我们将在“消息”窗格中看到其他数据：

Logical reads tell us how many reads were made from the buffer cache. This is the number that we will refer to whenever we talk about how many reads a query is responsible for, or how much IO it is causing.

逻辑读取告诉我们从缓冲区高速缓存进行了多少次读取。每当谈论一个查询负责多少次读取或引起多少IO时，我们都将使用此数字。

Physical reads tell us how much data was read from a storage device as it was not yet present in memory. This can be a useful indication of buffer cache/memory capacity problems if data is very frequently being read from storage devices, rather than memory.

物理读取告诉我们从存储设备读取了多少数据，因为它们尚未出现在内存中。如果经常从存储设备而非内存中读取数据，这可能是缓冲区高速缓存/内存容量问题的有用指示。

In general, IO will be the primary cause of latency and bottlenecks when analyzing slow queries. The unit of measurement of STATISTICS IO = 1 read = a single 8kb page = 8192 bytes.

通常，在分析慢查询时，IO将成为延迟和瓶颈的主要原因。 STATISTICS IO的度量单位= 1读取=一个8kb页面= 8192字节。

查询时长 (Query Duration)

Typically, the #1 reason we will research a slow query is because someone has complained and told us that it is too slow. The time it takes a query to execute is going to often be the smoking gun that leads us to a performance problem in need of a solution.

通常，我们研究缓慢查询的第一原因是因为有人抱怨并告诉我们它太慢了。执行查询所花费的时间通常是抽烟，这导致我们遇到需要解决方案的性能问题。

For our work here, we will measure duration manually using the timer found in the lower-right hand corner of SSMS:

对于此处的工作，我们将使用SSMS右下角的计时器手动测量持续时间：

There are other ways to accurately measure query duration, such as setting on STATISTICS TIME, but we’ll focus on queries that are slow enough that such a level of accuracy will not be necessary. We can easily observe when a 30 second query is improved to run in sub-second time. This also reinforces the role of the user as a constant source of feedback as we try to improve the speed of an application.

还有其他一些方法可以精确地测量查询持续时间，例如在STATISTICS TIME上进行设置，但是我们将重点放在足够慢的查询上，以至于不需要这样的准确性。我们可以轻松地观察到30秒的查询何时可以改进以在亚秒内运行。当我们试图提高应用程序速度时，这也加强了用户作为不断反馈源的作用。

我们的眼睛 (Our Eyes)

Many performance problems are the result of common query patterns that we will become familiar with below. This pattern recognition allows us to short-circuit a great deal of research when we see something that is clearly poorly written.

许多性能问题是我们将在下面熟悉的常见查询模式的结果。当我们看到明显写得不好的东西时，这种模式识别使我们可以将大量研究短路。

As we optimize more and more queries, quickly identifying these indicators becomes more second-nature and we’ll get the pleasure of being able to fix a problem quickly, without the need for very time-consuming research.

随着我们对越来越多的查询进行优化，快速识别这些指标变得更加自然，我们将很高兴能够快速解决问题，而无需进行非常耗时的研究。

In addition to common query mistakes, we will also look out for any business logic hints that may tell us if there is an application problem, parameter issue, or some other flaw in how the query was generated that may require involvement from others aside from us.

除了常见的查询错误外，我们还将寻找可能告诉我们是否存在应用程序问题，参数问题或查询生成方式中是否存在其他一些缺陷的业务逻辑提示，这些缺陷可能需要我们以外的其他人参与。

查询优化器做什么？ (What Does the Query Optimizer Do?)

Every query follows the same basic process from TSQL to completing execution on a SQL Server:

从TSQL到在SQL Server上完成执行，每个查询都遵循相同的基本过程：

Parsing is the process by which query syntax is checked. Are keywords valid and are the rules of the TSQL language being followed correctly. If you made a spelling error, named a column using a reserved word, or forgot a semicolon before a common table expression, this is where you’ll get error messages informing you of those problems.

解析是检查查询语法的过程。关键字是否有效，是否正确遵循TSQL语言的规则？如果您犯了拼写错误，使用保留字命名了列或在公用表表达式之前忘记了分号，那么您将在此处收到错误消息，以通知您这些问题。

Binding checks all objects referenced in your TQL against the system catalogs and any temporary objects defined within your code to determine if they are both valid and referenced correctly. Information about these objects is retrieved, such as data types, constraints, and if a column allows NULL or not. The result of this step is a query tree that is composed of a basic list of the processes needed to execute the query. This provides basic instructions, but does not yet include specifics, such as which indexes or joins to use.

绑定检查系统目录中TQL中引用的所有对象以及代码中定义的任何临时对象，以确定它们是否有效并正确引用。检索有关这些对象的信息，例如数据类型，约束以及列是否允许为NULL。此步骤的结果是一个查询树，该树由执行查询所需的基本过程列表组成。这提供了基本说明，但尚未包括具体说明，例如要使用的索引或联接。

Optimization is the process that we will reference most often here. The optimizer operates similarly to a chess (or any gaming) computer. It needs to consider an immense number of possible moves as quickly as possible, remove the poor choices, and finish with the best possible move. At any point in time, there may be millions of combinations of moves available for the computer to consider, of which only a handful will be the best possible moves. Anyone that has played chess against a computer knows that the less time the computer has, the more likely it is to make an error.

优化是我们在这里最常引用的过程。优化器的操作类似于象棋（或任何游戏）计算机。它需要尽快考虑大量可能的动作，消除错误的选择，并以最佳的动作完成。在任何时间点，计算机都会考虑数百万种动作组合，其中只有极少数是最佳动作。对计算机下过象棋的任何人都知道，计算机拥有的时间越短，出错的可能性就越大。

In the world of SQL Server, we will talk about execution plans instead of chess moves. The execution plan is the set of specific steps that the execution engine will follow to process a query. Every query has many choices to make to arrive at that execution plan and must do so in a very short span of time.

在SQL Server的世界中，我们将讨论执行计划而不是象棋棋步。执行计划是执行引擎处理查询所遵循的一组特定步骤。每个查询都有很多选择可以到达执行计划，并且必须在很短的时间内完成。

These choices include questions such as:

这些选择包括以下问题：

What order should tables be joined?
表应该以什么顺序连接？
What joins should be applied to tables?
什么联接应应用于表？
Which indexes should be used?
应该使用哪些索引？
Should a seek or scan be used against a given table?
是否应针对给定的表使用搜索或扫描？
Is there a benefit in caching data in a worktable or spooling data for future use?
将数据缓存在工作表中或假脱机数据以备将来使用是否有好处？

Any execution plan that is considered by the optimizer must return the same results, but the performance of each plan may differ due to those questions above (and many more!).

优化程序考虑的任何执行计划都必须返回相同的结果，但是每个计划的性能可能会由于上述问题（甚至更多！）而有所不同。

Query optimization is a CPU-intensive operation. The process to sift through plans requires significant computing resources and to find the best plan may require more time than is available. As a result, a balance must be maintained between the resources needed to optimize the query, the resources required to execute the query, and the time we must wait for the entire process to complete. As a result, the optimizer is not built to select the best execution plan, but instead to search and find the best possible plan after a set amount of time passes. It may not be the perfect execution plan, but we accept that as a limitation of how a process with so many possibilities must operate.

查询优化是一项占用大量CPU的操作。筛选计划的过程需要大量的计算资源，而找到最佳计划可能需要比可用时间更多的时间。因此，必须在优化查询所需的资源，执行查询所需的资源以及必须等待整个过程完成的时间之间保持平衡。结果，优化器不是为了选择最佳执行计划而构建的，而是经过一定时间后搜索并找到最佳可能的计划。这可能不是一个完美的执行计划，但我们接受这是一个限制，要求必须处理具有多种可能性的流程。

The metric used to judge execution plans and decide which to consider or not is query cost. The cost has no unit and is a relative measure of the resources required to execute each step of an execution plan. The overall query cost is the sum of the costs of each step within a query. You can view these costs in any execution plan:

用于判断执行计划并决定考虑或不考虑的指标是查询成本。成本没有单位，是执行计划的每个步骤所需资源的相对度量。总查询成本是查询中每个步骤的成本之和。您可以在任何执行计划中查看这些成本：

Subtree costs for each component of a query are calculated and used to either:

计算查询每个组成部分的子树成本，并将其用于以下任一情况：

Remove a high-cost execution plan and any similar ones from the pool of available plans.
从可用计划池中删除高成本的执行计划以及任何类似的计划。
Rank the remaining plans based on how low their cost is.
根据剩余计划的成本降低其排名。

While query cost is a useful metric to understand how SQL Server has optimized a particular query, it is important to remember that its primary purpose is to aid the query optimizer in choosing good execution plans. It is not a direct measure of IO, CPU, memory, duration, or any other metric that matters to an application user waiting for query execution to complete. A low query cost may not indicate a fast query or the best plan. Alternatively, a high query cost may sometimes be acceptable. As a result, it’s best to not rely heavily on query cost as a metric of performance.

虽然查询成本是了解SQL Server如何优化特定查询的有用指标，但重要的是要记住，它的主要目的是帮助查询优化器选择良好的执行计划。它不是IO，CPU，内存，持续时间或任何其他对等待查询执行完成的应用程序用户重要的指标的直接度量。低查询成本可能并不表示快速查询或最佳计划。备选地，有时可以接受高查询成本。因此，最好不要过分依赖查询成本作为性能指标。

As the query optimizer churns through candidate execution plans, it will rank them from lowest cost to highest cost. Eventually, the optimizer will reach one of the following conclusions:

当查询优化器遍历候选执行计划时，它将把它们从最低成本到最高成本进行排名。最终，优化器将得出以下结论之一：

Every execution plan has been evaluated and the best one chosen.
每个执行计划都经过评估，并选择了最佳执行计划。
There isn’t enough time to evaluate every plan, and the best one thus far is chosen.
没有足够的时间来评估每个计划，因此选择了迄今为止最好的计划。

Once an execution plan is chosen, the query optimizer’s job is complete and we can move to the final step of query processing.

一旦选择了执行计划，查询优化器的工作便完成了，我们可以进入查询处理的最后一步。

Execution is the final step. SQL Server takes the execution plan that was identified in the optimization step and follows those instructions in order to execute the query.

执行是最后一步。 SQL Server采用在优化步骤中确定的执行计划，并遵循这些指令以执行查询。

A note on plan reuse: Because optimizing is an inherently expensive process, SQL Server maintains an execution plan cache that stores details about each query executed on a server and the plan that was chosen for it. Typically, databases experience the same queries executed over and over again, such as a web search, order placement, or social media post. Reuse allows us to avoid the expensive optimization process and rely on the work we have previously done to optimize a query.

关于计划重用的注释：由于优化是一个固有的昂贵过程，因此SQL Server维护一个执行计划缓存，该缓存存储有关在服务器上执行的每个查询以及为其选择的计划的详细信息。通常，数据库会经历一遍又一遍执行的相同查询，例如Web搜索，订单放置或社交媒体帖子。重用使我们避免了昂贵的优化过程，而依靠我们之前完成的工作来优化查询。

When a query is executed that already has a valid plan in cache, that plan will be chosen, rather than going through the process of building a new one. This saves computing resources and speeds up query execution immensely. We’ll discuss plan reuse more in a future article when we tackle parameter sniffing.

当查询已经在缓存中具有有效计划的查询执行时，将选择该计划，而不是执行构建新计划的过程。这样可以节省计算资源并极大地加快查询的执行速度。在处理参数嗅探时，我们将在以后的文章中讨论计划重用。

查询优化中的常见主题 (Common Themes in Query Optimization)

With the introduction out of the way, let’s dive into optimization! The following is a list of the most common metrics that will assist in optimization. Once the basics are out of the way, we can use these basic processes to identify tricks, tips, and patterns in query structure that can be indicative of poor performance.

随着介绍的进行，让我们开始进行优化！以下是有助于优化的最常见指标列表。一旦不了解基础知识，我们就可以使用这些基本过程来识别查询结构中的技巧，技巧和模式，这些技巧，技巧和模式可能表明性能不佳。

索引扫描 (Index Scans)

Data may be accessed from an index via either a scan or a seek. A seek is a targeted selection of rows from the table based on a (typically) narrow filter. A scan is when an entire index is searched to return the requested data. If a table contains a million rows, then a scan will need to traverse all million rows to service the query. A seek of the same table can traverse the index’s binary tree quickly to return only the data needed, without the need to inspect the entire table.

可以通过扫描或查找从索引访问数据。搜索是基于（通常）窄过滤器从表中选择行的目标。扫描是指搜索整个索引以返回所请求的数据。如果一个表包含一百万行，则扫描将需要遍历所有一百万行以服务查询。对同一表的搜索可以快速遍历索引的二叉树，从而仅返回所需的数据，而无需检查整个表。

If there is a legitimate need to return a great deal of data from a table, then an index scan may be the correct operation. If we needed to return 950,000 rows from a million row table, then an index scan makes sense. If we only need to return 10 rows, then a seek would be far more efficient.

如果确实有必要从表中返回大量数据，则索引扫描可能是正确的操作。如果我们需要从一百万行表中返回950,000行，那么索引扫描就很有意义。如果我们只需要返回10行，那么查找将更加有效。

Index scans are easy to spot in execution plans:

索引扫描很容易在执行计划中发现：

SELECT
	*
FROM Sales.OrderTracking
INNER JOIN Sales.SalesOrderHeader
ON SalesOrderHeader.SalesOrderID = OrderTracking.SalesOrderID
INNER JOIN Sales.SalesOrderDetail
ON SalesOrderDetail.SalesOrderID = SalesOrderHeader.SalesOrderID
WHERE OrderTracking.EventDateTime = '2014-05-29 00:00:00';

We can quickly spot the index scan in the top-right corner of the execution plan. Consuming 90% of the resources of the query, and being labeled as a clustered index scan quickly lets us know what is going on here. STATISTICS IO also shows us a large number of reads against the OrderTracking table:

我们可以在执行计划的右上角快速发现索引扫描。消耗查询的90％的资源，并被快速标记为聚集索引扫描，这使我们知道这里发生了什么。 STATISTICS IO还向我们显示了对OrderTracking表的大量读取：

Many solutions are available when we have identified an undesired index scan. Here is a quick list of some thoughts to consider when resolving an index scan problem:

当我们确定了不需要的索引扫描时，可以使用许多解决方案。以下是解决索引扫描问题时要考虑的一些想法的快速列表：

- EventDateTime? EventDateTime上是否有索引？
- Is this query executed often enough to warrant this change? Indexes improve read speeds on queries, but will reduce write speeds, so we should add them with caution.
  是否经常执行此查询以保证进行此更改？索引可提高查询的读取速度，但会降低写入速度，因此我们应谨慎添加它们。
- Should we discuss this with those responsible for the app to determine a better way to search for this data?
  我们是否应该与负责该应用程序的人员讨论此问题，以便确定搜索此数据的更好方法？
EventDataTime in this example), then there may be some other shenanigans here that require our attention! EventDataTime ），则此处可能还有其他需要我们注意的恶作剧！
- EventDateTIme happens to equal “5-29-2014” in every row in Sales.OrderTracking的每一行中Sales.OrderTracking, then a scan is expected. Similarly, if we were performing a fuzzy string search, an index scan would be difficult to avoid without implementing a Full-Text Index, or some similar feature. EventDateTIme恰好等于“ 5-29-2014”，则应该进行扫描。同样，如果执行模糊字符串搜索，那么如果不实施全文索引或某些类似功能，就很难避免索引扫描。

As we walk through more examples, we’ll find a wide variety of other ways to identify and resolve undesired index scans.

当我们遍历更多示例时，我们将找到多种其他方法来标识和解决不希望的索引扫描。

联接和WHERE子句周围的函数 (Functions Wrapped Around Joins and WHERE Clauses)

A theme in optimization is a constant focus on joins and the WHERE clause. Since IO is generally our biggest cost, and these are the query components that can limit IO the most, we’ll often find our worst offenders here. The faster we can slice down our data set to only the rows we need, the more efficient query execution will be!

优化的主题是不断关注联接和WHERE子句。由于IO通常是我们最大的成本，而这些查询组件可能会最大程度地限制IO，因此我们经常在这里找到最糟糕的违规者。我们将数据集切成仅需要的行的速度越快，查询执行的效率就越高！

When evaluating a WHERE clause, any expressions involved need to be resolved prior to returning our data. If a column contains functions around it, such as DATEPART, SUBSTRING, or CONVERT, then these functions will also need to be resolved. If the function must be evaluated prior to execution to determine a result set, then the entirety of the data set will need to be scanned to complete that evaluation.

在评估WHERE子句时，需要先解决所有涉及的表达式，然后再返回我们的数据。如果列周围包含函数，例如DATEPART，SUBSTRING或CONVERT，则这些函数也需要解析。如果必须在执行之前评估功能以确定结果集，则将需要扫描整个数据集以完成该评估。

Consider the following query:

考虑以下查询：

SELECT
	Person.BusinessEntityID,
	Person.FirstName,
	Person.LastName,
	Person.MiddleName
FROM Person.Person
WHERE LEFT(Person.LastName, 3) = 'For';

This will return any rows from Person.Person that have a last name beginning in “For”. Here is how the query performs:

这将返回Person.Person中姓氏以“ For”开头的所有行。查询的执行方式如下：

Despite only returning 4 rows, the entire index was scanned to return our data. The reason for this behavior is the use of LEFT on Person.LastName. While our query is logically correct and will return the data we want, SQL Server will need to evaluate LEFT against every row in the table before being able to determine which rows fit the filter. This forces an index scan, but luckily one that can be avoided!

尽管仅返回4行，但整个索引仍被扫描以返回我们的数据。此行为的原因是对Person.LastName使用LEFT。虽然我们的查询在逻辑上是正确的，并且将返回我们想要的数据，但是SQL Server将需要针对表中的每一行评估LEFT，然后才能确定哪些行适合过滤器。这会强制进行索引扫描，但幸运的是可以避免！

When faced with functions in the WHERE clause or in a join, consider ways to move the function onto the scalar variable instead. Also think of ways to rewrite the query in such a way that the table columns can be left clean (that is: no functions attached to them!)

当在WHERE子句或联接中遇到函数时，请考虑将函数移至标量变量的方法。还请考虑以可以使表列保持整洁的方式重写查询的方法（即：没有附加的函数！）

The query above can be rewritten to do just this:

上面的查询可以重写为执行以下操作：

SELECT
	Person.BusinessEntityID,
	Person.FirstName,
	Person.LastName,
	Person.MiddleName
FROM Person.Person
WHERE Person.LastName LIKE 'For%';

By using LIKE and shifting the wildcard logic into the string literal, we have cleaned up the LastName column, which will allow SQL Server full access to seek indexes against it. Here is the performance we see on the rewritten version:

通过使用LIKE并将通配符逻辑转换为字符串文字，我们清理了LastName列，该列将允许SQL Server完全访问权限来为其寻找索引。这是我们在重写版本上看到的性能：

The relatively minor query tweak we made allowed the query optimizer to utilize an index seek and pull the data we wanted with only 2 logical reads, instead of 117.

我们进行的相对较小的查询调整允许查询优化器利用索引查找并仅通过2个逻辑读取（而不是117个）读取所需的数据。

The theme of this optimization technique is to ensure that columns are left clean! When writing queries, feel free to put complex string/date/numeric logic onto scalar variables or parameters, but not on columns. If you are troubleshooting a poorly performing query and notice functions (system or user-defined) wrapped around column names, then begin thinking of ways to push those functions off into other scalar parts of the query. This will allow SQL Server to seek indexes, rather than scan, and therefore make the most efficient decisions possible when executing the query!

此优化技术的主题是确保列保持干净！编写查询时，可以将复杂的字符串/日期/数字逻辑放到标量变量或参数上，但不能放到列上。如果要对性能不佳的查询和通知功能（系统或用户定义的）包裹在列名周围进行故障排除，请开始考虑将这些功能推入查询的其他标量部分的方法。这将使SQL Server可以查找索引，而不是扫描索引，因此可以在执行查询时做出最有效的决策！

隐式转换 (Implicit Conversions)

Earlier, we demonstrated how wrapping functions around columns can result in unintended table scans, reducing query performance and increasing latency. Implicit conversions behave the exact same way but are far more hidden from plain sight.

之前，我们演示了如何在列周围包装功能会导致意外的表扫描，从而降低查询性能并增加延迟。隐式转换的行为方式完全相同，但对普通人而言隐藏得多。

When SQL Server compares any values, it needs to reconcile data types. All data types are assigned a precedence in SQL Server and whichever is of the lower precedence will be automatically converted to the data type of higher precedence. For more info on operator precedence, see the link at the end of this article containing the complete list.

当SQL Server比较任何值时，它需要协调数据类型。在SQL Server中为所有数据类型分配了优先级，并且优先级较低的那个将自动转换为优先级较高的数据类型。有关运算符优先级的更多信息，请参见本文末尾的包含完整列表的链接。

Some conversions can occur seamlessly, without any performance impact. For example, a VARCHAR(50) and VARCHAR(MAX) can be compared no problem. Similarly, a TINYINT and BIGINT, DATE and DATETIME, or TIME and a VARCHAR representation of a TIME type. Not all data types can be compared automatically, though.

某些转换可以无缝进行，而不会影响性能。例如，可以比较VARCHAR（50）和VARCHAR（MAX）。同样，TINYINT和BIGINT，DATE和DATETIME或TIME以及TIME类型的VARCHAR表示形式。但是，并非所有数据类型都可以自动比较。

Consider the following SELECT query, which is filtered against an indexed column:

考虑以下SELECT查询，该查询是根据索引列进行过滤的：

SELECT
	EMP.BusinessEntityID,
	EMP.LoginID,
	EMP.JobTitle
FROM HumanResources.Employee EMP
WHERE EMP.NationalIDNumber = 658797903;

A quick glance and we assume that this query will result in an index seek and return data to us quite efficiently. Here is the resulting performance:

快速浏览一下，我们假设此查询将导致索引查找并将数据非常有效地返回给我们。这是产生的性能：

Despite only looking for a single row against an indexed column, we got a table scan for our efforts. What happened? We get a hint from the execution plan in the yellow exclamation mark over the SELECT operation:

尽管只针对索引列只查找一行，但是我们还是进行了表格扫描以查找我们的工作。发生了什么？我们从执行计划中的SELECT操作的黄色感叹号中得到了提示：

Hovering over the operator reveals a CONVERT_IMPLICIT warning. Whenever we see this, it is an indication that we are comparing two data types that are different enough from each other that they cannot be automatically converted. Instead, SQL Server converts every single value in the table prior to applying the filter.

将鼠标悬停在运算符上会显示CONVERT_IMPLICIT警告。每当我们看到这种情况时，就表明我们正在比较两个彼此完全不同以至于无法自动转换的数据类型。而是，SQL Server在应用筛选器之前转换表中的每个单个值。

When we hover over the NationalIDNumber column in SSMS, we can confirm that it is in fact an NVARCHAR(15). The value we are comparing it to is a numeric. The solution to this problem is very similar to when we had a function on a column: Move the conversion over to the scalar value, instead of the column. In this case, we would change the scalar value 658797903 to the string representation, ‘658797903’:

当我们将鼠标悬停在SSMS中的NationalIDNumber列上时，我们可以确认它实际上是NVARCHAR（15）。我们要与之比较的值是一个数字。此问题的解决方案与在列上具有函数时非常相似：将转换移至标量值而不是列。在这种情况下，我们将标量值658797903更改为字符串表示形式'658797903'：

SELECT
	EMP.BusinessEntityID,
	EMP.LoginID,
	EMP.JobTitle
FROM HumanResources.Employee EMP
WHERE EMP.NationalIDNumber = '658797903'

This simple change will completely alter how the query optimizer handles the query:

这个简单的更改将完全改变查询优化器处理查询的方式：

The result is an index seek instead of a scan, less IO, and the implicit conversion warning is gone from our execution plan.

结果是索引查找而不是扫描，从而减少了IO，并且隐式转换警告已从我们的执行计划中删除。

Implicit conversions are easy to spot as you’ll get a prominent warning from SQL Server in the execution plan whenever it happens. Once you’ve been tipped off to this problem, you can check the data types of the columns indicated in the warning and resolve the issue.

隐式转换很容易发现，因为您会在执行计划中从SQL Server得到明显的警告。解决此问题后，可以检查警告中指示的列的数据类型并解决问题。

结论 (Conclusion)

Query optimization is a huge topic that can easily become overwhelming without a good dose of focus. The best way to approach a performance problem is to find specific areas of focus that are most likely the cause of latency. A stored procedure could be 10,000 lines long, but only a single line needs to be addressed to resolve the problem. In these scenarios, finding the suspicious, high-cost, high resource-consuming parts of a script can quickly narrow down the search and allow us to solve a problem rather than hunt for it.

查询优化是一个巨大的主题，如果没有足够的重点，很容易变得不知所措。解决性能问题的最佳方法是找到最有可能导致延迟的特定关注领域。存储过程的长度可能为10,000行，但是只需一行即可解决该问题。在这些情况下，找到脚本中可疑，高成本，高资源消耗的部分可以Swift缩小搜索范围，并允许我们解决问题而不是寻找问题。

The information in this article should provide a good starting point to tackling latency and performance problems. Query optimization sometimes requires additional resources, such as adding a new index but often can end up as a freebie. When we can improve performance solely by rewriting a query, we reduce resource consumption at no cost (aside from our time). As a result, query optimization can be a direct source of cost-savings! In addition to saving money, resources, and the sanity of those waiting for queries to complete, there is a great deal of satisfaction to be gained by improving a process at no further cost to anyone else.

本文中的信息应为解决延迟和性能问题提供一个很好的起点。查询优化有时需要额外的资源，例如添加新索引，但通常最终可以成为免费赠品。当我们仅通过重写查询就可以提高性能时，我们会免费（除了我们的时间）减少资源消耗。结果，查询优化可以直接节省成本！除了节省金钱，资源和等待查询完成的人员的理智之外，通过改进流程也可以获得很多满足，而其他任何人都无需承担任何其他费用。

Thanks for reading, and let’s keep on making things go faster!

感谢您的阅读，让我们继续前进！

目录 (Table of contents)

Query optimization techniques in SQL Server: the basics

Query optimization techniques in SQL Server: tips and tricks

Query optimization techniques in SQL Server: Database Design and Architecture

Query Optimization Techniques in SQL Server: Parameter Sniffing

SQL Server中的查询优化技术：基础

SQL Server中的查询优化技术：提示和技巧

SQL Server中的查询优化技术：数据库设计和体系结构

SQL Server中的查询优化技术：参数嗅探

翻译自: https://www.sqlshack.com/query-optimization-techniques-in-sql-server-the-basics/

你可能感兴趣的:(大数据,编程语言,数据库,python,机器学习)

机器学习与深度学习间关系与区别 ℒℴѵℯ心·动ꦿ໊ོ꫞ 人工智能学习深度学习 python
一、机器学习概述定义机器学习（MachineLearning,ML）是一种通过数据驱动的方法，利用统计学和计算算法来训练模型，使计算机能够从数据中学习并自动进行预测或决策。机器学习通过分析大量数据样本，识别其中的模式和规律，从而对新的数据进行判断。其核心在于通过训练过程，让模型不断优化和提升其预测准确性。主要类型1.监督学习（SupervisedLearning）监督学习是指在训练数据集中包含输入
理解Gunicorn：Python WSGI服务器的基石范范0825 ipython linux 运维
理解Gunicorn：PythonWSGI服务器的基石介绍Gunicorn，全称GreenUnicorn，是一个为PythonWSGI（WebServerGatewayInterface）应用设计的高效、轻量级HTTP服务器。作为PythonWeb应用部署的常用工具，Gunicorn以其高性能和易用性著称。本文将介绍Gunicorn的基本概念、安装和配置，帮助初学者快速上手。1.什么是Gunico
【一起学Rust | 设计模式】习惯语法——使用借用类型作为参数、格式化拼接字符串、构造函数广龙宇一起学Rust #Rust设计模式 rust 设计模式开发语言
提示：文章写完后，目录可以自动生成，如何生成可参考右边的帮助文档文章目录前言一、使用借用类型作为参数二、格式化拼接字符串三、使用构造函数总结前言Rust不是传统的面向对象编程语言，它的所有特性，使其独一无二。因此，学习特定于Rust的设计模式是必要的。本系列文章为作者学习《Rust设计模式》的学习笔记以及自己的见解。因此，本系列文章的结构也与此书的结构相同（后续可能会调成结构），基本上分为三个部分
Python数据分析与可视化实战指南 William数据分析 python python 数据
在数据驱动的时代，Python因其简洁的语法、强大的库生态系统以及活跃的社区，成为了数据分析与可视化的首选语言。本文将通过一个详细的案例，带领大家学习如何使用Python进行数据分析，并通过可视化来直观呈现分析结果。一、环境准备1.1安装必要库在开始数据分析和可视化之前，我们需要安装一些常用的库。主要包括pandas、numpy、matplotlib和seaborn等。这些库分别用于数据处理、数学
python os.environ 江湖偌大 python 深度学习
os.environ['TF_CPP_MIN_LOG_LEVEL']='0'#默认值，输出所有信息os.environ['TF_CPP_MIN_LOG_LEVEL']='1'#屏蔽通知信息（INFO）os.environ['TF_CPP_MIN_LOG_LEVEL']='2'#屏蔽通知信息和警告信息（INFO\WARNING）os.environ['TF_CPP_MIN_LOG_LEVEL']='
Python中os.environ基本介绍及使用方法鹤冲天Pro #Python python 服务器开发语言
文章目录python中os.environos.environ简介os.environ进行环境变量的增删改查python中os.environ的使用详解1.简介2.key字段详解2.1常见key字段3.os.environ.get()用法4.环境变量的增删改查和判断是否存在4.1新增环境变量4.2更新环境变量4.3获取环境变量4.4删除环境变量4.5判断环境变量是否存在python中os.envi
Pyecharts数据可视化大屏：打造沉浸式数据分析体验我的运维人生信息可视化数据分析数据挖掘运维开发技术共享
Pyecharts数据可视化大屏：打造沉浸式数据分析体验在当今这个数据驱动的时代，如何将海量数据以直观、生动的方式展现出来，成为了数据分析师和企业决策者关注的焦点。Pyecharts，作为一款基于Python的开源数据可视化库，凭借其丰富的图表类型、灵活的配置选项以及高度的定制化能力，成为了构建数据可视化大屏的理想选择。本文将深入探讨如何利用Pyecharts打造数据可视化大屏，并通过实际代码案例
Python教程：一文了解使用Python处理XPath 旦莫 Python进阶 python 开发语言
目录1.环境准备1.1安装lxml1.2验证安装2.XPath基础2.1什么是XPath？2.2XPath语法2.3示例XML文档3.使用lxml解析XML3.1解析XML文档3.2查看解析结果4.XPath查询4.1基本路径查询4.2使用属性查询4.3查询多个节点5.XPath的高级用法5.1使用逻辑运算符5.2使用函数6.实战案例6.1从网页抓取数据6.1.1安装Requests库6.1.2代
Google earth studio 简介陟彼高冈yu 旅游
GoogleEarthStudio是一个基于Web的动画工具，专为创作使用GoogleEarth数据的动画和视频而设计。它利用了GoogleEarth强大的三维地图和卫星影像数据库，使用户能够轻松地创建逼真的地球动画、航拍视频和动态地图可视化。网址为https://www.google.com/earth/studio/。GoogleEarthStudio是一个基于Web的动画工具，专为创作使用G
python os.environ_python os.environ 读取和设置环境变量 weixin_39605414 python os.environ
>>>importos>>>os.environ.keys()['LC_NUMERIC','GOPATH','GOROOT','GOBIN','LESSOPEN','SSH_CLIENT','LOGNAME','USER','HOME','LC_PAPER','PATH','DISPLAY','LANG','TERM','SHELL','J2REDIR','LC_MONETARY','QT_QPA
关于提高复杂业务逻辑代码可读性的思考编程经验分享开发经验 java 数据库开发语言
目录前言需求场景常规写法拆分方法领域对象总结前言实际工作中大部分时间都是在写业务逻辑，一般都是三层架构，表示层（Controller）接收客户端请求，并对入参做检验，业务逻辑层（Service）负责处理业务逻辑，一般开发都是在这一层中写具体的业务逻辑。数据访问层（Dao）是直接和数据库交互的，用于查数据给业务逻辑层，或者是将业务逻辑层处理后的数据写入数据库。简单的增删改查接口不用多说，基本上写好一
SQL Server_查询某一数据库中的所有表的内容 qq_42772833 SQL Server 数据库 sqlserver
1.查看所有表的表名要列出CrabFarmDB数据库中的所有表（名），可以使用以下SQL语句：USECrabFarmDB;--切换到目标数据库GOSELECTTABLE_NAMEFROMINFORMATION_SCHEMA.TABLESWHERETABLE_TYPE='BASETABLE';对这段SQL脚本的解释：SELECTTABLE_NAME：这个语句的作用是从查询结果中选择TABLE_NAM
使用Faiss进行高效相似度搜索 llzwxh888 faiss python
在现代AI应用中，快速和高效的相似度搜索是至关重要的。Faiss（FacebookAISimilaritySearch）是一个专门用于快速相似度搜索和聚类的库，特别适用于高维向量。本文将介绍如何使用Faiss来进行相似度搜索，并结合Python代码演示其基本用法。什么是Faiss？Faiss是一个由FacebookAIResearch团队开发的开源库，主要用于高维向量的相似性搜索和聚类。Faiss
python是什么意思中文-在python中%是什么意思编程大乐趣
Python中%有两种：1、数值运算：%代表取模，返回除法的余数。如：>>>7%212、%操作符（字符串格式化，stringformatting），说明如下：%[(name)][flags][width].[precision]typecode(name)为命名flags可以有+，-，''或0。+表示右对齐。-表示左对齐。''为一个空格，表示在正数的左侧填充一个空格，从而与负数对齐。0表示使用0填
深入理解 MultiQueryRetriever：提升向量数据库检索效果的强大工具 nseejrukjhad 数据库 python
深入理解MultiQueryRetriever：提升向量数据库检索效果的强大工具引言在人工智能和自然语言处理领域，高效准确的信息检索一直是一个关键挑战。传统的基于距离的向量数据库检索方法虽然广泛应用，但仍存在一些局限性。本文将介绍一种创新的解决方案：MultiQueryRetriever，它通过自动生成多个查询视角来增强检索效果，提高结果的相关性和多样性。MultiQueryRetriever的工
Day1笔记-Python简介&标识符和关键字&输入输出 ~在杰难逃~ Python python 开发语言大数据数据分析数据挖掘
大家好，从今天开始呢，杰哥开展一个新的专栏，当然，数据分析部分也会不定时更新的，这个新的专栏主要是讲解一些Python的基础语法和知识，帮助0基础的小伙伴入门和学习Python，感兴趣的小伙伴可以开始认真学习啦！一、Python简介【了解】1.计算机工作原理编程语言就是用来定义计算机程序的形式语言。我们通过编程语言来编写程序代码，再通过语言处理程序执行向计算机发送指令，让计算机完成对应的工作，编程
python八股文面试题分享及解析(1) Shawn________ python
#1.'''a=1b=2不用中间变量交换a和b'''#1.a=1b=2a,b=b,aprint(a)print(b)结果：21#2.ll=[]foriinrange(3):ll.append({'num':i})print(11)结果:#[{'num':0},{'num':1},{'num':2}]#3.kk=[]a={'num':0}foriinrange(3):#0,12#可变类型，不仅仅改变
每日算法&面试题，大厂特训二十八天——第二十天（树）肥学 ⚡算法题⚡面试题每日精进 java 算法数据结构
目录标题导读算法特训二十八天面试题点击直接资料领取导读肥友们为了更好的去帮助新同学适应算法和面试题，最近我们开始进行专项突击一步一步来。上一期我们完成了动态规划二十一天现在我们进行下一项对各类算法进行二十八天的一个小总结。还在等什么快来一起肥学进行二十八天挑战吧！！特别介绍小白练手专栏，适合刚入手的新人欢迎订阅编程小白进阶python有趣练手项目里面包括了像《机器人尬聊》《恶搞程序》这样的有趣文章
Python快速入门 —— 第三节：类与对象孤华暗香 Python快速入门 python 开发语言
第三节：类与对象目标：了解面向对象编程的基础概念，并学会如何定义类和创建对象。内容：类与对象：定义类：class关键字。类的构造函数：__init__()。类的属性和方法。对象的创建与使用。示例：classStudent:def__init__(self,name,age,major):self.name&#
MongoDB Oplog 窗口喝醉酒的小白 MongoDB 运维
在MongoDB中，oplog（操作日志）是一个特殊的日志系统，用于记录对数据库的所有写操作。oplog允许副本集成员（通常是从节点）应用主节点上已经执行的操作，从而保持数据的一致性。它是MongoDB副本集实现数据复制的基础。MongoDBOplog窗口oplog窗口是指在MongoDB副本集中，从节点可以用来同步数据的时间范围。这个窗口通常由以下因素决定：Oplog大小：oplog的大小是有限
pyecharts——绘制柱形图折线图 2224070247 信息可视化 python java 数据可视化
一、pyecharts概述自2013年6月百度EFE(ExcellentFrontEnd）数据可视化团队研发的ECharts1.0发布到GitHub网站以来，ECharts一直备受业界权威的关注并获得广泛好评，成为目前成熟且流行的数据可视化图表工具，被应用到诸多数据可视化的开发领域。Python作为数据分析领域最受欢迎的语言，也加入ECharts的使用行列，并研发出方便Python开发者使用的数据
Python 实现图片裁剪（附代码） | Python工具剑客阿良_ALiang
前言本文提供将图片按照自定义尺寸进行裁剪的工具方法，一如既往的实用主义。环境依赖ffmpeg环境安装，可以参考我的另一篇文章：windowsffmpeg安装部署_阿良的博客-CSDN博客本文主要使用到的不是ffmpeg，而是ffprobe也在上面这篇文章中的zip包中。ffmpy安装：pipinstallffmpy-ihttps://pypi.douban.com/simple代码不废话了，上代码
【华为OD技术面试真题 - 技术面】- python八股文真题题库（4) 算法大师华为od 面试 python
华为OD面试真题精选专栏：华为OD面试真题精选目录:2024华为OD面试手撕代码真题目录以及八股文真题目录文章目录华为OD面试真题精选**1.Python中的`with`**用途和功能自动资源管理示例：文件操作上下文管理协议示例代码工作流程解析优点2.\_\_new\_\_和**\_\_init\_\_**区别__new____init__区别总结3.**切片（Slicing）操作**基本切片语法
python os 环境变量 CV矿工 python 开发语言 numpy
环境变量：环境变量是程序和操作系统之间的通信方式。有些字符不宜明文写进代码里，比如数据库密码，个人账户密码，如果写进自己本机的环境变量里，程序用的时候通过os.environ.get（）取出来就行了。os.environ是一个环境变量的字典。环境变量的相关操作importos"""设置/修改环境变量：os.environ[‘环境变量名称’]=‘环境变量值’#其中key和value均为string类
Python爬虫解析工具之xpath使用详解 eqa11 python 爬虫开发语言
文章目录Python爬虫解析工具之xpath使用详解一、引言二、环境准备1、插件安装2、依赖库安装三、xpath语法详解1、路径表达式2、通配符3、谓语4、常用函数四、xpath在Python代码中的使用1、文档树的创建2、使用xpath表达式3、获取元素内容和属性五、总结Python爬虫解析工具之xpath使用详解一、引言在Python爬虫开发中，数据提取是一个至关重要的环节。xpath作为一门
【PG】常见数据库、表属性设置江无羡数据库
PG的常见属性配置方法数据库复制、备份相关表的复制标识单表操作批量表操作链接数据库复制、备份相关表的复制标识单表操作通过ALTER语句单独更改一张表的复制标识。ALTERTABLE[tablename]REPLICAIDENTITYFULL;批量表操作通过代码块的方式，对某个schema中的所有表一起更新其复制标识。SELECTtablename,CASErelreplidentWHEN'd'TH
【华为OD技术面试真题 - 技术面】- python八股文真题题库（1）算法大师华为od 面试 python
华为OD面试真题精选专栏：华为OD面试真题精选目录:2024华为OD面试手撕代码真题目录以及八股文真题目录文章目录华为OD面试真题精选1.数据预处理流程数据预处理的主要步骤工具和库2.介绍线性回归、逻辑回归模型线性回归（LinearRegression）模型形式：关键点：逻辑回归（LogisticRegression）模型形式：关键点：参数估计与评估：3.python浅拷贝及深拷贝浅拷贝（Shal
数字里的世界17期：2021年全球10大顶级数据中心，中国移动榜首张三叨
你知道吗？2016年，全球的数据中心共计用电4160亿千瓦时，比整个英国的发电量还多40％！前言每天，我们都会创造超过250万TB的数据。并且随着物联网（IOT）的不断普及，这一数据将持续增长。如此庞大的数据被存储在被称为“数据中心”的专用设施中。虽然最早的数据中心建于20世纪40年代，但直到1997-2000年的互联网泡沫期间才逐渐成为主流。当前人类的技术，比如人工智能和机器学习，已经将我们推向
nosql数据库技术与应用知识点皆过客，揽星河 NoSQL nosql 数据库大数据数据分析数据结构非关系型数据库
Nosql知识回顾大数据处理流程数据采集(flume、爬虫、传感器)数据存储(本门课程NoSQL所处的阶段)Hdfs、MongoDB、HBase等数据清洗(入仓)Hive等数据处理、分析(Spark、Flink等)数据可视化数据挖掘、机器学习应用(Python、SparkMLlib等)大数据时代存储的挑战(三高)高并发(同一时间很多人访问)高扩展(要求随时根据需求扩展存储)高效率(要求读写速度快)
《Python数据分析实战终极指南》 xjt921122 python 数据分析开发语言
对于分析师来说，大家在学习Python数据分析的路上，多多少少都遇到过很多大坑**，有关于技能和思维的**：Excel已经没办法处理现有的数据量了，应该学Python吗？找了一大堆Python和Pandas的资料来学习，为什么自己动手就懵了？跟着比赛类公开数据分析案例练了很久，为什么当自己面对数据需求还是只会数据处理而没有分析思路？学了对比、细分、聚类分析，也会用PEST、波特五力这类分析法，为啥
java线程Thread和Runnable区别和联系 zx_code java jvm thread 多线程 Runnable
我们都晓得java实现线程2种方式，一个是继承Thread，另一个是实现Runnable。模拟窗口买票，第一例子继承thread，代码如下 package thread; public class ThreadTest { public static void main(String[] args) { Thread1 t1 = new Thread1(
【转】JSON与XML的区别比较丁_新 json xml
1.定义介绍 (1).XML定义扩展标记语言 (Extensible Markup Language, XML) ，用于标记电子文件使其具有结构性的标记语言，可以用来标记数据、定义数据类型，是一种允许用户对自己的标记语言进行定义的源语言。 XML使用DTD(document type definition)文档类型定义来组织数据;格式统一，跨平台和语言，早已成为业界公认的标准。 XML是标
c++ 实现五种基础的排序算法 CrazyMizzz C++c 算法
#include<iostream> using namespace std; //辅助函数，交换两数之值 template<class T> void mySwap(T &x, T &y){ T temp = x; x = y; y = temp; } const int size = 10; //一、用直接插入排
我的软件麦田的设计者我的软件音乐类娱乐放松
这是我写的一款app软件，耗时三个月，是一个根据央视节目开门大吉改变的，提供音调，猜歌曲名。1、手机拥有者在android手机市场下载本APP，同意权限，安装到手机上。2、游客初次进入时会有引导页面提醒用户注册。（同时软件自动播放背景音乐）。3、用户登录到主页后，会有五个模块。a、点击不胫而走，用户得到开门大吉首页部分新闻，点击进入有新闻详情。b、
linux awk命令详解被触发 linux awk
awk是行处理器: 相比较屏幕处理的优点，在处理庞大文件时不会出现内存溢出或是处理缓慢的问题，通常用来格式化文本信息 awk处理过程: 依次对每一行进行处理，然后输出 awk命令形式: awk [-F|-f|-v] ‘BEGIN{} //{command1; command2} END{}’ file [-F|-f|-v]大参数，-F指定分隔符，-f调用脚本，-v定义变量 var=val
各种语言比较 _wy_ 编程语言
Java Ruby PHP 擅长领域
oracle 中数据类型为clob的编辑知了ing oracle clob
public void updateKpiStatus(String kpiStatus,String taskId){ Connection dbc=null; Statement stmt=null; PreparedStatement ps=null; try { dbc = new DBConn().getNewConnection(); //stmt = db
分布式服务框架 Zookeeper -- 管理分布式环境中的数据矮蛋蛋 zookeeper
原文地址： http://www.ibm.com/developerworks/cn/opensource/os-cn-zookeeper/ 安装和配置详解本文介绍的 Zookeeper 是以 3.2.2 这个稳定版本为基础，最新的版本可以通过官网 http://hadoop.apache.org/zookeeper/来获取，Zookeeper 的安装非常简单，下面将从单机模式和集群模式两
tomcat数据源 alafqq tomcat
数据库 JNDI(Java Naming and Directory Interface，Java命名和目录接口)是一组在Java应用中访问命名和目录服务的API。没有使用JNDI时我用要这样连接数据库： 03. Class.forName("com.mysql.jdbc.Driver"); 04. conn
遍历的方法百合不是茶遍历
遍历在java的泛
linux查看硬件信息的命令 bijian1013 linux
linux查看硬件信息的命令一.查看CPU： cat /proc/cpuinfo 二.查看内存： free 三.查看硬盘： df linux下查看硬件信息 1、lspci 列出所有PCI 设备； lspci - list all PCI devices:列出机器中的PCI设备（声卡、显卡、Modem、网卡、USB、主板集成设备也能
java常见的ClassNotFoundException bijian1013 java
1.java.lang.ClassNotFoundException: org.apache.commons.logging.LogFactory 添加包common-logging.jar2.java.lang.ClassNotFoundException: javax.transaction.Synchronization
【Gson五】日期对象的序列化和反序列化 bit1129 反序列化
对日期类型的数据进行序列化和反序列化时，需要考虑如下问题： 1. 序列化时，Date对象序列化的字符串日期格式如何 2. 反序列化时，把日期字符串序列化为Date对象，也需要考虑日期格式问题 3. Date A -> str -> Date B,A和B对象是否equals 默认序列化和反序列化 import com
【Spark八十六】Spark Streaming之DStream vs. InputDStream bit1129 Stream
1. DStream的类说明文档： /** * A Discretized Stream (DStream), the basic abstraction in Spark Streaming, is a continuous * sequence of RDDs (of the same type) representing a continuous st
通过nginx获取header信息 ronin47 nginx header
1. 提取整个的Cookies内容到一个变量，然后可以在需要时引用，比如记录到日志里面， if ( $http_cookie ~* "(.*)$") { set $all_cookie $1; } 变量$all_cookie就获得了cookie的值，可以用于运算了
java-65.输入数字n，按顺序输出从1最大的n位10进制数。比如输入3，则输出1、2、3一直到最大的3位数即999 bylijinnan java
参考了网上的http://blog.csdn.net/peasking_dd/article/details/6342984 写了个java版的： public class Print_1_To_NDigit { /** * Q65.输入数字n，按顺序输出从1最大的n位10进制数。比如输入3，则输出1、2、3一直到最大的3位数即999 * 1.使用字符串
Netty源码学习-ReplayingDecoder bylijinnan java netty
ReplayingDecoder是FrameDecoder的子类，不熟悉FrameDecoder的，可以先看看 http://bylijinnan.iteye.com/blog/1982618 API说，ReplayingDecoder简化了操作，比如： FrameDecoder在decode时，需要判断数据是否接收完全： public class IntegerH
js特殊字符过滤 cngolon js特殊字符 js特殊字符过滤
1.js中用正则表达式过滤特殊字符, 校验所有输入域是否含有特殊符号function stripscript(s) { var pattern = new RegExp("[`~!@#$^&*()=|{}':;',\\[\\].<>/?~！@#￥……&*（）——|{}【】‘；：”“'。，、？]"
hibernate使用sql查询 ctrain Hibernate
import java.util.Iterator; import java.util.List; import java.util.Map; import org.hibernate.Hibernate; import org.hibernate.SQLQuery; import org.hibernate.Session; import org.hibernate.Transa
linux shell脚本中切换用户执行命令方法 daizj linux shell 命令切换用户
经常在写shell脚本时，会碰到要以另外一个用户来执行相关命令，其方法简单记下： 1、执行单个命令：su - user -c "command" 如：下面命令是以test用户在/data目录下创建test123目录 [root@slave19 /data]# su - test -c "mkdir /data/test123"
好的代码里只要一个 return 语句 dcj3sjt126com return
别再这样写了：public boolean foo() { if (true) { return true; } else { return false;
Android动画效果学习 dcj3sjt126com android
1、透明动画效果方法一：代码实现 public View onCreateView(LayoutInflater inflater, ViewGroup container, Bundle savedInstanceState) { View rootView = inflater.inflate(R.layout.fragment_main, container, fals
linux复习笔记之bash shell (4)管道命令 eksliang linux管道命令汇总 linux管道命令 linux常用管道命令
转载请出自出处： http://eksliang.iteye.com/blog/2105461 bash命令执行的完毕以后，通常这个命令都会有返回结果，怎么对这个返回的结果做一些操作呢？那就得用管道命令‘|’。上面那段话，简单说了下管道命令的作用，那什么事管道命令呢？答：非常的经典的一句话，记住了，何为管
Android系统中自定义按键的短按、双击、长按事件 gqdy365 android
在项目中碰到这样的问题：由于系统中的按键在底层做了重新定义或者新增了按键，此时需要在APP层对按键事件（keyevent）做分解处理，模拟Android系统做法，把keyevent分解成： 1、单击事件：就是普通key的单击； 2、双击事件：500ms内同一按键单击两次； 3、长按事件：同一按键长按超过1000ms（系统中长按事件为500ms）； 4、组合按键：两个以上按键同时按住；
asp.net获取站点根目录下子目录的名称 hvt .net C#asp.net hovertree Web Forms
使用Visual Studio建立一个.aspx文件(Web Forms)，例如hovertree.aspx,在页面上加入一个ListBox代码如下： <asp:ListBox runat="server" ID="lbKeleyiFolder" /> 那么在页面上显示根目录子文件夹的代码如下： string[] m_sub
Eclipse程序员要掌握的常用快捷键 justjavac java eclipse 快捷键 ide
判断一个人的编程水平，就看他用键盘多，还是鼠标多。用键盘一是为了输入代码（当然了，也包括注释），再有就是熟练使用快捷键。曾有人在豆瓣评《卓有成效的程序员》：“人有多大懒，才有多大闲”。之前我整理了一个程序员图书列表，目的也就是通过读书，让程序员变懒。写道程序员作为特殊的群体，有的人可以这么懒，懒到事情都交给机器去做，而有的人又可
c++编程随记 lx.asymmetric C++笔记
为了字体更好看，改变了格式…… &&运算符： #include<iostream> using namespace std; int main(){ int a=-1,b=4,k; k=(++a<0)&&!(b--
linux标准IO缓冲机制研究音频数据 linux
一、什么是缓存I/O(Buffered I/O)缓存I/O又被称作标准I/O,大多数文件系统默认I/O操作都是缓存I/O。在Linux的缓存I/O机制中，操作系统会将I/O的数据缓存在文件系统的页缓存(page cache)中，也就是说，数据会先被拷贝到操作系统内核的缓冲区中，然后才会从操作系统内核的缓冲区拷贝到应用程序的地址空间。1.缓存I/O有以下优点:A.缓存I/O使用了操作系统内核缓冲区，
随想生活暗黑小菠萝生活
其实账户之前就申请了，但是决定要自己更新一些东西看也是最近。从毕业到现在已经一年了。没有进步是假的，但是有多大的进步可能只有我自己知道。毕业的时候班里12个女生，真正最后做到软件开发的只要两个包括我，PS：我不是说测试不好。当时因为考研完全放弃找工作，考研失败，我想这只是我的借口。那个时候才想到为什么大学的时候不能好好的学习技术，增强自己的实战能力，以至于后来找工作比较费劲。我
我认为POJO是一个错误的概念 windshome java POJO 编程 J2EE 设计
这篇内容其实没有经过太多的深思熟虑，只是个人一时的感觉。从个人风格上来讲，我倾向简单质朴的设计开发理念；从方法论上，我更加倾向自顶向下的设计；从做事情的目标上来看，我追求质量优先，更愿意使用较为保守和稳妥的理念和方法。 &