Top 10 steps to optimize data access in SQL Server: Part I (Use indexing)

http://www.codeproject.com/Articles/34372/Top-10-steps-to-optimize-data-access-in-SQL-Server

Top 10 steps to optimize data access in SQL Server: Part I (use indexing)

By Al-Farooque Shubho, 28 Apr 2009

Introduction

"It's been months since you and your team have developed and deployed a site successfully in the internet. You have a pretty satisfied client so far as the site was able to attract thousands of users to register and use the site within a small amount of time. Your client, management, team and you - everybody is happy.

Life is not a bed of roses. As the number of users in the site started growing at a rapid rate day by day, problems started occurring. E-mails started to arrive from the client complaining that the site is performing too slowly (some of them ware angry mails). The client claimed that they started losing users.

You start investigating the application. Soon you discover that the production database was performing extremely slowly when the application was trying to access/update data. Looking into the database, you find that the database tables have grown large in size and some of them were containing hundreds of thousands of rows. The testing team performed a test on the production site, and they found that the order submission process was taking 5 long minutes to complete, whereas it used to take only 2/3 seconds to complete in the test site before production launch."

This is the same old story for thousands of application projects developed worldwide. Almost every developer, including me, has taken part in the story sometime in his/her development life. I know why such situations take place, and I can tell you what to do to overcome this.

Let's face it. If you are part of this story, you must have not written the data access routines in your application in the best possible way, and it's time to optimize those now. I want to help you do this by sharing my data access optimization experiences and findings with you in this series of articles. I just hope this might enable you to optimize your data access routines in existing systems, or to develop data access routines in an optimized way in your future projects.

Scope

Please note that the primary focus of this series of articles is "data access performance optimization in transactional (OLTP) SQL Server databases". But, most of the optimization techniques are roughly the same for other database platforms.

Also, the optimization techniques I am going to discuss are applicable for software application developers only. That is, as a developer, I'll focus on the issues that you need to follow to make sure that you have done everything that you could do to optimize the data access code you have written or you are going to write in future. Database Administrators (DBAs) also have a great role to play in optimizing and tuning database performance. But, optimization scopes that fall into a DBA's area are out of scope for these articles.

We have a database to optimize, let's start!

When a database based application performs slowly, there is a 90% probability that the data access routines of that application are not optimized, or not written in the best possible way. So, you need to review and optimize your data access/manipulation routines for improving the overall application performance.

Let us start our optimization mission in a step-by-step process:

Step 1: Apply proper indexing in the table columns in the database

Well, some could argue whether implementing proper indexing should be the first step in the performance optimization process for a database. But I would prefer applying indexing properly in the database in the first place, because of the following two reasons:

This will allow you to achieve the best possible performance in the quickest amount of time in a production system.
Applying/creating indexes in the database will not require you to do any application modification, and thus will not require any build and deployment.

Of course, this quick performance improvement can be achieved if you find that indexing is not properly done in the current database. However, if indexing is already done, I would still recommend you to go through this step.

What is indexing?

I believe you know what indexing is. But, I've seen many people being unclear on this. So, let us try to understand indexing once again. Let us read a small story.

Long ago, there was a big library in an ancient city. It had thousands of books, but the books ware not arranged in any order in the book shelves. So, each time a person asked for a book to the librarian, the librarian had no way but to check every book to find the required book that the person wanted. Finding the desired book used to take hours for the librarian, and most of the time, the persons who asked for the book had to wait for a long time.

[Hmm... sounds like a table that has no primary key. When data is searched for in a table, the database engine has to scan through the entire table to find the corresponding row, which performs very slow.]

Life was getting miserable for the librarian as the number of books and persons asking for books increased day by day. Then one day, a wise guy came to the library, and seeing the librarian's measurable life, he advised him to number each book and arrange the book shelves according to their numbers. "What benefit would I get?", asked the librarian. The wise guy answered, "Well, now if somebody gives you a book number and asks for that book, you will be able to find the shelves quickly that contains the book's number, and within that shelf, you can find that book very quickly as books are arranged according to their number".

[Numbering the books sounds like creating a primary key in a database table. When you create a primary key in a table, a clustered index tree is created, and all data pages containing the table rows are physically sorted in the file system according to their primary key values. Each data page contains rows, which are also sorted within the data page according to their primary key values. So, each time you ask for any row from the table, the database server finds the corresponding data page first using the clustered index tree (like finding the book shelf first) and then finds the desired row within the data page that contains the primary key value (like finding the book within the shelf).]

"This is exactly what I need!" The excited librarian instantly started numbering the books and arranging them across different book shelves. He spent a whole day to do this arrangement, but at the end of the day, he tested and found that a book now could be found using the number within no time at all! The librarian was extremely happy.

[That's exactly what happens when you create a primary key in a table. Internally, a clustered index tree is created, and the data pages are physically sorted within the data file according to the primary key values. As you can easily understand, only one clustered index can be created for a table as the data can be physically arranged only using one column value as the criteria (primary key). It's like the books can only be arranged using one criterion (book number here).]

Wait! The problem was not completely solved yet. The very next day, a person asked a book by the book's name (he didn't have the book's number, all he had was the book's name). The poor librarian had no way but to scan all the numbered books from 1 to N to find the one the person asked for. He found the book in the 67^th shelf. It took 20 minutes for the librarian to find the book. Earlier, he used to take 2-3 hours to find a book when they were not arranged in the shelves, so that was an improvement. But, comparing to the time to search a book using its number (30 seconds), this 20 minute seemed to be a very high amount of time to the librarian. So, he asked the wise man how to improve on this.

[This happens when you have a Product table where you have a primary key ProductID, but you have no other index in the table. So, when a product is to be searched using the Product Name, the database engine has no way but to scan all physically sorted data pages in the file to find the desired item.]

The wise man told the librarian: "Well, as you have already arranged your books using their serial numbers, you cannot re-arrange them. Better create a catalog or index where you will have all the book's names and their corresponding serial numbers. In this catalog, arrange the book names in their alphabetic number and group the book names using each alphabet so that if any one wants to find a book named "Database Management System", you just follow these steps to find the book:

Jump into the section "D" of your book name "catalog" and find the book name there.
Read the corresponding serial number of the book and find the book using the serial number (you already know how to do this).

"You are a genius!", exclaimed the librarian. Spending some hours, he immediately created the "Book name" catalog, and with a quick test, he found that he only required a minute (30 seconds to find the book's serial number in the "Book name" catalog and 30 seconds to find the book using the serial number) to find a book using its name.

The librarian thought that people might ask for books using several other criteria like book name and/or author's name etc., so he created a similar catalog for author names, and after creating these catalogs, the librarian could find any book using a common book finding criteria (serial number, book name, author's name) within a minute. The miseries of the librarian ended soon, and lots of people started gathering at the library for books as they could get books really fast, and the library became very popular.

The librarian started passing his life happily ever after. The story ends.

By this time, I am sure you have understood what indexes really are, why they are important, and what their inner workings are. For example, if we have a "Products" table, along with creating a clustered index (that is automatically created when creating the primary key in the table), we should create a non-clustered index on the ProductName column. If we do this, the database engine creates an index tree for the non-clustered index (like the "book name" catalog in the story) where the product names will be sorted within the index pages. Each index page will contain a range of product names along with their corresponding primary key values. So, when a product is searched using the product name in the search criteria, the database engine will first seek the non-clustered index tree for product name to find the primary key value of the book. Once found, the database engine then searches the clustered index tree with the primary key to find the row for the actual item that is being searched.

Following is how an index tree looks like:

Index tree structure

This is called a B+ Tree (Balanced tree). The intermediate nodes contain a range of values and directs the SQL engine where to go while searching for a specific index value in the tree, starting from the root node. The leaf nodes are the nodes which contain the actual index values. If this is a clustered index tree, the leaf nodes are the physical data pages. If this is a non-clustered index tree, the leaf nodes contain index values along with the clustered index keys (which the database engine uses to find the corresponding row in the clustered index tree).

Usually, finding a desired value in the index tree and jumping to the actual row from there takes an extremely small amount of time for the database engine. So, indexing generally improves the data retrieval operations.

Time to apply indexing in your database to retrieve results fast!

Follow these steps to ensure proper indexing in your database:

Make sure that every table in your database has a primary key.

This will ensure that every table has a clustered index created (and hence, the corresponding pages of the table are physically sorted in the disk according to the primary key field). So, any data retrieval operation from the table using the primary key, or any sorting operation on the primary key field or any range of primary key values specified in the where clause will retrieve data from the table very fast.

Create non-clustered indexes on columns which are:

Frequently used in the search criteria
Used to join other tables
Used as foreign key fields
Of having high selectivity (column which returns a low percentage (0-5%) of rows from a total number of rows on a particular value)
Used in the ORDER BY clause
Of type XML (primary and secondary indexes need to be created; more on this in the coming articles)

Following is an example of an index creation command on a table:

Collapse | Copy Code

CREATE INDEX

NCLIX_OrderDetails_ProductID ON

dbo.OrderDetails(ProductID)

Alternatively, you can use SQL Server Management Studio to create an index on the desired table

Creating an index using SQL Server Management Studio

Step 2: Create the appropriate covering indexes

So you have created all the appropriate indexes in your database, right? Suppose, in this process, you have created an index on a foreign key column (ProductID) in the Sales(SelesID,SalesDate,SalesPersonID,ProductID,Qty) table. Now, assuming that the ProductID column is a "highly selective" column (selects less than 5% of the total number of rows rows using any ProductID value in the search criteria), any SELECT query that reads data from this table using the indexed column (ProductID) in the where clause should run fast, right?

Yes, it does, compared to the situation where no index is created on the foreign key column (ProductID), in which case, a full table scan is done (scanning all related pages in the table to retrieve desired data). But still, there is further scope to improve this query.

Let's assume that the Sales table contains 10,000 rows, and the following SQL selects 400 rows (4% of the total rows):

Collapse | Copy Code

SELECT SalesDate, SalesPersonID FROM Sales WHERE ProductID = 112

Let's try to understand how this SQL gets executed in the SQL execution engine:

The Sales table has a non-clustered index on the ProductID column. So it "seeks" the non-clustered index tree for finding the entry that contains ProductID=112.
The index page that contains the entry ProductID = 112 also contains all the clustered index keys (all primary key values, that is SalesIDs, that have ProductID = 112 assuming that the primary key is already created in the Sales table).
For each primary key (400 here), the SQL Server engine "seeks" into the clustered index tree to find the actual row locations in the corresponding page.
For each primary key, when found, the SQL Server engine selects the SalesDate and SalesPersonID column values from the corresponding rows.

Please note that in the above steps, for each of the primary key entries (400 here) for ProductID = 112, the SQL Server engine has to search the clustered index tree (400 times here) to retrieve the additional columns (SalesDate, SalesPersonID) in the query.

It seems that, along with containing clustered index keys (primary key values), if the non-clustered index page could also contain two other column values specified in the query (SalesDate, SalesPersonID), the SQL Server engine would not have to perform steps 3 and 4 above and, thus, would be able to select the desired results even faster just by "seeking" into the non-clustered index tree for the ProductID column and reading all the three mentioned column values directly from that index page.

Fortunately, there is a way to implement this feature. This is what is called "covered index". You create "covered indexes" in table columns to specify what additional column values the index page should store along with the clustered index key values (primary keys). Following is an example of creating a covered index on the ProductID column in the Sales table:

Collapse | Copy Code

CREATE INDEX NCLIX_Sales_ProductID--Index name

ON dbo.Sales(ProductID)--Column on which index is to be created

INCLUDE(SalesDate, SalesPersonID)--Additional column values to include

Please note that covered index should be created including a few columns that are frequently used in the Select queries. Including too many columns in the covered indexes would not give you too much benefit. Rather, doing this would require too much memory to store all the covered index column values, resulting in over consumption of memory and slow performance.

Use the Database Tuning Advisor's help while creating covered index

We all know, when a SQL is issued, the optimizer in the SQL Server engine dynamically generates different query plans based on:

Volume of data
Statistics
Index variation
Parameter value in TSQL
Load on server

That means, for a particular SQL, the execution plan generated in the production server may not be the same execution plan that is generated in the test server, even though the table and index structure are the same. This also indicates that an index created in the test server might boost some of your TSQL performance in the test application, but creating the same index in the production database might not give you any performance benefit in the production application! Why? Well, because the SQL execution plans in the test environment utilizes the newly created indexes and thus gives you better performance. But the execution plans that are being generated in the production server might not use the newly created index at all for some reasons (for example, a non-clustered index column is not "highly" selective in the production server database, which is not the case in the test server database).

So, while creating indexes, we need to make sure that the index would be utilized by the execution engine to produce faster results. But, how can we do this?

The answer is, we have to simulate the production server's load in the test server, and then need to create the appropriate indexes and test those. Only then, if the newly created indexes improve performance in the test environment, these will most likely improve performance in the production environment.

Doing this should be hard, but fortunately, we have some friendly tools to do this. Follow these instructions:

Use SQL Profiler to capture traces in the production server. Use the Tuning template (I know, it is advised not to use SQL Profiler in a production database, but sometimes you have to use it while diagnosing performance problems in production). If you are not familiar with this tool, or if you need to learn more about profiling and tracing using SQL Profiler, read http://msdn.microsoft.com/en-us/library/ms181091.aspx.
Use the trace file generated in the previous step to create a similar load in the test database server using the Database Tuning Advisor. Ask the Tuning Advisor to give some advice (index creation advice in most cases). You are most likely to get good realistic (index creation) advice from the Tuning Advisor (because the Tuning Advisor loads the test database with the trace generated from the production database and then tried to generate the best possible indexing suggestion). Using the Tuning Advisor tool, you can also create the indexes that it suggests. If you are not familiar with the Tuning Advisor tool, or if you need to learn more about using the Tuning Advisor, read http://msdn.microsoft.com/en-us/library/ms166575.aspx.

Step 3: Defragment indexes if fragmentation occurs

OK, you created all the appropriate indexes in your tables. Or, may be, indexes are already there in your database tables. But you might not still get the desired good performance according to your expectations.

There is a strong chance that index fragmentation has occurred.

What is index fragmentation?

Index fragmentation is a situation where index pages split due to heavy insert, update, and delete operations on the tables in the database. If indexes have high fragmentation, either scanning/seeking the indexes takes much time, or the indexes are not used at all (resulting in table scan) while executing queries. Thus, data retrieval operations perform slow.

Two types of fragmentation can occur:

Internal Fragmentation: Occurs due to data deletion/update operations in the index pages which end up in the distribution of data as a sparse matrix in the index/data pages (creates lots of empty rows in the pages). Also results in an increase of index/data pages that increases query execution time.
External Fragmentation: Occurs due to data insert/update operations in the index/data pages which end up in page splitting and allocation of new index/data pages that are not contiguous in the file system. That reduces performance in determining the query result where ranges are specified in the "where" clauses. Also, the database server cannot take advantage of the read-ahead operations as the next related data pages are not guaranteed to be contiguous, rather these next pages could be anywhere in the data file.

How to know whether index fragmentation has occurred or not?

Execute the following SQL in your database (the following SQL will work in SQL Server 2005 or later databases). Replace the database name 'AdventureWorks' with the target database name in the following query:

Collapse | Copy Code

SELECT object_name(dt.object_id) Tablename,si.name

IndexName,dt.avg_fragmentation_in_percent AS

ExternalFragmentation,dt.avg_page_space_used_in_percent AS

InternalFragmentation

FROM

(

    SELECT object_id,index_id,avg_fragmentation_in_percent,avg_page_space_used_in_percent

    FROM sys.dm_db_index_physical_stats (db_id('AdventureWorks'),null,null,null,'DETAILED'

)

WHERE index_id <> 0) AS dt INNER JOIN sys.indexes si ON si.object_id=dt.object_id

AND si.index_id=dt.index_id AND dt.avg_fragmentation_in_percent>10

AND dt.avg_page_space_used_in_percent<75 ORDER BY avg_fragmentation_in_percent DESC

The above query shows index fragmentation information for the 'AdventureWorks' database as follows:

Index fragmentation information

Analyzing the result, you can determine where the index fragmentation has occurred, using the following rules:

ExternalFragmentation value > 10 indicates external fragmentation occurred for the corresponding index
InternalFragmentation value < 75 indicates internal fragmentation occurred for the corresponding index

How to defragment indexes?

You can do this in two ways:

Reorganize the fragmented indexes: execute the following command to do this:

Collapse | Copy Code

ALTER INDEX ALL ON TableName REORGANIZE

Rebuild indexes: execute the following command to do this:

Collapse | Copy Code

ALTER INDEX ALL ON TableName REBUILD WITH (FILLFACTOR=90,ONLINE=ON)

You can also rebuild or reorganize individual indexes in the tables by using the index name instead of the 'ALL' keyword in the above queries. Alternatively, you can also use SQL Server Management Studio to do index defragmentation.

Rebuilding index using SQL Server Management Studio

When to reorganize and when to rebuild indexes?

You should "reorganize" indexes when the External Fragmentation value for the corresponding index is between 10-15 and the Internal Fragmentation value is between 60-75. Otherwise, you should rebuild indexes.

One important thing with index rebuilding is, while rebuilding indexes for a particular table, the entire table will be locked (which does not occur in the case of index reorganization). So, for a large table in the production database, this locking may not be desired, because rebuilding indexes for that table might take hours to complete. Fortunately, in SQL Server 2005, there is a solution. You can use the ONLINE option as ON while rebuilding indexes for a table (see the index rebuild command given above). This will rebuild the indexes for the table, along with making the table available for transactions.

Last words

It's really tempting to create an index on all eligible columns in your database tables. But, if you are working with a transactional database (an OLTP system where update operations take place most of the time), creating indexes on all eligible columns might not be desirable every time. In fact, creating heavy indexing on OLTP systems might reduce the overall database performance (as most operations are update operations, updating data means updating indexes as well).

A rule of thumb can be suggested as follows: if you work on a transactional database, you should not create more than 5 indexes on the tables on an average. On the other hand, if you work on a data warehouse application, you should be able to create up to 10 indexes on the tables on an average.

What's next?

Applying indexing properly in your database would enable you to increase performance a lot in a small amount of time. But there are lots of other things you should do to optimize your database, including some advanced indexing features in SQL Server databases. These will be covered in the other optimization steps provided in the next articles.

Take a look at the next optimization steps in the article "Top 10 steps to optimize data access in SQL Server: Part II (re-factor TSQLs and apply best practices)". Have fun.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

你可能感兴趣的:(SQL Server)

2025年渗透测试面试题总结-阿里巴巴-阿里云安全（二面）（题目+回答）独行soc 2025年渗透测试面试指南科技安全 web安全面试职场和发展红蓝攻防阿里云
网络安全领域各种资源，学习文档，以及工具分享、前沿信息分享、POC、EXP分享。不定期分享各种好玩的项目及好用的工具，欢迎关注。目录阿里巴巴-阿里云安全二面一、职业方向与技术偏好1.安全研究vs安全研发的定位二、云安全与身份认证2.云上PKI与身份认证的关注方向三、项目实践与成果3.字节跳动训练营项目四、攻防技术深度解析4.SQL注入攻防方案5.WAF防护原理五、团队协作与效能优化6.分工协作与个
MySQL锁开发小呆瓜数据库数据库 mysql
一、MySQL锁的分类1.按锁的粒度说明适用引擎表锁锁定整个表，并发性低，但开销小。MyISAM、InnoDB（部分场景）行锁仅锁定需要操作的行，并发性高，但开销较大。InnoDB页锁锁定数据页（介于表锁和行锁之间），较少使用。BDB（已废弃）2.按锁的模式说明共享锁（S锁）允许其他事务读取被锁定的行，但禁止修改（读锁）。排他锁（X锁）禁止其他事务读取或修改被锁定的行（写锁）。二、锁的应用场景1.
Flask--orm wakawakaohoh Flask
fromflaskimportFlask#1.导入模块fromflask_sqlalchemyimportSQLAlchemyapp=Flask(__name__)importos#获取绝对路径BASE_DIR=os.path.abspath(os.path.dirname(__file__))print(BASE_DIR)##2.配置数据库连接classConfig(object):#连接数据库
FLASK，ORM(mysql)，12条查询练习语句毛毛是一只狗《Python专栏》flaskORM
ORM查询语句练习，创表数据在flask课件里，在我资源里可以下载查询所有用户数据查询有多少个用户查询第1个用户查询id为4的用户[3种方式]查询名字结尾字符为g的所有数据[开始/包含]查询名字不等于wang的所有数据[2种方式]查询名字和邮箱都以li开头的所有数据[2种方式]查询password是123456或者email以itheima.com结尾的所有数据查询id为[1,3,5,7,9]的用
python/R 连接 clickhouse weixin_41283198 python clickhouse r语言 python 大数据 r语言
1、python-clickhouseimportnumpyasnpfromclickhouse_driverimportClientimportpandasaspdsql=open('/opt/check_detect_local.sql','r',encoding='utf8')sqltxt=sql.readlines()print(len(sqltxt))sqls=[]foriinnp.ar
python flask sqlalchemy JSON 数据查询 Purple_Grape207 python flask python
classUser(db.Model):id=db.Column(db.Integer,primary_key=True)username=db.Column(db.String(80),unique=True,nullable=False)email=db.Column(db.String(120),unique=True,nullable=False)userInfos=db.Column(d
Python通过TCP端口和HTTP端口连接clickhouse的几种方法与报错解决有好的生发方法记得推荐给我 clickhouse
一、使用request库使用HTTP协议端口，默认为8123这种方法只能获取指定格式的数据importrequestsSSL_VERIFY=Falsehost='http://127.0.0.1:8123'//ip地址及HTTP协议端口query='select*fromdatabase.table_nameslimit5'//SQL语句user=('username','password')//
前端小食堂 | Day16 - 前端监控の天眼通喵爪排序前端
️今日天眼：错误追踪与性能透视1.错误监控の捕虫网//全局错误捕获window.addEventListener('error',(e)=>{sendToServer({type:'JS_ERROR',message:e.message,stack:e.error?.stack,filename:e.filename,lineno:e.lineno});});//️Promise未捕获异常wind
Nginx配置 ngx_http_proxy_connect_module 模块及安装 huazhixuthink nginx 运维
1、配置完互联网yum源后，安装相关依赖软件包[root@serversoft]#yuminstall-ypatchpcrepcre-develmakegccgcc-c++opensslopenssh[root@serversoft]#yuminstallopenssl*2、解压缩软件，加载模块[root@serversoft]#lsnginx-1.20.2nginx-1.20.2.tar.gzn
使用 Golang 操作 MySQL yinhezhanshen golang mysql 开发语言
在Go语言中，操作SQL数据库，通常会用到一些第三方库来简化数据库的连接、查询和操作过程。其中原生的database/sql+go-sql-driver/mysql库更符合sql语句使用习惯。‌安装gogetgithub.com/go-sql-driver/mysql直接上代码来演示基本的创建，插入，更新，删除操作。packagemainimport("database/sql""fmt"_"gi
SpringBoot调用deepseek 想买CT5的小曹 spring boot 后端 java
1、效果截图：2、代码部分：application.propertiesserver.port=8080deepseek.api.token=sk-d34e929e887b4881813395241df2f745deepseek.api.url=https://api.deepseek.com/chat/completionscontroller部分请求参数可以缩短，写成实体类形式packagec
Visual Studio 2022和C++实现带多组标签的Snowflake SQL查询批量数据导出程序 weixin_30777913 c++云计算开发语言 sql 数据仓库
设计一个基于多个带标签SnowflakeSQL语句作为json配置文件的VisualStudio2022的C++代码程序，实现根据不同的输入参数自动批量地将Snowflake数据库的数据导出为CSV文件到本地目录上，标签加扩展名.csv为导出数据文件名，文件已经存在则覆盖原始文件。需要考虑SQL结果集是大数据量分批数据导出的情况，通过多线程和异步操作来提高程序性能，程序需要异常处理和输出，输出出错
Python Pandas带多组参数和标签的Snowflake数据库批量数据导出程序 weixin_30777913 pandas python 云计算数据仓库
设计一个基于多个带标签的SnowflakeSQL模板作为配置文件和多组参数的PythonPandas代码程序，实现根据不同的输入参数自动批量地将Snowflake数据库中的数据导出为CSV文件到指定目录上，然后逐个文件压缩为zip文件，标签和多个参数（以“_”分割）为组成导出数据文件名，文件已经存在则覆盖原始文件。需要考虑SQL结果集是大数据量分批数据导出的情况，通过多线程和异步操作来提高程序性能
C#带多组标签的Snowflake SQL查询批量数据导出程序 weixin_30777913 c#数据仓库云计算 sql
设计一个基于多个带标签SnowflakeSQL语句作为json配置文件的C#代码程序，实现根据不同的输入参数自动批量地将Snowflake数据库的数据导出为CSV文件到本地目录上，标签加扩展名.csv为导出数据文件名，文件已经存在则覆盖原始文件。需要考虑SQL结果集是大数据量分批数据导出的情况，通过多线程和异步操作来提高程序性能，程序需要异常处理和输出，输出出错时的错误信息，每次每个查询导出数据的
MySQL 技术浅析（聚簇索引、UndoLog、RedoLog、MVCC）代码没写完哪有脸睡觉 mysql 数据库
MySQL核心技术深度解析一、聚簇索引与非聚簇索引1.聚簇索引结构存储方式InnoDB中，聚簇索引的叶子节点直接存储完整数据行，数据按主键值物理排序存储。主键索引即数据文件，非叶子节点存储主键范围和子节点指针数据行与主键索引绑定，主键顺序决定磁盘存储顺序示例存储结构B+树结构：根节点→[id20;--索引设计为(name,age)2.事务控制建议控制事务粒度：单个事务执行时间<1秒批量操作分批次提
RabbitMQ实战（二）-消息持久化策略、事务以及Confirm消息确认方式 Java思享汇 RabbitMQ学习 RabbitMQ 消息持久化事务 confirm ack
「扫码关注我，面试、各种技术（mysql、zookeeper、微服务、redis、jvm）持续更新中～」RabbitMQ学习列表：RabbitMQ实战（一）-消息通信基本概念·在上一篇学习完RabbitMQ通信的基本概念后，我们来继续学习消息的持久化以及代码实现RabbitMQ通信。在正常生产环境运维过程中无法避免RabbitMQ服务器重启，那么，如果RabbitMQ重启之后，那些队列和交换器就会
【从零开始学习计算机科学】数据库系统（十一）云数据库、NoSQL 与 NewSQL 贫苦游商数据库学习 nosql newsql 云数据库 CAP sql
【从零开始学习计算机科学】数据库系统（十一）云数据库、NoSQL与NewSQL云数据库云服务器的服务云数据库和传统的分布式数据库的异同NoSQLNoSQL数据库的特点CAP定理NoSQL的特性NoSQL数据库的分类NoSQL的适用场景Nosql数据库实例-RedisRedis的优势MongoDBMongoDB的特点NewSQLNewSQL出现的背景NewSQL（新型分布式数据库）的概念NewSQL
MySql的MVCC实现原理 zyrr mysql mysql mvcc java
MySql的MVCC实现原理前言MVCC解决什么问题MVCC的实现3个隐式字段UndoLogReadView读视图大致流程读已提交和可重复隔离级别下的快照读前言什么是MVCC？MVCC(Multi-VersionConcurrencyControl)即多版本并发控制，是乐观锁的一种实现方式，在MySql数据库中主要是为了提高数据库的并发性能，做到读写冲突不加锁，这里的读指的是快照读。快照读与当前读
快速上手：ASP.NET Core MVC 与 EF Core 操作 MySQL 数据库完整实例殷连靖Harlan
快速上手：ASP.NETCoreMVC与EFCore操作MySQL数据库完整实例【下载地址】ASP.NETCoreMVC使用EF操作MySQL数据库完整实例ASP.NETCoreMVC使用EF操作MySQL数据库完整实例本资源提供了一套完整的示例项目，展示了如何在ASP.NETCoreMVC应用程序中使用EntityFramework(EF)来操作MySQL数据库项目地址:https://gitc
超详细Python教程——SQL详解之DDL 月流霜 python sql 数据库
SQL详解之DDL我们通常可以将SQL分为四类，分别是DDL（数据定义语言）、DML（数据操作语言）、DCL（数据控制语言）和TCL（事务控制语言）。DDL主要用于创建、删除、修改数据库中的对象，比如创建、删除和修改二维表，核心的关键字包括create、drop和alter；DML主要负责数据的插入、删除、更新和查询，关键词包括insert、delete、update和select；DCL用于授予
MySQL索引最左原则：从原理到实战的深度解析
MySQL索引最左原则：从原理到实战的深度解析一、什么是索引最左原则？索引最左原则是MySQL复合索引使用的核心规则，简单来说："当使用复合索引（多列索引）时，查询条件必须从索引的最左列开始，且不能跳过中间的列，否则索引将无法完全生效"为什么会有这个原则？这与B+树索引的存储结构密切相关：复合索引按照定义时的列顺序构建数据先按第一列排序第一列相同的情况下按第二列排序依此类推形成层级结构二、3种典型
【MySQL】MVCC详解与MVCC实现原理（MySQL专栏启动） 2401_89317296 mysql android 数据库
如果此文还不错的话，还请关注、点赞、收藏三连支持一下博主~本文目录本文导读一、什么是MVCC二、MVCC的实现原理1、MVCC多版本实现2、MVCC实现原理3、什么是ReadView3.1、ReadView解析3.2、ReadView含义3.3、ReadView如何判断版本链可用三、当前读，快照读与MVCC1、什么是当前读和快照读
SeaTunnel社区「Demo方舟计划」第2期活动上线—— MySQL同步至MySQL数据合并场景实战数据库
引言凌晨3点，某金融公司的数据工程师老王盯着屏幕上的报错信息陷入绝望——从32个分库同步的用户交易数据，在合并至中心MySQL表时，因主键冲突导致每天近10万条数据丢失。业务方投诉不断，而他能找到的解决方案，要么是REPLACEINTO性能低下，要么是INSERTIGNORE无法追溯冲突数据。这并非孤例，可能大部分新接触数据同步的工程师都会遇到以下情况：72%在分库分表合并场景中遭遇过主键冲突导致
在uni-app中使用SQLite today喝咖啡了吗 uni-app sqlite 数据库
目录1、引入sqlite模块1.1、android权限申请1.2、权限配置1.3、打包，制作自定义基座运行2、sqlite文件结构3、初始化文件index.js4、打开数据库5、查询数据6、可视化测试SQLite是一个进程内的库，实现了自给自足的、无服务器的、零配置的、事务性的SQL数据库引擎。它是一个零配置的数据库，这意味着与其他数据库不一样，您不需要在系统中配置。就像其他数据库，SQLite引
Android数据存储:SQLite、Room -风になる- Android基础 android
在Android平台上，集成了一个嵌入式关系型数据库—SQLite，SQLite3支持NULL、INTEGER、REAL（浮点数字）、TEXT(字符串文本)和BLOB(二进制对象)数据类型，虽然它支持的类型只有五种，但实际上sqlite3也接受varchar(n)、char(n)、decimal(p,s)等数据类型，只不过在运算或保存时会转成对应的五种数据类型。SQLite最大的特点是你可以把各种
Spring Cloud Alibaba RocketMQ 消息队列 AI天才研究院 Python实战自然语言处理人工智能语言模型编程实践开发语言架构设计
作者：禅与计算机程序设计艺术1.简介RocketMQ是一款开源、高性能、分布式消息中间件，它具备以下主要特征：支持海量消息堆积能力，支持发送10万+TPS，且不受单机容量限制；提供灵活的消息过滤机制，支持按照标签，SQL92标准的过滤语法进行消息过滤；丰富的消息订阅模型，包括广播消费，集群消费，事务消费等多种模式；内置丰富的管理控制台，通过WebUI来方便地对集群进行管理、监控及报警；高吞吐量，单
SQLite Truncate Table lsx202406 开发语言
SQLiteTruncateTableSQLite是一种轻量级的数据库管理系统，常用于嵌入式系统和小型应用程序。在处理数据库时，有时需要删除表中的所有数据，但保留表结构。这时，TRUNCATETABLE语句就派上用场了。本文将详细介绍SQLite中的TRUNCATETABLE语句，包括其用法、性能影响以及与DELETE语句的区别。1.简介TRUNCATETABLE语句用于删除表中的所有数据，但保留
Android 里SQLite和ROOM框架简单介绍大林不要掉头发 android 数据库
简单的AndroidSQLite使用最简单的SQLite在Android开发中，SQLite是一个轻量级的关系型数据库管理系统，经常用于存储和管理应用程序的数据。如果你刚刚学习Android数据库的使用，你一定要学习SQLite的使用。以下是一个简单的示例，展示了如何在Android应用中创建SQLite数据库、创建表、插入数据以及查询数据。创建SQLite数据库、创建表publicclassDB
Linux---sqlite3数据库磨十三数据库 linux sqlite
一、数据库分类1.按数据关系分类类型特点代表产品关系型数据库-使用SQL（结构化查询语言）-数据以行列形式存储，支持事务和复杂查询MySQL、Oracle、SQLite非关系型数据库-无固定表结构（如键值对、文档、图）-高扩展性，适合非结构化数据MongoDB、Redis2.按功能规模分类类型特点代表产品大型数据库高并发、高可用性，支持企业级应用Oracle、DB2中型数据库适用于中小型企业，跨平
【Android】使用Room数据库解决本地持久化吃汉堡吃到饱 android 数据库 jvm
【Android】使用Room数据库解决本地持久化Room概述Room是一个持久性库，属于AndroidJetpack的一部分。Room是SQLite数据库之上的一个抽象层。Room并不直接使用SQLite，而是负责简化数据库设置和配置以及与数据库交互方面的琐碎工作。此外，Room还提供SQLite语句的编译时检查。Room主要组件Room包含三个主要组件：数据实体表示应用的数据库中的表。数据实体
如何用ruby来写hadoop的mapreduce并生成jar包 wudixiaotie mapreduce
ruby来写hadoop的mapreduce，我用的方法是rubydoop。怎么配置环境呢： 1.安装rvm：不说了网上有 2.安装ruby：由于我以前是做ruby的，所以习惯性的先安装了ruby，起码调试起来比jruby快多了。 3.安装jruby： rvm install jruby然后等待安
java编程思想 -- 访问控制权限百合不是茶 java 访问控制权限单例模式
访问权限是java中一个比较中要的知识点,它规定者什么方法可以访问,什么不可以访问一:包访问权限; 自定义包: package com.wj.control; //包 public class Demo { //定义一个无参的方法 public void DemoPackage(){ System.out.println("调用
[生物与医学]请审慎食用小龙虾 comsci 生物
现在的餐馆里面出售的小龙虾,有一些是在野外捕捉的,这些小龙虾身体里面可能带有某些病毒和细菌,人食用以后可能会导致一些疾病,严重的甚至会死亡..... 所以,参加聚餐的时候,最好不要点小龙虾...就吃养殖的猪肉,牛肉,羊肉和鱼,等动物蛋白质
org.apache.jasper.JasperException: Unable to compile class for JSP: 商人shang maven 2.2 jdk1.8
环境： jdk1.8 maven tomcat7-maven-plugin 2.0 原因： tomcat7-maven-plugin 2.0 不知吃 jdk 1.8，换成 tomcat7-maven-plugin 2.2就行，即 <plugin>
你的垃圾你处理掉了吗?GC oloz GC
前序:本人菜鸟，此文研究学习来自网络，各位牛牛多指教　 1.垃圾收集算法的核心思想　　Java语言建立了垃圾收集机制，用以跟踪正在使用的对象和发现并回收不再使用(引用)的对象。该机制可以有效防范动态内存分配中可能发生的两个危险：因内存垃圾过多而引发的内存耗尽，以及不恰当的内存释放所造成的内存非法引用。　　垃圾收集算法的核心思想是：对虚拟机可用内存空间，即堆空间中的对象进行识别
shiro 和 SESSSION 杨白白 shiro
shiro 在web项目里默认使用的是web容器提供的session，也就是说shiro使用的session是web容器产生的，并不是自己产生的，在用于非web环境时可用其他来源代替。在web工程启动的时候它就和容器绑定在了一起，这是通过web.xml里面的shiroFilter实现的。通过session.getSession()方法会在浏览器cokkice产生JESSIONID，当关闭浏览器，此
移动互联网终端淘宝客如何实现盈利小桔子移動客戶端淘客淘寶App
2012年淘宝联盟平台为站长和淘宝客带来的分成收入突破30亿元，同比增长100%。而来自移动端的分成达1亿元，其中美丽说、蘑菇街、果库、口袋购物等App运营商分成近5000万元。可以看出，虽然目前阶段PC端对于淘客而言仍旧是盈利的大头，但移动端已经呈现出爆发之势。而且这个势头将随着智能终端(手机，平板)的加速普及而更加迅猛
wordpress小工具制作 aichenglong wordpress 小工具
wordpress 使用侧边栏的小工具，很方便调整页面结构小工具的制作过程 1 在自己的主题文件中新建一个文件夹(如widget)，在文件夹中创建一个php(AWP_posts-category.php) 小工具是一个类,想侧边栏一样，还得使用代码注册，他才可以再后台使用，基本的代码一层不变 <?php class AWP_Post_Category extends WP_Wi
JS微信分享 AILIKES js
// 所有功能必须包含在 WeixinApi.ready 中进行 WeixinApi.ready(function(Api) { // 微信分享的数据 var wxData = { &nb
封装探讨百合不是茶 JAVA面向对象封装
//封装属性方法将某些东西包装在一起，通过创建对象或使用静态的方法来调用，称为封装；封装其实就是有选择性地公开或隐藏某些信息，它解决了数据的安全性问题，增加代码的可读性和可维护性在 Aname类中申明三个属性，将其封装在一个类中：通过对象来调用例如 1： //属性将其设为私有姓名 name 可以公开
jquery radio/checkbox change事件不能触发的问题 bijian1013 JavaScript jquery
我想让radio来控制当前我选择的是机动车还是特种车，如下所示： <html> <head> <script src="http://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js" type="text/javascript"><
AngularJS中安全性措施 bijian1013 JavaScript AngularJS 安全性 XSRF JSON漏洞
在使用web应用中，安全性是应该首要考虑的一个问题。AngularJS提供了一些辅助机制，用来防护来自两个常见攻击方向的网络攻击。一.JSON漏洞当使用一个GET请求获取JSON数组信息的时候（尤其是当这一信息非常敏感，
[Maven学习笔记九]Maven发布web项目 bit1129 maven
基于Maven的web项目的标准项目结构 user-project user-core user-service user-web src
【Hive七】Hive用户自定义聚合函数(UDAF) bit1129 hive
用户自定义聚合函数，用户提供的多个入参通过聚合计算(求和、求最大值、求最小值)得到一个聚合计算结果的函数。问题：UDF也可以提供输入多个参数然后输出一个结果的运算，比如加法运算add(3，5)，add这个UDF需要实现UDF的evaluate方法,那么UDF和UDAF的实质分别究竟是什么？ Double evaluate(Double a, Double b)
通过 nginx-lua 给 Nginx 增加 OAuth 支持 ronin47
前言：我们使用Nginx的Lua中间件建立了OAuth2认证和授权层。如果你也有此打算，阅读下面的文档，实现自动化并获得收益。SeatGeek 在过去几年中取得了发展，我们已经积累了不少针对各种任务的不同管理接口。我们通常为新的展示需求创建新模块，比如我们自己的博客、图表等。我们还定期开发内部工具来处理诸如部署、可视化操作及事件处理等事务。在处理这些事务中，我们使用了几个不同的接口来认证： &n
利用tomcat-redis-session-manager做session同步时自定义类对象属性保存不上的解决方法 bsr1983 session
在利用tomcat-redis-session-manager做session同步时，遇到了在session保存一个自定义对象时，修改该对象中的某个属性，session未进行序列化，属性没有被存储到redis中。在 tomcat-redis-session-manager的github上有如下说明： Session Change Tracking As noted in the &qu
《代码大全》表驱动法-Table Driven Approach-1 bylijinnan java 算法
关于Table Driven Approach的一篇非常好的文章： http://www.codeproject.com/Articles/42732/Table-driven-Approach package com.ljn.base; import java.util.Random; public class TableDriven { public
Sybase封锁原理 chicony Sybase
昨天在操作Sybase IQ12.7时意外操作造成了数据库表锁定，不能删除被锁定表数据也不能往其中写入数据。由于着急往该表抽入数据，因此立马着手解决该表的解锁问题。无奈此前没有接触过Sybase IQ12.7这套数据库产品，加之当时已属于下班时间无法求助于支持人员支持，因此只有借助搜索引擎强大的
java异常处理机制 CrazyMizzz java
java异常关键字有以下几个，分别为 try catch final throw throws 他们的定义分别为 try： Opening exception-handling statement. catch： Captures the exception. finally： Runs its code before terminating
hive 数据插入DML语法汇总 daizj hive DML 数据插入
Hive的数据插入DML语法汇总1、Loading files into tables语法：1) LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]解释：1)、上面命令执行环境为hive客户端环境下： hive>l
工厂设计模式 dcj3sjt126com 设计模式
使用设计模式是促进最佳实践和良好设计的好办法。设计模式可以提供针对常见的编程问题的灵活的解决方案。工厂模式工厂模式（Factory）允许你在代码执行时实例化对象。它之所以被称为工厂模式是因为它负责“生产”对象。工厂方法的参数是你要生成的对象对应的类名称。 Example #1 调用工厂方法（带参数） <?phpclass Example{
mysql字符串查找函数 dcj3sjt126com mysql
FIND_IN_SET(str,strlist) 假如字符串str 在由N 子链组成的字符串列表strlist 中，则返回值的范围在1到 N 之间。一个字符串列表就是一个由一些被‘,’符号分开的自链组成的字符串。如果第一个参数是一个常数字符串，而第二个是type SET列，则 FIND_IN_SET() 函数被优化，使用比特计算。如果str不在strlist 或st
jvm内存管理 easterfly jvm
一、JVM堆内存的划分分为年轻代和年老代。年轻代又分为三部分：一个eden,两个survivor。工作过程是这样的：e区空间满了后，执行minor gc，存活下来的对象放入s0, 对s0仍会进行minor gc，存活下来的的对象放入s1中，对s1同样执行minor gc，依旧存活的对象就放入年老代中；年老代满了之后会执行major gc，这个是stop the word模式，执行
CentOS-6.3安装配置JDK-8 gengzg centos
JAVA_HOME=/usr/java/jdk1.8.0_45 JRE_HOME=/usr/java/jdk1.8.0_45/jre PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib export JAVA_HOME
【转】关于web路径的获取方法 huangyc1210 Web 路径
假定你的web application 名称为news,你在浏览器中输入请求路径： http://localhost:8080/news/main/list.jsp 则执行下面向行代码后打印出如下结果： 1、 System.out.println(request.getContextPath()); //可返回站点的根路径。也就是项
php里获取第一个中文首字母并排序远去的渡口数据结构 PHP
很久没来更新博客了，还是觉得工作需要多总结的好。今天来更新一个自己认为比较有成就的问题吧。最近在做储值结算，需求里结算首页需要按门店的首字母A-Z排序。我的数据结构原本是这样的： Array ( [0] => Array ( [sid] => 2885842 [recetcstoredpay] =&g
java内部类 hm4123660 java 内部类匿名内部类成员内部类方法内部类
　在Java中，可以将一个类定义在另一个类里面或者一个方法里面，这样的类称为内部类。内部类仍然是一个独立的类，在编译之后内部类会被编译成独立的.class文件，但是前面冠以外部类的类名和$符号。内部类可以间接解决多继承问题,可以使用内部类继承一个类，外部类继承一个类，实现多继承。 &nb
Caused by: java.lang.IncompatibleClassChangeError: class org.hibernate.cfg.Exten zhb8015
maven pom.xml关于hibernate的配置和异常信息如下，查了好多资料，问题还是没有解决。只知道是包冲突，就是不知道是哪个包....遇到这个问题的分享下是怎么解决的。。 maven pom: <dependency> <groupId>org.hibernate</groupId> <ar
Spark 性能相关参数配置详解－任务调度篇 Stark_Summer spark cache cpu 任务调度 yarn
随着Spark的逐渐成熟完善, 越来越多的可配置参数被添加到Spark中来, 本文试图通过阐述这其中部分参数的工作原理和配置思路, 和大家一起探讨一下如何根据实际场合对Spark进行配置优化。由于篇幅较长，所以在这里分篇组织，如果要看最新完整的网页版内容，可以戳这里：http://spark-config.readthedocs.org/，主要是便
css3滤镜 wangkeheng html css
经常看到一些网站的底部有一些灰色的图标，鼠标移入的时候会变亮，开始以为是js操作src或者bg呢，搜索了一下，发现了一个更好的方法：通过css3的滤镜方法。 html代码： <a href='' class='icon'><img src='utv.jpg' /></a> css代码： .icon{-webkit-filter: graysc