https://www.percona.com/doc/percona-toolkit/2.2/pt-archiver.html
原文连接,有兴趣的读者可以看原文内容(译文为本人个人见解,如有异议,回复即可!转载请注明)。
NAME¶
pt-archiver - Archive rows from a MySQL table into another table or a file.
从MySQL表归档行记录到另一个表或文件中
Usage
pt-archiver [OPTIONS] –source DSN –where WHERE
pt-archiver nibbles records from a MySQL table. The –source and –dest arguments use DSN syntax; if COPY is yes, –dest defaults to the key’s value from –source.
pt-archiver从MySQL表中轻量删除记录。–source和–dest参数使用DSN语法;假如COPY成功,则–dest默认为–source中的键值
Examples
Archive all rows from oltp_server to olap_server and to a file:
pt-archiver –source h=oltp_server,D=test,t=tbl –dest h=olap_server \
–file ‘/var/log/archive/%Y-%m-%d-%D.%t’ \
–where “1=1” –limit 1000 –commit-each
Purge (delete) orphan rows from child table:
pt-archiver –source h=host,D=db,t=child –purge \
–where ‘NOT EXISTS(SELECT * FROM parent WHERE col=child.col)’
从oltp_server 到olap_server 归档所有行到文件中
从子表中清除(删除)孤立的行
RISKS
Percona Toolkit is mature, proven in the real world, and well tested, but all database tools can pose a risk to the system and the database server. Before using this tool, please:
Read the tool’s documentation
Review the tool’s known “BUGS”
Test the tool on a non-production server
Backup your production server and verify the backups
Percona Toolkit是成熟,在生产环境中得以证实并进行了充分测试。不过所有的数据库工具都有可能对系统和数据库服务造成一定的风险。在使用工具之前,应该:
阅读工具文档
审视工具已知的bug列表
在非生产环境中测试工具
备份你的生产服务器且进行验证
DESCRIPTION
pt-archiver is the tool I use to archive tables as described in http://tinyurl.com/mysql-archiving. The goal is a low-impact, forward-only job to nibble old data out of the table without impacting OLTP queries much. You can insert the data into another table, which need not be on the same server. You can also write it to a file in a format suitable for LOAD DATA INFILE. Or you can do neither, in which case it’s just an incremental DELETE.
pt-archiver用于归档表的工具如在tinyurl.com/mysql-archiving中描述的例子。其目的是低影响,在仅向前移动将旧数据从表中进行轻量删除而不对OLTP查询影响太大,你可以将数据插入到另一个表中,该表不必在统一服务器上。你也可以使用适合LOAD DATA INFILE格式的方式写入到文件中。或者你什么也可以不用做,则此时他仅仅是一个增量的DELETE
pt-archiver is extensible via a plugin mechanism. You can inject your own code to add advanced archiving logic that could be useful for archiving dependent data, applying complex business rules, or building a data warehouse during the archiving process.
pt-archiver可以通过插件的形式进行扩展。在整个归档过程中,你也可以植入自己的代码来增加高级归档逻辑使其在归档独立数据,应用用复杂业务规则或者构建数仓中变得很有用。
You need to choose values carefully for some options. The most important are –limit, –retries, and –txn-size.
一些选项中需要注意选择值,其中最重的是–limit,–retries和–txn-size参数
The strategy is to find the first row(s), then scan some index forward-only to find more rows efficiently. Each subsequent query should not scan the entire table; it should seek into the index, then scan until it finds more archivable rows. Specifying the index with the ‘i’ part of the –source argument can be crucial for this; use –dry-run to examine the generated queries and be sure to EXPLAIN them to see if they are efficient (most of the time you probably want to scan the PRIMARY key, which is the default). Even better, examine the difference in the Handler status counters before and after running the query, and make sure it is not scanning the whole table every query.
其策略是找到第一行,然后通过扫描索引使其向前找到更多的有效的行记录。每个后继查询都不应该扫描整个表;它应该检查索引,然后扫描直到找到更多可归档的行记录。使用–source参数的’i’部分指定索引对此至关重要。使用–dry-run来检查生成的查询并确保他们是否有效(更多的时候,你希望扫描的是默认的主键索引)。更好的是,在运行查询之前和之后检查Handler status counters的差异,并确保它不会在每个查询中扫描整个表。
You can disable the seek-then-scan optimizations partially or wholly with –no-ascend and –ascend-first. Sometimes this may be more efficient for multi-column keys. Be aware that pt-archiver is built to start at the beginning of the index it chooses and scan it forward-only. This might result in long table scans if you’re trying to nibble from the end of the table by an index other than the one it prefers. See –source and read the documentation on the i part if this applies to you.
你可以禁用seek-then-scan优化部分或者完全地使用–no-ascend and –ascend-first。有时对于多列键来说可能会更有效。需要注意的是,pt-archiver是构建在选择索引起始位置并向前扫描。如果您试图通过索引而不是它认可的索引来从表的末尾进行轻量删除,则可能会导致很长的表扫描。如果这适用于您,请参阅–source并阅读i部分的文档。
Percona XtraDB Cluster
pt-archiver works with Percona XtraDB Cluster (PXC) 5.5.28-23.7 and newer, but there are three limitations you should consider before archiving on a cluster:
pt-archiver使用于Percona XtraDB Cluster (PXC) 5.5.28-23.7及其最新版。但是在你归档集群之时需要考虑三个方面的限制
Error on commit
pt-archiver does not check for error when it commits transactions. Commits on PXC can fail, but the tool does not yet check for or retry the transaction when this happens. If it happens, the tool will die.
pt-archiver在提交事务之时不会检查错误。PXC的提交可能会失败,但是发生这种情况时,它还没有进行检查或者对事务进行重试。如果这样,pt-archiver将会终结。
MyISAM tables
Archiving MyISAM tables works, but MyISAM support in PXC is still experimental at the time of this release. There are several known bugs with PXC, MyISAM tables, and AUTO_INCREMENT columns. Therefore, you must ensure that archiving will not directly or indirectly result in the use of default AUTO_INCREMENT values for a MyISAM table. For example, this happens with --dest if --columns is used and the AUTO_INCREMENT column is not included. The tool does not check for this!
归档MyISAM表示可行的,但在PXC中对MyISAM的支持依然是在对应版本的试验阶段,使用PXC ,MyISAM表及自增列中有几个已知的bugs。因此,你必须确保对于MyISAM表的归档不会直接或间接使其默认的AUTO_INCREMENT值的使用。例如,在–dest中如果–columns是被使用并且AUTO_INCREMENT列并没有被包含的情况下,工具对此将不会进行检查
Non-cluster options
Certain options may or may not work. For example, if a cluster node is not also a slave, then --check-slave-lag does not work. And since PXC tables are usually InnoDB, but InnoDB doesn’t support INSERT DELAYED, then --delayed-insert does not work. Other options may also not work, but the tool does not check them, therefore you should test archiving on a test cluster before archiving on your real cluster.
某些选项可能有效。例如,假设集群节点也不是一个从属节点,则使用–check-slave-lag并不生效。由于PXC表通常使用的是InnoDB,但是InnoDB并不支持INSERT DELAYED,那么–delayed-insert也不起作用,其他选项也可能不起效,但是该工具并不会检查他们,因此你应该在归档你线上集群之时在测试集群中进行归档测试。
OUTPUT
If you specify –progress, the output is a header row, plus status output at intervals. Each row in the status output lists the current date and time, how many seconds pt-archiver has been running, and how many rows it has archived.
如果你指定了–progress则输出的是标题行,加上间隔状态的输出。在状态输出的每一行了罗列了当前的日期和时间,pt-archives已经运行了多少秒以及有多少行记录已经归档。
If you specify –statistics, pt-archiver outputs timing and other information to help you identify which part of your archiving process takes the most time.
如果你指定了 –statistics,pt-archiver输出次数及其他信息来帮助你确认在归档进程中那部分花费更多的时间。
ERROR-HANDLING
pt-archiver tries to catch signals and exit gracefully; for example, if you send it SIGTERM (Ctrl-C on UNIX-ish systems), it will catch the signal, print a message about the signal, and exit fairly normally. It will not execute –analyze or –optimize, because these may take a long time to finish. It will run all other code normally, including calling after_finish() on any plugins (see “EXTENDING”).
In other words, a signal, if caught, will break out of the main archiving loop and skip optimize/analyze.
pt-archive尝试捕捉信号及优雅的退出;例如,如果你发送信号(在UNIX-ish系统图中使用Ctrl-C),将会补货这个信号,打印关于这个信号的信息,然后正常退出。它将不会执行–analyze或者–optimize,因为这可能会导致很长时间才能完成,其将会正常的运行所有的其他代码,包括在任何插件上调用after_finish()。
OPTIONS
Specify at least one of –dest, –file, or –purge.
–ignore and –replace are mutually exclusive.
–ignore and –replace 相互排斥
–txn-size and –commit-each are mutually exclusive.
–txn-size and –commit-each相互排斥
–low-priority-insert and –delayed-insert are mutually exclusive.
–low-priority-insert and –delayed-insert相互排斥
–share-lock and –for-update are mutually exclusive.
–share-lock and –for-update 相互排斥
–analyze and –optimize are mutually exclusive.
–analyze and –optimize相互排斥
–no-ascend and –no-delete are mutually exclusive.
–no-ascend and –no-delete相互排斥
DSN values in –dest default to values from –source if COPY is yes.
–analyze
type: string
Run ANALYZE TABLE afterwards on --source and/or --dest.
Runs ANALYZE TABLE after finishing. The argument is an arbitrary string. If it contains the letter ‘s’, the source will be analyzed. If it contains ‘d’, the destination will be analyzed. You can specify either or both. For example, the following will analyze both:
--analyze=ds
See http://dev.mysql.com/doc/en/analyze-table.html for details on ANALYZE TABLE.
在完成后运行碎片整理,这参数可以使人意字符串,如果其信息中包含s,表示源将会进行碎片整理,如果包含d,则目标将会进行碎片整理,加入你两者都指定了,例如,下面将会同时进行碎片整理:
–ascend-first
Ascend only first column of index.
If you do want to use the ascending index optimization (see --no-ascend), but do not want to incur the overhead of ascending a large multi-column index, you can use this option to tell pt-archiver to ascend only the leftmost column of the index. This can provide a significant performance boost over not ascending the index at all, while avoiding the cost of ascending the whole index.
See “EXTENDING” for a discussion of how this interacts with plugins.
如果你想用升序索引进行优化(查阅–no-ascend),而不想引起在打的多列索引终端升序索引的升高,这个选项将告诉pt-archiver告诉升序仅适用索引的最左列, 这可以在不提升索引的同时提供显著的性能提升,同时避免提升整个索引的成本
–ask-pass
Prompt for a password when connecting to MySQL.
连接MySQL之时提示密码
–buffer
Buffer output to --file and flush at commit.
Disables autoflushing to --file and flushes --file to disk only when a transaction commits. This typically means the file is block-flushed by the operating system, so there may be some implicit flushes to disk between commits as well. The default is to flush --file to disk after every row.
The danger is that a crash might cause lost data.
The performance increase I have seen from using --buffer is around 5 to 15 percent. Your mileage may vary.
缓冲区输出到文件且在提交之时进行刷新
仅在事务提交时禁用自动刷新到文件和将文件刷新到磁盘中。意味着文件通过操作系统以块的方式进行刷新,所以在提交之时可能会有一些隐式的刷新到磁盘中,默认的是每一行之后都会将文件到磁盘进行刷新
存在的风险是在宕机之后可能会丢失数据
在使用其参数后看到的性能提示是在5%-15%之间,每个人的环境可能会不一样。
–bulk-delete
Delete each chunk with a single statement (implies --commit-each).
用单个语句删除每一个数据块(隐含–commit-each)
Delete each chunk of rows in bulk with a single DELETE statement. The statement deletes every row between the first and last row of the chunk, inclusive. It implies --commit-each, since it would be a bad idea to INSERT rows one at a time and commit them before the bulk DELETE.
使用单个DELETE语句批量删除每个行块。 该语句删除块的第一行和最后一行之间的每一行。 它意味着–commit-each,因为一次插入一行并在批量DELETE之前提交它是一个糟糕的建议。
The normal method is to delete every row by its primary key. Bulk deletes might be a lot faster. They also might not be faster if you have a complex WHERE clause.
通常的方法是通过主键来删除每一行,批量删可能会更快,但是如果你有一个复杂的where条件的话,他们未必很快。
This option completely defers all DELETE processing until the chunk of rows is finished. If you have a plugin on the source, its before_delete method will not be called. Instead, its before_bulk_delete method is called later.
该选项完全地延时了所有DELETE进程直到行的块完成。如果你在源库上有插件,before_delete method将不会被调用,反而,before_bulk_delete方法随后会被调用
WARNING: if you have a plugin on the source that sometimes doesn’t return true from is_archivable(), you should use this option only if you understand what it does. If the plugin instructs pt-archiver not to archive a row, it will still be deleted by the bulk delete!
警告:如果你在源库有一个插件从is_archivable()有时候并不能返回true,如果你明白它能做什么你才可以适用此选项,如果插件指引pt-archiver不归档行,则批量删除依然会删除它。
–[no]bulk-delete-limit
default: yes
Add --limit to --bulk-delete statement.
This is an advanced option and you should not disable it unless you know what you are doing and why! By default, --bulk-delete appends a --limit clause to the bulk delete SQL statement. In certain cases, this clause can be omitted by specifying --no-bulk-delete-limit. --limit must still be specified.
这是一个高级选项,你不应该去禁用它,除非你知道你在做什么以及为什么?默认的,–bulk-delete附加一个–limit子句来批量删除SQL语句。在某些情况下,可以通过指定–no-bulk-delete-limit来省略此子句,–limit必须仍然被指定。