本文转自:http://doc.opensuse.org/products/draft/SLES/SLES-tuning_sd_draft/cha.tuning.io.html
Contents
I/O scheduling controls how input/output operations will be submitted to storage. SUSE Linux Enterprise Server offers various I/O algorithms—called elevators
— suiting different workloads. Elevators can help to reduce seek operations, can prioritize I/O requests, or make sure, and I/O request is carried out before a given deadline.
Choosing the best suited I/O elevator not only depends on the workload, but on the hardware, too. Single ATA disk systems, SSDs, RAID arrays, or network storage systems, for example, each require different tuning strategies.
SUSE Linux Enterprise Server lets you set a default I/O scheduler at boot-time, which can be changed on the fly per block device. This makes it possible to set different algorithms for e.g. the device hosting the system partition and the device hosting a database.
By default the CFQ
(Completely Fair Queuing) scheduler is used. Change this default by entering the boot parameter
elevator=SCHEDULER
where SCHEDULER
is one of cfq
, noop
, or deadline
. See Section 13.2, “Available I/O Elevators” for details.
To change the elevator for a specific device in the running system, run the following command:
echoSCHEDULER
> /sys/block/DEVICE
/queue/scheduler
where SCHEDULER
is one of cfq
, noop
, or deadline
and DEVICE
the block device (sda
for example).
Default Schedulter on IBM System z | |
---|---|
On IBM System z the default I/O scheduler for a storage device is set by the device driver. |
In the following elevators available on SUSE Linux Enterprise Server are listed. Each elevator has a set of tunable parameters, which can be set with the following command:
echoVALUE
> /sys/block/DEVICE
/queue/iosched/TUNABLE
where VALUE
is the desired value for the TUNABLE
and DEVICE
the block device.
To find out which elevator is the current default, run the following command. The currently selected scheduler is listed in brackets:
jupiter:~ # cat /sys/block/sda/queue/scheduler noop deadline [cfq]
CFQ
(Completely Fair Queuing)¶CFQ
is a fairness-oriented scheduler and is used by default on SUSE Linux Enterprise Server. The algorithm assigns each thread a time slice in which it is allowed to submit I/O to disk. This way each thread gets a fair share of I/O throughput. It also allows assigning tasks I/O priorities which are taken into account during scheduling decisions (see man 1 ionice). The CFQ
scheduler has the following tunable parameters:
/sys/block/<device>
/queue/iosched/slice_idle
When a task has no more I/O to submit in its time slice, the I/O scheduler waits for a while before scheduling the next thread to improve locality of I/O. For media where locality does not play a big role (SSDs, SANs with lots of disks) setting /sys/block/
to <device>
/queue/iosched/slice_idle0
can improve the throughput considerably.
/sys/block/<device>
/queue/iosched/quantum
This option limits the maximum number of requests that are being processed by the device at once. The default value is 4
. For a storage with several disks, this setting can unnecessarily limit parallel processing of requests. Therefore, increasing the value can improve performance although this can cause that the latency of some I/O may be increased due to more requests being buffered inside the storage. When changing this value, you can also consider tuning /sys/block/
(the default value is <device>
/queue/iosched/slice_async_rq2
) which limits the maximum number of asynchronous requests—usually writing requests—that are submitted in one time slice.
/sys/block/<device>
/queue/iosched/low_latency
For workloads where the latency of I/O is crucial, setting /sys/block/
to <device>
/queue/iosched/low_latency1
can help.
NOOP
¶A trivial scheduler that just passes down the I/O that comes to it. Useful for checking whether complex I/O scheduling decisions of other schedulers are not causing I/O performance regressions.
In some cases it can be helpful for devices that do I/O scheduling themselves, as intelligent storage, or devices that do not depend on mechanical movement, like SSDs. Usually, the DEADLINE
I/O scheduler is a better choice for these devices, but due to less overhead NOOP
may produce better performance on certain workloads.
DEADLINE
¶DEADLINE
is a latency-oriented I/O scheduler. Each I/O request has got a deadline assigned. Usually, requests are stored in queues (read and write) sorted by sector numbers. The DEADLINE
algorithm maintains two additional queues (read and write) where the requests are sorted by deadline. As long as no request has timed out, the“sector” queue is used. If timeouts occur, requests from the “deadline” queue are served until there are no more expired requests. Generally, the algorithm prefers reads over writes.
This scheduler can provide a superior throughput over the CFQ
I/O scheduler in cases where several threads read and write and fairness is not an issue. For example, for several parallel readers from a SAN and for databases (especially when using “TCQ” disks). The DEADLINE
scheduler has the following tunable parameters:
/sys/block/<device>
/queue/iosched/writes_starved
Controls how many reads can be sent to disk before it is possible to send writes. A value of 3
means, that three read operations are carried out for one write operation.
/sys/block/<device>
/queue/iosched/read_expire
Sets the deadline (current time plus the read_expire value) for read operations in milliseconds. The default is 500.
/sys/block/<device>
/queue/iosched/write_expire
/sys/block/
Sets the deadline (current time plus the read_expire value) for read operations in milliseconds. The default is 500.<device>
/queue/iosched/read_expire
Most file systems (XFS, ext3, ext4, reiserfs) send write barriers to disk after fsync or during transaction commits. Write barriers enforce proper ordering of writes, making volatile disk write caches safe to use (at some performance penalty). If your disks are battery-backed in one way or another, disabling barriers may safely improve performance.
Sending write barriers can be disabled using the barrier=0
mount option (for ext3, ext4, and reiserfs), or using thenobarrier
mount option (for XFS).
Disabling barriers when disks cannot guarantee caches are properly written in case of power failure can lead to severe file system corruption and data loss. |
注:近期参加MySQL运维学习,老师推荐该文章作为学习和技术提高的扩展阅读,先记录到自己的博客中,随后慢慢消化、学习、提高。