$ chkconfig --list ksm ksm 0:off 1:off 2:off 3:on 4:on 5:on 6:off $ chkconfig --list ksmtuned ksmtuned 0:off 1:off 2:off 3:on 4:on 5:on 6:off
$ sudo chkconfig --del ksm $ sudo chkconfig --del ksmtuned
we also set it disable in kernel with this patch
From: Coly Li <bosong.ly@taobao.com> Subject: kvm: set KSM_RUN_STOP by default Date: Thu Mar 1 11:43:46 CST 2012 Patch-mainline: in-house KSM spends quite a lot CPU time to scan indentified pages, this behavior has negtive performance impect for online service RT (response time). For most of server configuarion, we have much enough memory to gain better performance. Therefore set ksmd to KSM_RUN_STOP by default for better RT. Signed-off-by: Coly Li <bosong.ly@taobao.com> Signed-off-by: Liu Yuan <tailai.ly@taobao.com> --- Index: linux-2.6.32-220.4.2.el5/mm/ksm.c =================================================================== --- linux-2.6.32-220.4.2.el5.orig/mm/ksm.c 2012-03-01 11:15:36.148283052 +0800 +++ linux-2.6.32-220.4.2.el5/mm/ksm.c 2012-03-01 11:27:02.571686841 +0800 @@ -2031,7 +2031,7 @@ goto out_free2; } #else - ksm_run = KSM_RUN_MERGE; /* no way for user to start it */ + ksm_run = KSM_RUN_STOP; /* no way for user to start it */ #endif /* CONFIG_SYSFS */
$ ps -ef |grep qemu zituan 13297 13235 0 15:47 pts/0 00:00:00 grep qemu qemu 32029 1 10 Feb28 ? 18:36:57 /opt/virt/stefanha/x86_64-softmmu/qemu-system-x86_64 -S -M pc-1.0 -enable-kvm -m 4096 -smp 4,sockets=4,cores=1,threads=1 -name v134202.sqa.cm4 -uuid 6f98077f-ff01-02e0-85a5-1f04636fceb2 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/v134202.sqa.cm4.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -boot c -drive file=/store/image/v134202.sqa.cm4.img,copy-on-read=on,if=none,id=drive-virtio-disk0,format=qed -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,fd=22,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=00:16:3e:e8:86:ca,bus=pci.0,addr=0x3 -usb -vnc 0.0.0.0:1 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 qemu 32476 1 9 Feb27 ? 18:56:38 /opt/virt/stefanha/x86_64-softmmu/qemu-system-x86_64 -S -M pc-1.0 -enable-kvm -m 4096 -smp 4,sockets=4,cores=1,threads=1 -name v134175.sqa.cm4 -uuid 630d0bb6-88a8-3df0-5684-0119d3493805 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/v134175.sqa.cm4.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -boot c -drive file=/store/image/v134175.sqa.cm4.img,copy-on-read=on,if=none,id=drive-virtio-disk0,format=qed -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,fd=20,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=00:16:3e:e8:86:af,bus=pci.0,addr=0x3 -usb -vnc 0.0.0.0:0 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
$ taskset -p 32476 pid 32476's current affinity mask: ffff
Defaultly said, kvm guest process is affinity to all cpu,which will bring in high overhead ,so we need to PIN it with physical CPU.
$ sudo taskset -p 0xf 32476 pid 32476's current affinity mask: ffff pid 32476's new affinity mask: f
# cat cpumap.sh #!/bin/bash for i in `seq 0 15`; do echo -n "cpu$i " cat /sys/devices/system/cpu/cpu$i/cache/index"$1"/shared_cpu_map done
[root@kvm156205.cm4 ~]# ./cpumap.sh 0 cpu0 000101 cpu1 000202 cpu2 000404 cpu3 000808 cpu4 001010 cpu5 002020 cpu6 004040 cpu7 008080 cpu8 000101 cpu9 000202 cpu10 000404 cpu11 000808 cpu12 001010 cpu13 002020 cpu14 004040 cpu15 008080 [root@kvm156205.cm4 ~]# ./cpumap.sh 1 cpu0 000101 cpu1 000202 cpu2 000404 cpu3 000808 cpu4 001010 cpu5 002020 cpu6 004040 cpu7 008080 cpu8 000101 cpu9 000202 cpu10 000404 cpu11 000808 cpu12 001010 cpu13 002020 cpu14 004040 cpu15 008080 [root@kvm156205.cm4 ~]# ./cpumap.sh 2 cpu0 000101 cpu1 000202 cpu2 000404 cpu3 000808 cpu4 001010 cpu5 002020 cpu6 004040 cpu7 008080 cpu8 000101 cpu9 000202 cpu10 000404 cpu11 000808 cpu12 001010 cpu13 002020 cpu14 004040 cpu15 008080 [root@kvm156205.cm4 ~]# ./cpumap.sh 3 cpu0 000f0f cpu1 000f0f cpu2 000f0f cpu3 000f0f cpu4 00f0f0 cpu5 00f0f0 cpu6 00f0f0 cpu7 00f0f0 cpu8 000f0f cpu9 000f0f cpu10 000f0f cpu11 000f0f cpu12 00f0f0 cpu13 00f0f0 cpu14 00f0f0 cpu15 00f0f0
we found that:
virsh # vcpupin v156130.cm4 0 4 virsh # vcpupin v156130.cm4 1 5 virsh # vcpupin v156130.cm4 2 6 virsh # vcpupin v156130.cm4 3 7 virsh # vcpupin v156130.cm4 4 14
virsh # vcpuinfo v156130.cm4 VCPU: 0 CPU: 4 State: running CPU time: 198834.1s CPU Affinity: ----y----------- VCPU: 1 CPU: 5 State: running CPU time: 200200.1s CPU Affinity: -----y---------- VCPU: 2 CPU: 6 State: running CPU time: 200674.6s CPU Affinity: ------y--------- VCPU: 3 CPU: 7 State: running CPU time: 203948.8s CPU Affinity: -------y-------- VCPU: 4 CPU: 14 State: running CPU time: 202115.5s CPU Affinity: --------------y-
$ cat /proc/interrupts |grep LOC ; sleep 10 ; cat /proc/interrupts |grep LOC LOC: 2125467766 2189539662 2208487131 2320778293 2342894859 Local timer interrupts LOC: 2125507106 2189588412 2208547939 2320839647 2342958869 Local timer interrupts $ echo 'scale=4;(2125507106-2125467766)/10'|bc 3934.0000
oops…. there is about 4k timer interrupts per second per cpu!!!
samples pcnt function DSO _______ _____ ______________________________________________________________________________________ _____________ 690.00 25.8% finish_task_switch [kernel] 312.00 11.7% _spin_unlock_irqrestore [kernel] 262.00 9.8% tick_nohz_restart_sched_tick [kernel] 230.00 8.6% tick_nohz_stop_sched_tick [kernel] 121.00 4.5% __do_softirq [kernel] 89.00 3.3% JVM_InternString libjvm.so 46.00 1.7% inflate_fast libzip.so 45.00 1.7% schedule [kernel] 44.00 1.6% crc32 libzip.so 43.00 1.6% copy_user_generic_unrolled [kernel] 40.00 1.5% __GI_strlen libc-2.5.so 34.00 1.3% system_call_after_swapgs [kernel]
samples pcnt function DSO _______ _____ ____________________________________ ___________ 8639.00 56.0% vmx_handle_exit [kvm_intel] 1821.00 11.8% kvm_arch_sync_events [kvm] 293.00 1.9% ftrace_raw_event_kvm_mmio [kvm] 241.00 1.6% intel_idle [kernel] 187.00 1.2% _spin_unlock_irqrestore [kernel] 160.00 1.0% copy_user_generic_string [kernel] 150.00 1.0% __hrtimer_start_range_ns [kernel] 148.00 1.0% update_cfs_shares [kernel]
obviously, there is too much timer interrupt in KVM guest's tickless mode.we append this in the kernel boot command line. then reboot.
nohz=off
# echo 'yes' > /sys/kernel/mm/redhat_transparent_hugepage/khugepaged/defrag # echo 'always'> /sys/kernel/mm/redhat_transparent_hugepage/enabled # echo 'never'> /sys/kernel/mm/redhat_transparent_hugepage/defrag
<disk type='block' device='disk'> <driver name='qemu' type='raw'/> <source dev='/dev/vg_vmms/lvm-v156129.cm4'/> <target dev='vda' bus='virtio'/> </disk>
<interface type='bridge'> <mac address='00:16:3e:18:9c:81'/> <model type='virtio'/> <source bridge='bridge0'/> </interface>
<memballoon model='virtio'> <alias name='balloon0'/> </memballoon>
options igb max_vfs=7
intel_iommu=on
options kvm allow_unsafe_assigned_interrupts=1
#virsh -nodedev-dettach pci_0000_01_11_4
#qemu-kvm -enable-kvm -smp 4 -drive file=/store/debian2.img,if=virtio -m 4096 -vnc :1 -device pci-assign,host=01:11.4 -vnc :1
<hostdev mode='subsystem' type='pci'> <source> <address bus='0x05' slot='0x10' function='0x02'/> </source> </hostdev>