Summary:Besides a new version numbering scheme, Linux 3.0 also has several newfeatures: Btrfs data scrubbing and automatic defragmentation, XEN Dom0support, unprivileged ICMP_ECHO, wake on WLAN, Berkeley Packet FilterJIT filtering, a memcached-like system for the page cache, a sendmmsg()syscall that batches sendmsg() calls and setns(), a syscall that allowsbetter handling of light virtualization systems such as containers. Newhardware support has been added: for example, Microsoft Kinect, AMDLlano Fusion APUs, Intel iwlwifi 105 and 135, Intel C600serial-attached-scsi controller, Ralink RT5370 USB, several realtekrtl81xx devices or the Apple iSight webcam. Many other drivers andsmall improvements have been added.
Automatic defragmentation
COW (copy-on-write) filesystems have manyadvantages, but they also have some disadvantages, for examplefragmentation. Btrfs lays out the data sequentially when files arewritten to the disk for first time, but a COW design implies that anysubsequent modification to the file must not be written on top of theold data, but be placed in a free block, which will cause fragmentation(RPM databases are a common case of this problem). Aditionally, itsuffers the fragmentation problems common to all filesystems.
Btrfs already offers alternativas to fight thisproblem: First, it supports online defragmentation using the command"btrfs filesystem defragment". Second, it has a mount option, -onodatacow, that disables COW for data. Now btrfs adds a third option,the -o autodefrag mount option. This mechanism detects small randomwrites into files and queues them up for an automatic defrag process,so the filesystem will defragment itself while it's used. It isn'tsuited to virtualization or big database workloads yet, but works wellfor smaller files such as rpm, sqlite or bdb databases. Code: (commit)
Scrub
"Scrubbing" is the process of checking theintegrity of the data in the filesystem. This initial implementation ofscrubbing will check the checksums of all the extents in thefilesystem. If an error occurs (checksum or IO error), a good copy issearched for. If one is found, the bad copy will be rewritten. Code: (commit 1, 2)
Other improvements
-File creation/deletion speedup: The performanceof file creation and deletion on btrfs was very poor. The reason isthat for each creation or deletion, btrfs must do a lot of b+ treeinsertions, such as inode item, directory name item, directory nameindex and so on. Now btrfs can do some delayed b+ tree insertions ordeletions, which allows to batch these modifications. Microbenchmarksof file creation have been speed up by ~15%, and file deletion by ~20%.Code: (commit)
-Do not flush csum items ofunchanged file data: speeds up fsync. A sysbench workload doing "randomwrite + fsync" went from 112.75 requests/sec to 1216 requests/sec.Code: (commit)
-Quasi-round-robin for spaceallocation in multidevice setups: the chunk allocator currently alwaysallocates space on the devices in the same order. This leads to a veryuneven distribution, especially with RAID1 or RAID10 and an unevennumber of devices. Now Btrfs always sorts the devices beforeallocating, and allocates the stripes on the devices with the mostavailable space. Code: (commit)
Recvmsg() and sendmsg() are the syscalls used to receive/send data to the network. In 2.6.33, Linux added recvmmsg(),a syscall that allows to receive in a single call data that would needmultiple recvmsg() calls, improving throughput and latency for a numberof scenarios. Now, a equivalent sendmmsg() syscall has been added. Amicrobenchmark saw a 20% improvement in throughput on UDP send and 30%on raw socket send
Code: (commit)
Finally, Linux has got Xen dom0 support
Recommended LWN article: Cleancache and Frontswap
Cleancache is an optional feature that canpotentially increases page cache performance. It could be described asa memcached-like system, but for cache memory pages. It provides memorystorage not directly accessible or addressable by the kernel, and itdoes not guarantee that the data will not vanish. It can be used byvirtualization software to improve memory handling for guests, but itcan also be useful to implement things like a compressed cache.
Code: (commit), (commit)
Recommended LWN article: A JIT for packet filters
The Berkeley Packet Filter filteringcapabilities, used by tools like libpcap/tcpdump, are normally handledby an interpreter. This release adds a simple JIT that generates nativecode when filter is loaded in memory (something already done by otherOSes, like FreeBSD). Admin need to enable this feature writting "1" to /proc/sys/net/core/bpf_jit_enable
Code: (commit)
Wake on Wireless is afeature to allow the system to go into a low-power state (e.g. ACPI S3suspend) while the wireless NIC remains active and does varying thingsfor the host, e.g. staying connected to an AP or searching fornetworks. The 802.11 stack has added support for it.
Code: (commit 1, 2)
Recommended LWN article: ICMP sockets
This release makes it possible to send ICMP_ECHOmessages (ping) and receive the corresponding ICMP_ECHOREPLY messageswithout any special privileges, similar to what is implemented in Mac OS X.In other words, the patch makes it possible to implement setuid-lessand CAP_NET_RAW-less /bin/ping. Initially this functionality waswritten for Linux 2.4.32, but unfortunately it was never made public.The new functionality is disabled by default, and is enabled at bootupby supporting Linux distributions, optionally with restriction to agroup or a group range.
Code: (commit)
Recommended LWN article: Namespace file descriptors
Linux supports different namespaces for many ofthe resources its handles; for example, lightweight forms ofvirtualization such as containers or systemd-nspawshow to the virtualized processes a virtual PID different from the realPID. The same thing can be done with the filesystem directorystructure, network resources, IPC, etc. The only way to set differentnamespace configurations was using different flags in the clone()syscall, but that system didn't do things like allow to one processesto access to other process' namespace. The setns() syscall solves thatproblem-
Code: (commit 1, 2, 3, 4, 5, 6)
Recommended LWN article: Waking systems from suspend
Alarm-timers are a hybrid style timer, similarto high-resolution timers, but when the system is suspended, the RTCdevice is set to fire and wake the system for when the soonestalarm-timer expires. The concept for Alarm-timers was inspired by theAndroid Alarm driver, and the interface to userland uses the POSIXclock and timers interface, using two new clockids:CLOCK_REALTIME_ALARMand CLOCK_BOOTTIME_ALARM.
Code: (commit 1, 2)
All the driver and architecture-specific changes can be found in the Linux_3.0_DriverArch page
Cache xattr security drop check for write: benchmarking on btrfsshowed that a major scaling bottleneck on large systems on btrfs iscurrently the xattr lookup on every write, which causes an additionaltree walk, hitting some per file system locks and quite badscalability. This is also a problem in ext4, where it hits the globalmbcache lock. Caching this check solves the problem (commit)
Increase SCHED_LOAD_SCALE resolution: With this extraresolution, the scheduler can handle deeper cgroup hiearchies and dobetter shares distribution and load load balancing on larger systems(especially for low weight task groups) (commit), (commit)
Move the second half of ttwu() to the remote cpu: avoids havingto take rq->lock and doing the task enqueue remotely, saving lots oncacheline transfers. A semaphore benchmark goes from 647278 workerburns per second to 816715 (commit)
Next buddy hint on sleep and preempt path: a worst-casebenchmark consisting of 2 tbench client processes with 2 threads eachrunning on a single CPU changed from 105.84 MB/sec to 112.42 MB/sec (commit)
Make mmu_gather preempemtible (commit)
Batch activate_page() calls to reduce zone->lru_lock contention (commit)
tmpfs: implement generic xattr support (commit)
Memory cgroup controller:
Add memory.numastat api for numa statistics (commit)
Add the pagefault count into memcg stats (commit)
Reclaim memory from nodes in round-robin order (commit)
Remove the deprecated noswapaccount kernel parameter (commit)
Allow setting the network namespace by fd (commit)
Wireless
Add the ability to advertise possible interface combinations (commit)
Add support for scheduled scans (commit)
Add userspace authentication flag to mesh setup (commit)
New notification to discover mesh peer candidates. (commit)
Allow ethtool to set interface in loopback mode. (commit)
Allow no-cache copy from user on transmit (commit)
ipset: SCTP, UDPLITE support added (commit)
sctp: implement socket option SCTP_GET_ASSOC_ID_LIST (commit), implement event notification SCTP_SENDER_DRY_EVENT (commit)
bridge: allow creating bridge devices with netlink (commit), allow creating/deleting fdb entries via netlink (commit)
batman-adv: multi vlan support for bridge loop detection (commit)
pkt_sched: QFQ - quick fair queue scheduler (commit)
RDMA: Add netlink infrastructure that allows for registration of RDMA clients (commit)
BLOCK LAYER
Submit discard bio in batches in blkdev_issue_discard() - makes discarding data faster (commit)
EXT4
Enable "punch hole" functionality (recommended LWN article) (commit), (commit)
Add support for multiple mount protection (commit)
CIFS
Add support for mounting Windows 2008 DFS shares (commit)
Convert cifs_writepages to use async writes (commit), (commit)
Add rwpidforward mount option that enables a mode when CIFSforwards pid of a process who opened a file to any read and writeoperation (commit)
OCFS2
SSD trimming support (commit), (commit)
Support for moving extents (commit), (commit)
NILFS2
Implement resize ioctl (commit)
XFS
Add online discard support (commit)
caam - Add support for the Freescale SEC4/CAAM (commit)
padlock - Add SHA-1/256 module for VIA Nano (commit)
s390: add System z hardware support for CTR mode (commit), add System z hardware support for GHASH (commit), add System z hardware support for XTS mode (commit)
s5p-sss - add S5PV210 advanced crypto engine support (commit)
User Mode Linux: add earlyprintk support (commit), add ucast ethernet transport (commit)
xen: add blkback support (commit)
Allow the application of capability limits to usermode helpers (commit)
SELinux
add /sys/fs/selinux mount point to put selinuxfs (commit)
Make selinux cache VFS RCU walks safe (improves VFS performance) (commit)
perf stat: Add -d -d and -d -d -d options to show more CPU events (commit), (commit)
perf stat: Add --sync/-S option (commit)
rcu: priority boosting for TREE_PREEMPT_RCU (commit)
ulimit: raise default hard ulimit on number of files to 4096 (commit)
cgroups
remove the Namespace cgroup subsystem. It has been replaced by acompatibility flag 'clone_children', where a newly created cgroup willcopy the parent cgroup values. The userspace has to manually create acgroup and add a task to the 'tasks' file (commit)
Make 'procs' file writable (commit)
kbuild: implement several W= levels (commit)
PM/Hibernate: Add sysfs knob to control size of memory for drivers (commit)
posix-timers: RCU conversion (commit)
coredump: add support for exe_file in core name (commit)
the original link:http://kernelnewbies.org/Linux_3.0