原文链接参见:http://lwn.net/Articles/357658/
(前面几段讲google对linux kernel代码的管理及跟进,偏细碎,不翻译了)
在google为linux加入的代码中,3/4是对内核核心的改动,设备驱动代码只是其中相对较小的一部分。
(linux发展到现在这个阶段,需要加入的新的设备驱动已经越来越少了)
如果google要与linux社区的合作开发,那将面临一系列问题。跟上linux代码的主干太难――它的代码更新的太快了。在一个大型项目里,开发者对补丁的提交、重改确实是个问题。Alan Cox对此的回答十分简单:人总是贪得无厌的,但有时候就应该简单的对他们说”不“。
(Alan Cox是linux kernel的二号功臣,现已加入Intel公司。我觉得Intel这样的CPU公司很适合内核开发者)
在CPU调度上,google发现想改用新的cfs(“完全公平调度”,由Con Kolivasy在2.6.23中加入内核)非常麻烦。由于太麻烦,google不得不倒回去把O(1) sheduler(2.6.23前内核使用的调度算法)移植到2.6.26上,一切才能运转起来。内核对sched_yield()语义的更改也造成了麻烦,尤其当google使用用户态锁时。高优先级的线程会对服务器的负载均衡(这里的负载均衡指的是一台服务器上对个CPU对多任务的分配处理,不是指分布式)造成影响,哪怕这些线程只是运行很短的时间。而负载均衡很重要:google通常在16-32核的服务器上跑5000个线程(好诡异的用法!)。
在内存管理上,新的linux内核改变了对脏数据的管理,导致出现了大量主动的写回操作(脏数据要写回硬盘)。系统很容易出现这种局面:kswaped(swap进程)会产生大量小的I/O操作,塞满块设备的请求队列,结果造成别的写回无法完成(写回“饥饿”);这个问题已经通过 “per-DBI写回机制”补丁在2.6.32内核中解决了。
(per-DBI的主要原理是块设备不再只有一个等待队列,而是多个,每个“硬盘轴”一个队列,因为硬盘轴是一个硬件上的真正的工作单位。这样,对装配多个硬盘的服务器会有很好的I/O性能。不过我个人猜测,如果能把kswaped的小请求合并,是否也能提高性能呢?)
如上所述,google在系统中启动很多现成――不寻常的用法。他们发现如果向一个大的线程组发信号,会造成运行队列锁的大量竞争。google 还发现mmap_sem信号量(是内核结构 struct mm_struct中用来保护mmap空间的内核信号量)有竞争问题;一个睡眠的读者会阻塞写者,而这个写者又阻塞了其他读者,最后造成系统死锁。内核应该修改,拿到信号量以后不要再等待I/O。
(google所说的信号对线程组造成的问题估计是“惊群效应”,就是很多任务睡在一个队列上,一个唤醒操作会造成他们都突然醒来,结果必然是资源拥挤。我个人认为这不是linux的问题,这是google使用linux的方法太奇特了,所以内核开发者没有注意到)
google大量使用了OOM killer来减轻高负载服务器的负担。这样做有一定的麻烦,当拥有锁的进程被OOM杀掉时(锁并不会释放,结果就阻塞了别的任务)。Mike(演讲人)很想知道为什么kernel费那么大劲搞出OOM killer来,而不是简单的在分配内存失败后返回一个错误。
(不光Mike,大家都有这样的疑问,估计答案只能在内核邮件组里找了。而google所说的那个“进程被杀锁却没释放、造成阻塞”的问题,yahoo在freebsd-4.11的时代就已经解决了,用了很巧很轻量级的办法。大家都觉得google的技术最牛,其实公正的说,牛公司牛人很多,只是大家没他那么高调而已。但对国内来说,能通过改进内核来提高服务器的公司,也真是凤毛麟角了。)
(此外略去一段google对内核开发工作的分类,看不太懂)
google增加了一种SCHED_GIDLE的调度类,是真正的空闲类;如果没有CPU供使用,属于此类的任务就彻底不运行(甚至不参与对 CPU的抢夺)。为了避免“优先级反转”问题,SCHED_GIDLE类的进程在睡眠时(此处指内核睡眠,不是系统调用sleep造成的睡眠)会临时提高优先级。网络由HTB排队规则管理,配有一组流量控制逻辑。对硬盘来说,它们按linux的I/O调度来工作。
(假设三个进程A、B、C,优先级为A>B>C,假设C先运行,占了一个重要的共享资源,A也想要这个资源,所以A等待C完成,但由于B的优先级比C高,结果C还没完成就调度到B运行了,这样总的来看,B的运行先于A,尽管A的优先级比B高。这就是“优先级反转”问题。通常的解决方法是:谁占了重要的共享资源,谁就临时提升自己的优先级,比如C占了资源后优先级临时升到和A一样高,释放资源后再把优先级降回来。说白了,占用资源的一伙人,他们最好有相同的优先级,不然会有麻烦)
除了这些,google还有很多代码用于监控。监控所有的硬盘和网络流量,记录之,用于后期对运维的分析。google在内核加了很多钩子,这样他们就能把所有的硬盘I/O情况返回给应用程序――包括异步的写回I/O。当Mike被问到他们是否使用跟踪点时,回答是“是的”,但是,自然的,google使用的是自己的一套跟踪方法。
google内核改进在2010年还有很多重要的目标:
google对CPU限制功能很兴奋,通过此功能,就能给“低延时任务”较高的优先级,而不必担心这些任务把霸占整个系统。
基于RPC的CPU任务调度,这包括监控RPC入口流量,以决定唤醒哪一个进程。(这很有分布式OS的味道)
延迟调度。对很多任务来说,延迟并不是什么大不了的事。但当RPC消息到来时,内核却尝试去运行所有这些任务;这些消息不会分布到不同的CPU上(意思就是处理这些请求的服务进程可能就在某几个CPU上运转),这造成了系统负载在CPU间分配的问题。所以任务应该能被标为“延迟调度”;当被唤醒后,这些任务并不会被直接放到运行队列上,而是等待,知道全局的CPU负载调度完成。
插入空闲周期。高级电源管理使google能够把服务器用到接近烧毁的边缘――但是不超过这个边缘。
更好的内存管理已经列入计划,包括统计内核的内存使用。
“离线内存”。Mike强调想买到便宜好用的内存是越来越难。所以google需要能够把坏内存标出的功能。HWPOISON兴许可以帮到他们。
在网络方面,google希望改进对“接收端缩放”的支持――即把输入流量导到指定的队列。google还需要统计软件中断次数然后转给指定的任务――网络进程通常包含大量的软中断。google已经在拥塞控制上做了很多工作;他们开发了“网络不安全”的拥塞算法,该算法在google的数据中心运转良好。名为“TCP pacing”的算法放慢了服务器的输出流量,避免了交换机过载。
(自己管理数据中心的公司就是不一样,网络优化做得很细)
在存储方面,google花了大量精力降低块设备层的瓶颈,这样就能更好的使用高速flash。在块设备层通过flash来提高存储效率已经列入了开发计划。google考虑在内核里增加flash转换层,但大家建议google还是把操作flash的逻辑直接放入文件系统层更好一些。
Mike最后总结了几个“感兴趣的问题”。其中一个是google希望把文件系统的元数据固定在内存里。目的是减少I/O请求的时间。从硬盘读取一个块的时间是已知的,但是如果元数据不在内存中,就会有不只一个I/O操作被执行(会有新的I/O操作用来读元数据)。这样就减慢了读文件的速度。google 目前对此问题的解决方案是直接从原始块设备读取数据到用户空间(估计是用O_DIRECT,然后自己在用户态管理元数据缓存,这样就不会使用系统的 cache),但以后不想再这么做了。
(不知道google所说的filesystem metadata到底指哪些数据,因为不同的文件系统,metadata也很不同,既然google说要把这些数据固定在内存里,估计应该不大,那自己缓存有什么不好?希望有机会可以问问Mike)
还有一个问题,使用fadvise会降低系统调用的性能。目前还不清楚问题的细节。
google的这个演讲很成功,linux社区也从自己最大的客户那里学到了不少东西。如果google更加面向linux社区的计划能够付诸行动,那linux将会拥有一个更好的kernel。
(注:google可算是IT公司中使用linux最多的,也很可能是使用得最深的,30个人的内核开发队伍,非常可观。看看国内,很少公司、很少人为开源做过贡献,唉,说起来惭愧,在下也是之一啊。)
There may be no single organization which runs more Linux systems thanGoogle. But the kernel development community knows little about how Googleuses Linux and what sort of problems are encountered there. Google's MikeWaychison traveled to Tokyo to help shed some light on thissituation; the result was an interesting view on what it takes to run Linuxin this extremely demanding setting.
Mike started the talk by giving the developers a good laugh: it seems thatGoogle manages its kernel code with Perforce. He apologized for that.There is a single tree that all developers commit to. About every 17months, Google rebases its work to a current mainline release; what followsis a long struggle to make everything work again. Once that's done,internal "feature" releases happen about every six months.
This way of doing things is far from ideal; it means that Google lags farbehind the mainline and has a hard time talking with the kernel developmentcommunity about its problems.
There are about 30 engineers working on Google's kernel. Currently theytend to check their changes into the tree, then forget about them for thenext 18 months. This leads to some real maintenance issues; developersoften have little idea of what's actually in Google's tree until it breaks.
And there's a lot in that tree. Google started with the 2.4.18 kernel -but they patched over 2000 files, inserting 492,000 lines of code. Amongother things, they backported 64-bit support into that kernel. Eventuallythey moved to 2.6.11, primarily because they needed SATA support. A2.6.18-based kernel followed, and they are now working on preparing a2.6.26-based kernel for deployment in the near future. They are currentlycarrying 1208 patches to 2.6.26, inserting almost 300,000 lines of code.Roughly 25% of those patches, Mike estimates, are backports of newerfeatures.
There are plans to change all of this; Google's kernel group is trying toget to a point where they can work better with the kernel community.They're moving to git for source code management, and developers willmaintain their changes in their own trees. Those trees will be rebased tomainline kernel releases every quarter; that should, it is hoped, motivatedevelopers to make their code more maintainable and more closely alignedwith the upstream kernel.
Linus asked: why aren't these patches upstream? Is it because Google isembarrassed by them, or is it secret stuff that they don't want todisclose, or is it a matter of internal process problems? The answer wassimply "yes." Some of this code is ugly stuff which has been carriedforward from the 2.4.18 kernel. There are also doubts internally about howmuch of this stuff will be actually useful to the rest of the world. But,perhaps, maybe about half of this code could be upstreamed eventually.
As much as 3/4 of Google's code consists of changes to the core kernel;device support is a relatively small part of the total.
Google has a number of "pain points" which make working with the communityharder. Keeping up with the upstream kernel is hard - it simply moves toofast. There is also a real problem with developers posting a patch, thenbeing asked to rework it in a way which turns it into a much largerproject. Alan Cox had a simple response to that one: people will alwaysask for more, but sometimes the right thing to do is to simply tell them"no."
In the area of CPU scheduling, Google found the move to the completely fairscheduler to be painful. In fact, it was such a problem that they finallyforward-ported the old O(1) scheduler and can run it in 2.6.26. Changes inthe semantics of sched_yield() created grief, especially with theuser-space locking that Google uses. High-priority threads can make a messof load balancing, even if they run for very short periods of time. Andload balancing matters: Google runs something like 5000 threads on systemswith 16-32 cores.
On the memory management side, newer kernels changed the management ofdirty bits, leading to overly aggressive writeout. The system could easilyget into a situation where lots of small I/O operations generated by kswapdwould fill the request queues, starving other writeback; this particularproblem should be fixed by the per-BDI writeback changes in2.6.32.
As noted above, Google runs systems with lots of threads - not an uncommonmode of operation in general. One thing they found is that sending signalsto a large thread group can lead to a lot of run queue lock contention.They also have trouble with contention for the mmap_sem semaphore;one sleeping reader can block a writer which, in turn, blocks otherreaders, bringing the whole thing to a halt. The kernel needs to be fixedto not wait for I/O with that semaphore held.
Google makes a lot of use of the out-of-memory (OOM) killer to pare backoverloaded systems. That can create trouble, though, when processesholding mutexes encounter the OOM killer. Mike wonders why the kerneltries so hard, rather than just failing allocation requests when memorygets too tight.
So what is Google doing with all that code in the kernel? They try veryhard to get the most out of every machine they have, so they cram a lot ofwork onto each. This work is segmented into three classes: "latencysensitive," which gets short-term resource guarantees, "production batch"which has guarantees over longer periods, and "best effort" which gets noguarantees at all. This separation of classes is done partly through the separation of eachmachine into a large number of fake "NUMA nodes." Specific jobs are thenassigned to one or more of those nodes. One thing added by Google is"NUMA-aware VFS LRUs" - virtual memory management which focuses on specificNUMA nodes. Nick Piggin remarked that he has been working on somethinglike that and would have liked to have seen Google's code.
There is a special SCHED_GIDLE scheduling class which is a truly idleclass; if there is no spare CPU available, jobs in that class will not runat all. To avoid priority inversion problems, SCHED_GIDLE processes havetheir priority temporarily increased whenever they sleep in the kernel (butnot if they are preempted in user space). Networking is managed with theHTBqueueing discipline, augmented with a bunch of bandwidth controllogic. For disks, they are working on proportional I/O scheduling.
Beyond that, a lot of Google's code is there for monitoring. They monitorall disk and network traffic, record it, and use it for analyzing theiroperations later on. Hooks have been added to let them associate all diskI/O back to applications - including asynchronous writeback I/O. Mike wasasked if they could use tracepoints for this task; the answer was "yes,"but, naturally enough, Google is using its own scheme now.
Google has a lot of important goals for 2010; they include:
Mike concluded with a couple of "interesting problems." One of those isthat Google would like a way to pin filesystem metadata in memory. Theproblem here is being able to bound the time required to service I/Orequests. The time required to read a block from disk is known, but if therelevant metadata is not in memory, more than one disk I/O operation may berequired. That slows things down in undesirable ways. Google is currentlygetting around this by reading file data directly from raw disk devices inuser space, but they would like to stop doing that.
The other problem was lowering the system call overhead for providingcaching advice (with fadvise()) to the kernel. It's not clearexactly what the problem was here.
All told, it was seen as one of the more successful sessions, with thekernel community learning a lot about one of its biggest customers. IfGoogle's plans to become more community-oriented come to fruition, theresult should be a better kernel for all.
Next: Performance regressions
Perforce
Posted Oct 21, 2009 4:35 UTC (Wed) by SLi (subscriber, #53131) [Link]
Perforce
Posted Oct 21, 2009 5:19 UTC (Wed) by bradfitz (subscriber, #4378) [Link]
So with stuff like that, or git-svn, if you're going to have a blessed
"central" repo anyway, who really cares if that repo is actually git, svn,
perforce, etc, as long as you can use your DVCS of choice at the edge?
The alternative is changing years of accumulated tools & checks every time a
new VCS comes out and you change your master repo's storage format.
*shrug*
Perforce
Posted Oct 21, 2009 22:02 UTC (Wed) by ianw (subscriber, #20143) [Link]
Although everyone always talks about getting rid of it, there is so much build, QA and release infrastructure built around it, I can't fathom it could ever happen. But, using git wrappers, us developers can pretty much forget that it's even there :)
Perforce
Posted Oct 21, 2009 6:07 UTC (Wed) by dlang (✭ supporter ✭, #313) [Link]
Perforce handles this well (for all it's shortcomings in other areas)
Perforce
Posted Oct 21, 2009 7:46 UTC (Wed) by epa (subscriber, #39769) [Link]
Git sounds like it should cope well with large objects in the repository, but the general view is that it doesn't perform so well. I wonder why not.
Perforce
Posted Oct 21, 2009 9:25 UTC (Wed) by cortana (subscriber, #24596) [Link]
Fortunately, a short trip to #git revealed the cause of the problem: git compresses objects
before sending them to a remote repository; it simply ran out of virtual memory while
compressing some of the larger files.
There were two fixes.
1. Use a 64-bit version of git. I'd be happy to, but there isn't an x64 binary download available
from the msysgit web site.
2. Tell git not to perform the compression; 'echo * -delta > .git/info/attributes'. Somewhat
undocumented, but at least I will be able to search for this LWN comment if I ever run into this
problem again. :)
-delta
Posted Oct 21, 2009 23:31 UTC (Wed) by joey (subscriber, #328) [Link]
Looks to me to make large object commits fast, but git pull will still
compress the objects, and still tends to run out of memory when they're
large.
-delta
Posted Oct 21, 2009 23:51 UTC (Wed) by cortana (subscriber, #24596) [Link]
Presumably git-pull running out of memory would be a server-side issue? And in that case, if you're not running a sensible 64-bit operating system on your server then you deserve what you get... ;)
Perforce
Posted Oct 21, 2009 12:13 UTC (Wed) by dlang (✭ supporter ✭, #313) [Link]
git mmaps the files to access them, and the pack definition is limited to no more than 4G (and since the over-the-wire protocol for download is the same as a pack file you run into limits there)
4G is _huge_ for source code. especially with the compression that git does, but when you start storing binaries that don't diff against each other the repository size can climb rapidly.
this has been documented several times, but it seems to be in the category of 'interesting problem, we should do something about that someday, but not a priority' right now
Perforce
Posted Oct 21, 2009 14:30 UTC (Wed) by drag (subscriber, #31333) [Link]
Pretty shitty at everything else. Its too bad because I'd like to use it for synchronizing my desktop.
Perforce
Posted Nov 1, 2009 18:54 UTC (Sun) by mfedyk (guest, #55303) [Link]
Perforce
Posted Oct 21, 2009 9:22 UTC (Wed) by nix (subscriber, #2304) [Link]
Perforce
Posted Oct 21, 2009 11:54 UTC (Wed) by jonth (subscriber, #4008) [Link]
First, some background. The company had grown over the course of it's first three years to a two-site, 100 person company. Up to that point, we had used CVS with a bunch of bespoke scripts to handle branches and merges. We used GNATs (I _really_ wouldn't recommend this) as our bug tracking system. These decisions had been taken somewhat by default, but by 2005 it was clear that we needed something new that would scale to our needs in the future. Our requirements were
a) Integration of bug tracking and source control management. For us, we felt that it was vital to understand that SCM is only half of the problem. I think that this tends to be overlooked in non-commercial environments.
b) Scalable to multi-site and 100s of users.
c) Ease of use.
d) Support.
e) Stability.
f) Speed.
g) Windows/Linux support. We're pridominantly a Linux shop, but we have teams who write Windows drivers.
We looked at the following systems (in no particular order):
a) git. Git had been active for about 6 months when we started looking at it. We liked the design principles, but at that time there was no obvious way to integrate it into an existing bug tracking system. It also had no GUI then (although I'm a confirmed command line jockey, a GUI for these things definitely improves productivity) and there was no Windows version of git. Finally, the underlying storage was still somewhat in flux, and all in all, it seemed just too young to risk the future of the company on it.
b) Mercurial. Many of the problems we had with git also applied to Mercurial. However, even then it did integrate with Trac, so we could have gone down that route. In the end, like git, it was just too new to risk.
c) Clearcase/Clearquest. Too slow, too expensive, and rubbish multi-site support.
d) Bitkeeper. Nice solution, but we were scared of the "Don't piss Larry off" license.
e) Perforce/Bugzilla. Provided "out of the box" integration with Bugzilla, worked pretty well with multi-site using proxies, had a nice GUI, scaled well, was stable (our major supplier had used it for a few years), had client versions for Windows and Linux, and was pretty quick, too.
f) MKS. No better than CVS.
g) SVN. In many ways, similar to Perforce in terms of how it is used. In fact, one part of the company decided to use SVN instead of Perforce. However, this lasted for about 6 months. I don't know the details but due to some technical difficulties, they gave up and moved over to Perforce.
All in all, Perforce integrated with a customized version of Bugzilla, while not perfect (git/mercurial/bk's model of how branches work is more sensible I think), gave us the best fit to our needs. We now have ~200 users spread all over the world, with no real performance problems. The bug tracking integration works well. Perforce's commercial support is responsive and good, we've never lost any data and we can tune the whole system to our needs.
If we had to revisit the decision, it's possible that Mercurial/Trac would have fared better, but to be honest the system we chose has stood the test of time and so there is no reason to change.
Perforce
Posted Oct 21, 2009 12:16 UTC (Wed) by ringerc (subscriber, #3071) [Link]
BDB is great if used in the (optional) transactional mode on an utterly stable system where nothing ever goes wrong. In other words, in the real world I like to see its use confined to throw-away databases (caches, etc).
I've been using SVN with the fsfs backend for years both personally and in several OSS projects and I've been very happy. Of course, the needs of those projects and my personal needs are likely quite different to your company's.
Perforce
Posted Oct 21, 2009 19:21 UTC (Wed) by ceswiedler (subscriber, #24638) [Link]
People use Perforce because it works very well for centralized version control, and that's what a lot of companies need. It enforces user security, integrates with a lot of other software, can be backed up centrally, and has a lot of very good tools. On the other hand, it doesn't scale as well as DVCSs do, and can't be used offline.
Git it
Posted Oct 21, 2009 21:11 UTC (Wed) by man_ls (guest, #15091) [Link]
Git is lightning fast (at least for code, I don't know for binaries), it's distributed and (surprise surprise) it's addictive! The cycle of 'commit, commit, commit, push when you're ready' is amazingly productive. I'm using it in my first project as single developer and I wouldn't change it for anything else I've used -- including cvs, svn, ClearCase, AccuRev and a few others too shitty to mention.Git it
Posted Oct 31, 2009 4:55 UTC (Sat) by Holmes1869 (guest, #42043) [Link]
That being said, I feel that some of the git features will only ever be used by people that take source control seriously. The people I work with check-in code without commit messages, mistakenly commit files that they forgot they changed (or other random files that ended up in their sandbox), and don't ever perform a simple 'svn diff' (or Subclipse comparison) just to make sure they are checking in what they want. Do you think these people care that they can re-order or squash a commit to create a single pristine, neat, atomic commit to fix exactly one particular bug? Probably not unfortunately. I hope to one day work with people that do care.
Perforce
Posted Oct 22, 2009 7:38 UTC (Thu) by cmccabe (guest, #60281) [Link]
I've worked with perforce, subversion, and git in the past. The three systems all have very different philosophies.
perforce has some abilities that none of the other systems have. When you start editing a file, it tells you who else has it open. You can look at their changes, too.
Both perforce and subversion can check out part of a repository without checking out the whole thing. Git can't do this. Arguably, you should use git subprojects to solve this problem. I've never actually done that, so I don't know how well it works.
Of course, git allows you to work offline, which neither perforce nor subversion can do. git also allows you to merge changes from one client to another ("branch," in git lingo). I've definitely been frustrated in the past by having to manually port changes from one perforce client to another-- even wrote scripts to automate it. What a waste.
"p4 merge" is a powerful command, much more powerful than "svn copy." p4 preserves the "x was integrated into y" relationships between files, whereas svn does not. Imagine a company that has branches for product 1.0, 2.0, and 3.0. It periodically integrates changes from 1.0 into 2.0, and 2.0 into 3.0. In this situation, the relative lack of sophistication of svn copy is a real Achilles heel. Imagine how much pain renaming a file in version 2.0 causes for the hapless svn copy user. Each time the build monkey does the integration from 1.0 to 2.0, he has to remember the files that were renamed. Except that with perforce, the system remembers it for him.
git I think has heuristics to detect this sort of thing. In general git was built from the ground up to do merging on a massive basis.
perforce also has excellent Windows support, a pile of GUI tools, and was about a dozen years earlier to the party. git and svn are catching up with these advantages, but it will take some time.
C.
Perforce
Posted Oct 22, 2009 19:17 UTC (Thu) by dsas (subscriber, #58356) [Link]
Perforce
Posted Oct 30, 2009 21:57 UTC (Fri) by lkundrak (subscriber, #43452) [Link]
Perforce
Posted Oct 29, 2009 3:05 UTC (Thu) by tutufan (guest, #60063) [Link]
Wow. I can almost hear the punch card reader in the background. Talk about an obsolete mindset. If I'm editing file X, do I really want to know whether somebody, somewhere, working on some idea that I have no idea about, is trying out something that also somehow involves file X, something that ultimately may never see the light of day? No.
If we get to the point of merging, I think about it then (if necessary).
Perforce
Posted Nov 4, 2009 21:44 UTC (Wed) by jengelh (subscriber, #33263) [Link]
Sure you could split it up, but uh, all too tightly integrated. Should anything go git in a future, I would guess all repositories will start with a fresh slate.
KS2009: How Google uses Linux
Posted Oct 21, 2009 9:13 UTC (Wed) by sdalley (subscriber, #18550) [Link]
Wow. Who actually wants to buy memory that doesn't work, or flips random bits on an off-day?
I'd prefer my code, documents and calculations not to be twiddled with behind my back, thank you.
Reminds me of a new improved version of an HP text editor I used once, backalong. We were maintaining a large Fortran program which started misbehaving in odd ways, then stopped working altogether. Turned out that each time you saved the file, the editor would lose a random line or two.
KS2009: How Google uses Linux
Posted Oct 21, 2009 9:37 UTC (Wed) by crlf (guest, #25122) [Link]
> Wow. Who actually wants to buy memory that doesn't work, or flips random bits on an off-day?The issue is one of probability and large numbers.. Memory errors are already common today, and the continued increase in density will not help matters tomorrow.
KS2009: How Google uses Linux
Posted Oct 21, 2009 12:28 UTC (Wed) by sdalley (subscriber, #18550) [Link]
So, one chance in three per annum of suffering a memory error on a given machine, roughly.
With ECC memory which Google use as standard, 19 out of 20 of these errors will be transparently corrected.
With non-ECC memory, as in commodity PCs, stiff biscuit every time.
KS2009: How Google uses Linux
Posted Oct 21, 2009 21:08 UTC (Wed) by maney (subscriber, #12630) [Link]
You imply that denser chips will cause higher error rates, but that is not what they found:We studied the effect of chip sizes on correctable and un-correctable errors, controlling for capacity, platform (dimmtechnology), and age. The results are mixed. When two chipconfigurations were available within the same platform, capacity and manufacturer, we sometimes observed an increasein average correctable error rates and sometimes a decrease.
There were other, also mixed, differences when comparing only memory module sizes, but that mixes together differences in chip density and number of chips on the module - and quite possibly chip width as well.
The best we can conclude therefore is that any chip size effect is unlikely to dominate error rates given that the trendsare not consistent across various other confounders such asage and manufacturer.
Which, I think, summarizes decades of experience that refuted various claims that the ever-shrinking memory cells just had to lead to terrible error problems. I may still have an old Intel whitepaper on this from back in the days when chips sizes were measured in Kbits.
KS2009: How Google uses Linux
Posted Oct 21, 2009 12:30 UTC (Wed) by nye (guest, #51576) [Link]
Anyone who wants to buy real memory that exists in the physical world, really.
KS2009: How Google uses Linux
Posted Nov 7, 2009 21:01 UTC (Sat) by jlin (guest, #61855) [Link]
KS2009: How Google uses Linux
Posted Oct 21, 2009 14:23 UTC (Wed) by cma (subscriber, #49905) [Link]
What a waste of resources...Google could just work tied with the kernel community.Come on Google what are you waiting for?Besides this fact, if linux kernel code is GPLv2 why don't they release their code and respect GPLv2 license terms?KS2009: How Google uses Linux
Posted Oct 21, 2009 14:36 UTC (Wed) by drag (subscriber, #31333) [Link]
So the GPL is pretty irrelevant.
So it is just a business case of whether working with the kernel community is
going to be more profitable or not. And so far they decided that taking care
of stuff internally is a better approach. Maybe that will change.
GPL doesn't require, but maintenance kills you
Posted Oct 21, 2009 15:00 UTC (Wed) by dwheeler (guest, #1216) [Link]
Correct, the GPL doesn't require the release of this internal source code. However, the GPL does have an effect (by intent): Google cannot take the GPL'ed program, modify it, and sell the result as a proprietary program. Thus, what Google is doing is almost certainly wasting its own resources, by trying to do its own parallel maintenance. They could probably save a lot of money and time by working with the kernel developers; it's a short-term cost for long-term gain. And as a side-effect, doing so would help all other kernel users.
There's probably some stuff that will stay Google-only, but if they worked to halve it, they'd probably save far more than half their money.Google can do this, in spite of its long-term inefficiencies, because they have a lot of money... but that doesn't mean it's the best choice for them or anyone else.
Appliance kernel source?
Posted Oct 21, 2009 15:11 UTC (Wed) by dmarti (subscriber, #11625) [Link]
If you buy a Google Search Appliance, you should be able to request a copy of the source code to any GPL software on it. (Could be that they're maintaining a whole extra kernel for the GSA, though.)Appliance kernel source?
Posted Oct 21, 2009 18:17 UTC (Wed) by ncm (subscriber, #165) [Link]
Appliance kernel source?
Posted Oct 27, 2009 9:13 UTC (Tue) by dsommers (subscriber, #55274) [Link]
Appliance kernel source?
Posted Oct 30, 2009 22:25 UTC (Fri) by cdibona (subscriber, #13739) [Link]
You guys are killing me, we've had this up The GSA Mirror for years and years. Enjoy!Chris DiBona
KS2009: How Google uses Linux
Posted Oct 21, 2009 21:57 UTC (Wed) by jmm82 (guest, #59425) [Link]
1. They are not using kernels that are close to linus git head.
2. Some code would not be wanted in the mainline kernel.
3. Some code is not good enough to get into the mainline kernel.
4. They don't want to have 30 people saying the code will only get in if it does this. Aka. They don't want to make it support the features they are not using.
5. Some code is proprietary and they want to protect the IP. As was stated above as long as they are not distributing the code the changes are their property.
6. A lot of there patches are code backported from mainline, so it is already in the kernel.
I think moving forward that you will see Google have a few developers working on mainline to try and influence future kernels because it will be financially cheaper to carry as few patches as possible. Also, I feel they will always have some patches that they feel are too valuable IP to gave back and will continue to maintain those outside the mainline.
KS2009: How Google uses Linux
Posted Oct 21, 2009 22:46 UTC (Wed) by cpeterso (guest, #305) [Link]
KS2009: How Google uses Linux
Posted Oct 21, 2009 22:21 UTC (Wed) by dany (guest, #18902) [Link]
KS2009: How Google uses Linux
Posted Oct 23, 2009 22:13 UTC (Fri) by dvhart (guest, #19636) [Link]
1) sched_yield with CFS trouble
Does the /proc/sys/kernel/sched_compat_yield flag help? This is the third time I've ran into sched_yield() behavior issues in the last couple weeks, all related to userspace locking. I'd really like to know what we (kernel developers) can do to make this recurring problem go away. Big applications often seem to have need of better performing locking constructs. Adaptive mutexes, implemented with futexes, seem like they could address a lot of this, with a couple exceptions: the spin time is not user configurable AFAIK, the additional accounting, etc. done in support of POSIX seems to make even the uncontended calls into glibc too expensive. A common response seems to be that userspace locking isn't the right answer and they should rely on OS primitives. Unfortunately, as Chris Wright mentioned during Day 1, these developers of empirical evidence to the contrary. I was reviewing some for a different project today, sometimes the performance difference is staggering.
2) Mike asked why the kernel tries so hard to allocate memory - why not just fail to allocate if there is too much pressure. Why isn't disabling overcommit enough?
KS2009: How Google uses Linux
Posted Oct 24, 2009 1:26 UTC (Sat) by Tomasu (subscriber, #39889) [Link]
KS2009: How Google uses Linux
Posted Oct 25, 2009 19:24 UTC (Sun) by oak (guest, #2786) [Link]
KS2009: How Google uses Linux
Posted Nov 8, 2009 13:08 UTC (Sun) by vbeeno (guest, #61876) [Link]
Jess
www.private-web.se.tc
LWN web design
Posted Nov 9, 2009 9:33 UTC (Mon) by kragil (guest, #34373) [Link]
I think hiring a web designer from this century (new colours, css stuff) could really improve this site. I am not talking lot of JS or Flash, just a newer more modern look.
Kernel hackers seem to complain that new blood is lacking, but for an ignorant observer a lot of stuff seems stuck in 1996.(just compare Rails,Gnome,KDE,etc news and planet sites to what the kernel has .. )
I won't even mention microblogging :)
LWN web design
Posted Nov 9, 2009 10:32 UTC (Mon) by k8to (subscriber, #15413) [Link]
LWN web design
Posted Nov 9, 2009 12:59 UTC (Mon) by quotemstr (subscriber, #45331) [Link]
LWN web design
Posted Nov 9, 2009 14:17 UTC (Mon) by kragil (guest, #34373) [Link]
LWN web design
Posted Nov 9, 2009 15:00 UTC (Mon) by anselm (subscriber, #2796) [Link]
In what way is the LWN.net design not »good«? It does what it is supposedto do in a very unobtrusive way -- unlike many of the newer sites. Notchasing after the latest visual fashions does not automatically make itslayout, fonts, and colours »bad«. Exactly what about these do you think iskeeping newusers away?
(Having said that, table-based layout isn't exactly the done thing thesedays, but you only addressed »layout, fonts, and colours«, not HTML-levelimplementation, and as far as I'm concerned there's nothing whatsoeverwrong with those. Also, registered users can tweak the colours to theirown liking, and it probably wouldn't be impossible to allow the fonts tobe tweaked, too.)
I'm sure that the LWN.net team will welcome constructive hints as to howto improve the LWN.net »experience« without giving up the strengths of thesite, i.e., no-frills Linux info and commentary. For the time being,however, I'd much rather Jon & co. spend their time on giving us thebest Linux news possible than chase the newest fads in web design. Afterall, people come here for the content, not to see great web design.
LWN web design
Posted Nov 9, 2009 15:57 UTC (Mon) by kragil (guest, #34373) [Link]
And I am no web designer, but AFAIK you can do a lot of nice stuff with just CSS (even rounded corners etc.). A lot of small changes done by a professional would certainly add up. And it would still be backwards compatible and no-frills.
Just read a few comments on the link above to look beyond your little bubble.
I don't think a more professional look would be bad for LWN. Quite the opposite.
LWN web design
Posted Nov 9, 2009 16:46 UTC (Mon) by anselm (subscriber, #2796) [Link]
Right. Rounded corners. Rounded corners will definitely make all thedifference! Honestly, if you can't come up with better suggestions thanthis ...
I just had a look at the comments on the Digg article you quoted above andI'm not convinced that encouraging that sort of crowd to come here issomething Jon & co. should spend time and energy on. If the web designis what keeps them away then I would surely recommend keepingthings as they are.
(Incidentally, looking at the Digg site itself, I'll take current LWN.netover the Digg design any day, thank you very much. Maybe it's just me, butI like my fonts readable and navigation where I can actually find it. Alsovery incidentally, unlike you I in fact pay LWN.net money everymonth so they can keep doing what they are doing so well. Iwonder how many of the Digg users you revere so much would -- even ifLWN.net looked like Digg?)
LWN web design
Posted Nov 9, 2009 23:26 UTC (Mon) by kragil (guest, #34373) [Link]
LWN web design
Posted Nov 9, 2009 23:40 UTC (Mon) by quotemstr (subscriber, #45331) [Link]
most websites that never change and don't adapt have a good change of dyingYes, but that adaption doesn't necessarily have to come in the form of the latest design trends. Objectively speaking, LWN is perfectly usable. What you're objecting to is LWN's not following web fashion. Not following web fashion hasn't seemed to hurt craigslist, despite the existence of many more hip competitors.
LWN web design
Posted Nov 10, 2009 0:22 UTC (Tue) by kragil (guest, #34373) [Link]
LWN web design
Posted Nov 13, 2009 1:11 UTC (Fri) by foom (subscriber, #14868) [Link]
I just noticed the program " Marketplace":Marketplace... Craig's List. Without the Ugly.I agree with the other comments that LWN could do with some sprucing up. It doesn't really bother me that it's currently ugly, I still read (and pay for) it. But I wouldn't mind it being nicer, either, and it might keep other readers from running in terror.Love Craig's List but hate how painful and ugly it is? Me too. So I made Marketplace. It takes the pain and ugly out and leaves the good stuff in.
LWN web design
Posted Nov 10, 2009 0:10 UTC (Tue) by anselm (subscriber, #2796) [Link]
If your resources are limited (as LWN.net's are) it makes sense to stayaway from stuff that is essentially eye candy for people who must alwayshave the latest and greatest, and concentrate on stuff that benefitsall your readers, like compelling content. If I was in charge ofHTML/CSS development for LWN.net, I would consider some changes but theywould not in the first instance touch the visual appearance -- I wouldprobably move to a purely CSS-based (instead of table-based) layout tomake the site more accessible. I might change some of the visual designbut only to improve readability, not to make substantial changes to thelayout as it appears. IMHO, such changes would be worthwhile but theywould not be changes for change's sake the way you seem to be advocating.(Feel free to suggest anything specific that will actually improvethe reader's experience if you can think of something.)
As far as Google is concerned, when the site was new it looked completelyunlike all the other search engines precisely because itwent back to the basics and presented essentially a logo, text entrywidgetand search button. In spite of this »conservative attitude« itstill went on to become the most popular search engine on the planet.Again, it was the content that made the difference, not the (lack of)bling; people were attracted more by the good results Google produced thanthey were turned off by the straightforward appearance. Also, in spite ofnot changing its appearance substantially during the last decade or so,www.google.com isn't likely to go away anytime soon, either.
Finally, the »big stream of money« from subscribers is, to a large degree,what keeps LWN.net going. Jon & co. may, in fact, be very interestedin updating their web design but perhaps they can't afford to spend thetime or money at this moment. So if you want them to be able tocontemplate non-essential tasks like HTML/CSS hacking, instead of whiningabout how LWN.net will go away if they don't »adapt« you should reallycontribute to their bottom line by taking out a subscription, which willcertainly have a much greater impact than any amount of whining.
LWN web design
Posted Nov 10, 2009 1:11 UTC (Tue) by kragil (guest, #34373) [Link]
LWN web design
Posted Nov 10, 2009 10:51 UTC (Tue) by anselm (subscriber, #2796) [Link]
This is getting silly. If you can't point to anything specific andconstructive that you would actually change to improve LWN.netother than »use rounded corners, they're cool« this must mean that thesite isn't so bad to begin with, so I'll politely and respectfully suggestyou shut up.
LWN web design
Posted Nov 10, 2009 12:06 UTC (Tue) by kragil (guest, #34373) [Link]
Get a good web designer that knows modern web design (page layout, usability, style, colours, logo etc.) and have him/her improve the crappy first impression this site makes. That may or may not include rounded corners. I don't know as I am not a web designer as I already mentioned and explained that that was just one tiny technical example, which still does not fit into your brain. All I know is that this sites design (unprofessional logo with green font, annoying flashy ads, black,red or blue text, grey and orange boxes, no advanced CSS etc.) undeniably makes a bad impression, which does not help anybody.
This doesn't have cost a lot. There are a lot of talented young web monkeys out there that don't charge a lot per hour. First thing though is to acknowledge that not everything is peachy.
LWN web design
Posted Nov 10, 2009 12:31 UTC (Tue) by hppnq (guest, #14462) [Link]
Another cup of Open Source, anyone?LWN web design
Posted Nov 10, 2009 12:33 UTC (Tue) by quotemstr (subscriber, #45331) [Link]
My original point, if I may flesh it out a bit, is that the kind of person bothered by LWN's layout probably won't get much out of LWN's content in the first place. LWN's attraction to me is the deep, literate, and mature coverage, and to a lesser extent, the informative and useful comment section. I couldn't care less how the site looks, and would be just as happy (no, happier) if I could read it over NNTP. Changing lwn.net to pander to the Digg crowd would compromise what makes LWN worthwhile in the first place. Franky, the kind of person who judges a site based on how Web 2.0 it is would find the articles here boring, and would post vapid comments saying so. It'd be an Eternal September.LWN web design
Posted Nov 10, 2009 12:40 UTC (Tue) by nye (guest, #51576) [Link]
This idea may be absolutely unthinkable to you, but it is actually possible to appreciate good design without being a sub-literate fool, despite what your prejudices may lead you to feel.