http://www.tux.org/lkml/index.html
The linux-kernel mailing list FAQ
Before you consider posting to the linux-kernel mailing list, please read at least the start of section 3 of this FAQ list.
These frequently asked questions are divided in various categories. Please contribute any category and Q/A that you may find relevant. You can also add your answer to any question that has already been answered, if you have additional information to contribute.
The official site is: http://www.tux.org/lkml/ (this is in the east coast of the U.S.A). Many thanks to Sam Chessman and David Niemi for hosting the FAQ on a high-bandwidth, professionally managed Linux server. The following mirrors are available (and are updated at the same time as the official site):
- http://www.atnf.csiro.au/~rgooch/linux/docs/lkml/ in Sydney, Australia
- http://www.ras.ucalgary.ca/~rgooch/linux/docs/lkml/ in Calgary, Canada
- http://www.kernel.org/pub/linux/docs/lkml/ in the west coast of the U.S.A.
Hot off the Presses
vger.kernel.org has enabled ECN. You may need to switch ISP in order to receive linux-kernel email. See the section on ECN for more details.
Two digest forms of linux-kernel (a normal digest every 100KB and a once-daily digest) are available at http://lists.us.dell.com/.
Go to http://www.atnf.csiro.au/~rgooch/linux/docs/kernel-newsflash.html for newflashes about official kernel releases.
NOTE: this page is no longer maintained. If there is an alternative page, please let me know.
Read this before complaining to linux-kernel about compile problems. Chances are a thousand other people have noticed and the fix is already published.
Index
- Basic Linux kernel documentation
- Contributors and some special expressions
- Related mailing lists
- Question Index
- General questions
- Driver specific questions
- Mailing list questions
- "How do I" questions
- "Who's who" questions
- CPU questions
- OS questions
- Compiler/binutils questions
- Feature specific questions
- "What's changed between kernels 2.0.x and 2.2.x" questions
- Primer documents
- Kernel Programming Questions
- Mysterious kernel messages
- Odd kernel behaviour
- Programming Religion
- User-space Programming Questions
- Answers
- Contributing
Basic Linux kernel documentation
The following are
Linux
kernel related documents, which you should take a look at
before you post to the linux-kernel mailing list:
- The Linux Kernel Hackers' Guide, compiled by Michael K. Johnson of Red Hat fame. Includes among other documents selected Q/As from the linux-kernel mailing list.
- The Linux Kernel book, by David A. Rusling, available in various formats from the Linux Documentation Project and mirrors. Still being worked on, but explains clearly the main structure of the Linux kernel.
- The Linux FAQ by Robert Kiesling has many high quality Q/As.
- The Linux Kernel HOWTO by Brian Ward. Fundamental reading for anybody wanting to post to the linux-kernel mailing list.
- Various Linux HOWTOs on specific questions, such as the BogoMips mini-HOWTO by Wim van Dorst. These are all by definition LDP documents.
- The Linux kernel source code for any particular kernel version that you may be using. Note that there is a /Documentation directory which holds some very useful text files about drivers, etc. Also check the MAINTAINERS file in the kernel source root directory.
- Some drivers even have Web pages, with additional up to date information e.g. the network drivers by Donald Becker, etc. Check the Hardware section in the LDP site.
- Similarly, Linux implementations for some CPU architectures have dedicated Web pages, mailing lists, and sometimes even a HOWTO e.g. the Linux Alpha HOWTO by Neal Crook. Check the LDP site and its mirrors for Web links to the various architecture specific sites.
- Linux device drivers, a book written by Alessandro Rubini. C. Scott Ananian reviewed it for Amazon.com.
- Linux kernel internals, a book by Michael Beck (Editor) et al. Also reviewed for Amazon.com.
- Another useful site is: http://www.kernelnewbies.org/
- Here is a general guide on how to ask questions in a way that greatly improves your chances of getting a reply: http://www.catb.org/~esr/faqs/smart-questions.html. If you have a bug to report, you should also read http://www.chiark.greenend.org.uk/~sgtatham/bugs.html.
Extra instructions, specific to the Linux kernel are available here.
Contributors and some special expressions
This is the list of contributors to this FAQ. They are listed in alphabetic order of their abbreviations, used in the Answers sections below to identify the author(s) of each answer.
- AC : Alan Cox
- AV : Alexander Viro
- ADB: Andrew D. Balsa
- CP : Colin Plumb
- DBE: Daniel Bergstrom
- DSM: David S. Miller (co-postmaster)
- DW : David Woodhouse
- JBG: Jan-Benedict Glaw
- KGB: Krzysztof G. Baranowski
- KO : Keith Owens
- MEA: Matti E. Aarnio (co-postmaster)
- MRW: Matthew Wilcox
- PG : Paul Gortmaker
- RC : Ralph Corderoy
- REG: Richard E. Gooch (FAQ maintainer)
- REW: Roger E. Wolff
- RML: Robert M. Love
- RRR: Rafael R. Reilova
- TAC: Thomas A. Cort
- TJ : Trevor Johnson
- TYT: Theodore Y. Ts'o
- VKh: Vassilii Khachaturov
Some English expressions for non-native English readers. Many of these (and far more) may be obtained from the Jargon File:
- AFAIK = As Far As I Know
- AKA = Also Known As
- ASAP = As Soon As Possible
- BTW = By The Way (used to introduce some piece of information or question that is on a different topic but may be of interest)
- COLA = comp.os.linux.announce (newsgroup)
- ETA = Estimated Time of Arrival
- FAQ = Frequently Asked Question
- FUD = Fear, Uncertainty and Doubt
- FWIW = For What It's Worth
- FYI = For Your Information
- IANAL = I Am Not A Lawyer
- IIRC = If I Recall Correctly
- IMHO = In My Humble Opinion
- IMNSHO = In My Not-So-Humble Opinion
- IOW = In Other Words
- LART = Luser Attitude Readjustment Tool (quoting Al Viro: "Anything you use to forcibly implant the clue into the place where luser's head is")
- LUSER = pronounced "loser", a user who is considered to indeed be a loser (idiot, drongo, wanker, dim-wit, fool, etc.)
- OTOH = On The Other Hand
- PEBKAC = Problem Exists Between Keyboard And Chair
- ROTFL = Rolling On The Floor Laughing
- RSN = Real Soon Now
- RTFM = Read The Fucking Manual (original definition) or Read The Fine Manual (if you want to pretend to be polite)
- TANSTAAFL = There Ain't No Such Thing As A Free Lunch (contributed by David Niemi, quoting Robert Heinlein in his science fiction novel 'The Moon is a Harsh Mistress')
- THX = Thanks (thank you)
- TIA = Thanks In Advance
- WIP = Work In Progress
- WRT = With Respect To
Related mailing lists
Some questions are better posted to related mailing lists on specific subjects. Posting to these mailing lists helps reduce the volume on the linux-kernel mailing list and also increases your chances of having your message read by an expert on the subject. Some people do not have the time to subscribe to the linux-kernel mailing list, as it is too general for them. Some related lists are:
Question Index
Section 1 - General questions
- Why do you use "GNU/Linux" sometimes and just "Linux" in other parts of the FAQ?
- What is an experimental kernel version?
- What is a production kernel?
- What is a feature freeze?
- What is a code freeze?
- What is a f.g.hhprei kernel?
- Where do I get the latest kernel source?
- Where do I get extra kernel patches?
- What is a patch?
- How do I make a patch suitable for the linux kernel list?
- How do I apply a patch?
- What's vger?
- What is a CVS tree? Where can I find more information about CVS?
- Is there a CVS tutorial?
- How do I get my patch into the kernel?
- Why does the kernel tarball contain a directory called linux/ instead of linux-x.y.z/ ?
- What's the difference between the official kernels and Alan Cox's -ac series of patches?
- What does it mean for a module to be tainted?
- What is this about GPLONLY symbols?
- Do I have to use GIT to send patches?
- Who maintains the kernel?
- The kernel doesn't compile cleanly. What shall I do ?
Section 2 - Driver specific questions
- Driver such and such is broken!
- Here is a new driver for hardware XYZ.
- Is there support for my card TW-345 model C in kernel version f.g.hh?
- Who maintains driver such and such?
- I want to write a driver for card TW-345 model C, how do I get started?
- I want to get the docs, but they want me to sign an NDA (Non-Disclosure Agreement).
- I want/need/must have a driver for card TW-345 model C! Won't anybody write one for me?
- What's this major/minor device number thing?
- Why aren't WinModems supported?
- Modern CPUs are very fast, so why can't I write a user mode interrupt handler?
- Do I need to test my driver against all distributions?
Section 3 - Mailing list questions
- How do I subscribe to the linux-kernel mailing list?
- How do I unsubscribe from the linux-kernel mailing list?
- Do I have to be subscribed to post to the list?
- Is there an archive for the list?
- How can I search the archive for a specific question?
- Are there other ways to search the Web for information on a particular Linux kernel issue?
- How heavy is the traffic on the list?
- What kind of question can I ask on the list?
- What posting style should I use for the list?
- Is the list moderated?
- Can I be ejected from the list?
- Are there any implicit rules on this list that I should be aware of?
- How do I post to the list?
- Does the list get spammed?
- I am not getting any mail anymore from the list! Is it down or what?
- Is there an NNTP gateway somewhere for the mailing list?
- I want to post a Great Idea (tm) to the list. What should I do?
- There is a long thread going on about something completely offtopic, unrelated to the kernel, and even some people who are in the "Who's who" section of this FAQ are mingling in it. What should I do to fight this "noise"?
- Can we have the Subject: line modified to help mail filters?
- Can we have a Reply-To: header automatically added to the list traffic?
- Can I post job offers/requests to the list?
- Why do I get bounces when I send private email to some people?
- Why don't you split the list, such as having one each for the development and stable series?
Section 4 - "How do I" questions
- How do I post a patch?
- How do I capture an Oops?
- How do I post an Oops?
- I think I found a bug, how do I report it?
- What information should go into a bug report?
- I found a bug in an "old" version of the kernel, should I report it?
- How do I compile the kernel?
- How do check if the running kernel is tainted?
Section 5 - "Who's who" questions
Names are in alphabetical order (last name) to avoid stepping on toes.
If someone doesn't appear here, check /usr/src/linux/CREDITS.
- Who is in charge here?
- Why don't we have a Linux Kernel Team page, same as there are for other projects?
- Why doesn't answer my mails? Isn't that rude?
- Why do I get bounces when I send private to email to some of these people?
- Who is Matti Aarnio?
- Who is H. Peter Anvin?
- Who is Donald Becker?
- Who is Alan Cox?
- Who is Richard E. Gooch?
- Who is Paul Gortmaker?
- Who is Bill Hawes?
- Who is Mark Lord?
- Who is Larry McVoy?
- Who is David S. Miller?
- Who is Linus Torvalds?
- Who is Theodore Y. T'so?
- Who is Stephen Tweedie?
- Who is Roger Wolff?
Some people haven't contributed yet with a few lines about themselves, and the policy of this FAQ dictates that nobody is going to write about anybody else without authorization. Hence the missing links e.g. if you are not Linus, don't insist, we are not going to add your information about Linus.
Other OS developers:
- Who is Prof. Douglas Comer (Xinu)?
- Who is Richard M. Stallman aka RMS (GNU)?
- Who is Prof. Andrew Tanenbaum (MINIX)?
Section 6 - CPU questions
Is this a matter of taste or what?
- What is the "best" CPU for GNU/Linux?
- What is the fastest CPU for GNU/Linux?
- I want to implement the Linux kernel for CPU Hyper123, how do I get started?
- Why is my Cyrix 6x86/L/MX detected by the kernel as a Cx486?
- What about those x86 CPU bugs I read about?
- I grabbed the standard kernel tarball from ftp.kernel.org or some mirror of it, and it doesn't compile on the Sparc, what gives?
- Does the Linux kernel execute the Halt instruction to power down the CPU?
- I have a non-Intel x86 CPU. What is the [best|correct] kernel config option for my CPU?
- What CPU types does Linux run on?
Section 7 - OS questions
OS theory and practical issues mix.
- OS $toomuch has this Nice feature, so it must be better than GNU/Linux.
- Why doesn't the Linux kernel have a graphical boot screen like $toomuch OS?
- The kernel in OS CTE-variant has this Nice-very-nice feature, can I port it to the Linux kernel?
- How about adding feature Nice-also-very-nice to the Linux kernel?
- Are there more bugs in later versions of the Linux kernel, compared to earlier versions?
- Why does the Linux kernel source code keep getting larger and larger?
- The kernel source is HUUUUGE and takes too long to download. Couldn't it be split in various tarballs?
- What are the licensing/copying terms on the Linux kernel?
- What are those references to "bazaar" and "cathedral"?
- What is this "World Domination" thing?
- What are the plans for future versions of the Linux kernel?
- Why does it show BogoMips instead of MHz in the kernel boot message?
- I installed kernel x.y.z and package foo doesn't work anymore, what should I do?
- People talk about user space vs. kernel space. What's the advantage of each?
- What are threads?
- Can I use threads with GNU/Linux?
- You mean threads are implemented in user space? Why not in kernel space? Wouldn't that be more efficient?
- Can GNU/Linux machines be clustered?
- How well does Linux scale for SMP?
- Can I lock a process/thread to a CPU?
- How efficient are threads under Linux?
- How does the Linux networking/TCP stack work?
- Can we put the networking/TCP stack into user-space?
Section 8 - Compiler/binutils questions
Kernel compilation problems.
- I downloaded the newest kernel and it doesn't even compile! What's wrong?
- What are the recommended compiler/binutils for building kernels?
- Why the recommended compiler? I like xyz-compiler better.
- Can I compile the kernel with gcc 2.8.x, egcs, (add your xyz compiler here)? What about optimizations? How do I get to use -O99, etc.?
- I compiled the kernel with xyz compiler and get the following warnings/errors/strange behavior, should I post a bug report to the list? Should I post a patch?
- Why does my kernel compilation stops at random locations with: "Internal compiler error: program cc1 caught fatal signal 11."?
- What compiler flags should I use to compile modules?
- Why do I get unresolved symbols like foo__ver_foo in modules?
- Why do I get unresolved symbols with __bad_ in the name?
Section 9 - Feature specific questions
Miscellaneous kernel features questions.
- GNU/Linux Y2K compliance?
- What is the maximum file size supported under ext2fs? 2 GB?
- GGI/KGI or the Graphics Interface in Kernel Space debate?
- How do I get more than 16 SCSI disks?
- What's devfs and why is it a Good Idea (tm)?
- Linux memory management? Zone allocation?
- How many open files can I have?
- When will the Linux accept(2) bug be fixed?
- What about STREAMS? I noticed Caldera has a STREAMS package, when will that go in the kernel source proper?
- I need encryption and steganography. Why isn't it in the kernel?
- How about an undelete facility in the kernel?
- How about tmpfs for Linux?
- What is the maximum file size/filesystem size?
- Linux uses lots of swap while I still have stuff in cache. Isn't this wrong?
- Why don't we add resource forks/streams to Linux filesystems like NT has?
- Why don't we internationalise kernel messages?
Section 10- "What's changed between kernels 2.0.x and 2.2.x" questions
- Size (source and executable)?
- Can I use a 2.2.x kernel with a distribution based on a 2.0.x kernel?
- New filesystems supported?
- Performance?
- New drivers not available under 2.0.x?
- What are those __initxxx macros?
- I have seen many posts on a "Memory Rusting Effect". Under what circumstances/why does it occur?
- Why does ifconfig show incorrect statistics with 2.2.x kernels?
- My pseudo-tty devices don't work any more. What happened?
- Can I use Unix 98 ptys?
- Capabilities?
- Kernel API changes
Section 11- Primer documents
Please, if you wish to contribute a Q/A in this section, provide a very short answer defining the topic and
then a URL to a longer text/Web page. Like that we can have various URL's for a single Q, each with a different point of view. Another advantage of this approach is that each contributor has to sit down and write a coherent HTML page or text file. Having to structure a written answer gives ample time to think about the issues and the topic as a whole. It also allows frequent independent revisions, which would be impossible on the FAQ itself.
Note that writing the longer text/Web page on some relevant Linux kernel topic and providing a Q/A in this section confers you instant Guru status. Some people would *kill* for this. Now go and write your stuff. ;)
- What's a primer document and why should I read it first?
- How about having I/O completion ports?
- What is the VFS and how does it work?
- What's the Linux kernel's notion of time?
- Is there any magic in /proc/scsi that I can use to rescan the SCSI bus?
Section 12- Kernel Programming Questions
Answers to common questions about kernel programming details. See also Tigran Aivazian's page on kernel programming.
- When is cli() needed?
- Why do I see sometimes a cli()-sti() pair, and sometimes a save_flags-cli()-restore_flags sequence?
- Can I call printk() when interrupts are disabled?
- What is the exact purpose of start_bh_atomic() and end_bh_atomic()?
- Is it safe to grab the global kernel lock multiple times?
- When do I need to initialise variables?
Section 13- Mysterious kernel messages
We sometimes get these messages in our system logs and wonder what they mean...
- What exactly does a "Socket destroy delayed" mean?
- What do I do about "inconsistent MTRRs"?
- Why does my kernel report lots of "DriveStatusError BadCRC" messages?
- Why does my kernel report lots of "APIC error" messages?
Section 14- Odd kernel behaviour
The kernel behaves in ways that seem odd...
- Why is kapmd using so much CPU time?
- Why does the 2.4 kernel report Connection refused when connecting to sites which work fine with earlier kernels?
- Why does the kernel now report zero shared memory?
- Why does lsmod report a use count of -1 for some modules? Is this a bug?
- Why doesn't the kernel see all of my RAM?
- I've mounted a filesystem in two different places and it worked. Why?
Section 15- Programming Religion
Responses to suggestions about programming techniques and languages.
- Why is the Linux kernel written in C/assembly?
- Why don't we rewrite it all in assembly language for processor Mega666?
- Why don't we rewrite the Linux kernel in C++?
- Why is the Linux kernel monolithic? Why don't we rewrite it as a microkernel?
- Why don't we replace all the goto's with C exceptions?
- Why are the kernel developers so dismissive of new techniques?
Section 16- User-space Programming Questions
Answers to common questions about user-space programming details, as it relates to the kernel/user-space interface (i.e. system calls). This does not cover questions on the C library nor any other library, as those questions are not related to the kernel.
- Why does setsockopt() double SO_RCVBUF?
Answers
Section 1 - General questions
- Why do you use "GNU/Linux" sometimes and just "Linux" in other parts of the FAQ?
- (ADB) In this FAQ, we have tried to use the word "Linux" or the expression "Linux kernel" to designate the kernel, and GNU/Linux to designate the entire body of GNU/GPL'ed OS software, as found in the various distributions. We prefer to call a cat, a cat, and a GNU, a GNU. ;-)
The purpose of the FAQ is to provide information on the Linux kernel and avoid debates on e.g. semantics issues. Further discussion of the relationship between GNU software and Linux can be found at http://www.gnu.org/gnu/linux-and-gnu.html.
BTW, it seems many people forget that the linux kernel mailing list is a forum for discussion of kernel-related matters, not GNU/Linux in general; please do notbring up this subject on the list.
- What is an experimental kernel version?
- (ADB) Linux kernel versions are divided in two series: experimental (odd series e.g. 1.3.xx or 2.1.x) and production (even series e.g. 1.2.xx, 2.0.xx, 2.2.x, 2.4.x and so on). The experimental series are fast moving versions which are used to test new features, algorithms, device drivers, etc. By their own nature the experimental kernels may behave in unpredictable ways, so one may experience data losses, random machine lockups, etc.
- What is a production kernel?
- (ADB) Production or stable kernels have a well defined feature set, a low number of known bugs, and tried and proven drivers. They are released less frequently than the experimental kernels, but even so some "vintages" are considered better than others. GNU/Linux distributions are usually based on chosen stable kernel versions, not necessarily the latest production version.
- What is a feature freeze?
- (ADB) A feature freeze is when Linus announces on the linux-kernel list that he will not consider any more features until the release of a new stable kernel version. Usually the net effect of such an announcement is that on the following days people on the list propose a flurry of new features before Linus really enforces the feature freeze. ;-)
- What is a code freeze?
- (ADB) A code freeze is more restrictive than a feature freeze; it means only severe bug fixes are accepted. This is a short phase that usually precedes the creation of a new stable kernel tree.
- What is a f.g.hhprei kernel?
- (ADB) These are intermediate pre-release versions of version f.g.hh. Note that usually i < 5, but e.g. 2.0.34prei was available with i = 1 to 16. Sometimes "pre" is replaced by the initials of the developer putting together the kernel revision, e.g. 2.1.105ac4 means the 4th intermediate release of kernel version 2.1.105 by Alan Cox.
- Where do I get the latest kernel source?
- (ADB) The primary site for the Linux kernel (experimental and production) sources is hosted by Transmeta (the company Linus Torvalds used to work for) on a dedicated Web server at http://www.kernel.org/. This site is mirrored across the world, and has pointers to mirrors for each country. You can go directly to a mirror for your country by going to http://www.CODE.kernel.org/ where "CODE" is the appropriate country code. For example, "au" is the country code for Australia, so the principle mirror site for Australia is http://www.au.kernel.org/
- (REG) You may also access tarballs and patches directly via ftp from ftp://ftp.CODE.kernel.org/pub/linux/kernel/ which is where Linus distributes his kernels from. Other notable kernel hackers have directories under the people directory, which is where they keep their kernel patches. The testing directory is where Linus puts pre-release patches. The pre-release patches are mainly intended for other developers, so they can stay in sync with changes in Linus' source tree. These are often highly experimental and may crash or cause filesystem corruption. Use at your own risk.
Note that Linus and Marcelo are using GIT to manage their kernel source trees, and it is more convenient for them to make snapshots of their latest trees available via GIT, rather than make patches. If you want access to these snapshots (which are merely a work in progress, and may be buggy), there are several access methods available:
CVS: :pserver:[email protected]:/home/cvs/linux-2.[45]
Subversion: svn://svn.kernel.org/linux-2.[46]/trunk
- (JBG) Linux is no longer maintained with the BitKeeper source code management system, but with GIT, a tool Linus wrote after BitKeeper was no longer available to all developers. You can browse Linus's latest kernel source as well as all other people's projects hosted on kernel.org. There's also a nice Overview of GIT and some helper tools as well as a complete Tutorial to get you into using GIT.
- Where do I get extra kernel patches?
- (REG) There are many places which provide various extra patches to the kernel for new features. One fairly good archive is available at: http://www.linuxhq.com/.
- What is a patch?
- (RRR) A patch file (as it refers to the Linux kernel) is an ASCII text file that contains the differences between the original code and the new code, plus some additional information such as filenames and line numbers. The patch program (man patch) can then apply the patch to an existing kernel source tree.
- How do I make a patch suitable for the linux kernel list?
- (REG) Here are some basic guidelines for posting patches. For information on how to generate patches, see the entry by RRR below.
- Ensure the patch does not have trailing control-M characters on each line. A number of broken tools used to encode patches add control-M for "DOS compatibility". This breaks many versions of patch, so be sure to configure your tools properly, or use unbroken tools, otherwise your patch will be silently deleted.
- Include the patch inline in your email, in plain text. Do not post it as a base64 MIME attachment. Many people will not be able to read your patch, and thus your patch will be deleted without comment.
- This FAQ previously advised posting a URL to a patch if the patch is large. This is no longer recommended. The preferred way to submit a large patch is to break it up into logical chunks, with a descriptive comment for each, and post each piece with a subject line like
"[PATCH] cleanup of foo driver [1/5]".
Do not start a new thread for each chunk - rather, post each chunk as a followup to the previous chunk. You may want to begin with an explanatory post, and label it something like
"[PATCH] cleanup of foo driver [0/5]".
See Documentation/SubmittingPatches for more information.
- If you want Linus or one of the primary maintainers (i.e. Marcelo, David) to apply your patch, you must Cc: them explicitly, otherwise your patch will be ignored.
- When sending patches to Linus or one of the primary maintainers, you must include the patch inline, in plain text, no matter how large the patch.
- If you want to send a patch to the list for comment, and also send it to Linus/primary maintainer for inclusion, and the patch is large, you may wonder how to reconcile the conflicting requirements. The solution is obvious: post the URL to the mailing list, wait for comments, and later send the patch, inline, to Linus/primary maintainer. Yes, this is more work for you. No, we don't care.
- If you have a mailer that eats whitespace or causes similar corruption, then FIX YOUR MAILER, don't expect to be able to take the easy solution and MIME encode your patch.
Finally, I've seen one person question the veracity of these guidelines, stating that the rules are rather more relaxed, and this FAQ is being over zealous. Fortunately, the King Penguin himself responded to this, so I include his words on this, so that there can be no doubt: If I get a patch in an attachment (other than a "Text/PLAIN" type
attachment with no mangling and that pretty much all mail readers and
all tools will see as a normal body), I simply WILL NOT apply it unless
I have strong reason to. I usually wont even bother looking at it,
unless I expected something special from the sender.
Really. Don't send patches as attachments.
Linus
- A caveat applies for people using a Mozilla Mail client. Andrew Morton noted that Mozilla mangles spaces in column zero when patches are included in the message body. Fortunately, Mozilla Mail sends patch attachments as type text/plain or text/x-patch (depending on the presence of a file extension), so it's safe to send patches as attachments instead.
- (RRR) To make a patch you use the diff program (read the info file for diff). The easiest way to do this is to set up two source trees under /usr/src, set a symlink "/usr/src/linux" to point to the modified tree, and diff one tree against the other. The file /usr/src/Documentation/CodingStyle has more specific information, read it. Things to remember:
- Always specify unified (-u) diff format.
- Avoid making formatting changes to the source that make the diff needlessly larger. Watch out for editors that convert tabs to spaces or vice versa.
- Unless you have specific reasons, diff against the latest official source tree. Otherwise, your patch is likely to be ignored. Either way, specify in your post against what you've diff'ed.
- Make sure your diff includes only the intended changes in your patch, not every other patch you have made to your source tree. Usually patches are limited to a few files, or directories. It is best to only diff the relevant files i.e. if I only made changes to the file driver_xyz.c under drivers/net, then I would use the following commands (assuming you have the original source tree named "linux-2.1.105", and the modified tree pointed at by the symlink "linux"):
cd /usr/src
diff -u linux-2.1.105/drivers/net/driver_xyz.c /
linux/drivers/net/driver_xyz.c > my_patch
- The following two should go without saying: the arguments to diff are first source (the original, unmodified file(s)), and then destination (your modified version of the file(s)), otherwise you get a reversed patch (and lots of people wondering what you're smoking). Also, make sure your patch applies and compiles cleanly.
- Of course you need to set up two identical source directories to be able to diff the tree later. A nice trick -- requiring a little bit of consideration, though -- is to create the modified source tree from hard links to the original source tree:
tar xzvf linux-2.1.anything.tar.gz
mv linux linux-2.1.anything.orig
cp -al linux-2.1.anything.orig linux-2.1.anything
This will hardlink every source file from the original tree to a new location; it is very fast, since it does not need to create some 80+ megabytes of files. You can now apply patches to the linux-2.1.anything source tree, since patch does not change the original files but move them to filename.orig, so the contents of the hard-linked file will not be changed.
Assuming that your editor does the same thing, too (moving original files to backup files before writing out changed ones) you can also freely edit within the hardlinked tree. If your editor does not handle files this way, you need to make a copy of each file before editing it, like this:
cp driver_xyz.c temporary; mv temporary driver_xyz.c
You can use file permissions to remind you to do this. Just remove write permissions from all the files in the directory you are working in:
chmod -w *.c
The changed tree can be diffed at high speed, since most files don't just have indentical contents, they are identical files in both trees. Naturally removing that tree is quite fast, too. Thanks to Janos Farkas for this trick.
- Finally, review the patch file (the format is not that complicated) before posting, and include all relevant information as to the nature of the patch. In particular, specify: why is this patch needed/useful, and what exactly does it fix/improve.
- How do I apply a patch?
- (TAC) (From /usr/src/linux/README) You can upgrade between releases by patching. Patches are distributed in the traditional gzip and the new bzip2 format. To install by patching, get all the newer patch files, enter the top-level directory of the unpacked kernel source tree and execute:
gzip -cd patchXX.gz | patch -p1 or:
bzip2 -dc patchXX.bz2 | patch -p1
(repeat xx for all versions bigger than the version of your current source tree, in order) and you should be ok. You may want to remove the backup files (xxx~ or xxx.orig), and make sure that there are no failed patches (xxx# or xxx.rej). If there are, either you or me has made a mistake.
Alternatively, the script patch-kernel can be used to automate this process. It determines the current kernel version and applies any patches found. Use it thus:
scripts/patch-kernel .
The first argument in the command is the location of the kernel source. Patches are applied from the current directory, but an alternative directory can be specified as the second argument.
- (RRR) To apply kernel patches please take a look at the kernel README file (/usr/src/linux/README) under "Installing the kernel". There is also a good explanation on the Linux HQ Project site.
- What's vger?
- (REG) "vger" is the name of the machine which hosts the LKML server. This server also hosts a number of other linux-related mailing lists. More information about the server is available at http://vger.kernel.org/
- What is a CVS tree? Where can I find more information about CVS?
- (REG) "CVS" is short for Concurrent Versions System, a Source Code Management system. Check out the CVS Bubbles page.
- Is there a CVS tutorial somewhere?
- (ADB) Here is a CVS tutorial which you can find online:
- An interactive CVS tutorial.
Getting a general idea of how CVS works takes about 15 minutes (highly recommended). Note that there are various graphical front ends to CVS, so you don't have to learn the usual assortment of cryptic commands.
- How do I get my patch into the kernel?
- (RRR) Depending on your patch there are several ways to get it into the kernel. The first thing is to determine under which maintainer does your code fall into (look in the MAINTAINERS file). If your patch is only a small bugfix and you're sure that it is 'obviously correct', then by all means send it to the appropriate maintainer and post it to the list. If there is urgency to the bugfix (i.e. a major security hole) you can also send it to Linus directly, but remember he's likely to ignore random patches unless they are "obviously correct" to him, have the maintainer's approval, or have been well tested and meet the first condition. In case you're wondering what constitutes well tested, here's another important bit: one purpose of the list is to get patches peer-reviewed and well-tested. Now, if your patch is relatively big, i.e. a rewrite of a large code section or a new device driver, then to conserve bandwidth and disk-space just post an announcement to the list with a link to the patch. Lastly, if you're not too sure about your patch yet, want some feedback from the maintainer, or wish to avoid open-season flaming on work-in-progress, then use private email.
- (REG) If there is no specific maintainer for the part of the kernel you want to patch, then you have three main options:
- send it to [email protected] and hope someone picks it up and feeds it to Linus, or maybe Linus himself will pick it up (don't count on it)
- send it to linux-kernel and Cc: Linus Torvalds and hope Linus will apply it. Note that Linus operates like a black box. Do not expect a response from him. You will need to check patches he releases to see if he applied your patch. If he doesn't apply your patch, you will need to resend it (often many times). If after weeks or months and many patch releases he still hasn't applied it, maybe you should give up. He probably doesn't like it
- send it to linux-kernel and Cc: Alan Cox . Alan is better at responding to email, and will queue your patch and resend it to Linus periodically, so you can forget about it. He also serves as a good taste tester. If Alan accepts your patch, it's more likely that Linus will too. If he doesn't like your patch, you will probably get an email saying so. Expect it to be terse.
- Why does the kernel tarball contain a directory called linux/ instead of linux-x.y.z/ ?
- (DW) Because that's the way Linus wants it. It makes applying many consecutive patches simpler, because the directory doesn't need to be renamed each time, and it also makes life easier for Linus.
- What's the difference between the official kernels and Alan Cox's -ac series of patches?
- (REG, contributed by Erik Mouw) Alan's kernel can be seen as a test bed for Linus' kernels. While Linus is very conservative and only applies obvious and well tested patches to the 2.4 kernel, Alan maintains a set of kernel patches that contains new concepts, more and/or newer drivers, and more intrusive patches. If the patches prove themselves stable, Alan submits them to Linus to include them into the official kernel.
- What does it mean for a module to be tainted?
- (REG, contributed by John Levon) Some vendors distribute binary modules (i.e. modules without available source code under a free software license). As the source is not freely available, any bugs uncovered whilst such modules are loaded cannot be investigated by the kernel hackers. All problems discovered whilst such a module is loaded must be reported to the vendor of that module, not the Linux kernel hackers and the linux-kernel mailing list. The tainting scheme is used to identify bug reports from kernels with binary modules loaded: such kernels are marked as "tainted" by means of the MODULE_LICENSE tag. If a module is loaded that does not specify an approved license, the kernel is marked as tainted. The canonical list of approved license strings is in linux/include/linux/module.h.
"oops" reports marked as tainted are of no use to the kernel developers and will be ignored. A warning is output when such a module is loaded. Note that you may come across module source that is under a compatible license, but does not have a suitable MODULE_LICENSE tag. If you see a warning from modprobe or insmod for a module under a compatible license, please report this bug to the maintainers of the module, so that they can add the necessary tag.
- (KO) If a symbol has been exported with EXPORT_SYMBOL_GPL then it appears as unresolved for modules that do not have a GPL compatible MODULE_LICENSE string, and prints a warning. A module can also taint the kernel if you do a forced load. This bypasses the kernel/module verification checks and the result is undefined, when it breaks you get to keep the pieces.
- (KO) According to Alan Cox, a license of "BSD without advertisement clause" is not a suitable free software license. This license type allows binary only modules without source code. Any modules in the kernel tarball with this license should really be "Dual BSD/GPL".
- What is this about GPLONLY symbols?
- (REG) By default, symbols are exported using EXPORT_SYMBOL, so they can be used by loadable modules. During the 2.4 series, a new export directive EXPORT_SYMBOL_GPL was added. This is almost the same thing, except that the symbol can only be accessed by modules which have a GPL compatible licence (note that this includes dual-licenced BSD/GPL code). This new directive was added for these reasons:
- To clarify the ambiguous legal ground on which non-GPL (particularly proprietary) modules lie. A strict reading of the GPL prohibits loading proprietary modules into the kernel. While Linus has consistently stated that proprietary modules are allowed (i.e. he has granted an explicit exemption), it is not clear that he is able to speak for all developers who have contributed to the Linux kernel. While many think Linus' edict means that all contributed code falls under this exemption granted by Linus, not everyone agrees that this is a legally sound argument. The new EXPORT_SYMBOL_GPL directive makes the licence conditions explicit, and thus removes the legal ambiguity.
- To allow choice for developers who wish, for their own reasons, to contribute code which cannot be used by proprietary modules. Just as a developer has the right to distribute code under a proprietary licence, so too may a developer distribute code under an anti-proprietary licence (i.e. strict GPL).
Note that Linus has stated that existing symbols will not be switched to GPL-only. Developers of proprietary modules for Linux need not fear. Furthermore, it is quite unlikely that Linus will look favourably upon the introduction of new core driver APIs which are restricted to GPL-only modules. This would not be in the best interests of Linux. Linus has forwarded me a message he sent to someone else to clarify his views. Note that since that time, several developers have eroded the number of non-GPL only symbols by writing new (usually better) infrastructure and interfaces and deprecating the older interfaces. The newer interfaces are often tagged as GPL-only. In addition, there are some "kernel janitors" who aggressively submit patches to remove all symbols (whether GPL-only or not) which are not used by code shipped with the kernel source tree.
- Do I have to use GIT to send patches?
- (REG) Absolutely not. Some kernel developers, including Linus and Marcelo, have chosen to use GIT to manage their kernel source trees, but this does not mean you need to use GIT yourself to maintain your trees or submit patches. Many notable kernel developers continue to maintain their source trees using other tools and techniques, and continue to send conventional patches.
- Who maintains the kernel?
- (REG) Originally, Linus Torvalds maintained the kernel. As the kernel has matured, he has delegated maintenance for older stable versions to others, while he continues development of the latest "bleeding edge" release. As of 27-MAY-2002, the following kernel versions are maintained by these people:
- 2.0 David Weinehall
- 2.2 Alan Cox
- 2.4 Marcelo Tosatti
- 2.6 Linus Torvalds
- The kernel doesn't compile cleanly. What shall I do?
- (REG) First make sure you have the latest version of that kernel series. Perhaps a pre-patch already has a fix. If not, search the list archives for a fix. Don't contribute to noise on the list by asking a question that may already have been answered.
If the problem has not yet been fixed, try digging into the code yourself and post a fix to the mailing list. You'll be famous! Beware that making broken code compile just for the sake of a clean 'make bzImage modules' doesn't count as a fix, and your fix will be discarded, ignored or flamed.
Section 2 - Driver specific questions
- Driver such and such is broken!
- (RRR) Try to be more specific. Please, provide information on your particular setup (see Qs How do I make a bug report?) Also see the Q: "kernel x.y.z broken!" below.
- (ADB) That's the worst possible way to start a thread. Please try to reach the author of the driver first and report the "broken" driver to him. Constructive criticism is welcome, usually.
- Here is a new driver for hardware XYZ.
- (REW) Good work! Please try to find a few people that also have the XYZ hardware and have them test it on their configuration (e.g. by posting a message on a newsgroup). No it won't go in the standard kernel before some people have tested it.
Testing will take a while. In the mean time, kernel development will continue, and you will have to rewrite your patch for the most recent version before Linus might consider it.
As a whole new driver is most likely more than a few pages long, we'd prefer it if you would put the actual driver up for ftp instead of posting it to the list. Post the URL and the description that tells us what your driver does for which hardware.
- Is there support for my card TW-345 model C in kernel version f.g.hh?
- (REW) First check if your card is detected at boot time. It usually is. Second see if you might need to configure something like modules.conf for your card. Third see if there is a file with the card name in the kernel sources. (e.g. you have a Buslogic card, and there is a buslogic.c file in the kernel sources, you're in luck.). Next, grep for the manufacturer name through ALL the kernel sources. And try the model number of your card. Also try to find the largest chip on your card and grep for the chip number on that thing. Realize that 53C80 chips might be named 5380 in the kernel. Other chips don't have their middle name removed.
Nothing yet? Now check DejaNews, using the same arguments you used to grep the kernel source. There are 99.99% chances that somebody has exactly the same card TW-345 model C.
Ok. That's what you can do without bothering anyone. If all this doesn't lead somewhere, you should really ask this question on a newsgroup like comp.os.linux.hardware.
- Who maintains driver such and such?
- (RRR) Have a look at the /usr/src/linux/MAINTAINERS file, this is the most authoritative source. Also check the source code for the driver itself; in both cases, check the latest version of the kernel that you have available. Some drivers have specific Web pages and sometimes even a dedicated mailing list. Check those first. If you cannot contact the maintainer then as a last resort post a short message to the list. In any case, keep in mind that maintainers are usually very busy peopleand most of them work on Linux for free and in their spare time, so don't expect an immediate response. Some maintainers get just too many mails in too small periods of time to be able to answer them all, so please be kind to them.
- I want to write a driver for card TW-345 model C, how do I get started?
- (REW) Good initiative! First a piece of advise: are you up to this? Ten times as many projects like this get started as get finished. Also, make sure that you're not doing double work. Make sure that such a driver is not already available: read Q/A 2.3 above...
First prepare yourself. Get the docs, read them (OK, you're allowed to start skipping stuff if you've gotten to the part "detailed register descriptions"). Next, get the Linux kernel source, find a driver that drives similar hardware to the one you're going to work on, and read THAT. (I usually use the smallest one I can find: wc -l *.c | sort -n | head -4).
Ok. You've thought about it. Now the question is, do you have technical documentation for your card? You can reverse engineer the driver for MS operating systems, but having the documentation is MUCH easier.
In the dark old ages (70s to middle of the 80s), you got a complete technical description with every card you could get. This is no longer the case. Anyway, contact your vendor and politely ask them for the "device driver kit" or the "technical manual" for the card.
Try the head office and your local office at the same time. Local offices occasionally have bad photo copies that they give out before you get an official rejection from the head office. In that case whom you got the documentation from becomes confidential information. Don't put the guy's name in the source.
If you can't get the technical documentation, consider giving up and investing in a competitors product (and tell the manufacturer about this). Not given up yet? Ok. Next step is to find out what the DOS driver does. Try to get the card to work while you run it in a microsoft emulator (dosemu or WINE). This will allow you to program these tools to log the I/O accesses of the driver. This will give you a large list of I/O accesses that the driver did. If you're good, you might be able to see patterns, and deduce how the driver works. From there you might be able to write a working driver. Good luck! You'll need it.
- I want to get the docs, but they want me to sign an NDA (Non-Disclosure Agreement).
- (REW) Some people find this a tremendous problem. Some companies just want to know who has the docs to their hardware, and don't mind if you write a GPL-ed driver. In that case, there is really no problem: just tell them what you intend to do and ask them to acknowledge in writing that they've understood what you're saying. In that case, you can get your driver into the standard kernel, but you cannot send out the docs to anybody who wants to work on the driver. They will have to rely on the comments in the source.
Other companies (just like Netscape) themselves signed NDAs that forbids them to disclose information to you.
Some really think that they have trade secrets in the interface towards the software, and intend to keep them secret. Those won't allow you to write a driver and then put the source on the net. Be careful with these.
- (ADB) The first and only NDA I ever received instantly found its way to the wastebasket. I would advise anybody who gets an NDA to refuse to sign it, if it refers to anything that may/will be put under GNU/GPL. Of course, for contract work this doesn't apply.
- I want/need/must have a driver for card TW-345 model C! Won't anybody write one for me?
- (REW) Some Linux developers will settle for a beer, and develop the driver for you. Others want a "free sample" of the hardware and will then go ahead and write the driver.
If you need more than a few of the cards or you manufacture the cards yourself, you can consider paying one of the commercial Linux device driver companies to get a commercially backed, officially maintained device driver.
- What's this major/minor device number thing?
- (REG) Device numbers are the traditional Unix way to provide a mapping between the filesystem and device drivers. A device number is a combination of a major number and a minor number. Currently Linux has 8 bit majors and minors. When you open a device file (character or block device) the kernel takes the major number from the inode and indexes into a table of driver structure pointers. The specific driver structure is then used to call the driver open() method, which in turn may interpret the minor number. There are two tables: one for character devices and one for block devices, each are 256 entries maximum. Obviously, there must be agreement between device numbers used in a driver and files in /dev. The kernel source has the file Documentation/devices.tex which lists all the official major and minor numbers. H. Peter Anvin (HPA) maintains this list. If you write a new driver (for public consumption), you will need to get a major number allocated by HPA. See the Q/A on devfsfor an improved (IMHO) mechanism for handling device drivers.
- Why aren't WinModems supported?
- (REG, quoting Edward S. Marshall) The problem is the lack of specifications for this hardware. Most companies producing so-called "WinModems" refuse to provide specifications which would allow non-Microsoft operating systems to use them.
The basic issue is that they don't work like a traditional modem; they don't have a DSP, instead making the CPU do all the work. Hence, you can't talk to them like a traditional modem, and you need to run the modem driver as a realtime task, or you'll have serious data loss issues under any kind of load. They're simply a poor design.
- (REG) Note that some people have been putting effort into reverse engineering some WinModems, so you may be lucky and find that yours is now supported. If not, it's time to get a refund and buy a real modem.
Note that modems have to be approved by the appropriate statutory or regulatory body for standards compliance (to make sure they don't send crap down the line and blow up the exchange). With WinModems, the driver software needs to be certified as well as the hardware. It's harder to get approval for Open Source drivers, since it usually costs money to obtain approval. Also, in theory, it's easier to modify an Open Source driver, so it would no longer be compliant. In reality, 99.999% of users don't even know there is source code for the driver, so "Standards Compliance" may well be a smoke-screen for manfacturers who don't want to bother with non-WinTel systems. If certification was the only problem, manufacturers could release binary-only drivers.
- (DW)The good news is that a certain amount of WinModem hardware is now supported. The bad news is that that is just the tip of the iceberg. Although the WinModems can now be used, they have functionality similar to that of a sound card - all the modulation and demodulation has to be performed by the host CPU. Work is progressing on this front too - see http://www.linmodems.org/ for more up-to-date information.
- Modern CPUs are very fast, so why can't I write a user mode interrupt handler?
- (REG, quoting Pete Zaitcev) This is not a question of having enough CPU cycles to waste them on mode switches. Rather, the current Linux architecture does not allow it. User processes run with interrupts enabled. Thus, any interrupt handler must deactivate the particular interrupt source before a process is scheduled to run, or an interrupt storm results. The deactivation is done in a device specific manner, so at least a small device driver must be present in kernel mode.
- Do I need to test my driver against all distributions?
- (REG, MEA) There are minor detail changes in between each kernel version (even in stable series), and depending on what configuration options are used (basically SMP or not), certain things like spinlocks may or may not reserve space in structures, and may or may not need to be called (are even optimized away in non-SMP systems), meaning that a binary driver compiled for SMP might not work with a non-SMP kernel. And vice versa.
Also different vendors tend to inject different things into their kernel patch-sets, which again may subtly change data layouts, etc. In stable kernel series great pains are suffered at maintenance so that data layouts of in-kernel APIs (and API calls themselves) are not changed. Nevertheless something may change making binary drivers to fail in mysterious ways.
Subtle memory changes may appear with i386-PAE mode (large memory machines which can't map all of RAM into the kernel at the same time).
Because of these differences, a driver compiled for one version of the kernel, or one vendor's kernel, is not likely to work with another kernel. Thus, if you are distributing a binary-only driver, you will have a significant support load compiling drivers for different kernels. If you are distributing a driver in source form, then, provided the driver is well-written (i.e. does not make assumptions about byte ordering or word sizes and uses standard kernel interfaces), the driver should be portable across kernel versions and architecture types. It will of course have to be compiled by end-users for their particular kernel. Distribution maintainers are likely to provide pre-compiled drivers, thus most end-users won't need to compile the driver themselves.
Section 3 - Mailing list questions
The linux-kernel mailing list is for discussion of the development of the Linux kernel itself. Questions about administration of a Linux based system, programming on a Linux system or questions about a Linux distribution are
not appropriate.
"Test" messages are very, very inappropriate on the lkml or any other list, for that matter. If you want to know whether the subscribe succeeded, wait for a couple of hours after you get a reply from the mailing list software saying it did. You'll undoubtedly get a number of list messages. If you want to know whether you can post, you must have something important to say, right? After you have read the following paragraphs, compose a real letter, not a test message, in an editor, saving the body of the letter in the off chance your post doesn't succeed. Then post your letter to lkml. Please remember that there are quite a number of subscribers, and it will take a while for your letter to be reflected back to you. An hour is not too long to wait.
(REG) The essential point to remember when posting to the linux-kernel mailing list is that there are a lot of very busy people reading the list. No matter how important you think you are, it is most likely that there are many people on the list who are more important than you. "Important" is not measured by the amount of money you have, how much your question is worth to your company or how desperate you are for an answer, rather, it is measured by how much you contribute to the linux kernel.
With that in mind, you should make sure that you are not wasting the time of other people on the list. Write for maximum efficiency of reading. It doesn't matter if it takes twice as long for you to compose a more readable message, if it halves the time a hundred key kernel developers spend trying to decode your message. Ignoring good taste and consideration is most likely to result in you being ignored.
- How do I subscribe to the linux-kernel mailing list?
- How do I unsubscribe from the linux-kernel mailing list?
- (ADB) At the bottom of each and every message sent by the linux-kernel mailing list server one can read:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
See http://vger.kernel.org/majordomo-info.html for information on Majordomo.
- Do I have to be subscribed to post to the list?
- (ADB) No, you don't have to be subscribed to the list to post to it. The address of the list is [email protected]. And you should indicate on your message that you wish to be personally CC'ed the answers/comments posted to the list in response to your posting.
- (REG) It is, however, generally considered good netiquette to be subscribed to a list (or a newsgroup for that matter) and lurk for a while before posting. That way you can learn what's considered an appropriate post and what isn't.
Don't treat the list as your personal helpdesk. Remember that the list is a community.
- Is there an archive for the list?
- (REG) There are many. Here are some:
- http://www.uwsg.indiana.edu/hypermail/linux/kernel/ has a search by word/subject capability.
- http://marc.theaimsgroup.com/?l=linux-kernel keeps a collection of Linux related list archives.
- http://lkml.org/ is another archive with latest kernels, latest messages and hottest messages tables.
- http://groups.google.com/groups?hl=en&q=fa.linux.kernel&meta= is a Google interface to the fa.linux.kernel newsgroup, which is in turn fed from the mailing list.
- http://gossamer-threads.com/lists/linux/kernel/ has an easy interface and an appealing format (click on a thread, shows all posts in a thread with posts clearly delimited).
- How can I search the archive for a specific question?
- (ADB) Use simple keywords which refer to the issue that matters to you. For example, if you are investigating an oops that happens whenever you plug in a network adapter NIC-007, use "NIC-007" or "oops NIC-007". As soon as you have found a link to a message that interests you, try to follow the thread. Remember that you will almost always get more information by carefully searching the archive than by posting a question to the list itself.
- Are there other ways to search the Web for information on a particular Linux kernel issue?
- (ADB) Sure. Before you check the list archives, you can search DejaNews and AltaVista (simultaneously, if your browser allows you to open various windows). You can also follow some links on the Linux Documentation Project site.
- How heavy is the traffic on the list?
- List traffic is very heavy; the average number of messages per day is ~400 [07/2007 - 02/2008]. That's over 12 000 messages a month!!!
- (ADB) You really don't want to read each and every posting to the list. If you are concerned with list traffic, I suggest you temporarily try the digest lists, which will be much easier on your mailbox (thanks to A. Wik for this suggestion).
- (REG) There is a weekly summary called "Kernel coverage at LWN.net" at http://lwn.net/Kernel/, which can save you a lot of time.
- What kind of question can I ask on the list?
- (ADB) The basic rule is to avoid asking questions that have been asked before, or that are irrelevant to other list users, or that are off topic. Please use your good sense.
- (REG) Remember that this is a list for the discussion of kernel development. If you have some ideas or bug reports to contribute, this is the place. User space issues are not appropriate for this forum. If you find a bug in the C library or some application, it doesn't belong on linux-kernel.
- What posting style should I use for the list?
- (REG, contributed by [email protected]) When following up a post on the kernel mailing list, please think before you quote. Since everybody else on the list also got the original post, don't quote it entirely. Highlight only the points that you really need to understand your arguments. Make sure the quoted part is recognizable as such, by ensuring each quoted line starts with a > (or more >>, in case of multi-level quoting). Don't quote signatures, entire patches, entire config files or entire posts. Don't quote the standard signature. The kernel-list is crowded enough already, let's take care!
- (REG) Be aware that your message is far more likely to be deleted without being read if you have too much quoted material before your reply.
- (REG) And please reply after the quoted text, not before it (as per RFC 1855). It's very confusing to see a reply before the quoted context. And it's embarrassing: it makes you look like a newbie. Change your mailer if necessary, if the one you have makes it hard to do reply-after-quoting.
I know some people like to quote the entire message they are replying to, so they put their reply right at the top so people won't give up after the first page of quoted material. Don't do it. It's annoying. Just learn to stop quoting everything. No-one wants to see it all anyway (list archives allow people to see everything if they missed it). You're not helping yourself anyway, as you're more likely to be ignored if you reply-before-quoting.
- (REG) Please don't use tabs or multiple spaces to quote text. Use the "> " sequence instead. Using whitespace to quote text makes it difficult to differentiate between what's quoted and the reply. And don't try to be cute or "different" and use some other character like "}" or whatever. Again, it's confusing. It wastes people's time. Write for maximum efficiency of reading.
- (REG) Please try to have halfway reasonable spelling and grammar. When reading text with really bad spelling or grammar, people stall while trying to parse your post. Don't think you're being "artistic" by stripping out all punctuation characters. Linux-kernel is not an online gallery, it's a communications medium. Write for maximum efficiency of reading.
- (REG) Please don't have long, inflammatory, controversial or offensive signatures (see RFC 1855). The rule of thumb is no more than 4 lines of 80 characters each.
- (PG) Don't attach huge files to your post. One major culprit is people attaching their kernel .config file to their post. These can be in excess of 1000 lines, and will grow more as kernel options are continuously added. If the contents of your .config file are relevant to your post then attach the output of
grep ^C .config
or grep "=[y|m]" .config
.
- (MEA) Some structures are forbidden as they appear to be used way too much in SPAM mail. Specifically, messages with Content-Type: text/html either as the only (primary) message, or as ANY of component sub-messages are considered spam, and rejected outright without any info to the sender.
Also, any message with header matching the regular expression: X-Mailing-List:.*@vger.kernel.org is considered to be LOOPING somewhere, and is thus diverted to list-owner.
- (REG) If you are using stuck using Microsoft Outlook or Outlook Express, which have flawed quoting algorithms, you should apply one of the following fixes:
- http://home.in.tum.de/~jain/software/oe-quotefix/ for Outlook Express
- http://home.in.tum.de/~jain/software/outlook-quotefix/ for Outlook
These fixes make these mailers more standards compliant.
- Is the list moderated?
- (ADB) No, the linux-kernel list is not moderated.
- Can I be ejected from the list?
- (ADB) It is technically possible, but I have never heard of anybody being ejected from the linux-kernel list.
- (REW) But you will if you post questions or answers that are asked and answered on this FAQ. ;-)
- (MEA) Oh definitely, all you need to have is malfunctioning email system which does not accept email to you -- e.g. check your domain backup MX servers by using the tool at: http://vger.kernel.org/mxverify.html
It is known that over the years the keepers of vger's lists have removed some people after getting sufficiently annoyed with them, but there you really must try to exceed yourself, and will likely get lots of peer pressure before getting kicked off.
Another way to quickly get yourself removed is to use the program called "fetchmail" -- which in itself is not all that bad, but apparently it is far too easy to accidentally re-post email to addresses which the visible RFC 822 headers contain -- that is, what the original sender used, like: To: [email protected] The result is duplicate messages on the mailing list. If you let that happen, you can be quite sure that your subscription will be removed as soon as possible.
- Are there any implicit rules on this list that I should be aware of?
- (ADB) Here are a few implicit rules which you should be aware of:
- Stick to the subject. This is a Linux kernel list, mainly for developers.
- Use English only!
- Don't post in HTML format! If you are using IE or Netscape, please turn off HTML formatting for your posts to the kernel list.
- If you use that other OS, make sure your mailer doesn't use Charset="Windows*" as those posts will be blocked.
- If you will be asking a question, before you post to the list, try to find the answer in the available documentation or in the list archives. Just remember that 99% of the questions on this list have already been answered at least once. Usually the first answer is the most detailed, so the archives contain far better information than you will get from somebody who has answered the same question a dozen times or more.
- Be precise, clear and concise, whether asking a question or making a comment or announcing a bug, posting a patch or whatever. Post facts, avoid opinions.
- Be nice, there is no need to be rude. Avoid expressions that may be interpreted as aggressive towards other list participants, even if the subject being treated is particularly relevant to you and/or controversial.
- Don't drag on with controversies. Don't try to have the last word. You will eventually have the last word, but meanwhile you'll have lost all your sympathy credit.
- A line of code is worth a thousand words. If you think of a new feature, implement it first, then post to the list for comments.
- It's very easy to criticize someone else's code, but when you write something for the first time, it's not that simple. If you find a bug, a mistake, or something that could be perfected, don't immediately post a comment such as "This piece of code is crap, how did it get into the kernel?". Contact the author of the code, explain the issue, and try to get the point across in a simple, humble way. Do that a few times and you will get a lot of credit as a good code debugger. Then when you write a piece of code people will pay attention to you.
- Don't flame beginners that ask the wrong questions. This adds noise to the list. Send them a private mail pointing them to a source of information e.g. this FAQ.
- (MEA) If you post HTML, your email won't make it to the lists (see section 3.9).
- (RC) Ensure your email doesn't match any of the regular expressions in vger's Majordomo's taboo list of regular expressions else it will be silently dropped. This matches seemingly innocuous words like `Deutschland' as in `Sitecom Deutschland GmbH'.
- (REG) Don't post post any religious or political material, including in your signature. Doing it in the body of a message will anger people, as it's always off-topic and is a waste of bandwidth (remember that even in the 21st century, many people are still being gouged by the second for bandwidth by their ISP or telco or both).
Including this unwanted material in your signature is less obnoxious, but is pointless at best (preaching to the converted). Most people will ignore it, and many will be prone to ignore the content of your message, recognising you are a wanker. If you want to be taken seriously, leave the soap-box at home. Limit your posts to technical issues.
- How do I post to the list?
- Does the list get spammed?
- (ADB) The linux-kernel list is no longer spammed, you will rarely if ever find a commercial posting to the list itself. OTOH once you post to the list, expect to get a few undesirable mails in the following days. Unfortunately some people watch the list and think it's a good idea to pick names from it. There are many ways to avoid spam, check the dedicated anti spam sites on the list. I learned many things this way.
- (REW) Although the list maintainers do their best to keep the list spam free, it is not possible to do this 100%. Some of the good kernel development people cannot keep up with the volume on linux-kernel. But they do occasionally post. Therefore we need to keep the submissions open for "everybody". Some of the other important people have two or three Email addresses. They too need to post from different addresses. Consequently something that looks like a submission from a valid return address tends to go on the list. There is nothing an automated filtering system can do about it.
The end result is about one spam a month. It happens. The maintainer will get a flood of mail about it and he will block the domain it came from. Please don't bother the list about it, don't add noise. Don't post "This guy is a jerk if he spams this list". Don't post "I traced him, you can mail bomb him at this address". Don't post "I traced him, bother his postmaster at such and such".
- I am not getting any mail anymore from the list! Is it down or what?
- (ADB) Majordomo is an intelligent mail list server. If for any reason your email address is unavailable, after some retries you will be automatically unsubscribed.
- (REW) On the other hand, accidents with the mailing list server have happened. These have wiped out the whole subscription list once or twice. Just resubscribe. Majordomo will get you a nice note saying you're still subscribed if suddenly everybody went dumb. Don't post "Just testing: Is the list working? I didn't get any mail for a few days now".
- (MEA) You may get unsubscribed because MTAs relaying traffic to you get bounces for some reason. One thing to verify is that your email routing data in the DNS is valid, e.g. feed your address to the input box at: http://vger.kernel.org/mxverify.html
- (MEA) VGER and/or one of its fanout boxes may be in overload. Usually system keepers notice the situation, and it becomes fixed within 1-2 days without messages being lost, but we don't track the entire world. Asking help from [email protected] could expedite the issue. Asking help on lists WILL NOT help, doing so just puts more load on the system!
- Is there an NNTP gateway somewhere for the mailing list?
- (RRR) Yes there is the newsgroup fa.linux.kernel, but you can only read the mailing list there, not post directly. Posting to the list must go by email to [email protected].
Here's the dejanews URL, if your NNTP host don't have the group http://www.dejanews.com/bg.xp?level=fa.linux.kernel
- (REG) Unfortunately not all news servers have the fa heirarchy. You can access the fa.linux.kernel by going to http://groups.google.com/groups?hl=en&q=fa.linux.kernel&meta=
- (REG, contributed by Gunter Ohrner) Yes, GMANE offers a bidirectional gateway at nntp:gmane.linux.kernel at server news.gmane.org with additional web access athttp://blog.gmane.org/gmane.linux.kernel.
- I want to post a Great Idea (tm) to the list. What should I do?
- (REG) OK, that's great. Now:
- First make sure that your idea is relevant to kernel development. Perhaps your idea is better implemented in the C library, or perhaps in a new library? Before posting to linux-kernel, be sure it really is a kernel issue.
- OK, so you have this great idea for the kernel. Are you sure someone hasn't thought of it before? Reading all of this document is a good starting point. Alsosearch the mailing list archives to see if that topic has been raised before.
- Now you have verified that you have an idea none has suggested before. For the best response, code up an implementation/kernel patch and post that to the kernel list when you announce your idea. If you provide code, you can be sure someone will try it out and give you comments. If you don't know anything about kernel hacking, this is a good time to start learning:-) By the time you've implemented your idea, you'll be able to call yourself a Linux Guru.
- If you really can't code something up, but still would like your idea implemented, post a message to the kernel list. Be as clear and precise as possible, so that people can understand your idea quickly. If you are lucky, someone who likes your idea may find the time to implement it. If nobody steps forward to implement it, you're out of luck: remember, we're all volunteers and we all have too many things to do as it is.
- If you get a negative response to your idea, don't get offended, after all, we all have different notions on what is a Good Idea (tm) and a Bad Idea (tm). If someone is rude to you, please resist the temptation to carry on a war on the list. Instead, email them privately saying that you don't like their rudeness. If everybody is polite, but just strongly disagrees with your idea, be careful not to push it too hard. If people haven't understood the point you are making, try explaining it a different way. But if people understand your idea but maintain it is flawed, it's time to stop pushing it. Pushing harder will just get you ignored.
- If you're convinced you're right, despite what everybody else says, stop talking about it and implement it! If you're right, you'll have the last laugh.
- (ADB) Good code (i.e. documented, elegant, efficient) and some benchmarking data showing your Great Idea performs well will go a long way to show you're right.
- There is a long thread going on about something completely offtopic, unrelated to the kernel, and even some people who are in the "Who's who" section of this FAQ are mingling in it. What should I do to fight this "noise"?
- (REW, ADB) Ignore it.
- (REG) Don't send a response to the kernel list under any circumstances. If you feel compelled to respond, do so privately informing the person that the message was offtopic. Or set up a procmail recipe to drop all messages for that thread: that way you'll never see the thread again.
- Can we have the Subject: line modified to help mail filters?
- (REG) The usual proposition is that a string like [LINUX-KERNEL] is prepended to the subject line.
This question has been raised many times before, and the answer has always been "no" or "there are better ways to filter email". The majority of the developers, and all (?) of the list maintainers take this position. Some of the reasons are:
- It would increase the size of the Subject: line. This is a problem, as it limits the amount of useful information that can be seen in the Subject: line, making it harder to scan through a list of subject lines looking for interesting subjects.
- It can lead to the Subject: line from hell, since some mailers and users don't behave sanely. Imagine the following:
RE: [LINUX-KERNEL] Re: [LINUX-KERNEL] RE: [LINUX-KERNEL] Re: [LINUX-KERNEL] Critical security flaw in 2.666.0
That's a lot of characters. The useful information will very likely be lost due to line truncation.
- It doesn't work for cross-posted messages, as the subject line for a single email will change depending on which list it was sent via. Not only can this confuse simple-minded filtering recipes, it can also break threaded mail readers (people may end up reading the same message twice).
- Cross-posting will make the Subject: line from hell problem more frequent. Imagine the following:
RE: [LINUX-KERNEL] Re: [LINUX-SCSI] RE: [LINUX-KERNEL] Re: [LINUX-SCSI] RE: [LINUX-KERNEL] Re: [LINUX-SCSI] Critical security flaw in 2.666.0
See? It just gets worse. Give it up, Subject: line modification is a bad idea.
- The correct way to filter is to base your recipe on the X-Mailing-List: line, which should always have "[email protected]".
An example procmail recipe would look like this: # Linux-kernel list
:0: /var/lib/emacs/lock/!home!fred!mfilter!linux!kernel
* ^X-Mailing-List:.*linux-kernel@vger/.kernel/.org
/home/fred/mfilter/linux/kernel
People subscribed to [email protected], which uses GNU Mailman, may want to use something like this: # linux-kernel-digest
:0
* ^X-BeenThere: linux-kernel-digest@lists/.us/.dell/.com
/home/fred/mfilter/linux/kernel-digest
People using mailagent might try this in their .rules file (thanks to Martin Smith ): To CC: /[email protected]/
{ SPLIT -adi ~/Kernel }
Similarly to procmail you can omit the mail folder from the split command. This causes the split messages to go back into the mailagent queue for further processing. Most mailers with filtering capabilities can be similarly configured. If not, then you can simply install procmail. If perchance you're running a damaged OS that can't filter properly, and there is no procmail port for it, then you should either upgrade, or accept that you won't be able to filter linux-kernel. Don't bother asking for a subject line modification.
- If you really want to get the feel of a toy mailing list, you can write a procmail recipe which will modify the Subject: line.
An example procmail recipe would look like this: # Linux-kernel list
:0 f
* ^X-Mailing-List:.*linux-kernel@vger/.kernel/.org
| sed -e 's/^Subject: /Subject: [TOY-LINUX-KERNEL] /'
Warning: if you do this, be careful to edit your Subject: line when replying to messages from the list, otherwise you risk being ignored or kill-filed.
- Can we have a Reply-To: header automatically added to the list traffic?
- (DW) Some mailing lists automatically add a Reply-To: header to the mails which go through them, forcing people to reply to the list, rather than replying personally to the original poster. This is a bad idea for many reasons which won't be listed here. See Chip Rosenthal's excellent summary Reply-To: Munging Considered Harmful for more explanation.
- Can I post job offers/requests to the list?
- (REG) Of course not! This is a technical development list, not a job exchange. You may find this site useful: http://www.hotlinuxjobs.com/
- Why do I get bounces when I send private email to some people?
- (REG) This could be for a variety of reasons, such as temporary problems with mail delivery. Your email may also be blocked (permanently rejected) by that individual or their ISP. This often happens if you send email from a machine or domain which is listed in the MAPS RBL, DUL and ORBS lists. These lists have been set up to protect people against spam. See http://www.mail-abuse.org/ for more information on these lists.
NOTE that these lists aren't trying to block you personally, they are trying to block known spammers or spammer-friendly sites (RBL and ORBS), or uncontrolled dial-up users (DUL). If you are being blocked, it probably means you have the misfortune to be using an ISP that is not a good net citizen and thus has been added to the RBL or ORBS lists. In some cases, you may be blocked because your ISP has volunteered their dial-up IP address ranges to the DUL, in which case you should be using their approved mail relay rather than sending email out directly from your host.
You must NOT post a message to the kernel list about this, as the people there cannot and will not help you. Nor should you use the list as a means of getting a message through to the individual you are trying to contact. This is not what the list is for.
If you are intent on making a fool of yourself in public, follow the same path as too many others before you, and complain on the kernel list about how unfair it is that you are being blocked because your ISP is bad. Expect sympathy from some, flames from others and silence from most. The net gain will be that your mail will still be blocked by the anti-spam lists, many people will ignore you in future emails (because you've made a fool of yourself), and you may find yourself in the killfiles of some people (i.e. you personally are being blocked because some people are fed up with you and don't want to hear anything more from you).
If you actually want your mails to no longer be blocked, get your ISP to clean up their act, or switch to a decent ISP. If you are required to use your ISP's mail relay, but it is crippled somehow, complain to your ISP or switch to one with competent staff.
If your ISP is unresponsive and you don't have an alternative ISP you could switch to, you'll just have to accept that an increasing fraction of people will block your email (as more and more people subscribe to the anti-spam lists). There's no point in shouting at the people who are defending themselves against spam (no-one is obliged to receive any and all email), go pester the spammers instead.
- Why don't you split the list, such as having one each for the development and stable series?
- (REG, by "hacksaw") It's true that the lkml is a high traffic list and can be a lot to handle. However, splitting the list wouldn't help, since most developers would just subscribe to both lists. In fact, there would then be extra traffic, because of the number of issues that hit both the development and stable kernels, or even farther back!
Section 4 - "How do I" questions
- How do I post a patch?
- (ADB) I assume you made the patch following the general instructions found above. Now write a short post describing your patch, the version of the kernel it applies to, your tests, the feedback you would like to get, etc. This should fit in 10 lines. Attach your patch and a one line README file describing it very succinctly, and mentioning your name and email (either as two ASCII files or as a MIME encoded tarball). In the subject of your post, put: [PATCH] , kernel . Send. Wait.
The small README file insures that your patch will not start circulating around the net without people noticing your name. If you don't care about copyright and/or your patch is trivial, you can skip tarring the files, just gzip the patch file and attach it to your post.
- (REG) Note that Linus does not read linux-kernel very much. So if you want him to see a patch, you will need to send it to him directly (say by Cc:ing him if you post to the list). Note that Linus likes to be able to read patches in plain ASCII, so anything that is uuencoded or MIMEd is likely to go straight to the bit-bucket. If because your patch is large you only send a URL, send a plain-text copy to Linus privately.
Also note that Linus drops patches silently when he is too busy (which is always:-), so if you don't see it in the next kernel patch, send it again. Oh, and don't expect him to tell you he's applied the patch, either.
- How do I capture an Oops?
- (REG, quoting Keith Owens) If an Oops is recoverable then the text appears first in the kernel message buffer (/proc/kmsg). You can use the dmesg command to print the contents but most of the time klogd and syslogd will automatically capture the Oops and write it to your log files.
Sometimes an Oops is so bad that the kernel is completely hung. When this occurs, almost anything that requires kernel support is also dead. In particular most interrupt driven subsystems are unusable, especially after the dreaded "Aiee, killing interrupt handler" message. Since most disk controllers use interrupts, no disk I/O is possible so the Oops does not get written to the log files. The same problem applies to logging over the network, most network cards require interrupt handlers.
In a complete hang, you have three options.
- Write the Oops down by hand from the screen and type it in after you have rebooted. This is the only option if you have not planned for a kernel hang.
- If you plan ahead and install a serial console linked to another machine (read linux/Documentation/serial-console.txt) then you can capture the Oops report on the other machine. By far the easiest and most reliable option.
- Since kernel 2.3.10 it has also been possible to use a parallel port line printer as a console. You can either attach a real printer, or another computer with EPP (Enhanced Parallel Port) support, which pretends to be a printer.
- There have been patches on linux-kernel to save the log somewhere in hardware. Unfortunately these patches are very hardware specific. Search the l-k archives for "Oops assist", "OOPS output over reboot" and "KMSGDUMP". Most of these patches require that the keyboard still works and even that can be useless when the kernel hangs.
Other operating systems can save the log even when the machine hangs, why doesn't Linux? Any OS that can save the log after a catastrophic kernel failure must do so without kernel support, that typically means using the underlying hardware. Alas the ix86 hardware does not provide enough support for this, in particular most BIOS will clear memory on reset, destroying any data in storage.
- How do I post an Oops?
- (ADB) Assuming you have found a genuine Oops (those are rare nowadays, but they happen), you should post the relevant portions of your system log, kernel configuration file and kernel symbol map, plus a description of your hardware and the circumstances under which the Oops occurred. Can the Oops be triggered by any particular method? Did it happen after you changed any part of your hardware configuration?
Don't post your oops report before you have checked linux/Documentation/oops-tracing.txt, the relevant paragraphs in linux/README, the ksymoops C program in linux/scripts/ksymoops which has another README, and the gdb man and info pages (thanks to Paul Kimoto for this tip). These documents describe the basic procedure for kernel oops tracing. Good trace info makes it much easier to understand and solve apparently weird oopses.
- (REG) Don't even bother posting an Oops if you haven't run it through ksymoops to decode the symbol addresses. The report will be ignored because it contains too little useful information.
Make sure you copy the correct System.map file into /boot or into the modules directory, otherwise you will get incorrect results.
- (REG, quoting "The Doctor What") There are some situations that make a kernel oops useless. The two most obvious are if your are overclocking your CPU or running VMWARE's vmmon. The reason is that overclocking can introduce random bit errors, while VMWARE's vmmon has the ability to (and does) change parts of the kernel. In both cases, data in the kernel, as reported by the oops, won't be useful.
- I think I found a bug, how do I report it?
- (ADB) A bug differs very slightly from an oops, actually. An oops is when the kernel detects that something has gone wrong. A bug is when something (in the kernel, presumably) doesn't behave the way it should, either with a driver or in some kernel algorithm. If you can detect this misbehaviour, you may or may not be getting an oops.
Perhaps the most important step is to determine under which conditions this misbehaviour can be triggered, and whether it is reproducible.
- What information should go in a bug report?
- (ADB) Does it affect system security? Is it related to a specific driver/hardware configuration? Did you manage to identify the piece(s) of kernel code concerned? It really depends on the kind of bug you found.
- (TYT) Please follow general good bug reporting guidelines: remember, the developers don't have access to your system, and they're not mind readers. Tell us which kernel version, and what your hardware is (if you're not sure, more detail is better than less). At the very least, tell us what processor and motherboard you have, how much memory, how many and what kind of disks (IDE, SCSI, etc.), what kind of disk controllers you have, what other expansion boards (specify whether they're PCI or ISA or some other bus). Also useful: what version of gcc and binutils were used to compile the kernel.
Try to find a simple, reliable way to trigger the problem. Telling the developer that they have to set up some complicated application environment (especially if it involves some ghastly expensive proprietary software like SAP or Oracle :-) may cause the developer to hit the 'd' key and move on.
In general, raw data is better than jumping to conclusions. If you want to give your guesses in your bug reports, they're of course welcome, but this is not a substitute for raw data. Many problems are not what they first seem. A hardware problem can masquerade as a VM problem. A device driver or VM problem can cause the filesystem code to notice a discrepancy, and flag a warning. Even if you're sure that the problem isn't a hardware problem, or some other theory that the developer advances, the scientific method demands that you do a test to rule these sorts of things out. Sometimes, you will get surprised.....
If you get a kernel oops message, it's useless unless you give us the proper symbolic information. This used to mean sending relevant pieces out of System.map. Fortunately, with the latest syslogd/klogd, this is much simpler (check the man page of klogd to see if your version has this feature; if it doesn't, you should upgrade to the latest version, and probably to a modern distribution). Make sure that you have the System.map file installed the appropriate place so that klogd can find it (the standard search path is in the /boot, /, and /usr/src/linux directories).
If the system oops and then dies without a chance for klogd to record the information into a syslog file, copy down the oops message exactly, and then use the ksymoops (see the man page) to get the symbolic information out. Remember, the raw numbers by themselves will generally not be useful.
If you can, try to isolate the problem to a specific kernel version. Knowledge that it worked in version 2.2.17, as well as 2.3.0-test6, but it stopped working in 2.3.0-test7-pre1, is extremely helpful, and will save developers a lot of time. (If you're comfortable disecting patches, fell free, taking apart the individual file changes and try to isolate to a particular change.)
- (REG) You did of course read REPORTING-BUGS from the kernel source tree first, didn't you?
- I found a bug in an "old" version of the kernel, should I report it?
- (CP) Only if it hasn't been fixed yet. The best thing to do is to try to repeat it with a new version of the kernel. If not, you have to figure out if it's been fixed yet. The kernel release announcements and patch descriptions from Jitterbug are also useful. Failing that, look for discussion of the bug in linux-kernel and check the patches between your kernel and the latest ones for relevant changes.
If you can't find your bug mentioned, and you're not running a truly ancient kernel, posting a bug report is worthwhile. You can probably expect a request of the form "try it with the latest kernel" or "try it with this patch" in response. If there's a reason why you can't run the latest kernel (like it's your main dialin server and you don't want to mess with it), saying it in your original report will save some explaining later.
- How do I compile the kernel?
- How do I check if the running kernel is tainted?
Section 5 - "Who's who" questions
- Who is in charge here?
- (ADB) Do you mean "Who takes decisions relative to the mailing list?" or do you mean "Who takes decisions relative to the Linux kernel"? If the former: there is relatively little to decide when it comes to the mailing list. Majordomo, once correctly setup, will manage the list in an autonomous fashion. In any case, you can always reach the Majordomo-owner for the list, if you have a very specific question about the list mechanism itself. When it comes to kernel development management and decision making, see the answer to Question 7.8 below.
- Why don't we have a Linux Kernel Team page, same as there are for other projects?
- (ADB) Perhaps because there is no Linux Kernel Team, per se. Also because so many people contributed to the Linux kernel that it would be a tough task to setup and maintain such a page. Finally, although this is not a rule, most Linux kernel contributors prefer to keep a low profile, for various reasons.
- Why doesn't answer my mails? Isn't that rude?
- (ADB) Probably because of sheer lack of time to answer each email that gets sent to them. What would you do if you got 1000 mails in your mailbox, from one day to the next? They don't mean to be rude, however.
One hint: if you attach to your mail a genuinely useful piece of good quality code that you wrote, there are good chances that it will be answered (choose a good subject line, too). If you ask a dozen beginner's questions, the truth is, there are zero chances that you will get even the simplest reply pointing to some source of information.
Aside from that, you may get "mail rejected" error messages if you try to contact some major contributors of the list. It is due to the spam filtering systems used by them. Please complain about it to your ISP and don't post to the list about spam !! .
- (REG) Some people also have very aggressive mail filtering which rejects (non-list) messages from people they don't know, asking for a re-send with a password (this stops SPAM dead). If you mail to someone and receive such an automatic response, don't get upset. Remember, a person's mailbox is their personal property.
Also, some people maintain "guru lists" and only read posts on linux-kernel by someone on their guru list, other people's posts go to /dev/null. This is done because there are too many questions asked on linux-kernel which shouldn't be (which is why people should read this FAQ first!), and people can't cope with the load. If you post to the list and want to make sure a specific individual will see the message, Cc: that person.
- Why do I get bounces when I send private email to some of these people?
- (REG) Some people, like Alan Cox, bounce messages. Read this to find out why and what you can do about it.
- Who is Matti Aarnio?
- (MEA) He is principally a ZMailer hacker, and a co-postmaster of vger.kernel.org.
Sometimes he finds also cycles to hack on the kernel, and you see some patches from him. (e.g. initial work on Large File Summit; files over 2G in size, was his)
- Who is H. Peter Anvin?
- Who is Donald Becker?
- Who is Alan Cox?
- (AC) Alan Cox supervises the 2.0.34/35/36 kernel releases, works on the Mac68K port, the SGI port, 2.0 networking, modular sound, video capture and helps collect up and sort patches to the kernel. He gets to do all this and sleep because the nice guys at Red Hat pay him to hack Linux.
- Who is Richard E. Gooch?
- (REG himself) "I've written various utilities and kernel patches which you can find here including the MTRR, devfs and fastpoll patches. My PhD in Computer Science was on the topic of Astronomical Visualization , which is my current research interest. This is what I work on when I don't get distracted by kernel hacking. See myhome page to find out more about me."
- Who is Paul Gortmaker?
- (ADB, OK'ed by Paul) Paul has contributed various pieces of kernel code over the last few years, among other things the Real Time Clock driver. He is also the maintainer of the 8390 based network drivers (NE-2000, etc.), and wrote the Linux Ethernet HOWTO and the Boot-Prompt HOWTO.
- Who is Mark Lord?
- Who is Larry McVoy?
- Who is David S. Miller?
- (DSM) David Miller is mainly known for the porting work he has done, primarily for the 32-bit and 64-bit Sparc platforms although he has made significant contributions to the MIPS effort as well. He is also the current maintainer of the IP networking layer in the kernel and likes to address general performance and scalability problems all over as his time permits.
- Who is Linus Torvalds?
- Who is Theodore Y. T'so?
- (TYTSO) Theodore Ts'o has over the years written, rewritten, or supported Posix Job Control, the high level tty driver, the serial driver, the ramdisk support, e2fsck/e2fsprogs, and other bits and pieces of code in and near the kernel. He is currently a member of the Technical Board of Linux International. His day job at MIT is concerned with Kerberos and other network security and I/T architecture issues. He is also a member of the Internet Engineering Task Force, where he serves as a member of the Security Area Directorate.
- Who is Roger Wolff?
- (REW himself) "I wrote the kmalloc that still drives linux-2.0.x. I wrote the Specialix and Olicom device drivers. I currently write Linux device drivers for a living. Contact me if you need one."
Other OS developers
Rogier Wolff
(REW) suggested we add a section on OS developers who influenced/preceded the design of Linux.
- Who is Prof. Douglas Comer?
- (Prof. Comer) Dr. Douglas Comer is a full professor of Computer Science at Purdue University, where he teaches courses on operating systems and computer networks. He has written numerous research papers and textbooks, and currently heads several networking research projects.
He has been involved in TCP/IP and internetworking since the late 1970s, and is an internationally recognized authority. He designed and implemented X25NET and Cypress networks, and the Xinu operating system. He is director of the Internetworking Research Group at Purdue, editor of Software - Practice and Experience, and a former member of the Internet Architecture Board.
Dr. Comer completed the original version of Xinu (and wrote "The Xinu approach" book) in 1979. Since then, Xinu has been expanded and ported to a wide variety of platforms, including: IBM PC, Macintosh, Digital Equipment Corporation VAX and DECStation 3100, Sun Microsystems Sun 2, Sun 3 and Sparcstations, and Intel Pentium. It has been used as the basis for many research projects. Furthermore, Xinu has been used as an embedded system in products by companies such as Motorola, Mitsubishi, Hewlett-Packard, and Lexmark. There is a full TCP/IP stack, and even the original version of Xinu (for the PDP-11) supported arbitrary processes and network I/O.
- Who is Richard M. Stallman?
- (RMS) Richard Stallman is the founder of the GNU project, launched in 1984 to develop the free operating system GNU (an acronym for "GNU's Not Unix"), and thereby give computer users the freedom that most of them have lost. GNU is free software: everyone is free to copy it and redistribute it, as well as to make changes either large or small.
Today, Linux-based variants of the GNU system, based on the kernel Linux developed by Linus Torvalds, are in widespread use. There are estimated to be over 10 million users of GNU/Linux systems today.
Richard Stallman is the principal author of the GNU C Compiler, a portable optimizing compiler which was designed to support diverse architectures and multiple languages. The compiler now supports over 30 different architectures and 7 programming languages.
Stallman also wrote the GNU symbolic debugger (GDB), GNU Emacs, and various other GNU programs.
Stallman received the Grace Hopper Award from the Association for Computing Machinery for 1991 for his development of the first Emacs editor in the 1970s. In 1990 he was awarded a MacArthur Foundation fellowship, and in 1996 an honorary doctorate from the Royal Institute of Technology in Sweden. In 1998 he received the Electronic Frontier Foundation's Pioneer award along with Linus Torvalds.
- Who is Prof. Andrew Tanenbaum?
- (Prof. Tanenbaum) Andrew S. Tanenbaum has an S.B. degree from MIT and a Ph.D. from the University of California at Berkeley. He is currently a Professor of Computer Science at the Vrije Universiteit in Amsterdam, The Netherlands, where he heads the Computer Systems Group.
His current research focuses primarily on the design of wide-area distributed systems that scale to millions of users. These research projects have led to over 70 refereed papers in journals and conference proceedings. He is also the author of five books.
Prof. Tanenbaum has also produced a considerable volume of software. He was the principal architect of the Amsterdam Compiler Kit, a widely-used toolkit for writing portable compilers, and MINIX, a small UNIX-like operating system for operating systems courses.
Prof. Tanenbaum is a Fellow of the ACM, a Senior Member of the IEEE, a member of the Royal Netherlands Academy of Arts and Sciences, and winner of the ACM Karl V. Karlstrom Outstanding Educator Award.
Section 6 - CPU questions
- What is the "best" CPU for GNU/Linux?
- (REW) There is no "best" CPU. The choice of CPU always depends on your price/performance/technical requirements. On the x86 side, we have Intel, AMD, Cyrix and IDT/Centaur, with various models available. All of these work.
Besides the x86 processors, the Linux kernel runs on 68k processors, MIPS R3000 and R4000, Power PC, ARM, Alpha and Sparc processors. There are lots of different ways to build a computer around a processor. If you have an x86, they built a PC around it. Don't go around buying second hand R4000 computers because the Linux kernel runs on the R4000 processor. Check the latest Linux kernel revision to see if the specific computer you're buying is supported.
- (ADB) OK, the Linux kernel is a good start. Now, there is a huge difference between kernel support and a ready-to-install distribution. Only four architectures have widely available, reasonably homogeneous distributions: x86 (or i386), Alpha, Sparc and Power-PC. And the Alpha and Sparc distributions that exist still have some rough edges. IOW, if you don't want to spend a lot of time installing and fine-tuning GNU/Linux, and you have a limited budget, your "best" choice is an x86 machine. If you have very specific needs (e.g. a hand-held computer running Linux, where the low power ARM architecture would be the ideal choice, or a workstation dedicated to scientific applications, where an Alpha or a Sparc would provide superior performance), check the various architectures, list your specific requirements, and make a choice. Nowadays Alpha 21164 machines are much more affordable than one or two years ago, but it's certainly harder to put one together than your average PC clone.
- What is the fastest CPU for GNU/Linux?
- (REW, ADB) The CPU field is very active in terms of technological developments. New CPU models, new architectures, new manufacturing technologies keep pushing the state of the art. WRT GNU/Linux, it is a general consensus that Alpha machines usually provide the best floating point performance, when the actually shipping hardware available at any given point in time is compared (June 1998: the 21164/600).
However for non floating point applications the issue is not as clear-cut. Very high clock rate x86 machines (e.g. Pentium-II/400) provide impressive integer performance, for use in e.g. databases or Web server applications.
For 3D rendering applications you may want to consider the GNU/GPL Mesa OpenGL compatible library, which has support for some graphics accelerator chips.
Also note that some applications are not CPU bound. Check the exact bottleneck in your case.
- I want to implement the Linux kernel for CPU Hyper123, how do I get started?
- (ADB) Is Hyper123 supported by gcc, or at least is the Hyper architecture supported by gcc? Do you have a target machine with a well defined architecture? If you have answered yes to both questions proceed to REW's answer. If you have answered no to either or both, don't even bother getting started. This is a major project, not exactly the kind of thing you do over the weekend. Quoting from a SparcLinux paper by Miguel de Icaza:
"Thanks to having an international team of developers and support people, when the first Linux/SPARC distribution on CD went out we had a very strong port: a port that had taken only 22 months to engineer and complete (starting from scratch up to releasing the operating system on a bootable CD-ROM)."
- (REW) Auch. Difficult task. Besides having to write support for the processor, you will also have to write the boot sequence to get things going. And a few device drivers.
You're not running away screaming yet? Good. Make sure you get the programmers manual for Hyper123, and data sheets for all the peripheral IC's. Make sure you have the docs for the computer that you're working on (addresses, registers for the stuff on the motherboard).
After that, start on learning the processor, by writing the boot program. Try booting a simple program that says "hello world". That will also allow you to write a console device driver.
Next, there is the hard part: get Linux to compile and run on the processor. Make a new arch directory and start putting things in there that implement whatever needs implementing on your processor.
- Why is my Cyrix 6x86/L/MX/MII detected by the kernel as a Cx486?
- (RRR, ADB) Cyrix 6x86 CPUs are different in many ways from Pentium (tm) and AMD K5/K6 (tm) CPUs, so special code must be included for adequate CPU detection, setup and reporting. Cyrix 6x86 support isn't perfect in kernels 2.0.x up to 2.0.34. From 2.0.35 on things should get much better ('cause we're working on it ;) ). Similarly, late 2.1.1xx kernels should fully support the Cyrix CPUs. Please check the Linux Cyrix 6x86 HOWTO site for details and patches.
- What about those x86 CPU bugs I read about?
- (ADB) There are basically three known bugs that affect x86 processors, and each CPU design got its fair share it seems:
- The Intel Pentium F00F "Death" bug, affects ALL Pentium and Pentium MMX CPUs. Linus implemented the Intel recommended workaround for this bug a few days after the bug was first reported in the newsgroups. All recent kernels will report and workaround the bug.
- The AMD K6 "sig11" bug, affects only a few K6 revisions. Was diagnosed by Benoit Poulot-Cazajous. There is no workaround, but you can get your processor exchanged by contacting AMD. 2.2.x kernels will detect buggy K6 processors and report the problem in the kernel boot message. Recently, a new K6 bug has been reported on the linux-kernel list. Benoit is checking into it.
- The Cyrix 6x86(Classic, L, MX) "Coma" bug, affects ALL Cyrix 6x86 CPUs. I proposed a simple workaround which is implemented as a user space boot option, a few hours after the bug was reported on the linux-kernel mailing list. See the Linux Cyrix 6x86 HOWTO site for details. Cyrix was notified of the bug, and their new MII CPUs are not affected by this problem anymore.
- I grabbed the standard kernel tarball from ftp.kernel.org or some mirror of it, and it doesn't compile on the Sparc, what gives?
- (DSM) Often the Sparc port diverges due to the sheer high rate of changes which occur to that port. Also changes can happen to major interfaces in the kernel and the Sparc port is not updated at the same time. Eventually the Sparc port maintainers do try to merge all of their work into the standard tree, and at which time it will compile. In any event, trees which will compile just fine are available via two mechanisms, the vger CVS tree (accessible via read-only anonymous CVS) and pre made tarballs of known working stable or test kernel trees. Check:
- ftp://vger.kernel.org/pub/linux/README.CVS and
- ftp://vger.kernel.org/pub/linux/Sparc/kernel/v2.{0,1}/
- Does the Linux kernel execute the Halt instruction to power down the CPU?
- (REG, ADB) Yes. The Linux kernel will execute the Halt instruction when the machine is idle (check the code for the idle_task in sched.c). It has done so since the earliest i386 implementation, even though on the i386 we didn't care about power saving; it's just that halting the CPU is the Right Thing (tm) to do when there is no other task that must be run.
On the Pentium, K6 and C6 CPUs, power consumption gets automatically reduced from an average 12-24 Watts operating power down to 2-3 Watts when the processor is Halted. On the Cyrix 6x86 CPUs, Halt state power consumption can be further reduced down to 150 mw by enabling the Suspend-on-Halt feature.
Reduced power consumption means cooler, more reliable machine operation and longer component life. And it saves trees too.
- I have a non-Intel x86 CPU. What is the [best|correct] kernel config option for my CPU?
- (ADB) For 386 class machines, compile as a 386. For 486-class machines, compile as a 486.
For the Cyrix 6x86 family CPUs and the AMD K5 and K6, you should probably compile the kernel as a Pentium or PPro. The only difference between the Pentium (-M586) and PPro (-M686) compile options is in the string operations (AFAIK). The Pentium option uses a header file that breaks down the complex string opcodes into simpler operations (which are faster on the Intel Pentium and Pentium MMX).
The PPro option uses the complex opcodes, but should be slightly faster than a Pentium because of the PPro has deeper, smarter pipelines.
The same rules apply to the 6x86 family and the K5/K6, but the difference in speed is minimal between the Pentium and PPro kernel config options on these CPUs (PPro should be slightly better).
The 486 kernel config option (-M486) should not be used for anything above a 486-class CPU. This option sets code alignment options that work well on the 486, but that cause excessive NOP padding on 586 and above class machines. Usually, the 6x86 speculative execution capabilities will just optimize this padding at run time, but the NOP opcodes still take precious L1/L2 cache space (same applies to the K6; I am not 100% sure of what the K5 does).
The 386 config option (-M386) does not suffer from excessive padding, but does not produce code optimized for recent x86 CPUs either, so it is also deprecated, except for kernels included in GNU/Linux distributions which must run on the widest possible range of machines.
- What CPU types does Linux run on?
- (REG) Quite a few. Below is the list for kernel 2.4.18. Note that for some CPUs advanced development is kept outside the mainline kernel, and changes are merged into the mainline periodically. The WWW pages for these projects are listed as well.
- alpha, by DEC (now Compaq). http://www.alphalinux.org/
- ARM http://www.arm.linux.org.uk/
- Cris (AXIS) http://developer.axis.com/software/linux/
- x86 (32 bit, aka IA32, aka i386)
- x86-64 (64 bit extension to x86 by AMD) http://www.x86-64.org/
- IA64 (aka Itanium aka Itanic, by Intel and HP) http://www.linuxia64.org/
- M68K (Motorola M68000 family) http://www.linux-m68k.org/
- MIPS (32 bit and 64 bit) http://www.linux.sgi.com/
- PA-RISC (by HP) http://www.parisc-linux.org/
- Power PC 32 bit http://www.penguinppc.org/ and 64 bit http://penguinppc64.org/
- S390/S390x (IBM mainframe) http://linux.s390.org/
- SuperH (by Hitachi) http://www.linux-sh.org/
- Sparc (32 and 64 bits) http://sunsite.tut.fi/SPARCLinux/ and http://www.ultralinux.org/
- OpenRISC (unfinished) http://www.opencores.org/projects.cgi/web/or1k/openrisc_1200
- Emotion Engine (SONY Playstation 2) http://playstation2-linux.com/
- ColdFire by Motorola (incompatible derivative of MC68000) http://www.uclinux.org/ports/coldfire/
- VAX (DEC) http://linux-vax.sourceforge.net/
- TMS320 Digital Signal Processor (Texas Instrument) http://www.dsplinux.net/
- 8088 / 8086 / 80286 (INTEL) http://elks.sourceforge.net/
- ITRON (Japanese CPU used by DoCoMo in 3G mobile phones) http://www.emblix.org/english/etop.html
- General CPU http://www.cyut.edu.tw/~ckhung/resource/linux_ports.html
Section 7 - OS questions
- OS $toomuch has this Nice feature, so it must be better than GNU/Linux.
- (ADB) Sorry, but this simply means that OS $toomuch was designed with a given set of objectives and priorities, and GNU/Linux was designed with a different one. Neither is better than the other and also note that I am not referring to the respective implementations. But please, no OS comparisons on the linux-kernel list. Check the newsgroups instead, particularly comp.os.linux.advocacy which is dedicated to that kind of debate.
- Why doesn't the Linux kernel have a graphical boot screen like $toomuch OS?
- (ADB) Because it doesn't need one. You can add that feature to the boot loader code, if you want to. The Linux kernel has no graphics primitives, just like any UNIX kernel.
- The kernel in OS CTE-variant has this Nice-very-nice feature, can I port it to the Linux kernel?
- (ADB) Sure, you can do (almost) anything you want with Free Software. Oh, OS CTE-variant is not Free Software?
- How about adding feature Nice-also-very-nice to the Linux kernel?
- (ADB) You should probably read the definition of creeping featurism first. Related concepts, in increasing order of obfuscation: the KISS rule-of-thumb, the "Small is Beautiful" concept, Occam's Razor and Complexity Theory. A good book to read on these concepts as they apply to OS design is "The Mythical Man-Month" by Frederick P. Brooks, Jr.
- Are there more bugs in later versions of the Linux kernel, compared to earlier versions?
- (ADB) There are no more known bugs in later kernel versions than in earlier kernel versions. However, the Linux kernel source code has been growing at a constant rate. As a general rule, large pieces of code tend to have undetected bugs. OTOH, the core code for the Linux kernel seems to have stabilized at around 16 thousand lines of C code, according to Larry McVoy.
- (REW) I'd say more than 23 thousand lines in 2.1.x. Add together the totals from kernel, mm, arch//, subtract fpu-emulation.
- Why does the Linux kernel source code keep getting larger and larger?
- (ADB) There are four causes for this unbounded growth:
- New architectures are implemented. This is usually OK, because the code that is specific to each architecture is (in theory, at least) separate from the others. Common code doesn't grow.
- New drivers are implemented. Again, this is OK, because each driver has different source files, and those are selectively compiled in the kernel executable or built as modules according to the specified kernel configuration.
- Old code gets adequately documented. Adding comments and documentation increases the size of the source, but it's still a Good Idea (tm).
- Creeping featurism. It's generally considered a Bad Idea (tm) to keep adding more features to an already working piece of code.
- The kernel source is HUUUUGE and takes too long to download. Couldn't it be split in various tarballs?
- (REG) The kernel (as of 2.1.110) has about 1.5 million lines of code in *.c, *.h and *.S files. Of those, about 253 k lines (17%) are in the architecture-specific subdirectories, and about 815 k lines (54%) are in platform-independent drivers. If, like most people, you are only interested in i386, you could save about 230 k lines by removing the other architecture-specific trees. That is a 15% saving, which is not that much, really. The "core" kernel and filesystems take up about 433 k lines, or around 29%.
If you want to start pruning drivers away, the problem becomes much harder, since most of that code is architecture independent. Or at least, is supposed to be/will be. There is some driver code which probably should be moved to an i386-specific subdirectory, and perhaps over time it will be (it will take a lot of work!), but you need to be careful. PCI cards for example should be architecture independent. Throwing out the non i386-specific drivers will save around 97 k lines, a saving of about 6%.
But the most important argument for/against splitting the kernel sources is not about how much space/download time you could save. It's about the work involved for Linus or whoever will be putting together the kernel releases. Building tarballs (compressed tarfiles) of the whole kernel already represents a considerable amount of work; splitting it into various architecture-dependent tarballs would dramatically increase this effort and would probably pose serious maintainability problems too.
If you are really desperate for a reduced kernel, set up some automated procedure yourself, which takes the patches which are made available, applies them to a base tree and then tars up the tree into multiple components. Once you've done all this, make it available to the world as a public service. There will be others who will appreciate your efforts.
Under no circumstances should you complain to the kernel list. I promise you that Linus and the core developers will completely ignore such messages, so whinging about it is a complete waste of bandwidth. The only message on this subject that should be posted is an announcement of a new service providing split kernel sources.
- What are the licensing/copying terms on the Linux kernel?
- (RRR) In the root directory of the Linux kernel source tree (e.g. /usr/src/linux/), you will find a file COPYING. The file states that the Linux kernel is placed under the GNU General Public License (version 2), a copy of which is provided. If still in doubt, post to the appropriate forums (such as gnu.misc.discuss) or ask a lawyer, but don't ask about it on the linux-kernel list.
- What are those references to "bazaar" and "cathedral"?
- (ADB) These terms are used to describe two different development models adopted by the Free Software community, and were first coined by Eric S. Raymond. You should check his original article.
Note that Eric's article describes two among an infinite range of possible different development models. You could for example create new "Versailles", "Great Wall of China" or "Pyramid of Kheops" software development models. As long as the end result is under a GNU/GPL license, it will still be Free Software.
- What is this "World Domination" thing?
- (ADB) Geek humor? Please don't take this seriously! This is just a way of saying that there are more and more people using GNU/Linux all over the world i.e. that theFree Software movement is gaining momentum. Note that the "Free" in Free Software refers to freedom, just about the opposite of what's implied by "World Domination".
- (REW) This is a reference to an interview of Linus some years ago. After being pretty modest about the success that Linux was enjoying he concluded the interview with the remark: "Of course, what I really want is total world domination."
I've been browsing the net for the reference for this. http://www.ukuug.org/newsletter/63/[email protected] and http://www.linuxgazette.com/issue15/lg_toc15.html are close but not quite close enough.
Linus has referred back to this remark often enough.
- What are the plans for future versions of the Linux kernel?
- (ADB) Linus would be the best person to ask, but I don't know if he would have the time and patience to answer this question. There are some development issues that can be mentioned, though:
- PnP support in the kernel. Right now one can get PnP support using the isapnptools user space package and manually tuning the I/O, IRQ and DMA channel allocation, but future Linux kernels will do that for you.
- Improved SMP support.
- Improved 64 bit code support.
- Improved POSIX support.
- Improved APM support.
- Why does it show BogoMips instead of MHz in the kernel boot message?
- (ADB) On some processor architectures it is very difficult to find out the clock speed of the processor, and since the kernel does not depend on determining the MHz rating of a processor to operate correctly, MHz simply do not get calculated at boot time. OTOH, BogoMips get calculated because the kernel bases itself on BogoMips data to implement small time delays (busy loops) needed by various drivers in different circumstances. Note that neither BogoMips nor MHz measure processor performance in any way. See the BogoMips HOWTO by Wim van Dorst for an accurate description of BogoMips. Also take a look at the Linux Benchmarking HOWTO (shameless plug) if you want some basic information on Linux performance measurements.
Sometimes your BogoMips reading will vary by as much as 30%, from one kernel to another. This is due to changes in the alignment of the BogoMips calibration loop, which interacts with cache behavior. Richard B. Johnson has recently proposed a small patch that takes care of this problem.
- I installed kernel x.y.z and package foo doesn't work anymore, what should I do?
- (RRR) Check out the /usr/src/linux/Documentation/Changes and make sure you have the recommended versions (or newer) of the relevant software. This is very important.A lot of things are evolving on Linux and newer versions of the kernel may break older packages (especially on the development kernels). If you are using development kernels keep an eye for reports on the kernel list. If all else fails post a bug report (see Q/A on bug reports) to the list.
- People talk about user space vs. kernel space. What's the advantage of each?
- (REG) User space is what all user (including root) programs run in. It is fully virtual memory (i.e. normally swappable). The X server is in user space, for example. So is your shell. Kernel space is the domain of the kernel (wow!), device drivers and hardware interrupt handlers. Kernel memory is non-swapable (i.e. it's REAL RAM), and hence should be used sparingly. Also, operations performed in kernel space are not pre-emptive: this means other processes are prevented from running until the operation completes.
Some people think that it's better to implement stuff in kernel space ("so that everyone has it"). In general this is a Bad Idea (tm) (see "creeping featurism"above), since kernel space resources are more "heavy" than user space resources. For example, coding a Mandelbrot fractal generator in kernel space is a *really stupid* idea.
The job of the kernel is to provide a safe and simple interface to hardware and give different processes a fair share of the resources, and to arbitrate access to resources/hardware.
Many ideas are best implemented in user space, with perhaps the absolute minimum of kernel support. The only exceptions to this principle are where it is particularly complicated or inefficient to implement the solution in user space only. This is why filesystems are in the kernel (you *could* put them in user space implemented as daemons), because a kernel implementation is *much* faster.
Note that you can make user space memory non-swappable by using the mlock(2) system call. This is a privileged operation and should not be used trivially.
- What are threads?
- (ADB) Very shortly, threads are "lightweight" processes (see the definition of a process in the Linux Kernel book) that can run in parallel. This is a lot different from standard UNIX processes that are scheduled and run in sequential fashion. More threads information can be found here or in the excellent Linux Parallel Processing HOWTO by Professor Hank Dietz.
- (REG) When people talk about threads, they usually mean kernel threads i.e. threads that can be scheduled by the kernel. On SMP hardware, threads allow you to do things truly concurrently (this is particularly useful for large computations). However, even without SMP hardware, using threads can be good. It can allow you to divide your problems into more logical units, and have each of those run separately. For example, you could have one thread read blocking on a socket while another reads something from disk. Neither operation has to delay the other. Read "Threads Primer" by Bill Lewis and Daniel J. Berg (Prentice Hall, ISBN 0-13-443698-9).
- Can I use threads with GNU/Linux?
- (REG) Yes! The Linux kernel has the clone(2) system call, which provides the underlying mechanism for implementing a threads library. And Xavier Leroy has provided us with LinuxThreads, a POSIX 1003b implementation of threads for the Linux kernel.
If you have a libc 5 system, you'll need to install LinuxThreads if it is not already installed. You can get the LinuxThreads library here.
If you have a libc 6 (aka glibc 2) system, you shouldn't need to do anything. Glibc has LinuxThreads merged in.
- You mean threads are implemented in kernel space in GNU/Linux? Why not a hybrid kernel/user space implementation? Wouldn't that be more efficient?
- (REG) It is not clear that there is any significant benefit for Linux to have a hybrid threading library. If we look at Solaris Threads, they have a hybrid scheme, and claim that is an advantage. Well, yes, I suppose so, given their environment (the Solaris 2 kernel). They have a very heavy kernel, so a pure kernel space implementation would be too slow (remember the time it takes to enter/exit the kernel). Linux, on the other hand, has a very efficient kernel, so the difference between a kernel context switch under Linux and a user space context switch under Solaris 2 is pretty small. Also note that Solaris Threads took a long time to get right, because of problems such as signal delivery to threads. With a pure kernel threads implementation, signal delivery is much simpler. Fixing the signal delivery problems with Solaris Threads increased the complexity of their library, leading to bloat and performance loss. We don't want to make the same mistakes.
Now, you may argue that a hybrid scheme under Linux would be even better. Maybe. Prove it. Code it and benchmark it. In any case, this is a discussion that is not relevant to the kernel, since a hybrid scheme is built on top of kernel threads (Solaris 2 builds their threads on top of LWPs (Light Weight Processes) too). It's a user space issue, so please, keep it off the kernel list.
BTW: if you do manage to code something up and it is much faster than pure kernel space threads, you may need some kind of extra kernel support (depending on how you implement things). If that happens, then come and talk about it on the kernel list.
The Linux philosophy is to optimize the kernel first, so that all possible implementations can share the benefits.
- Can GNU/Linux machines be clustered?
- (REG) Different people mean different things when they talk about clustering. Some people want transparent fault tolerance and load balancing of general applications, others want parallel processing of a single job. Most people who talk about fault tolerance expect hardware and OS support of this (if a node goes down, the OS will automatically migrate the application to another node). This is not (yet) available for Linux.
You can write a fault tolerant application for a network of computers without direct OS support: you just need to structure your application appropriately. Note that a fault tolerant distributed application may also be a parallel, multithreaded application.
The Beowulf project provides an API and system management software to write parallel distributed applications on a network of Linux machines. The main emphasis here is on parallelism to get maximum processing power, although fault tolerance is possible too. An example of a Beowulf clustered Linux system is Avalon, which has just been listed among the world's 500 most powerful supercomputers.
Beowulf clusters deliver GFLOPS using arrays of commodity computers. It is an incredibly cheap and elegant way to get significant computing power for e.g. scientific applications.
- (ADB) Also check the Parallel Processing HOWTO by Professor Hank Dietz.
- (REG) In June 2000, Mission Critical Linux released Kimberlite which they describe as an "open source linux clustering cabability". Tim Burke, their Cluster Architect describes it thus:
A Kimberlite Cluster provides support for two server nodes connected to a shared SCSI or Fibre Channel storage subsystem, in an active-active failover environment. The software provides the ability to detect when either node leaves the cluster, and will automatically trigger recovery scripts which perform the procedures necessary to restart applications on the remaining node. When the node rejoins the cluster, applications can be moved back to it, manually or automatically, if required. Sample recovery scripts are provided. Kimberlite is designed to deliver the highest levels of data integrity and be extremely robust. It is suitable for deployment in any environment that requires high availability for un-modified Linux applications.
- How well does Linux scale for SMP?
- (REG) Reasonably well. Kernel version 2.2 has much better scalability than version 2.0. People are running 4 processor Intel Xeon machines and 14 processor UltraSparc machines. Version 2.2 still has a global kernel lock, but this is often released quite quickly (for example, when the process blocks waiting for a resource and/or data), so the net result is that it is quite unlikely for two processors to compete for the global lock. Experiments with 14 processor UltraSparc machines shows that Linux scales well, indicating that the current locking strategy is not hurting us for these machines.
Also consider that for parallel processing jobs, the kernel is not involved, so even Kernel v2.0 scaled well for these applications. When we talk about SMP scalability, we are referring to how many IO operations the kernel can perform at the same time.
Unfortunately some hysterical NT supporters continue to spread FUD that Linux does not scale well on SMP. Efforts to insert a bit of truth have generally fallen on deaf ears. If someone tells you that NT scales better than Linux, ignore them. They're operating in a fact-free zone. Tests indicate that NT has trouble scaling to 4 processors. There really is no competition.
Note that kernel version 2.3 has replaced the remaining global kernel lock with finer grained locking. This allows Linux to scale well to 64 processor machines and beyond.
- Can I lock a process/thread to a CPU?
- (RML) Yes, as of 2.5.8 the Linux kernel supports binding a process to a particular CPU. Patches exist for the 2.4 kernel series but are not yet merged (as of 28-APR-2002). This is called "task CPU affinity" and the interface was implemented via the following syscalls:
int sched_setaffinity(pid_t pid, unsinged long len, unsigned long *mask)
int sched_getaffinity(pid_t pid, unsinged long len, unsigned long *mask)
which set and get a given task's CPU affinity, respectively. Utilities for manipulating affinity and the patches for 2.4 are available at kernel.org. The interface allows any task's affinity to be retrieved, although only the task's uid (or root) can change the affinity. The calls assure the task has been successfully scheduled to a valid CPU before returning.
- How efficient are threads under Linux?
- (REG) Incredibly. Compared with all the other kernel-based thread implementations, Linux is probably the fastest. Each thread takes only 8 kiB of kernel memory for the stack and thread creation and context switching is very fast. I have measured less than 1 microsecond context switch times on an old Pentium/MMX 200 (seehttp://www.atnf.csiro.au/~rgooch/benchmarks/linux-scheduler.html for more details). However, the Linux scheduler is designed to work well with a small number ofrunning threads. Best results are obtained when the number of running theads equals the number of processors.
Avoid the temptation to create large numbers of threads in your application. Threads should only be used to take advantage of multiple processors or for specialised applications (i.e. low-latency real-time), not as a way of avoiding programmer effort (writing a state machine or an event callback system is quite easy). A good rule of thumb is to have up to 1.5 threads per processor and/or one thread per RT input stream. On a single processor system, a normal application would have at most two threads, over 10 threads is seriously flawed and hundreds or thousands of threads is progressively more insane.
A common request is to modify the Linux scheduler to better handle large numbers of running processes/threads. This is always rejected by the kernel developer community because it is, frankly, stupid to have large numbers of threads. Many noted and respected people will extol the virtues of large numbers of threads. They are wrong. Some languages and toolkits create a thread for each object, because it fits into a particular ideology. A thread per object may be appealing in the abstract, but is in fact inefficient in the real world. Linux is not a good computer science project. It is, however, good engineering. Understand the distinction, and you will understand why many widely acclaimed ideas in computer science are held with contempt in the Linux kernel developer community.
- How does the Linux networking/TCP stack work?
- (REG) The best guide may be found in the Linux kernel sources. A popular reference is "TCP/IP Illustrated" (volumes 1 to 3) by W. Richard Stevens, which explains much of the theory and practice behind TCP/IP. This material is based on the BSD implementation, which differs from Linux in fundamental ways. Nevertheless, it is an excellent reference.
- Can we put the networking/TCP stack into user-space?
- (REG) The short answer is no, because this would slow it down (see the monolithic versus microkernel debate for reasons why). The longer answer involves the motivations behind the question. Some people want to inspect every packet, and think it's easier to do in user-space. In fact, the kernel has a network packet filtering API (Linux Socket Filter (LSF), which is an easier-to-use implementation of the Berkeley Packet Filter (BPF)). The LSF allows you to capture some or all packets and pass them to user-space. This yields the advantages of a kernel-based networking stack, but still allows you to inspect packets in user-space if needed.
One reason people want to inspect packets is to perform firewalling. In this case, a far superior solution is available, using the Netfilter infrastructure. This is a kernel-level firewalling/NAT solution which is fast and reliable. You may create both stateful and stateless firewalling configurations. This infrastructure was introduced during the 2.3.x development cycle.
Section 8 - Compiler/binutils questions
- I downloaded the newest kernel and it doesn't even compile! What's wrong?
- (REG) First check the kernel newsflash page at http://www.atnf.csiro.au/~rgooch/linux/docs/kernel-newsflash.html where late-breaking patches may be posted.
- (DW) Do not post any details of the compile failure to the mailing list unless you have first checked the archives to ensure that the question hasn't been asked already.
Normally, if Linus allows a simple typo into a release kernel which prevents it from compiling, a patch is posted to the list within hours, yet still there are clueless idiots who continue to ask about it for weeks thereafter.
Do not do this. We will find out where you live, and we will come to your house and knock on your door at three o'clock in the morning to ask you stupid questions. Repeatedly, if needs be.
REW's note below also says this; but evidently not explicitly enough. Some people are just too stupid, I guess.
- (RRR) Make sure you are compiling with the recommended version of gcc with default optimizations flags (IOW, leave the Makefiles alone) and a recent binutils. The binutils package is the one that contains the assembler (gas) and linker (ld). See Documentation/Changes for more info. If that works then, experiment with different compiler/optimizations.
- (REW) Linus cannot test every permutation of drivers and options. He's a selfish little guy. He just compiles the version that runs on his computers, and then releases it. Actually, he sometimes even doesn't compile it before releasing. He's a busy man. Give him a break. Wait for half a day. Someone will post a patch that will fix it within that time. If that doesn't happen for more than a day, fix it yourself, and post the patch to linux-kernel. If you don't have the expertise to do this yourself, please wait for another day, before reporting the problem.
Please check if it hasn't been reported before. Most companies have a help desk that keeps the end users from bothering the developers. Linux is different: You get to talk to the developers. But don't waste everybody's time by posting stuff that has been reported already.
- (DBE) Not all ports of the Linux kernel to different hardware platforms are fully merged into the official tree at kernel.org. If you have problems compiling a kernel for a non-i386 architecture please check the related Web pages and mailing-lists for that specific port.
- What are the recommended compiler/binutils for building kernels?
- (REG) This depends on the kernel version. Until 26-OCT-2000, gcc 2.7.2.3 was the recommended compiler for all kernels. On this date, Linus announced that gcc 2.91.66 (aka egcs 1.1.2) is the recommended compiler for 2.4.x kernels up to version 2.4.9. Gcc 2.95.3 is the recommended compiler for kernel 2.4.10 and later.
The recommended binutils is 2.9.1.0.25. Avoid binutils versions from 2.8.1.0.25 to 2.9.1.0.2, these were beta releases and known to be buggy.
Always see the Documentation/Changes file for details.
- Why the recommended compiler? I like xyz-compiler better.
- (RRR) Quick Answer: it's what Linus uses. Real Answer: the recommended compiler has been extensively tested and proven to be a very stable compiler. What is at issue is not whether other compilers can optimize better, but whether they will compile the kernel correctly. Current kernels and compilers are very complex pieces of software. There are just too many ways that the two can interact and cause trouble (a recent example: gcc 2.8.x and kernel 2.0.x). By keeping constant one of the variables - the compiler - kernel developers can concentrate on the kernel. If both the compiler and kernel are changing then it's anyone's guess what went wrong.
- Can I compile the kernel with gcc 2.8.x, egcs, (add your xyz compiler here)? What about optimizations? How do I get to use -O99, etc.?
- (RRR) Sure, it's your kernel. But if it doesn't work, you get to fix it. Seriously now, there is really no point in compiling a production kernel with an experimental compiler. Production kernels should only be compiled with the recommended compiler. Newer compilers are known to break the 2.0 series kernels, known symptoms of this breakage are hwclock and the X server seg.faulting.
Compiling a 2.0 kernel with egcs or gcc 2.8, even after applying the workaround of copying the ioport.c file from a late 2.1 kernel to 2.0, is not recommended and will inevitably lead to unpredictable kernel behaviour.
Regarding 2.1 kernels, they usually compile fine with other compiler versions, but do NOT complain to the list if your are not using the recommended compiler. Linux developers have enough work tracking kernel bugs, to also be swamped with compiler related bugs.
If you want to play with the optimization options, you need to hack the Makefile in arch/i386/Makefile (assuming you have an x86 processor), but if it breaks... well, you should know the answer by now. Also keep in mind that some demented optimizations (such as -O99) may even produce slower and bigger kernels, due to gratuitous loop unrolling and function inlining.
- (ADB) I think the standard Paul Gortmaker disclaimer (?) is: "If it breaks, you get to keep the pieces." ;-)
- I compiled the kernel with xyz compiler and get the following warnings/errors/strange-behavior, should I post a bug report to the list? Should I post a patch?
- (RRR) In general, no, unless you get these with the recommended compiler. Few exceptions:
Everyone welcomes code cleanup patches, for instance, newer compilers may complain a lot more. Some of these warnings may even be warranted (i.e. ambiguous use of else statements), fixing these is a good thing.
There could be some aging code around that makes too many assumptions about the compiler (especially true about inline assembly), some of the newer compilers break these statements. Fixing these is also a good thing, but be very sure you're are really fixing a bug in the kernel. Workarounds for other compilers will be ignored (if the compiler is buggy, fix the compiler!).
- Why does my kernel compilation stops at random locations with: "Internal compiler error: program cc1 caught fatal signal 11."?
- (REW) Sometimes bad hardware causes this. Read the Web page at http://www.BitWizard.nl/sig11/ about this. The important word here is random. If it stops at the same place every time, the kernel source might have a glitch or your compiler might be bad. The Web page is mostly about the random error source: hardware. There is a bunch of different error messages that you can get if you have bad or marginal hardware.
- (ADB) Overclocked processors very often fail long compilations with a sig11, because a long gcc compilation puts more strain on the processor. As the processor heats up, it may attain a point where internal timings get out of spec. At this point, something gives and you get a sig11.
Also, some old K6 revisions would sig11 when compiling large programs if > 32 Mb of RAM were installed on the Linux box. AMD will exchange these faulty processors for free. Benoit Poulot-Cazajous correctly diagnosed the problem and devised an ingenious test for this bug that is run at boot time in 2.2.x kernels.
- What compiler flags should I use to compile modules?
- (REG) At the very least, you need these: -O2 -DMODULE -D__KERNEL__ -DLINUX -Dlinux
- (KO) I don't advise compiling modules by hand if the directory is in the kernel source tree. The rest of the Makefile system will not know about the extra modules so it will not recompile them if the config changes nor will it install the modules. The best method (until the 2.5 Makefile rewrite) is to add the directory into the kernel Makefile system.
Create a kernel Makefile in your new directory. Example: #
# Example Makefile for your own modules
#
SUB_DIRS :=
MOD_SUB_DIRS := $(SUB_DIRS)
ALL_SUB_DIRS := $(SUB_DIRS)
M_OBJS := example-module1.o example-module2.o
include $(TOPDIR)/Rules.make
Edit the Makefile in the parent directory to add your subdirectory to the SUB_DIRS list. make dep, make modules and make modules_install will automatically handle your modules.
- (VKh) If you have a local makefile with which you wish to build your module not linked under the kernel tree in the proper way, you still can "ride" on the master Makefile.
This way one can eliminate the dependency on your particular machine kernel compilation options to be hardwired in the local Makefile. I.e., once you reconfigure the kernel, your driver will compile itself when you do a local "make" with the correct set of the new flags.
This is what you can do on 2.2 (Makefile excerpt follows):
EXTRA_CFLAGS := -DDEBUG -DLINUX -I/usr/src/foo/include
MI_OBJS := your-module.o
O_TARGET := your-module.o
O_OBJS := your1.o your2.o
# Reuse Linux kernel master makefile on this directory
ifdef MAKING_MODULES
include $(TOPDIR)/Rules.make
else
all::
cd '/usr/src/linux' && make modules SUBDIRS=$(PWD)
endif
In 2.4 the syntax is different. Rename MI_OBJS
to obj-m
and O_OBJS
to obj-y
to achieve the same goal there: obj-m := your-module.o
O_TARGET := your-module.o
obj-y := your1.o your2.o
- Why do I get unresolved symbols like foo__ver_foo in modules?
- Why do I get unresolved symbols with __bad_ in the name?
- (REG) This is an indication that a function has been called with an invalid parameter. In some cases, these invalid parameters can be detected at compile time (through clever use of preprocessor tricks), so the preprocessor will modify the called function name into an invalid one. This will prevent the final link stage from completing (or will prevent the module from loading).
OK, so now that you know why, go forth and pester the maintainer of the section of code that is making the invalid function call. You should check the CREDITS andMAINTAINERS files to determine the maintainer.
Section 9 - Feature specific questions
- GNU/Linux Y2K compliance?
- (ADB) Y2K compliance under GNU/Linux is a multi-level problem.
- Applications. Check your application sources for routines that only operate on/test the last two digits of the year field/variable(s). Obviously the problem here is that 2000 > 1999, but 00 < 99. Unfortunately, poor programming practices are just as common and unavoidable as death and taxes...
- Libraries. Libc5 and glibc are known to be Y2K compliant. Alan Cox mentioned that libc4 had some problems.
- Kernel. The Linux kernel is y2k compliant. BTW the code snippet in the /arch/i386/kernel/time.c will force those non-y2k compliant RTC implementations to the correct date on 00:00:00 Jan 1, 2000. It's been there for quite some time, now, nice and quiet; added by Alan Modra circa 1994!
- BIOS. On x86 PC machines, upon boot some BIOS's will wrap back to 1900, later versions will correctly wrap the RTC clock to 2000. This is a rather critical problem in embedded systems if they are not running Linux; if they are running Linux this is solved by Alan Modra's code snippet mentioned above. :-)
- Hardware. The standard PC RTC chip will not wrap the century. Wrapping must be done in software/BIOS. The chip will store the century data, but it just won't increment it on 00:00:00 Jan 1, 2000. Same issue as BIOS WRT embedded systems.
Testing the kernel, the BIOS and the RTC hardware is relatively easy if you are allowed to reboot the machine; just enter the CMOS setup routine and set the time to Dec 31 1999, 23:58:00. Boot and check what happens.
Checking applications and libraries takes a lot more work... Specially checking applications when you don't have access to the source code :( The only way is simulation. But this is getting off topic: if you don't have access to the source code, then it's not relevant to GNU/Linux. ;)
- What is the maximum file size supported under ext2fs? 2 GB?
- (REW, AC) In the 2.0.x kernels, maximum file size (not to be confused with partition sizes, which can be much larger) under ext2fs is 2GB. Larger files are only supported on 64-bit architectures (Alpha and UltraSPARC) in late 2.1.1xx kernels.
Files larger than 2GB are difficult to support on 32-bit architectures. This will probably be implemented in the 2.3 kernel series.
- GGI/KGI or the Graphics Interface in Kernel Space debate?
- (REG, ADB) GGI/KDI information can be found here. The GGI/KGI developers warn against useless debates on the kernel list.
- How do I get more than 16 SCSI disks?
- (REG) Get kernel version 2.2.0 or later.
- What's devfs and why is it a Good Idea (tm)?
- (REG) OK, pushing my own barrow here. Devfs allows device drivers to have a direct link with device special files (what you see in /dev). The current dependence on major/minor numbers to provide this link poses scalability and performance problems. Devfs also only has device nodes for devices that you have available. Read thedevfs FAQ for more details. Note that devfs went into the official 2.3.46 kernel.
- Linux memory management? Zone allocation?
- (ADB) Rik van Riel has setup a nice page on Linux memory management. It has a link to an excellent tutorial on virtual memory.
- How many open files can I have?
- (REG) With kernels 2.0.x you can have 256 open FDs (file descriptors). With 2.2.x you can have 1024. Various patches exist which allow you to increase these limits. Note that this can break select(2).
- When will the Linux accept(2) bug be fixed?
- (REG) Firstly, this is not a bug in the Linux kernel, despite the fact that the "Sendmail 8.9.0 Known Bugs List" states there is a bug with Linux accept(2). The Linux accept(2) call can return the ETIMEDOUT error when there are system resource problems. This is not wrong, just different from what Sendmail expects. Since accept(2) is not part of the POSIX standard, it cannot be claimed that Linux is violating it. I'm told that the Single UNIX Specification, Version 2 (SUSv2), which is much newer, implicitly prohibits ETIMEDOUT. Nevertheless, the networking hackers are not inclined to change this behaviour. They seem to prefer to follow POSIX in this, perhaps following the maxim the great thing about standards is that there are so many to choose from. Note also that BSD documents slightly different behaviour from SUSv2. It is prudent for an application to deal gracefully with unexpected error codes.
- What about STREAMS? I noticed Caldera has a STREAMS package, when will that go in the kernel source proper?
- (REG) STREAMS allow you to "push" filters onto a network stack. The idea is that you can have a very primitive network stream of data, and then "push" a filter ("module") that implements TCP/IP or whatever on top of that. Conceptually, this is very nice, as it allows clean separation of your protocol layers. Unfortunately, implementing STREAMS poses many performance problems. Some Unix STREAMS based server telnet implementations even ran the data up to user space and back down again to a pseudo-tty driver, which is very inefficient.
STREAMS will never be available in the standard Linux kernel, it will remain a separate implementation with some add-on kernel support (that comes with the STREAMS package). Linus and his networking gurus are unanimous in their decision to keep STREAMS out of the kernel. They have stated several times on the kernel list when this topic comes up that even optional support will not be included.
- (REW, quoting Larry McVoy) "It's too bad, I can see why some people think they are cool, but the performance cost - both on uniprocessors and even more so on SMP boxes - is way too high for STREAMS to ever get added to the Linux kernel."
Please stop asking for them, we have agreement amongst the head guy, the networking guys, and the fringe folks like myself that they aren't going in.
- (REG, quoting Dave Grothe, the STREAMS guy) STREAMS is a good framework for implementing complex and/or deep protocol stacks having nothing to do with TCP/IP, such as SNA. It trades some efficiency for flexibility. You may find the Linux STREAMS package (LiS) to be quite useful if you need to port protocol drivers from Solaris or UnixWare, as Caldera did.
The Linux STREAMS (LiS) package is available for download if you want to use STREAMS for Linux. The following site also contains a dissenting view, which supports STREAMS.
- I need encryption and steganography*. Why isn't it in the kernel?
- (TJ) Note that this section was written in 2000/2001, and the laws in various countries have changed since then. Updates would be appreciated.
In France and Russia, strong encryption is essentially illegal, using it there requires a license which is seldom granted. The United States has cumbersome restrictions on exporting such software (it's considered a "munition"--see http://www.epic.org/crypto/export_controls/ ). Having these features in the standard kernel would therefore cause great inconvenience to people in those countries. However, separate programs and patches to the kernel are available at:
- ftp://ftp.csua.berkeley.edu/pub/cypherpunks/filesystems/linux/
- http://www.freeswan.org/
- http://www.quick.com.au/ftp/pub/sjg/
- http://www.ssh.org/
- http://web.mit.edu/kerberos/www/
- http://tcfs.dia.unisa.it/
- ftp://ftp.tik.ee.ethz.ch/pub/packages/skip/
(*) Steganography is disguising sensitive data as noise in a digitized image, sound file, or the like.
- How about an undelete facility in the kernel?
- (REG) This idea keeps being raised again and again. There is no need for kernel support to do this. You can easily do it in user space. There are replacement versions of the rm utility which will move/copy files to a wastebasket area instead of actually deleting. If you're really keen, you could implement a wrapper for the unlink system call, and use LD_PRELOAD to override the function for all applications. This has been done by Manuel Arriaga and is called "libtrash". It is available at: http://m-arriaga.net/software/libtrash/
- How about tmpfs for Linux?
- (REG) The 2.4 series kernels have introduced a tmpfs. The old SysV shared memory code has been replaced with a new shm file-system, which is much simpler and cleaner, thanks to the improved VFS. Since the shm code can be shared to create a tmpfs, this was done. You may find tmpfs useful if you have an embedded system which has the root file-system on a read-only media but needs a writable file-system.
- (REG) Prior to the introduction of tmpfs, many people asked for its development, on the grounds that it would be faster than /tmp in a conventional file-system. This was never considered a valid reason for tmpfs development, because the Linux ext2 filesystem is so good that it outperforms tmpfs (memory-based filesystems) in other operating systems. Jim Nance ([email protected]) has posted a comparison to linux-kernel. Here is an extract of his message:
The original question is enough of an FAQ that I thought it would be
good to have real numbers rather than just my assurances that Linux
has a fast FS layer. Therefore I wrote a benchmarking program that
creates/writes/destroys files and ran it under several operating
systems and on several types of file systems. I have included that
program as an attachment to this mail. Here are the results:
OS Hardware FS Type Loops/Second
--------------------------------------------------------------------
Linux 2.2.5-ac6 1 nfs 16.33
Linux 2.2.5-ac6 1 arla 73.67
Linux 2.2.5-ac6 1 ext2 15383.32
Solaris 2.6 2 afs 71.33
Solaris 2.6 2 nfs 10.00
Solaris 2.6 2 ufs 23.67
Solaris 2.6 2 tmpfs 9162.32
Digital Unix 4.0D 3 afs 49.33
Digital Unix 4.0D 3 nfs 14.67
Digital Unix 4.0D 3 ufs 28.67
Digital Unix 4.0D 3 memfs 3062.66
Linux 2.0.33 4 afs 69.33
Linux 2.0.33 4 nfs 15.00
Linux 2.0.33 4 ext2 2218.33
Hardware:
1 -> 333 MHz PII, 512M ram, Compaq WDE4360W disk
2 -> Ultra450 class Sun server (300MHz?)
3 -> Personal Workstation 600 AU. 600 MHz alpha. 1.5G ram
4 -> 75 MHz Pentium, 32M ram, Segate ST31200N disk
Notice how Linux writting to an ext2 file system is significantly
faster than any other OS/FS combination. The next closest is Solaris
writting to tmpfs, and its still far behind ext2. It's also good to
notice how slow both Solaris and Digital Unix are on their local file
systems. This is probably why both have a ram base file system.
Please note that this benchmark is intended to measure the time it
takes to create and delete files, which is expensive on most non-linux
systems. It does not indicate anything about the data I/O rate to an
existing file.
It would be interesting to see a comparison between Linux ext2fs and tmpfs.
- (REG, by Adam Sulmicki) If after reading all the above you still feel you need tmpfs, and you're stuck in the stone age with a 2.2 kernel, read on. However, keep in mind it is more of a hack than true tmpfs.
The magic way to do it is:
- compile ramdisk support into kernel, the option is: CONFIG_BLK_DEV_RAM=y
- Run the following command to create 2mb ext2 ram disk: /sbin/mke2fs -vm0 /dev/ram 2048
- mount it: mount /dev/ram /tmp
And you are done.
- What is the maximum file size/filesystem size?
- (REG) Maximum file size depends on the block size on your filesystem. For ext2 (and UFS, SysVFS and similar filesystems), the limits are:
Block size Maximum file size (GiBytes)
512 B 2
1 kiB 16
2 kiB 128
4 kiB 1024
8 kiB 8192 (PAGE_SIZE must be >= 8 kiB)
plus a small amount. The limitation is due to the classic triply-indirect addressing scheme. In the future, ext2 will have extent-based addressing, which will overcome this problem. The limit for a single filesystem (partition) on a 32 bit CPU is 4 Gi blocks. Each block is 512 Bytes, so that works out to 2 TiB. For 64 bit CPUs, the limits are bigger than you can imagine.
- Linux uses lots of swap while I still have stuff in cache. Isn't this wrong?
- (MRW) Not really. Linux will page out processes which haven't been used for a long time (e.g. lpd on many systems) in favour of retaining data from files which have been used recently (e.g. header files while compiling a big program). This is more efficient. Trust us, we're engineers.
- Why don't we add resource forks/streams to Linux filesystems like NT has?
- (REG) Resource forks (aka "named streams") are a way of storing multiple "streams" of data in a file. Each stream may be read, written and seeked in just like in files with only one stream of data. Resource forks are used to store ancillary data with files (such as which icon to display for the file when using a graphical filemanager). These extra streams of data may be manipulated by any user who has write access to the file, just as the "primary" stream can be manipulated.
Unix only supports one "stream" of data per file. Adding support for multiple streams to the Linux kernel is not considered to be especially difficult. However, files with multiple data streams would break a large number of user-space programmes (which currently only manipulate the "primary" stream) and protocols (such as ftp, http, email, NFS and many more). A number of new utilities would need to be written, and a large number of shell scripts would have to be audited for correctness in a multiple-stream world. Because of this massive breakage, many kernel developers consider resource forks to be a bad idea.
Rather than add kernel support, a user-space library could be written which provided easy management of multiple steams of data for applications, while still storing the data in a single Unix file. If someone wants to write such a library, please do so. Once it's completed, send an email to the FAQ maintainer.
Note that the GNUstep/Foundation library has the NSBundle class, which provides this functionality. A number of APIs to this class for different languages are available:
- Objective-C has GNUstep at: http://www.gnustep.org/
- Java has JIGS at: http://www.gnustep.it/jigs/
- Smalltalk has StepTalk at: http://www.gnustep.org/experience/StepTalk.html
- Scheme has gstep-guile at: http://www.tiptree.demon.co.uk/gstep/guile/gstep_guile_toc.html
Note that a separate problem is the storage of "extended attributes". These are attributes like file permissions (such as ACLs and POSIX capability sets), which have limited size, and tend to be read and written atomically (i.e. you can't read or write part of the attribute nor seek in it). These usually require special privileges to modify. Also, you normally don't want to copy these attributes when copying files around, thus these extended attributes don't present the problems of massive breakage that resource forks would.
- Why don't we internationalise kernel messages?
- (REG) There are several reasons why this should not be done:
- It would bloat the kernel sources
- It would drastically increase the cost of maintaining the kernel message database
- Kernel message output would slow down
- English is the language in which the kernel sources are written, and thus is the language in which kernel messages are written. Developers cannot be expected to provide translations
- Bug reports should be submitted in English, and that includes kernel messages. If kernel messages were to be output in some other language, most developers could not help in fixing bugs
- Translation can be performed in user-space, there is no need to change the kernel
- It would bloat the kernel sources
Finally, it will not be done. No core developer supports this. Neither does Linus. Don't even ask.
Section 10 - "What's changed between kernels 2.0.x and 2.2.x" questions
- Size (source and executable)?
- (REW) I use the following to quickly estimate the size of a project:
cat `find . -name /*.c -o -name /*.h -o -name /*.S `| wc -l
I get 811985 (lines of code, including comments and blank lines) when I run this on the 2.0.33 kernel source, and 1460508 when I run this on a 1.0.106 kernel.
This means that the Linux kernel qualifies as an "extremely large" software product, requiring the effort of 200 to 500 programmers for 5 to 10 years. [Richard Farley: Software engineering concepts, Mc Graw-Hill, 1985, page 11].
Actually, the Linux kernel is now 7 years old, and has seriously involved 100 to 1000 programmers. (i.e. not counting those that have contributed a "one line fix"). This is my personal guess, so feel free to disagree or tell me otherwise.
- (ADB) I can't compare actual kernel footprints of 2.0.x vs. 2.1.x, but I think it's worth mentionning that 2.1.x kernels have the ability to "jettison" kernel initialization code, freeing the corresponding physical memory. So, even though the executable is certainly larger for 2.1.x kernels, you may actually get a smaller memory footprint.
- Can I use a 2.2.x kernel with a distribution based on a 2.0.x kernel?
- (REW) Yes. However some applications may need upgrading. Read /usr/src/linux/Documentation/Changes before you complain about something not working. Also note that the 2.1.(x+1) kernel may need a different set of upgrades than 2.1.x, so you should check the Changes file every single time you upgrade your Linux kernel.
- New filesystems supported?
- NTFS (read-only). Allows read-only access to Windows NT (tm) partitions.
- Coda. Coda is an advanced experimental distributed file system with features such as server replication and disconnected operation for laptops. Note that Coda is also available for 2.0.x kernels as an add-on package. Check the Coda Web site for more information.
- Performance?
- (REG) Here are some performance optimizations which are only available on 2.2.x kernels:
- MTRRs. MTRRs are registers in PPro and Pentium II CPUs which define memory regions with distinct properties. The default mode for PCI memory accesses is "uncacheable" which means memory and I/O addresses on a PCI peripheral are not cached. For linear frame buffers, a better mode is "write-combining" which allows the CPU to re-order and slightly delay writes to memory so that they can be done in blocks. If you are writing to the PCI bus, you then use PCI burst mode transfers, which are a few times faster.
- Finer grained locking. Most instances of the global SMP spinlock have been replaced with finer grained locking. This gives much better concurrency.
- User buffer checks. Replaced the old, painful way of checking if user buffers passed to syscalls were legal by a kernel exception handler. The kernel now assumes a buffer is OK. If not, an exception handler catches the fault and returns -EFAULT to user space. The advantage is that legal buffers no longer need to be carefully checked, which is much faster. The old scheme was also suffering from race conditions under SMP.
- New directory entry cache (dcache). This makes file lookups much faster.
Example: time find /usr -name gcc -print
2.1.104: cold cache: 0.180u 0.460s 0:15.02 4.2% 0+0k 0+0io 85pf+0w
2.1.104: warm cache: 0.100u 0.150s 0:00.25 100.0% 0+0k 0+0io 72pf+0w
2.0.33: cold cache: 0.100u 0.660s 0:14.87 5.1% 0+0k 0+0io 85pf+0w
2.0.33: warm cache: 0.090u 0.600s 0:00.69 100.0% 0+0k 0+0io 72pf+0w
Note /usr had 17750 files/directories. We see how with a cold cache (no disc blocks cached) there is very little difference. However, once the cache is warm, we see a fourfold reduction in system time. This is because inode lookups are not needed when a dcache entry is available. Tests performed on a Pentium/MMX 200.
- New drivers not available under 2.0.x?
- (XXX) Please add your answer here...
- What are those __initxxx macros?
- (KGB) __initfunc() for example is a macro used to put its first argument (a function) into a special ELF section that is dropped from memory once drivers's initialization is over.
So if you write an initialization function, whose code will never be used again after your driver is initialized, you can use __initfunc() around its declaration in order to reduce your kernel memory footprint by a few KB of memory. Similarly, __initdata() is used for variables, arrays, strings, etc. For implementation details and examples please consult the file include/linux/init.h from a 2.2.x source tree.
The main idea here is that the kernel memory is not swappable. Jettisoning useless code represents a nice way to save RAM.
- I have seen many posts on a "Memory Rusting Effect". Under what circumstances/why does it occur?
- Why does ifconfig show incorrect statistics with 2.2.x kernels?
- (TJ) This is in linux/Documentation/Changes that comes with the kernel sources:
"For support for new features like IPv6, upgrade to the latest
net-tools. This will also fix other problems. For example,
the format of /proc/net/dev changed; as a result, an older ifconfig
will incorrectly report errors."
- My pseudo-tty devices don't work any more. What happened?
- (TJ) Support for ptys using a major number of 4 was dropped in Linux 2.1.115. Replace your device files with ones using the new major numbers, 2 and 3. They will work with later 1.3 versions of Linux, and any 2.x version.
- (REG) If you use devfs, then this problem magically goes away.
- Can I use Unix 98 ptys?
- (TJ, with much information provided by H. Peter Anvin) Yes, but only if you have a kernel and libc which support them, and if your applications are written and compiled to use them. They will be supported by Linux 2.2 and glibc 2.1. This is in Documentation/Changes that comes with the kernel sources.
There is also the new standalone libpt by Duncan Simpson which implements the Unix98 PTY API independently of libc (check the Incoming directory on metalab.unc.edu/Linux and mirrors). You still need to have your apps compiled to use this API, of course.
- Capabilities?
- (TJ) There's a FAQ on capabilities under Linux at ftp://ftp.guardian.no/pub/free/linux/capabilities/capfaq.txt.
- Kernel API changes
- (REG) Some parts of the kernel API (programming interface) have changed from v2.0 to v2.2. This is relevant to the authors of 3rd party device drivers, filesystems and other code. So called "3rd party" code is any kernel code which is not distributed with the official kernel tarball that Linus distributes. A quick reference for programmers wishing to port their code to v2.2 is available here. Note that this document is not relevant for programmes running in user space.
If you want to port your drivers to the 2.4 series kernel, then read this, which tells you how to port code from 2.2 to 2.4.
Section 11 - Primer documents
- What's a primer document and why should I read it first?
- (REG) From time to time various technical debates start on the linux kernel list. Some of these are about quite important topics, however often these debates are repeated every few months or so and much of the same ground is covered each time around. Other times, questions about how some part of the Linux kernel works are posted. Often we see the same old questions time and time again. Don't get me wrong: these are often reasonable questions, it's just that seeing them over and over is something we'd rather avoid.
This section has some primer document links on various topics that should be read before starting a debate or posing a question (which itself can lead to a debate). This is not an attempt to censor debate, rather, it's an attempt to get you familiar with the current arguments so that you can contribute something new without going over old ground. If it's just a question you have, hopefully we can explain it clearly once, in a single document, and then point everybody to it.
- How about having I/O completion ports?
- (REG) The existing UNIX semantics - select(2) and poll(2) - for polling for activity on FDs do not scale very well: the overhead is too high with large numbers of FDs. Here is a primer document which explains some of the problems and explores some solutions.
- What is the VFS and how does it work?
- (REG) The VFS (Virtual FileSystem or Virtual Filesystem Switch, depending on who you talk to) is basically the Linux filesystem layer. It incorporates the dentry cache and standard UNIX file semantics. It also contains a "switch" to specific filesystem types (ext2, vfat, iso9660 and so on), which is why Linux supports so many different filesystems. Read this VFS primer document if you want to know more.
- What's the Linux kernel's notion of time?
- (ADB) I have tried to put together some information on this topic, which you can find here. Colin Plumb is working on new code for the Linux kernel software clock.
- Is there any magic in /proc/scsi that I can use to rescan the SCSI bus?
- (TJ) The text below is from drivers/scsi/scsi.c.
/*
* Usage: echo "scsi add-single-device 0 1 2 3" >/proc/scsi/scsi
* with "0 1 2 3" replaced by your "Host Channel Id Lun".
* Consider this feature BETA.
* CAUTION: This is not for hotplugging your peripherals. As
* SCSI was not designed for this you could damage your
* hardware !
* However perhaps it is legal to switch on an
* already connected device. It is perhaps not
* guaranteed this device doesn't corrupt an ongoing data transfer.
*/
For a typical discussion of this topic, see http://jpj.net/~trevor/linux/rescan_scsi.txt.gz.
Section 12 - Kernel Programming Questions
- When is cli() needed?
- (ADB) cli() is a kernel wide function that disables maskable interrupts, whereas sti() is the equivalent function that enables maskable interrupts. Some routines must be run with interrupts disabled, because some peripherals need a guaranteed access sequence, or because the routine is not reentrant and could be reentered from an interrupt, etc. You should never use cli() in a user space program/daemon.
- (REW) The use of cli() is no longer encouraged. On a single processor, this simply clears an internal CPU flag, which is ANDed with the Maskable Interrupt Request pin. On SMP systems it is quite troublesome to keep ALL processors from servicing interrupts if one processor wants to do something uninterrupted. Currently we try to do locking on a much finer scale. For example, you should put a spinlock on the record that describes THIS INSTANCE of the device that needs the handling without accesses to other registers (e.g. from the interrupt routine). Besides preventing the overhead of trying to keep the other CPUs from handling interrupts, this allows the other CPUs to service interrupts from a second card of the same type in the same machine.
- Why do I see sometimes a cli()-sti() pair, and sometimes a save_flags-cli()-restore_flags sequence?
- (RRR) The cli()-sti() pair assumes that interrupts were enabled when execution of the code began, and thus proceeds to reenable them at the end. The save_flags-cli-restore_flags sequence doesn't make this assumption. Since the interrupt flag is one of the flags saved by save_flags(), it will be correctly restored to its previous state by restore_flags(). This is critical for code that may be called with interrupts either on or off.
Using save_flags-cli-restore_flags does incur in a very slight overhead as compared to the cli()-sti) pair, which may be significant for speed critical code (apart from being superfluous if it's known a priori that the code will never be called with interrupts off).
- (REG) Note that on UP systems cli(), sti() and restore_flags() operate immediately. However, on SMP systems, these functions may have to wait for the global IRQ lock (when another CPU has disabled interrupts). Other than this difference, these functions are SMP safe. It is also safe to call cli() multiple times on one CPU: the global IRQ lock is only grabbed the first time.
- Can I call printk() when interrupts are disabled?
- (REG) Yes, you can, although you should be careful. Older kernels had the infamous cli()-sti() pair in printk(), so you would get enabled interrupts when returning from printk(), whether printk() was called with interrupts disabled or enabled; whereas recent kernels (e.g. 2.1.107) restore the flags when printk() is finished. You have to know which version of the kernel you are coding for. Read the Source, Luke. Also note that in 2.2.x kernels, printk() grabs a spinlock for SMP machines to avoid any possible deadlocks.
- What is the exact purpose of start_bh_atomic() and end_bh_atomic()?
- (REG, quoting Krzysztof G. Baranowski) To protect your code from being interrupted by a bottom half handler. It is mostly used in syscalls and functions called from userspace and is better than cli/sti pair, because most of the time there is no need to mask interrupts on hardware level..
- Is it safe to grab the global kernel lock multiple times?
- (REG) Yes. The global kernel lock is recursive per process. That means a process can grab the lock multiple times and not deadlock. The lock is released whenunlock_kernel() is called as many times as lock_kernel() was called.
- When do I need to initialise variables?
- (REG) All variables should be initialised (implicitly or explicitly) before they are read from. Automatic variables are placed on the stack, and thus will have a random initial value. This means that you need to manually initialise them.
Static variables are placed in the .bss section, which is initialised to zero by the kernel (at the start of the boot sequence). If the initial value of a static variable should be zero, you don't need to do anything. If it should be a non-zero value, you will need to initialise it. Note that you should not explicitly initialise a static variable to zero, as this will increase the size of the kernel image, which causes problems for embedded systems.
Section 13 - Mysterious kernel messages
- What exactly does a "Socket destroy delayed" mean?
- (TJ, from a post by Henner Eisen) Sometimes you may get:
Jul 25 22:14:02 zero kernel: Socket destroy delayed (r=212 w=0)
in /var/log/messages.
It means that the kernel cannot free the internal data structures associated with a released socket because there are still socket data buffers (in the above case 212 bytes read memory) accounted to the socket. For this reason, destroying is delayed and tried again later. At some point, after the remaining sk_buffs accounted to the socket are freed, destroying should succeed. Also:
It keeps spitting that out about every 5 seconds or so. the only way to fix it is to reboot. It doesn't happen very often, but I'd like to find out what's causing it.
This might indicate a problem that some kernel entity (i.e protocol module or network device driver), which is responsible for freeing an sk_buff, fails to do so. To help tracking down the problem, try to find out under which circumstances the messages start to appear (in particular, which program closed a socket right before the messages appears, which network protocol does it use, which network device drivers are involved).
- What do I do about "inconsistent MTRRs"?
- (REG) Sometimes you may get:
mtrr: your CPUs had inconsistent ... MTRR settings
mtrr: probably your BIOS does not setup all CPUs In English, using "had" as past or past perfect tense commonly implies that the condition no longer exists. While it isn't absolutely proper, it is very common. The MTRRs were inconsistent, but they aren't anymore. The kernel fixed them up. Everything is fine now.
- Why does my kernel report lots of "DriveStatusError BadCRC" messages?
- Why does my kernel report lots of "APIC error" messages?
- (REG, contributed by Mark Hahn) You may get messages like: APIC error on CPU1: 00(08).
APIC is the hardware that ia32 systems use to communicate between CPUs to handle low-level events like interrupts and TLB flushes. APIC messages are checksummed, and automatically retried when they fail. This message indicates that a transaction failed; it's only a problem when there are many of them. The APIC checksum is quite weak, so even a few failures is a cause for concern, since it implies that some corruption has likely gone undetected. Assuming you're not forcing your motherboard to use an invalid system clock (i.e. AGP other than 66 MHz), this is strictly a physical design flaw in your motherboard. The Abit BP6 is notorious for this flaw, but it's not unheard of on other boards (such as the Gigabyte BXD), and it's possible on any board that uses APICs.
You can force the kernel not to use APIC like this with the "noapic" kernel option. This also forces CPU0 to handle all interrupts.
Section 14 - Odd kernel behaviour
- Why is kapmd using so much CPU time?
- (REG) Don't worry, it's not stealing valuable CPU time from other processes. It's just consuming idle cycles (normally charged to the idle task, which is displayed differently in top).
Normally, when your system is idle, the system idle task is run, and this is shown as idle time (i.e. the "unused" CPU time is not charged to a specific process). With APM (Advanced Power Management), a special idle task (kapmd) is required so that greater power saving techniques can be enabled. So now, the "unused" CPU time is charged to the kapmd task instead.
- Why does the 2.4 kernel report Connection refused when connecting to sites which work fine with earlier kernels?
- Why does the kernel now report zero shared memory?
- (REG, contributed by Erik Mouw) Yes, the processes still share memory, but due to changes to the VM in 2.4 it became too CPU intensive to calculate the total amount of shared memory. In order not to break the userland tools, the "MemShared" field in /proc/meminfo was set to 0.
- Why does lsmod report a use count of -1 for some modules? Is this a bug?
- (REW) There are several possibilities. First:
- (DW) No, this is not necessarily a bug. A module may report a use count of -1 if it has a can_unload function, which is called when necessary by the system to determine if it is safe to unload the module.
- (REW) But then again, it could be a bug anyway. In that case, you'd normally see the usage count at 0 (or more when it's actually used), and when "something" happens, the usage may drop below zero. If you can repeat this, please drop the driver maintainer an Email. Some modules lack the code to unload. They will deliberately set their usage count to -1 to prevent unloading.
- Why doesn't the kernel see all of my RAM?
- (REG, based on contribution from Mark Hahn) Some older distributions like (RedHat 6.1) are quite old, and use a 2.2 kernel which has not fundamentally changed since mid-to-late 1998. Way back then, the safe thing for the kernel to do was trust the standard bios memory detection mechanism. That bios call returns memory size as a 16 bit count of 1 KiB chunks, leading to a 64 MiB limit. Modern kernels (2.4 is the current stable kernel) use more modern bios calls that can detect all your memory, and even keep track of which memory is used by the bios itself. So your best option is to install a modern kernel. You can workaround the 64 MiB limit with obsolete kernels by telling the kernel how much memory you have, by using the mem= boot argument. For example, if you have 128 MiB of RAM, you would type mem=128M at the lilo prompt, or can have lilo use the argument automatically (add append="mem=128M" to your /etc/lilo.conf file).
- I've mounted a filesystem in two different places and it worked. Why?
- (AV, paraphrased by William Stearns)Because you've asked the kernel to do that. Yes, it works. No, it's not a bug. To unmount it from either mountpoint, simply runumount . Repeat for each mountpoint on which you do not wish the filesystem mounted.
Section 15 - Programming Religion
- Why is the Linux kernel written in C/assembly?
- (ADB) For many reasons, some practical, others theoretical. The practical reasons first: when Linus began writing Linux, what he had available was a 386, Minix (a minimal OS designed by Andrew Tanenbaum for OS design teaching purposes) and gcc. The theoretical reasons: some small parts of any OS kernel will always be written in assembly language, because they are too dependent on the hardware to be coded in C; for example, CPU and virtual memory setup. Or because we are dealing with very short routines that must be implemented in the fastest possible code e.g. the stubs for the "top half" interrupt handlers. WRT C, OS designers (since Thompson and Ritchie first wrote UNIX) have traditionally used C to implement as many OS kernel routines as possible. In this sense C can be considered the "canonical" language for OS kernel implementation, and particularly for UNIX variants.
- Why don't we rewrite it all in assembly language for processor Mega666?
- (ADB) Basically because we wouldn't gain much in terms of efficiency, but would lose a lot in terms of ease of maintenance and readability of the source code. Gcc is actually quite efficient, when we look at the assembler code generated. You are referred to Andrew Tanenbaum's book "Structured Computer Organization", 3rd ed., pages 401-404, for a more detailed comparison of the use of high level languages vs. assembly language in the implementation of OS's. There are a number of references on the subject at the end of the book, too.
- Why don't we rewrite the Linux kernel in C++?
- (ADB) Again, this has to do with practical and theoretical reasons. On the practical side, when Linux got started gcc didn't have an efficient C++ implementation, and some people would argue that even today it doesn't. Also there are many more C programmers than C++ programmers around. On theoretical grounds, examples of OS's implemented in Object Oriented languages are rare (Java-OS and Oberon System 3 come to mind), and the advantages of this approach are not quite clear cut (for OS design, that is; for GUI implementation KDE is a good example that C++ beats plain C any day).
- (REW) In the dark old days, in the time that most of you hadn't even heard of the word "Linux", the kernel was once modified to be compiled under g++. That lasted for a few revisions. People complained about the performance drop. It turned out that compiling a piece of C code with g++ would give you worse code. It shouldn't have made a difference, but it did. Been there, done that.
- (REG) Today (Nov-2000), people claim that compiler technology has improved so that g++ is not longer a worse compiler than gcc, and so feel this issue should be revisited. In fact, there are five issues. These are:
- Should the kernel use object-oriented programming techniques? Actually, it already does. The VFS (Virtual Filesystem Switch) is a prime example of object-oriented programming techniques. There are objects with public and private data, methods and inheritance. This just happens to be written in C. Another example of object-oriented programming is Xt (the X Intrinsics Toolkit), also written in C. What's important about object-oriented programming is the techniques, not the languages used.
- Should the kernel be rewritten in C++? This is likely to be a very bad idea. It would require a very large amount of work to rewrite the kernel (it's alarge piece of code). There is no point in just compiling the kernel with g++ and writing the odd function in C++, this would just result in a confusing mix of C and C++ code. Either the kernel is left in C, or it's all moved to C++.
To justify the enormous effort in rewriting the kernel in C++, significant gains would need to be demonstrated. The onus is clearly on whoever wants to push the rewrite to C++ to show such gains.
- Is it a good idea to write a new driver in C++? The short answer is no, because there isn't any support for C++ drivers in the kernel.
- Why not add a C++ interface layer to the kernel to support C++ drivers? The short answer is why bother, since there aren't any C++ drivers for Linux. However, if you are bold enough to consider writing a driver in C++ and a support layer, be aware that this is unlikely to be well received in the community. Most of the kernel developers are unconvinced of the merits of C++ in general, and consider C++ to generate bloated code. Also, it would result in a confusing mix of C and C++ code in the kernel. Any C++ code in the kernel would be a second-class citizen, as it would be ignored by most kernel developers when changes to internal interfaces are made. A C++ support layer would be frequently be broken by such changes (as whoever is making the changes would probably not bother fixing the C++ code to match), and thus would require a strong commitment from someone to regularly maintain it.
- Can we make the kernel headers C++-friendly? This is the first step required for supporting C++ drivers, and on the face seems quite reasonable (it is not a C++ support layer). This has the problem that C++ reserves keywords which are valid variable or field names in C (such as private and new). Thus, C++ is not 100% backwards compatible with C. In effect, the C++ standards bodies would be dictating what variable names we're allowed to have. From past behaviour, the C++ standards people have not shown a commitment to 100% backwards compatibility. The fear is that C++ will continue to expand its claim on the namespace. This would generate an ongoing maintenance burden on the kernel developers.
Note that someone once submitted a patch which performed this "cleaning up". It was ~250 kB in size, and was quite invasive. The patch did not generate much enthusiasm.
Apparently, someone has had the temerity to label the above paragraph as "a bit fuddy". So Erik Mouw did a short back-of-the-envelope calculation to show that searching the kernel sources for possible C++ keywords is a nightmare. Here is his calculation and comments (dates April, 2002): % find /usr/src/linux-2.4.19-pre3-rmap12h -name "*.[chS]" |/
xargs cat | wc -l
4078662
So there's over 4 million lines of kernel source. Let's assume 10% is
comments, so there's about 3.6 million lines left. Each of those lines
has to be checked for C++ keywords. Assume that you can do about 5
seconds per line (very optimistic), work 24 hours per day, and 7 days
a week:
5 s 1 hour 1 day 1 week
3600000 lines * ------ * -------- * ---------- * -------- = 29.8 weeks
line 3600 s 24 hours 7 days
Sounds like a nightmare to me. You can automate large parts of this,
but you'll need to write a *very* intelligent search-and-replace tool
for that. Better use that time in a more efficient way by learning C.
Note that this is the time required to do a proper manual audit of the code. You could cheat and forgo the auditing process, and instead just compile with C++ and fix all compiler errors, figuring that the compiler can do most of the work. This would still be a major effort, and has the problem that there may be uses of some C++ keywords which don't generate a compiler error, but do generate unintended code. In other words, introduced bugs. That is not a risk the kernel development community is prepared to take.
My personal view is that C++ has its merits, and makes object-oriented programming easier. However, it is a more complex language and is less mature than C. The greatest danger with C++ is in fact its power. It seduces the programmer, making it much easier to write bloatware. The kernel is a critical piece of code, and must be lean and fast. We cannot afford bloat. I think it is fair to say that it takes more skill to write efficient C++ code than C code. Not every contributer to the linux kernel is an uber-guru, and thus will not know the various tricks and traps for producing efficient C++ code.
- (REG) Finally, while Linus maintains the development kernel, he is the one who makes the final call. In case there are any doubts on what his opinion is, here is what he said in 2004:
In fact, in Linux we did try C++ once already, back in 1992.
It sucks. Trust me - writing kernel code in C++ is a BLOODY STUPID IDEA.
The fact is, C++ compilers are not trustworthy. They were even worse in 1992, but some fundamental facts haven't changed:
- the whole C++ exception handling thing is fundamentally broken. It's _especially_ broken for kernels.
- any compiler or language that likes to hide things like memory allocations behind your back just isn't a good choice for a kernel.
- you can write object-oriented code (useful for filesystems etc) in C, _without_ the crap that is C++.
In general, I'd say that anybody who designs his kernel modules for C++ is either
- (a) looking for problems
- (b) a C++ bigot that can't see what he is writing is really just C anyway
- (c) was given an assignment in CS class to do so.
Feel free to make up (d).
- Why is the Linux kernel monolithic? Why don't we rewrite it as a microkernel?
- (REG) The short answer is why should we? The longer answer is that experience has shown that microkernels have poor performance compared to monolithic kernels. Microkernels have a fundamental design problem, where different components of the kernel cannot interact without passing a privilege barrier (which is expensive). Microkernel advocates claim this is a feature, as it increases modularity and protects one part of the kernel from another. Whether this is a feature or a mis-feature is in the eye of the beholder, but it is clear that there is a performance cost inherent in the microkernel design. This is a cost the Linux kernel developers (and apparently, the users) are unwilling to bear.
There are projects which have ported the Linux kernel to generic microkernels (such as Mach3), usually making Linux a "personality". There are also other projects to create microkernel-based Unix-like implementations. Here is a short list:
- MkLinux was funded by Apple, and runs Linux on PowerPC Macs. It is available at: http://www.mklinux.org/. An x86 version is also available. Note that there is now a native Linux kernel for the PowerPC which is much faster, and is actively maintained. MkLinux has become a historical footnote.
- The Hurd is a microkernel-based Unix, and is supposed to be the promised GNU kernel. It sits on top of Mach3. The Debian Project provides a full distributionfor the Hurd.
- FIASCO is another project for creating MicroKernel LINUX. See http://os.inf.tu-dresden.de/fiasco/ for details.
There is a historical Usenet thread related to this subject, dating back from 1992, with posts from Linus, Andrew Tanenbaum, Roger Wolff, Theodore Y T'so, David Miller and others. Nice reading on a rainy afternoon. It's fascinating to see how some predictions (which seemed rather reasonable at the time) have proved wrong over the years (for example, that we would all be using RISC chips by 1998).
- Why don't we replace all the goto's with C exceptions?
- (REG) Admittedly, all those goto's do look a bit ugly. However, they are usually limited to error paths, and are used to reduce the amount of code required to perform cleanup operations. Replacing these with Politically Correct if-then-else blocks would require duplication of cleanup code. So switching to if-then-else blocks might be good Computer Science theory, but using goto's is good Engineering. Since the Linux kernel is one designed to be used, rather than to demonstrate theory, sound engineering principles take priority.
So now we come to the suggestion for replacing the goto's with C exception handlers. There are two main problems with this. The first is that C exceptions, like any other powerful abstraction, hide the costs of what is being done. They may save lines of source code, but can easily generate much more object code. Object code size is the true measure of bloat. A second problem is the difficulty in implementing C exceptions in kernel-space. This is convered in more detail below.
- (REG, quoting Keith Owens) The exceptions patch has to use assembler to walk the stack frames. Exceptions are being touted as a replacement for goto in new driver code but the sample patch only works for i386. No arch independent code can use exceptions until you have arch specific code that does the equivalent of longjmp for _all_ architectures.
Doing longjmp in the kernel is _hard_, I know because I had to do it for kdb on i386 and ia64. The kernel does things differently from user space and sometimes the arch maintainers decide to change the internal register usage. They are allowed to do this because it only affects the kernel, but any change to kernel register usage will probably require a corresponding change to setjmp/longjmp.
So you have arch dependent code which has to be done for all architectures before any driver can use it and the code has to be kept up to date by each arch maintainer. Tell me again why the existing mechanisms are not working and why we need exceptions? IOW, what existing problem justifies all the extra arch work and maintenance?
- Why are the kernel developers so dismissive of new techniques?
- (REG) This is a complaint that is raised periodically, usually shortly after some debate or flamewar following on from a suggestion to use a "new" technique. Often one or more noted kernel developers will shoot down the idea with a dismissive "that's a dumb idea" or "all pain, no gain", without a detailed explanation of why it's a bad idea. This does indeed look arrogant and dismissive, and gives the impression that the kernel developers are a pack of old dogs unwilling to learn new tricks. This perception is compounded by proclamations made by various computer science teachers about the positive value of the proposed new technique.
It should be noted, however, that kernels developers are exceptionally busy people, and generally prefer to write code than engage in lengthy discussions about why some idea is not good (at least for the kernel). Further, it's fairly likely that the "new" technique that is being proposed has already been evaluated, and found to be inadequate/inappropriate for the kernel. Or perhaps the developer has had prior experience with this technique and found it lacking.
If you are convinced that your favourite technique has value, you have to prove it. You can't demand that other people spend the time explaining to you why they think it's a bad idea. You have to do the hard work yourself to show you're right. Code up a patch and benchmark it compared to the standard kernel. Be prepared to defend your patch in a broader context, and demonstrate that it doesn't have costly side-effects. Remember that many micro-optimisations result in macro slowdowns.
Finally, some personal advice. Coding up a controversial patch and proving you're right is a time-consuming task. Because of this, avoid pushing ideas which you read in a book or heard from some CS notable. Stick to pushing ideas which you have either had prior experience, or have spent a lot of time thinking about. This will increase your chances of picking a winner, and decrease your frustration levels.
Section 16 - User-space programming questions
- Why does setsockopt() double SO_RCVBUF?
Contributing
Contributions are welcome on this FAQ. These can be submitted, preferably in
diff -u format, (against this HTML document source) by Email to Richard (see the Contributorssection above).
Sometimes, we may feel your contribution is controversial and/or incomplete and/or could be improved somehow. Also, the turnaround time has a wide range, from hours to months, depending on how busy Richard is. Please do not email him to chase changes as it slows him down. Suggestions and patches are queued, and will be processed eventually. Acknowledgements are usually sent when the change is made. Please be patient, FAQ updates are rarely urgent. Note that small, "obviously correct" patches are more likely to be processed faster, and often jump the queue ahead of larger patches.
Last updated on 17 Oct 2009 by Richard Gooch. This document is GPL'ed by its various contributors.