原位: http://www.linux.org/article/view/text-processing-and-manipulation
(grep awk sed uniq sort cut 命令使用实例)
Other text manipulation tools at your disposal
In your day-to-day work with a Linux server, you may find yourself needing to view, manipulate and change text data without "editing" files, per se. In this section we will look at various tools that come with standard Linux distributions for dealing with text data.
Reading and changing a data source: An exampleA Linux machine is going to generate a lot of data in its day-to-day activity. Most of this is going to be very important for your management of the machine. Let's take an example. If you have the Apache web server installed, it constantly pumps data to two important files, access and errors. The access file logs every visit to your web site. Every time somebody (or something) visits your website, Apache creates an entry in this file with the date, hour and information the file that was requested. The errors file is there to record errors such as server misconfiguration or faulty CGI scripts.
You can find good software out there for processing your access file. It will tell you how many hits you got on your webpage. It will even automatically look for requests for missing or non-existent web pages, known as 404s. Some of this software is very good at generating informative reports of traffic, warnings of missing pages and other website statistics, but it is aimed at website administrators in general and as such presents its data in a general way. At times though, you may want or need to do a bit of hands on evaluation of the data in the file - to go beyond what these web stats packages offer.
I can grep that!
Any information that you might want to get from a file is at your fingertips, thanks to grep. Despite rumors to the contrary that grep is Vulcan forfind this word, grep stands for General Regular Expression Parser. And now that you know that, I'm sure you feel a lot better. Actually the more proficient you are at dealing with regular expressions, the better you will be at systems administration. Let's look at another example from our Apache access file.
Let's say you're interested in looking at the number of visits to your website for the month of June. You would do a simple grep on the access file, like so:
grep -c 'Jun/2003' access
grep -c `date +%b` access
grep -c `date +%d/%b` access
grep -c `date +%d/` access
zgrep -c `date +%d/` access_062003.gz
grep `date +%b/` access | gzip -c > access_01-20jul.gz
grep -c '^From:' /var/spool/mail/penguin
From: [email protected] From: [email protected] From: "F. Scott Free" < [email protected]>From: "Sarah Doktorindahaus" < [email protected]>
grep '[0-9]{3}-[0-9]{4}' inbox
grep bash$ /etc/passwd
root:x:0:0:root:/root:/bin/bash mike:x:500:500:mike:/home/mike:/bin/bash dave:x:501:501:dave:/home/dave:/bin/bash laura:x:502:502:laura:/home/laura:/bin/bash jeff:x:503:503:jeff:/home/jeff:/bin/bash
ps uax | grep $USER
ps -l | grep Oct
ps uax | awk '/mike/'
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 3355 0.0 0.1 1344 120 ? S Jul27 0:00 crond
ps uax | awk '/root/ {print $1,$2,$4}'
root 1 0.3 root 2 0.0 root 3 0.0 root 4 0.0 root 9 0.0 root 5 0.0 root 6 0.0 root 7 0.0 root 8 0.0 root 10 0.0 root 11 0.0 root 3466 0.0 root 3467 0.0 root 3468 0.0 root 3469 0.0 root 3512 0.0 root 3513 7.9 root 14066 0.0
ps uax | awk '/3513/ {print $1,$2,$4,$11}
root 3513 7.6 /usr/X11R6/bin/X
ps uax | awk '/^mike/ { x += $4 } END { print "total memory: " x }'
total memory: 46.8
ls -l | awk '/jpg/ { x += $5 } END { print "total bytes: " x }'
mail **Never logged in** news **Never logged in** uucp **Never logged in**
lastlog | sed '/Never/d' > last_logins
Username Port From Latest fred pts/3 s57a.acme.com Mon Jul 14 08:45:49 +0200 2011 sarah pts/2 s52b.acme.com Mon Jul 14 08:01:27 +0200 2011 harry pts/6 s54d.acme.com Mon Jul 14 07:56:20 +0200 2011 carol pts/4 s53e.acme.com Mon Jul 14 07:57:05 +0200 2011 carlos pts/5 s54a.acme.com Mon Jul 14 08:07:41 +0200 2011
cat last_logins | sed 's/08/07/g' > last_logins2
sed '/fred/s/08/07/g'
cat access | sed '/default.ida/!d; //Jul/!d' > MS_Exploits_July
cat access | sed '/^.{200}/d' > normal_traffic
cat access | sed -n '/^.{220}/p' > MS_exploits
echo `date +%y-%m-%d_AT_%T` No changes >> sleep_experiment_43B
echo `date +%y-%m-%d_AT_%T` subject moved right arm >> sleep_experiment_43B
03-08-09_AT_23:10:16 No change 03-08-09_AT_23:20:24 No change 03-08-09_AT_23:30:29 No change 03-08-09_AT_23:40:31 No change 03-08-09_AT_23:50:33 No change 03-08-09_AT_00:00:34 No change 03-08-09_AT_00:10:35 No change 03-08-09_AT_00:20:37 No change 03-08-09_AT_00:30:05 subject rolled over 03-08-09_AT_00:40:12 No change 03-08-09_AT_00:50:13 No change 03-08-09_AT_01:00:50 subject moved left leg 03-08-09_AT_01:10:17 No change 03-08-09_AT_01:20:18 No change 03-08-09_AT_01:30:19 No change 03-08-09_AT_01:40:20 No change 03-08-09_AT_01:50:47 subject moved right arm 03-08-09_AT_02:00:11 No change 03-08-09_AT_02:10:20 subject scratched nose
uniq -f 1 sleep_experiment_43B
03-08-09_AT_23:10:16 No change 03-08-09_AT_00:30:05 subject rolled over 03-08-09_AT_01:00:50 subject moved left leg 03-08-09_AT_01:50:47 subject moved right arm 03-08-09_AT_02:10:20 subject scratched nose
lastlog | uniq -f 4
chocolate ketchup detergent cola chicken mustard bleach ham rice bread croissants ice-cream hamburgers cookies spaghetti
sort grocery_list
chocolate aisle 3 ketchup aisle 9 detergent aisle 6 cola aisle 5 chicken meat dept mustard aisle 9 bleach aisle 6 ham deli counter rice aisle 4 bread aisle 1 croissants aisle 1 ice-cream aisle 2 hamburgers meat dept cookies aisle 3 spaghetti aisle 4
sort +2 grocery_list
bread aisle 1 croissants aisle 1 ice-cream aisle 2 chocolate aisle 3 cookies aisle 3 rice aisle 4 spaghetti aisle 4 cola aisle 5 bleach aisle 6 detergent aisle 6 ketchup aisle 9 mustard aisle 9 ham deli counter chicken meat dept hamburgers meat dept
cat /var/log/mail.log | sort -r |more
cat access | cut -c1-16 > IP_visitors
cat /etc/passwd | grep bob | cut -f1,3 -d":"
bob:1010
cat access | cut -f1-2 -d" " | sort | uniq | wc -l