LinuxCBT_Awk_&_Sed_Edition_Notes

###FEATURES COMMONS TO BOTH AWK & SED###

 1. Both are scripting languages
 2. Both work primarily with text files
 3. Both are programmable editors
 4. Both accept command-line options and can be scripted (-f script_name)
 5. Both GNU versions uspport POSIX (GREP) and EGREP RegExes
 6. Lineage = ed (editor) -> sed -> awk

###SED's FEATURES###
 1. Non-interactive editor
 2. Stream Editor
  a. Manipulates input - performing edits as instructed
  b. Sed accepts input on/from: STDIN (Keyboard), File, Pipe (|)
 3. Sed Loops through ALL input lines of input stream or file, by DEFAULT
 4. Does NOT operate on the source file, by default. (Will NOT clobber the original file, unless instructed to do so)
 5. Supports addresses to indicate which lines to operate on: /^$/d - deletes blank lines
 6. Stores active (current) line the 'pattern space' and maintains a 'hold space' for usage
 7. Used primarily to perform Search-and-Replaces

###AWK's FEATURES###
 1. Field processor based on whitespace, by default
 2. Used for reporting ( extracting specific columns) from data feed
 3. Supports programming constructs:
  a. loop (for, while, do)
  b. conditioins (if, then, else)
  c. arrays (lists)
  d. functions (string, umeric, user-defined)
 4. Automatically tokenizes words in a line for later usage - $1, $2, $3, etc. (This is based on the current delimiter)
 5. Automatically loops through input like Sed, making lines availables for processing
 6. Ability to execute shell commands using 'system()' functions


###REGULAR EXPRESSIONS (RegEx) REVIEW###
Regular Expressions (RegExes) are key to mastering Awk & Sed

###METACHARACTERS###
^ - matches the character(s) at the beginning of a line
 a. sed -ne '/^dog/p' animals.txt

$ - matches the character(s) at the end of a line
 a. sed -ne '/dog$/p' animals.txt

Task: Match line which contains only 'dog':
 a. sed -ne '/^dog$/p' animals.txt
 b. sed -ne '/^dog$/p' - reads from STDIN, Press Enter after each line, Terminate with CTRL-D
 c. cat animals.txt | sed -ne '/^dog$/p'
 d. cat animals.txt | sed -ne '/^dog$/Ip' - Prints matches case-insensitively

. - matches any character (typically except new line)
 a. sed -ne '/^d...$/Ip' animals.txt
 b. sed -ne '/^d.../Ip' animals.txt

###REGEX QUANTIFIERS###
* - 0 or more matches of the previous character
+ - 1 or more matches of the previous character
? - 0 or 1 of the previous character

 a. sed -ne '/^d.\+/Ip' animals.txt
Note: Escape quantifiers in RegExes using the escape character '\'

###CHARACTERS CLASSES###
Allow to search for a range of characters
 a. [0-9]
 b. [a-z][A-Z]

a. sed -ne '/^d.\+[0-9]/Ip' animals.txt

Note: Character Classes match 1, and only 1 character


###INTRO TO SED###
Usage:
 1. sed [options] 'instruction' file | PIPE | STDIN
 2. sed -e 'instruction1' -e 'instruction2' ...
 3. sed -f script_file_name file
Note: Execute Sed by indicating instruction on one of the following:
 1. Command-line
 2. Script File

Note: Sed accepts instructions based on '/pattern_tp_match/action'
###Print Specific Lines of a file###
Note: '-e' is optional if there is only 1 instruction to execute
sed -ne '1p' animals.txt - prints first line of file
sed -ne '2p' animals.txt - prints second line of file
sed -ne '$p' animals.txt - prints last printable line of file
sed -ne '2,4p' animals.txt - prints lines 2-4 from file
sed -ne '1!p' animals.txt - prints ALL EXCEPT line #1
sed -ne '1,4!p' animals.txt - prints ALL EXCEPT line 1 - 4
sed -ne '/dog/p' animals.txt - prints ALL line scontaining 'dog' - case-sensitive
sed -ne '/dog/Ip' animals.txt - prints ALL line scontaining 'dog' - case-insensitive
sed -ne '/[0-9]/p' animals.txt - prints ALL lines with AT LEAST 1 numeric
sed -ne '/cat/,/deer/p' animals.txt - prints ALL lines beginning with 'cat', ending with 'deer'
sed -ne '/deer/,+2p' animals.txt - prints the line with 'deer' plus 2 extra lines

###Delete Lines using Sed Addresses###
sed -e '/^$/d' animals.txt - deletes blank lines from file
Note: Drop '-n' to see the new output when deleting

sed -e '1d' animals.txt - deletes the first line form animals.txt
sed -e '1,4d' animals.txt - deletes lines 1-4 form animals.txt
sed -e '1~2d' animals.txt - deletes every 2nd line beginning with line 2 - 1, 3, 5...

###Saves Sed's Changes using Output Redirection###

sed -e '/^$/d' animals.txt > animals2.txt - deletes blank lines from file and creates new output file 'animals2.txt


###SEARCH & REPLACE USING Sed###
General Usage:
sed -e 's/find/replace/g' animals.txt - replaces 'find' with 'replace'
Note: Left Hand Side (LHS) supports literals and RegExes
Note: Right Hand Side (RHS) supports literals and back references

Examples:
sed -e 's/LinuxCBT/UnixCBT/' - replaces 'LinuxCBT' with 'UnixCBT' on STDIN to STDOUT
sed -e 's/LinuxCBT/UnixCBT/I' - replaces 'LinuxCBT' with 'UnixCBT' on STDIN to STDOUT (Case-Insensitives)

Note: Replacements occur on the FIRST match, unless 'g' is appended to the s/find/replace/g sequence
sed -e 's/LinuxCBT/UnixCBT/Ig' - replaces 'LinuxCBT' with 'UnixCBT' on STDIN to STDOUT (Case-Insensitives)

Task:
 1. Remove ALL blank lines
 2. Substitute 'cat', regardless of case, with 'Tiger'

Note: Whenever using '-n' option, you MUST specify the print modifier 'p'
sed -ne '/^$/d' -e 's/cat/Tiger/Ig' animals.txt - removes blank lines & substitutes 'cat' with 'Tiger'
OR sed -e '/^$/d; s/cat/Tiger/Igp' animals.txt - does the same as above
Note: Simply separate multiple commands with semicolons

###Update Source File - Backup Source File###
sed -i.bak -e '/^$/d; s/Cat/Tiger/Igp' animals.txt - performs as above, but ALSO replaces the source file and backs it up


###Search & Replace (Text Substitution) Continued###
sed -e '/address/s/find/replace/g/' file
sed -e '/Tiger/s/dog/mutt/g' animals.txt
sed -ne '/Tiger/s/dog/mutt/gp' animals.txt - substitutes 'dog' with 'mutt' where line contains 'Tiger'
sed -e '/Tiger/s/dog/mutt/gI' animals.txt
sed -e '/^Tiger/s/dog/mutt/gI' animals.txt - Updates lines that begin with 'Tiger'
sed -e '/^Tiger/Is/dog/mutt/gI' animals.txt - Updates lines that begin with 'Tiger' (Case-Insensitive)

###Focus on the Right Hand Side (RHS) of Search & Replace Function in SED###
Note: SED reserves a few characters to help with substitutions based on the matchsd pattern from the LHS
& = The full value of the LHS (Pattern Matched) OR the values in the pattern space

Task:
Intersperse each line with the word 'Animal '
sed -ne 's/.*/&/p' animals.txt - replace the matched pattern with the matched pattern
sed -ne 's/.*/Animal &/p' animals.txt - Intersperses 'Animal' on each line
sed -ne 's/.*/Animal: &/p' animals.txt - Intersperses 'Animal' on each line

sed -ne 's/.*[0-9]/&/p' animals.txt - returns animals with at least 1 numeric at the end of the name
sed -ne 's/.*[0-9]\{1\}/&/p' animals.txt - returns animals with only 1 numeric at the end of the name
sed -ne 's/[a-z][0-9]\{4\}$/&/pI' animals.txt - returns animal(s) with 4 numeric values at the end of the line
sed -ne 's/[a-z][0-9]\{1,4\}$/&/pI' animals.txt - returns animal(s) with at leaset 1, up to 4 numeric values at the end of the name

###Grouping & Backreferences###
#Note: Segement matches into backreferences using escaped parenthesis: \(RegEx\)
sed -ne 's/\(.*\)\([0-9]\)/&/p' animals.txt - This creates 2 variables: \1 & \2
sed -ne 's/\(.*\)\([0-9]\)$/\1/p' animals.txt - This creates 2 variables: \1 & \2 but references \1
sed -ne 's/\(.*\)\([0-9]\)$/\2/p' animals.txt - This creates 2 variables: \1 & \2 but references \2
sed -ne 's/\(.*\)\([0-9]\)$/\1 \2/p' animals.txt - This creates 2 variables: \1 & \2 but references \1 and \2


###Apply Changes to Multiple Files###
Sed Supports Globbing: *, ?
sed -ne 's/\(.*\)\([0-9]\)$/\1 \2/p' animals*.txt - This creates 2 variables: \1 & \2 but references \1 and \2

###Sed Scripts###
Note: Sed supports scripting, which means, the ability to dump 1 or more instructions into 1 file

sed -f script_file_name text_file

sed -f animals.sed animals.txt

Task:
Perform multiple transformations on animals.txt
1. /^$/d - removes blank lines
2. s/dog/frog/Ig - substitute globally 'dog' with 'frog' - (case-insensitive)
3. s/tiger/lion/Ig - substitute globally 'tiger' with 'lion' - (case-insensitive)
4. s/.*/Animals: &/ - Interspersed 'Animals:'
5. s/animals/mammals/iG - replaced 'Animals' with mammals'
6. s/\([a-z]*\)\([0-9]*\)/\1/Ip - Strips trailing numeric values from alphas

Sed Scripting Rules:
 1. Sed applies ALL rules to each line
 2. Sed applies ALL changes dynamically to the pattern space
 3. Sed ALWAYS works with the current line
 

###Awk - Intro###
Features:
 1. Reporter
 2. Field Processor
 3. Supports Scripting
 4. Programming Constructs
 5. Default delimiter is whitespace
 6. Supports: Pipes, Files, and STDIN as sources of input
 7. Automatically tokenizes processed columns/fields into the variables: $1, $2, $3 .. $n
 8. Supports GREP and EGREP RegExes

Usage:
awk '{instructions}' file(s)
awk '/pattern/ { procedure }' file
awk -f script_file file(s)


Tasks:
Note: $0 represents the current record or row
1. Print enrire row, one at a time, form a input file (animals.txt)
 a. awk '{ print $0 }' animals.txt

2. Print specific columns from (animals.txt)
 a. awk '{ print $1 }' animals.txt - this print the 1st column form the file

3. Print multiple columns from (animals.txt)
 a. awk '{ print $1; print $2; }' animals.txt
 b. awk '{ print $1,$2; }' animals.txt

4. Print columns from lines containing 'deer' using RegEx Support
 a. awk '/deer/ { print $0 }' animals.txt

5. Print columns from lines containing digits
 a. awk '/[0-9]/ { print $0 }' animals.txt

6. Remove blank lines with Sed and pipe output to awk for processing
 a. sed -e '/^$/d' animals.txt | awk '/[0-9]/ { print $0 }'

7. Print blank lines
 a. awk '/^$/ { print }' animals.txt
 b. awk '/^$/ { print $0 }' animals.txt

8. Print ALL lines beginning with the animal 'dog' case-insensitive
 a. awk '/dog/I { print }' animals.txt

###Delimiters###
Default delimiter: whitespace (space, tabs)
Use: '-F' to influence the default delimiter

Task:

1. Parse /etc/passwd using awk
 a. awk -F: '{ print }' /etc/passwd
 b. awk -F: '{ print $1, $5 }' /etc/passwd

2. Support for character classes in setting the default delimiter
 a. awk -F"[:;,\t]'


###Awk Scripts###
Features:
 1. Ability to organize patterns and procedures into a script file
 2. The patterns/procedures are much neater and easier to read
 3. Less information is placed on the command-line
 4. By default, loops throught lines of input from various sources: STDIN, Pipe, files
 5. '#' is the default comment character
 6. Able to perform matches based on specific fields

 Awk Scripts consists of 3 parts:
  1. Before (denoted using: BEGIN) - Executed prior to FIRST line of input being read
  2. During (Main Awk loop) - Focuses on looping through lines of input
  3. After (denoted using: END) - Executed after the LAST line of input has been procesed
Note: BEGIN and END components of Awk scripts are OPTIONAL

Tasks:
1. Print to the screen some useful information without reading input (STDIN, Pipe, or File)
 a. awk 'BEGIN { print "Testing Awk without input file" }'

2. Set system variable: FS to colon in BEGIN block
 a. awk 'BEGIN { FS = ":" ; print "Testing Awk without input file" }'
 b. awk 'BEGIN { FS = ":" ; print FS }'

3. Write script to extract rows which contain 'deer' from animals.txt using RegEx
 a. awk -f animals.awk animals.txt

4. Parse /etc/passwd
 a. print entire lines - { print }
 b. print specific columns - { print $1,$5 }
 c. print specific columns for a specific user - /linuxcbt/ { print $1,$5 }
 d. print specific columns for a specific user matching a given column - $1 ~ /linuxcbt/ { print $1, $5 }
 e. test column #7 for the string 'bash' - $7 ~ /bash/


###Awk Variables###
Features 3 Types of variables:
 1. System - i.e. FILENAME, RS, ORS...
 2. Scalars - i.e. a = 3
 3. Arrays - i.e. variable_name[n]

Note: Variables do not need to be declared. Awk automatically registers them in memory
Note: Variables names ARE case-sensitive

System Variables:
 1. FILENAME - name of current input file
 2. FNR - used when multiple input files are used
 3. FS - field separator - defaults to whitespace - can be a single character, including via a RegEx
 4. OFS - output field separator - defaults to whitespace
 5. NF - number of field in the current record
 6. NR - current record number (it is auto-summed when referenced in END section)
 7. RS - record separator - defaults to a newline
 8. ORS - output record separator - defaults to a newline
 9. ARGV - array of command-line argurments - indexed at 0, beginning with $1
10. ARGC - total # of command-line arguments
11. ENVIRON - array of environment variables for the current user


Tasks:
1. print key system variables
 a. print FILENAME (print anywhere after the BEGIN block)
 b. print NF - number of fields per record
 c. print NR - current record number
 d. print ARGC - returns total number of command-line arguments

Scalar Variables:
variable_name = value

age = 50

Note: Set scalars in the BEGIN section, however, they can be, if required, set in the main loop

{ ++age } - increments variable 'age' by 1, for each iteration of the main loop (component 2 of 3)

Set variable to string using double quotes:
fullname = "Dean Davis"

Concatenate variables by separating the values using a space
fullname "Dean" "Davis"

Array Variables:
Features:
 1. List of information

Task:
 1. Define an array variable to store various ages
  a. age[0] = 50
 2. Use split function to auto-build an array
  a. arr1num = split(string, array,separator)

###Operators###
#Features:
 1. Provides comparison tools for expressions
 2. Generally 2 types:
  a. Relational - ==, !=, <, >, <=, >= (RegEx Matches), !~ (RegEx NOT Match)
  b. Boolean - ||(OR), &&(AND), !(NOT) - Combines comparisons

 3. Print something if the current record number is > 10
  a. NR > 10 { print "Current Record Number is greater than 10" }
 4. Extract records with ONLY 2 fields
  a. NF == { print }

 5. Find records that have at least 2 fields and are positioned at record 5 or higher
  a. NF >= && NR >= 5 { print }

###Loop###
Features:
 1. Support for: while, do, and for

while:
 { wihle (NR > 10) print "Greater than 10" }

For:
 for (i=1; i <=10; ++i) print i

Do - performs the action carried-out by while at least once:
 do action while (condition)

###Processing Records with Awk###
Task:
 1. Process mutiple delimiters in the same file (accross records)
  a. awk -F "[:; ]" '{ print }' animals2.txt
  b. awk 'BEGIN { FS="[ ;:]" }; { print $2 }' animals2.txt
  c. awk -f acript_name animals2.txt
 2. process mutiple delimiter on the same line
  a. Note: Script does NOT change, however, input file DOES
 3. Normalize the Output Field Separator (OFS)
  BEGIN { OFS=: }

 4. Build animalclasses array from the list of classes in animals2.txt
  a. { animalclass[NR] = $2 } - places in main loop - builds animalclass array

 5. Extract Daemon entries from /var/log/messages
  a. extract kernel message
   a1. awk -f test.awk /var/log/message
   a2. awk -f ~linuxcbt/test.awk messages | awk '$8 ~ /error/ { print $5,$6,$7,$8,$9 }'
   a3. awk -f ~linuxcbt/test.awk messages | awk 'BEGIN { print "HERE ARE THE ERROR MESSAGES" }; $8 ~ /error/ { print $5,$6,$7,$8 }; END { print "Process Complete" }'

###Printf Formatting###
Features:
 1. Ability to control the width of fields in the ouput

Usage:
 printf("format", arguments)
Supported Print Formats include:
 1. %c - ASCII Characters
 2. %d - Decimals - NOT floating point values OR values to the right of the decimal point
 3. %f - Floating Point
 4. %s - Strings
Note: printf does NOT print newline characters(s)
This means you'll need to indicate a newline character sequence: \n - in the "format" section of the printf function

Note: Default output is right-justfied, Use '-' to indicate left-justification
General format section:
[-]width.precision[cdfs]
witch - influences the actual width of the column to be output
precision - influences the number of places to the right of the decimal point
precision - ALSO influences the number of strings to be printed from a string


Examples | Tasks:
 1. print "Testing printf" from the command-line
  a. awk 'BEGIN { printf("Testing printf\n") }'

 2. read 'animals.txt' and format the output with printf
  a. awk 'BEGIN { printf("Here is the output\n") } { printf("%s\t%s\n", $1, $2) }' animals.txt

 3. Apply width and precision to task #2
  a. awk 'BEGIN { printf("Here is the output\n") } { printf("%.3s\t%.4s\n", $1, $2) }' animals.txt
  b. awk 'BEGIN { printf("Here is the output\n") } { printf("%20s\t%20s\n", $1, $2) }' animals.txt

 4. Left-justify task #3
  a. awk 'BEGIN { printf("Here is the output\n") } { printf("%-20s\t%-20s\n", $1, $2) }' animals.txt

 5. Parse animals_with_prices.txt file and properly represent strings, decimals and floating point values
  a. awk 'BEGIN { printf("Here is the output\n\n") } { printf("%-5s\t$%.2f\n", $1, $2) } animals_with_prices.txt

 6. Format using printf animals2.txt
  a. for ( i=1; i <= NR; i++)
     printf("%-10s %1d %-2s %-10s\n", "Animal Class", i, ": ", animalclass[i])

 7. Apply upper and lower-case formatting to printf values
  a. printf("%-10s %1d %-2s %-10s\n", "ANIMAL CLASS", i, ": ", toupper(animalclass[i]))
  b. printf("%-10s %1d %-2s %-10s\n", "ANIMAL CLASS", i, ": ", tolower(animalclass[i]))

 8. Format output from /var/log/message
  a. Extract date, time, server and daemon columns, include a header
   BEGIN { printf("%-15s %-8s %10s\n", "Date", "Server", "Daemon") }
   /kernel/ { printf("%3s %2s %8s %-9s %10s\n", $1,$2,$3,$4,$5) }

###Additional Sed & Awk Examples###
Task:
 1. Update PHP web pages to remove 'Shipping: Free' wherever it exists
  a. Code to remove: <b>Shipping</b>:&nbsp;Free<br>
   sed -i.bak -e 's/<b>Shipping<\/b>:&nbsp;Free<br>//' products_linuxcbt_security_edition.php

  b. Effect the change to ALL product files and create .new output files without clobbering the source file
   for i in `ls -A products_*php`; do sed -e 's/<b>Shipping<\/b>:&nbsp;Free<br>//' $i > $i.new; done

 2. Strip '.new' suffix from newly generated files
  a. echo "products_linuxcbt.php.new" | sed -e 's/\.new//'
  b. for i in `ls -A products_*new | sed -e 's/\.new//'`; echo $i; done
  c. for i in `ls -A products_*new | sed -e 's/\.new//'`; do mv $i.new $i; done

 3. Remove 'Free Shipping' from faq.php file
  a. Code to remove: <li>Free Shipping
  b. sed -e 's/<li>Free Shipping//' faq.php > faq.php.new


Use Awk & Sed Together to update specific rows in /var/log/message
Task:
 a. Update Month information for kernel messages for September 3
 awk '$1 ~ /Sep/ && $2 ~ /3/ && $5 ~ /kernel/ { print }' /var/log/message
 b. awk '$1 ~ /Sep/ && $2 ~ /3/ && $5 ~ /kernel/ { ++total; print } END { print "Total Records Updated:" total }' /var/log/message | sed -ne 's/Sep/September/p'

###Windows Support for GNU Sed & Awk###
Download GNU Sed & Awk from: http://gnuwin32.sourceforge.net

Windows Stuff:
gawk "BEGIN { max=ARGV[1]; for (i=1;i<=max;++i) print i }" 10 - reads 10 from ARGV[1] and passes it to 'max' var for use in the 'for' loop

你可能感兴趣的:(linux,awk,sed,休闲,linuxCBT)