数组
Arrays are subscripted with an expression between square brackets ([ and ]). If the expression is an expression list (expr, expr ...) then the array subscript is a string consisting of the concatenation of the (string) value of each expression, separated by the value of the SUBSEP variable. This facility is used to simulate multiply dimensioned arrays. For example:
i = "A"; j = "B"; k = "C"
x[i, j, k] = "hello, world\n"
assigns the string "hello, world\n" to the element of the array x which is indexed by the string "A\034B\034C". All arrays in AWK are associative, i.e. indexed by string values. The special operator in may be used in an if or while statement to see if an array has an index consisting of a particular value.
数组是用方括号下标定义的,支持表达式列表的输入,表达式之间的分隔符可以用SUBSEP设定,默认为逗号。AWk里面的数组都是关联数组,也就是hash数组了,或者说像是c++里面的map,支持string作为索引值。关键字in(和perl一致)常用在if和while语句中。
if (val in array)
print array[val]
If the array has multiple subscripts, use (i, j) in array. The in construct may also be used in a for loop to iterate over all the elements of an array. An element may be deleted from an array using thedelete statement. The delete statement may also be used to delete the entire contents of an array, just by specifying the array name without a subscript.
delete表达式既可以删除列表元素也可以删除列表本身。
Variable Typing And Conversion
变量的书写和转换
Variables and fields may be (floating point) numbers, or strings, or both. How the value of a variable is interpreted depends upon its context. If used in a numeric expression, it will be treated as a number, if used as a string it will be treated as a string. To force a variable to be treated as a number, add 0 to it; to force it to be treated as a string, concatenate it with the null string. When a string must be converted to a number, the conversion is accomplished using strtod(3). A number is converted to a string by using the value of CONVFMT as a format string for sprintf(3), with the numeric value of the variable as the argument. However, even though all numbers in AWK are floating-point, integral values are always converted as integers. Thus, given:
变量可以是数字(浮点也ok)、字符串或者人妖。和一般的解释性语言一样,变量具体是什么仍然是上下文相关的。基本的转换方法也差不多,可以使用加0法或者点加空字符串法;这里找注意的是转换的过程:字符串转换为数字的时候,其结果会受到CONVFMT(设定了输出的数字格式)的影响!管你女马的是不是浮点,整数总是整数。
CONVFMT = "%2.2f"
a = 12
b = a ""
the variable b has a string value of "12" and not "12.00". Gawk performs comparisons as follows: If two variables are numeric, they are compared numerically. If one value is numeric and the other has a string value that is anumeric string, then comparisons are also done numerically. Otherwise, the numeric value is converted to a string and a string comparison is performed. Two strings are compared, of course, as strings.
awk 做转换的原则如下:
1.如果两者都是数字,则按照数字处理(废话)
2.如果一个是数字,而另外一个是“数字符串”(类似“1234”的字符串),则按照数字处理。
3.其他的所有情况都按照字符串处理。
The idea of numeric string only applies to fields, getline input, FILENAME, ARGV elements, ENVIRON elements and the elements of an array created by split() that are numeric strings. The basic idea is that user input, andonly user input, that looks numeric, should be treated that way. Uninitialized variables have the numeric value 0 and the string value "" (the null, or empty, string).
“数字符串”的有效光环比较tiny,只能在一些情况有效,按照原文的意思,只有在用户输入的情况下,或者使用split函数产生的数组元素才可以是数字符串。
Octal and Hexadecimal Constants
六八常量
Starting with version 3.1 of gawk , you may use C-style octal and hexadecimal constants in your AWK program source code. For example, the octal value 011 is equal to decimal 9, and the hexadecimal value 0x11 is equal to decimal17.
和C一样。
做人要言简意赅,嗯嗯。
String Constants
字符常量
String constants in AWK are sequences of characters enclosed between double quotes ("). Within strings, certain escape sequences are recognized, as in C. These are:
\\ A literal backslash.
\a The alert character; usually the ASCII BEL character.
\b backspace.
\f form-feed.
\n newline.
\r carriage return.
\t horizontal tab.
\v vertical tab.
\xhex digits
The character represented by the string of hexadecimal digits following the \x. As in ANSI C, all following
hexadecimal digits are considered part of the escape sequence. (This feature should tell us something about
language design by committee.) E.g., "\x1B" is the ASCII ESC (escape) character.
\ddd The character represented by the 1-, 2-, or 3-digit sequence of octal digits. E.g., "\033" is the ASCII ESC
(escape) character.
\c The literal character c.
The escape sequences may also be used inside constant regular expressions (e.g., /[ \t\f\n\r\v]/ matches whitespace
characters).
In compatibility mode, the characters represented by octal and hexadecimal escape sequences are treated literally
when used in regular expression constants. Thus, /a\52b/ is equivalent to /a\*b/.