$ cat file Item1,200 Item2,500 Item3,900 Item2,800 Item1,600
$ awk -F"," '{x+=$2}END{print x}' file 3000The delimiter(-F) used is comma since its a comma separated file. x+=$2 stands for x=x+$2. When a line is parsed, the second column($2) which is the price, is added to the variable x. At the end, the variable x contains the sum. This example is same as discussed in the awk example of finding the sum of all numbers in a file.
2. To find the total sum of particular group entry alone. i.e, in this case, of "Item1":
$ awk -F, '$1=="Item1"{x+=$2;}END{print x}' file 800This gives us the total sum of all the items pertaining to "Item1". In the earlier example, no condition was specified since we wanted awk to work on every line or record. In this case, we want awk to work on only the records whose first column($1) is equal to Item1 .
3. If the data to be worked upon is present in a shell variable:
$ VAR="Item1" $ awk -F, -v inp=$VAR '$1==inp{x+=$2;}END{print x}' file 800-v is used to pass the shell variable to awk , and the rest is same as the last one.
4. To find unique values of first column
$ awk -F, '{a[$1];}END{for (i in a)print i;}' file Item1 Item2 Item3Arrays in awk are associative and is a very powerful feature. Associate arrays have an index and a corresponding value. Example: a["Jan"]=30 meaning in the array a, "Jan" is an index with value 30. In our case here, we use only the index without values. So, the command a[$1] works like this: When the first record is processed, in the array named a, an index value "Item1" is stored. During the second record, a new index "Item2", during third "Item3" and so on. During the 4th record, since the "Item1" index is already there, no new index is added and the same continues.
To understand the for loop better, look at this:
for (i in a) { print i; }
Note: The order of the output in the above command may vary from system to system. Associative arrays do not store the indexes in sequence and hence the order of the output need not be the same in which it is entered.
5. To find the sum of individual group records. i.e, to sum all records pertaining to Item1 alone, Item2 alone, and so on.
$ awk -F, '{a[$1]+=$2;}END{for(i in a)print i", "a[i];}' file Item1, 800 Item2, 1300 Item3, 900a[$1]+=$2 . This can be written as a[$1]=a[$1]+$2. This works like this: When the first record is processed, a["Item1"] is assigned 200(a["Item1"]=200). During second "Item1" record, a["Item1"]=800 (200+600) and so on. In this way, every index item in the array is stored with the appropriate value associated to it which is the sum of the group.
$ awk -F"," '{x+=$2;print}END{print "Total,"x}' file Item1,200 Item2,500 Item3,900 Item2,800 Item1,600 Total,3000This is same as the first example except that along with adding the value every time, every record is also printed, and at the end, the "Total" record is also printed.
7. To print the maximum or the biggest record of every group:
$ awk -F, '{if (a[$1] < $2)a[$1]=$2;}END{for(i in a){print i,a[i];}}' OFS=, file Item1,600 Item2,800 Item3,900Before storing the value($2) in the array, the current second column value is compared with the existing value and stored only if the value in the current record is bigger. And finally, the array will contain only the maximum values against every group. In the same way, just by changing the "lesser than(<)" symbol to greater than(>), we can find the smallest element in the group.
if (condition)
{
<code for true condition >
}else{
<code for false condition>
}
8. To find the count of entries against every group:
$ awk -F, '{a[$1]++;}END{for (i in a)print i, a[i];}' file Item1 2 Item2 2 Item3 1
9. To print only the first record of every group:
$ awk -F, '!a[$1]++' file Item1,200 Item2,500 Item3,900A little tricky this one. In this awk command, there is only condition, no action statement. As a result, if the condition is true, the current record gets printed by default.
10. To join or concatenate the values of all group items. Join the values of the second column with a colon separator:
$ awk -F, '{if(a[$1])a[$1]=a[$1]":"$2; else a[$1]=$2;}END{for (i in a)print i, a[i];}' OFS=, file Item1,200:600 Item2,500:800 Item3,900This if condition is pretty simple: If there is some value in a[$1], then append or concatenate the current value using a colon delimiter, else just assign it to a[$1] since this is the first value.
if(a[$1]) a[$1]=a[$1]":"$2; else a[$1]=$2The same can be achieved using the awk ternary operator as well which is same as in the C language.
$ awk -F, '{a[$1]=a[$1]?a[$1]":"$2:$2;}END{for (i in a)print i, a[i];}' OFS=, file Item1,200:600 Item2,500:800 Item3,900Ternary operator is a short form of if-else condition. An example of ternary operator is: x=x>10?"Yes":"No" means if x is greater than 10, assign "Yes" to x, else assign "No".
Concatenate variables in awk:
One more thing to notice is the way string concatenation is done in awk. To concatenate 2 variables in awk, use a space in-between.
Examples:
z=x y #to concatenate x and y z=x":"y #to concatenate x and y with a colon separator.