Fedora provides awk as part of its default installation, including all its editions, including the immutable ones like Silverblue. But you may be asking, what is awk and why would you need it?
Awk is a data driven programming language that acts when it matches a pattern. On Fedora, and most other distributions, GNU awk or gawk is used. Read on for more about this language and how to use it.
A brief history of awk
Awk began at Bell Labs in 1977. Its name is an acronym from the initials of the designers: Alfred V. Aho, Peter J. Weinberger, and Brian W. Kernighan.
The specification for awk in the POSIX Command Language and Utilities standard further clarified the language. Both the gawk designers and the original awk designers at Bell Laboratories provided feedback for the POSIX specification.
From The GNU Awk User’s Guide
For a more in-depth look at how awk/gawk ended up being as powerful and useful as it is, follow the link above. Numerous individuals have contributed to the current state of gawk. Among those are:
- Arnold Robbins and David Trueman, the creators of gawk
- Michael Brennan, the creator of mawk, which later was merged with gawk
- Jurgen Kahrs, who added networking capabilities to gawk in 1997
- John Hague, who rewrote the gawk internals and added an awk-level debugger in 2011
Using awk
The following sections show various ways of using awk in Fedora.
At the command line
The simples way to invoke awk is at the command line. You can search a text file for a particular pattern, and if found, print out the line(s) of the file that match the pattern anywhere. As an example, use cat to take a look at the command history file in your home director:
$ cat ~/.bash_history
There are probably many lines scrolling by right now.
Awk helps with this type of file quite easily. Instead of printing the entire file out to the terminal like cat, you can use awk to find something of specific interest. For this example, type the following at the command line if you’re running a standard Fedora edition:
$ awk '/dnf/' ~/.bash_history
If you’re running Silverblue, try this instead:
$ awk '/rpm-ostree/' ~/.bash_history
In both cases, more data likely appears than what you really want. That’s no problem for awk since it can accept regular expressions. Using the previous example, you can change the pattern to more closely match search requirements of wanting to know about installs only. Try changing the search pattern to one of these:
$ awk '/rpm-ostree install/' ~/.bash_history
$ awk '/dnf install/' ~/.bash_history
All the entries of your bash command line history appear that have the pattern specified at any position along the line. Awk works on one line of a data file at a time. It matches pattern, then performs an action, then moves to next line until the end of file (EOF) is reached.
From an awk program
Using awk at the command line as above is not much different than piping output to grep, like this:
$ cat .bash_history | grep 'dnf install'
The end result of printing to standard output (stdout) is the same with both methods.
Awk is a programming language, and the command awk is an interpreter of that language. The real power and flexibility of awk is you can make programs with it, and combine them with shell scripts to create even more powerful programs. For more feature rich development with awk, you can also incorporate C or C++ code using Dynamic-Extensions.
Next, to show the power of awk, let’s make a couple of program files to print the header and draw five numbers for the first row of a bingo card. To do this we’ll create two awk program files.
The first file prints out the header of the bingo card. For this example it is called bingo-title.awk. Use your favorite editor to save this text as that file name:
BEGIN {
print "B\tI\tN\tG\tO"
}
Now the title program is ready. You could try it out with this command:
$ awk -f bingo-title.awk
The program prints the word BINGO, with a tab space (\t) between the characters. For the number selection, let’s use one of awk’s builtin numeric functions called rand() and use two of the control statements, for and switch. (Except the editor changed my program, so no switch statement used this time).
The title of the second awk program is bingo-num.awk. Enter the following into your favorite editor and save with that file name:
@include "bingo-title.awk"
BEGIN {
for (i = 1; i < = 5; i++) {
b = int(rand() * 15) + (15*(i-1))
printf "%s\t", b
}
}
The @include statement in the file tells the interpreter to process the included file first. In this case the interpreter processs the bingo-title.awk file so the title prints out first.
Running the test program
Now enter the command to pick a row of bingo numbers:
$ awk -f bingo-num.awk
Output appears similar to the following. Note that the rand() function in awk is not ideal for truly random numbers. It’s used here only as for example purposes.
$ awk -f bingo-num.awk
B I N G O
13 23 34 53 71
In the example, we created two programs with only beginning sections that used actions to manipulate data generated from within the awk program. In order to satisfy the rules of Bingo, more work is needed to achieve the desirable results. The reader is encouraged to fix the programs so they can reliably pick bingo numbers, maybe look at the awk function srand() for answers on how that could be done.
Final examples
Awk can be useful even for mundane daily search tasks that you encounter, like listing all flatpak’s on the Flathub repository from org.gnome (providing you have the Flathub repository setup). The command to do that would be:
$ flatpak remote-ls flathub --system | awk /org.gnome/
A listing appears that shows all output from remote-ls that matches the org.gnome pattern. To see flatpaks already installed from org.gnome, enter this command:
$ flatpak list --system | awk /org.gnome/
Awk is a powerful and flexible programming language that fills a niche with text file manipulation exceedingly well.
boaboa
you mean awk ‘/dnf/’ ~/.bash_history ? There is confusion between slash and antislash.
Stephen Snow
Sorry, no I mean what was typed, but the regex will need to be single quoted for bash. I use zshell as my shell of choice and sometimes I forget to adapt what I do to bash since they are generally interchangeable, this would be one of the exceptions.Yes, you are correct, that is exactly what I meant. The command in bash needs the regex single quoted so it then looks likeinstead.
Todd Lewis
That particular regex does not need to be quoted in bash.
Stephen Snow
You are correct, it works without the single quotes. I was more referring to the regex with two words in it.
Todd Lewis
Stephen, the command as presented doesn’t work with zsh or bash.
Stephen Snow
Yeah, it would appear I was remiss in checking after edits. You are again correct the command is not supposed to have \ in the regex, it must be / since awk will interpret the \ plus the following character as an escape sequence and fault. So the command’s should be like this
and
as example.
Tomas
Those should most likely be forward slashes in the quote marks:
awk ‘\dnf\’ ~/.bash_history
Stephen Snow
Again, no they are part of the regular expression used by awk/gawk as the test when searching a file.Bash will need the single quotes, but if you use zshell for instance it doesn’t.My appologies, you are correct, I really need to check edit’s. Anyway the backslashes in the regex need to be forward slashes. The single quotes are only needed when the regex uses more than one word as in the ‘/dnf install/’ test.
Sebastiaan Franken
Sadly the first few examples don’t work for me. They throw the following errors:
awk: cmd. line:1: ^ backslash not last character on line
Stephen Snow
Hello,
Sorry for the confusion on the first two usage examples, on some systems, the shell will give issues, such as bash and backslashes or slashes. in those instances simply single quote your regex making the command
, and you should get the correct result.
Stephen Snow
Hello to those who have had the misfortune of using my messed up bit’s of code in the article. I have fixed the errors and they should be good to try now. Sorry for the confusion, and thank you to all who have posted their comments trying to help me realize them.
Marcelo Mafra
The line 5 of the bingo-num.awk file reads:
for (i = 1; i <= 5; i++) {
should be:
for (i = 1; i <= 5; i++) {
Probably a copy & paste problem.
Stephen Snow
Hello,
Yes it must be. Originally, it was a less elegant version that used a switch statement, the editors cleaned it up, but I didn’t check it afterwards. I was hoping readers would pick it apart and offer better solutions. I wrote the article mainly to generate some interest around awk/gawk since it is a pretty cool programming language for specific uses. The Bingo example really needs to be fixed in any event, it will give the same results for each line if you were to try to fill a bingo card with numbers it picks.
Gregory Bartholomew
Here is an improved version of your BINGO example:
????
Stephen Snow
Thanks Gregory,
you should do an article about making a game with perl.
Gregory Bartholomew
Correction: I’m not a BINGO player, I didn’t realize that there were limitations to the range of numbers that could appear in each column. Here is a version of that PERL script that takes into account the range limits for anyone who might actually be interested:
Gregory Bartholomew
So, apparently zero isn’t allowed either. One last try:
Stephen Snow
Yeah, funny thing is I don’t play it either and had to look up the rules on the wikipedia page. If you want an introduction to Bingo and simultaneous exposure to a bit of Canadiana, check out Stompin’ Tom Connors song Sudbury Saturday Night for your edification. On a related note, the intent I had was for the readers to make a better Bingo program with awk or gawk.
alain
What’s the difference?
johanh
When comparing to using grep, I would simply have compared to “grep ‘dnf install’ .bash_history”. No need for cat and pipe in this case.
Stephen Snow
Good point,
I wasn’t really focused on cat or grep for that matter.
jnagy
grep searches for PATTERN in each FILE.
grep ‘dnf install’ .bash_history
Dont neccessary to use cat and pipe
Bill Chatfield
Thanks for the article. A follow-up with some more advanced usage would be nice. I am interested in learning more. I never learned awk as I went straight to Perl after trying to write scripts in bash. Perl does everything that bash, sed and awk do. Once the code gets more complicated than running a few commands, a language like Perl is better. I looked at Python but it was too slow not mature enough at the time. It is still too slow. But, I’ve found very few other people how can understand Perl. So, now I end up writing bash just so that other people can understand and modify what I did, even though the code is longer and not as straight-forward.
Stephen Snow
Thank you, I was not very familiar with awk prior to writing the article which I began last Wednesday evening. By Thursday morning I realized that I wasn’t even going to scratch the surface of potential in awk/gawk. I thought Perl was pretty much ubiquitous in the realm of system administration. I am not as familiar with Perl as bash or zsh scripting, and I don’t feel that I am much more than a power user with them. As for awk/gawk I think a more advanced article would be a good thing, but I would like a bit of time playing with it to explore potential before writing something of greater detail. I would like to see someone do a Perl intro/intermediate article, it is a language I have intended on getting more familiar with, just never seemed to do it.
Ivan
Stephen, during normal work, I cannot live without awk! It is so powerfull!
parsing files, filtering!
For example, if you want to print field 2 and the fied 1 of /etc/passwd (a stupid example), you can do:
awk -F: ‘{print $2, $1}’ /etc/passwd
-F: means ” use : as separator”
Fast and simple!
johanh
Be aware that using awk could be much slower and more consuming on resources than e.g. using
cut -d: -f2,1 /etc/passwd
So if you are using it for processing a lot of data, you should rather use “cut” in this example. This is because awk is a larger program and meant to be used for much more than parsing a simple field. (But I often use it myself in this way.)
Osvaldo Marques Jr
Sorry, but the cut comnand put the field in the order of the file, not the order required.
Sebastiaan Franken
Just out of curiosity: what kind of astronomical workloads do you run on your machine(s) that Python is “too slow”?
silber
I like to look at awk scripts as programs. I tested on Ubuntu 16.04:
command: which awk
returns: /usr/bin/awk
Now with text editor open bingo-num.awk file and insert at top of file:
#!/usr/bin/awk -f
Save and exit the file. Add execute permissions:
chmod 700 bingo-num.awk
And now awk program can be directly executed from shell:
./bing-num.awk
silber
The first contact with awk I have had like 20 years ago when I needed to print only some columns out of file. For example let say I need to print first and third column of passwd file:
awk -F: ‘{print $1, $3}’ /etc/passwd
Default column delimiter in awk is space/tab, but passwd uses “:” as column delimiters, so -F: defines the delimiters.
silber
Adding some header and footer in text file:
gawk -f bingo-num.awk | gawk ‘BEGIN {print “my header”} {print $0} END {print “my footer”}’
It prints “my header” in first line. Then print all of the lines in file ($0 is for print all columns) and after the file it prints “my footer”.
silver
To ignore first line from the file:
gawk -f bingo-num.awk | gawk ‘NR > 1’
or little bit longer syntax
gawk -f bingo-num.awk | gawk ‘NR > 1 {print}’
or full syntax
gawk -f bingo-num.awk | gawk ‘NR > 1 {print $0}’
Jürgen
Hi Stephen, thanks a lot for your blog post!
You got a typo in your code examples:
$ awk ‘/dnf isntall/’ ~/.bash_history
=> should read “install” instead of “isntall”
Stephen Snow
Hi Jürgen,
Thanks for catching that. It’s fixed now!
mythcat
This is a simple intro.
The awk tool (known as a special-purpose programming language) is a simple tool with many possibilities into the area of text processing.
I expected to see more examples with special issues…
Stephen Snow
Indeed, I barely scratched the surface of what you can do with awk/gawk. I would like to delve into it in more detail maybe for a future article. For that though, I would want to use it for a while to explore it’s potential.
rapra
I realized the awesome power of ‘awk’ when I had to perform numerical operations on some of the columns of the 32 column 2000 line data file.
With ‘awk’, it was so easy, fast and elegant, and best of all, it is a one line command.
Stevko
Yes, simple numerical stuff.
Let’s say you have file like this (every line has person and how much of something they got):
Alice 20
Bob 5
Alice 4
Alice 6
Eva 67
Alice 9
Bob 10
To sum them all in awk you would do use script.awk with:
{sums[$1] += $2;}
END {for (f in sums) {print f ” has ” sums[f];}}
and run
awk -f script.awk inputfile
and get the output
Eva has 67
Bob has 15
Alice has 39
You can write the script into command line if you want (instead of separate file).
Nita Mathews
Nice and beautiful article. Thank you for sharing with us.