Awk utility in Fedora

Posted by Stephen Snow on April 29, 2019

Fedora provides awk as part of its default installation, including all its editions, including the immutable ones like Silverblue. But you may be asking, what is awk and why would you need it?

Awk is a data driven programming language that acts when it matches a pattern. On Fedora, and most other distributions, GNU awk or gawk is used. Read on for more about this language and how to use it.

A brief history of awk

Awk began at Bell Labs in 1977. Its name is an acronym from the initials of the designers: Alfred V. Aho, Peter J. Weinberger, and Brian W. Kernighan.

The specification for awk in the POSIX Command Language and Utilities standard further clarified the language. Both the gawk designers and the original awk designers at Bell Laboratories provided feedback for the POSIX specification.
From The GNU Awk User’s Guide

For a more in-depth look at how awk/gawk ended up being as powerful and useful as it is, follow the link above. Numerous individuals have contributed to the current state of gawk. Among those are:

Arnold Robbins and David Trueman, the creators of gawk
Michael Brennan, the creator of mawk, which later was merged with gawk
Jurgen Kahrs, who added networking capabilities to gawk in 1997
John Hague, who rewrote the gawk internals and added an awk-level debugger in 2011

Using awk

The following sections show various ways of using awk in Fedora.

At the command line

The simples way to invoke awk is at the command line. You can search a text file for a particular pattern, and if found, print out the line(s) of the file that match the pattern anywhere. As an example, use cat to take a look at the command history file in your home director:

$ cat ~/.bash_history

There are probably many lines scrolling by right now.

Awk helps with this type of file quite easily. Instead of printing the entire file out to the terminal like cat, you can use awk to find something of specific interest. For this example, type the following at the command line if you’re running a standard Fedora edition:

$ awk '/dnf/' ~/.bash_history

If you’re running Silverblue, try this instead:

$ awk '/rpm-ostree/' ~/.bash_history

In both cases, more data likely appears than what you really want. That’s no problem for awk since it can accept regular expressions. Using the previous example, you can change the pattern to more closely match search requirements of wanting to know about installs only. Try changing the search pattern to one of these:

$ awk '/rpm-ostree install/' ~/.bash_history
$ awk '/dnf install/' ~/.bash_history

All the entries of your bash command line history appear that have the pattern specified at any position along the line. Awk works on one line of a data file at a time. It matches pattern, then performs an action, then moves to next line until the end of file (EOF) is reached.

From an awk program

Using awk at the command line as above is not much different than piping output to grep, like this:

$ cat .bash_history | grep 'dnf install'

The end result of printing to standard output (stdout) is the same with both methods.

Awk is a programming language, and the command awk is an interpreter of that language. The real power and flexibility of awk is you can make programs with it, and combine them with shell scripts to create even more powerful programs. For more feature rich development with awk, you can also incorporate C or C++ code using Dynamic-Extensions.

Next, to show the power of awk, let’s make a couple of program files to print the header and draw five numbers for the first row of a bingo card. To do this we’ll create two awk program files.

The first file prints out the header of the bingo card. For this example it is called bingo-title.awk. Use your favorite editor to save this text as that file name:


BEGIN { 

    print "B\tI\tN\tG\tO" 

}

Now the title program is ready. You could try it out with this command:

$ awk -f bingo-title.awk

The program prints the word BINGO, with a tab space (\t) between the characters. For the number selection, let’s use one of awk’s builtin numeric functions called rand() and use two of the control statements, for and switch. (Except the editor changed my program, so no switch statement used this time).

The title of the second awk program is bingo-num.awk. Enter the following into your favorite editor and save with that file name:


@include "bingo-title.awk"

BEGIN {

    for (i = 1; i < = 5; i++) {

    b = int(rand() * 15) + (15*(i-1))

    printf "%s\t", b

    }

    print

}

The @include statement in the file tells the interpreter to process the included file first. In this case the interpreter processs the bingo-title.awk file so the title prints out first.

Running the test program

Now enter the command to pick a row of bingo numbers:

$ awk -f bingo-num.awk

Output appears similar to the following. Note that the rand() function in awk is not ideal for truly random numbers. It’s used here only as for example purposes.


$ awk -f bingo-num.awk 

B   I   N   G   O

13  23  34  53  71

In the example, we created two programs with only beginning sections that used actions to manipulate data generated from within the awk program. In order to satisfy the rules of Bingo, more work is needed to achieve the desirable results. The reader is encouraged to fix the programs so they can reliably pick bingo numbers, maybe look at the awk function srand() for answers on how that could be done.

Final examples

Awk can be useful even for mundane daily search tasks that you encounter, like listing all flatpak’s on the Flathub repository from org.gnome (providing you have the Flathub repository setup). The command to do that would be:

$ flatpak remote-ls flathub --system | awk /org.gnome/

A listing appears that shows all output from remote-ls that matches the org.gnome pattern. To see flatpaks already installed from org.gnome, enter this command:

$ flatpak list --system | awk /org.gnome/

Awk is a powerful and flexible programming language that fills a niche with text file manipulation exceedingly well.

Fedora Project Community

Stephen Snow

Industrial controls and instrumentation solution provider. A long time Fedora Linux user.

39 Comments

boaboa

you mean awk ‘/dnf/’ ~/.bash_history ? There is confusion between slash and antislash.

April 29, 2019
- Stephen Snow
  
  Sorry, no I mean what was typed, but the regex will need to be single quoted for bash. I use zshell as my shell of choice and sometimes I forget to adapt what I do to bash since they are generally interchangeable, this would be one of the exceptions.Yes, you are correct, that is exactly what I meant. The command in bash needs the regex single quoted so it then looks like
  
  awk '/dnf/' ~/.bash_history
  
  instead.
  
  April 29, 2019
  - Todd Lewis
    
    That particular regex does not need to be quoted in bash.
    
    April 29, 2019
    - Stephen Snow
      
      You are correct, it works without the single quotes. I was more referring to the regex with two words in it.
      
      April 29, 2019
  - Todd Lewis
    
    Stephen, the command as presented doesn’t work with zsh or bash.
    
    April 29, 2019
    - Stephen Snow
      
      Yeah, it would appear I was remiss in checking after edits. You are again correct the command is not supposed to have \ in the regex, it must be / since awk will interpret the \ plus the following character as an escape sequence and fault. So the command’s should be like this
      
      awk /dnf/ ~/.bash_history
      
      and
      
      awk '/rpm-ostree install/' ~/.bash_history
      
      as example.
      
      April 29, 2019
Tomas

Those should most likely be forward slashes in the quote marks:

awk ‘\dnf\’ ~/.bash_history

April 29, 2019
- Stephen Snow
  
  ~~Again, no they are part of the regular expression used by awk/gawk as the test when searching a file.~~ ~~Bash will need the single quotes, but if you use zshell for instance it doesn’t.~~
  My appologies, you are correct, I really need to check edit’s. Anyway the backslashes in the regex need to be forward slashes. The single quotes are only needed when the regex uses more than one word as in the ‘/dnf install/’ test.
  
  April 29, 2019
Sebastiaan Franken

Sadly the first few examples don’t work for me. They throw the following errors:

awk: cmd. line:1: \dnf install\
awk: cmd. line:1: ^ backslash not last character on line

April 29, 2019
- Stephen Snow
  
  Hello,
  
  Sorry for the confusion on the first two usage examples, on some systems, the shell will give issues, such as bash and backslashes or slashes. in those instances simply single quote your regex making the command
  
  awk '/dnf install/' ~/.bash_history
  
  , and you should get the correct result.
  
  April 29, 2019
Stephen Snow

Hello to those who have had the misfortune of using my messed up bit’s of code in the article. I have fixed the errors and they should be good to try now. Sorry for the confusion, and thank you to all who have posted their comments trying to help me realize them.

April 29, 2019
Marcelo Mafra

The line 5 of the bingo-num.awk file reads:
for (i = 1; i <= 5; i++) {
should be:
for (i = 1; i <= 5; i++) {
Probably a copy & paste problem.

April 29, 2019
- Stephen Snow
  
  Hello,
  Yes it must be. Originally, it was a less elegant version that used a switch statement, the editors cleaned it up, but I didn’t check it afterwards. I was hoping readers would pick it apart and offer better solutions. I wrote the article mainly to generate some interest around awk/gawk since it is a pretty cool programming language for specific uses. The Bingo example really needs to be fixed in any event, it will give the same results for each line if you were to try to fill a bingo card with numbers it picks.
  
  April 29, 2019
  - Gregory Bartholomew
    
    Here is an improved version of your BINGO example:
    
    perl -e 'print "B\tI\tN\tG\tO\n"; while ($i++ < 5) { print int rand 100, "\t"; }; print "\n";'
    
    ????
    
    April 29, 2019
    - Stephen Snow
      
      Thanks Gregory,
      you should do an article about making a game with perl.
      
      April 29, 2019
  - Gregory Bartholomew
    
    Correction: I’m not a BINGO player, I didn’t realize that there were limitations to the range of numbers that could appear in each column. Here is a version of that PERL script that takes into account the range limits for anyone who might actually be interested:
    
    perl -e 'print "B\tI\tN\tG\tO\n"; do { print ((int rand 16) + ($i*15)); print "\t"; } while (++$i < 5); print "\n";'
    
    April 29, 2019
  - Gregory Bartholomew
    
    So, apparently zero isn’t allowed either. One last try:
    
    perl -e 'print "B\tI\tN\tG\tO\n"; do { print (1 + (int rand 15) + ($i*15)); print "\t"; } while (++$i < 5); print "\n";'
    
    April 29, 2019
    - Stephen Snow
      
      Yeah, funny thing is I don’t play it either and had to look up the rules on the wikipedia page. If you want an introduction to Bingo and simultaneous exposure to a bit of Canadiana, check out Stompin’ Tom Connors song Sudbury Saturday Night for your edification. On a related note, the intent I had was for the readers to make a better Bingo program with awk or gawk.
      
      April 29, 2019
- alain
  
  What’s the difference?
  
  April 30, 2019
johanh

When comparing to using grep, I would simply have compared to “grep ‘dnf install’ .bash_history”. No need for cat and pipe in this case.

April 29, 2019
- Stephen Snow
  
  Good point,
  I wasn’t really focused on cat or grep for that matter.
  
  April 29, 2019
jnagy

grep searches for PATTERN in each FILE.
grep ‘dnf install’ .bash_history
Dont neccessary to use cat and pipe

April 29, 2019
Bill Chatfield

Thanks for the article. A follow-up with some more advanced usage would be nice. I am interested in learning more. I never learned awk as I went straight to Perl after trying to write scripts in bash. Perl does everything that bash, sed and awk do. Once the code gets more complicated than running a few commands, a language like Perl is better. I looked at Python but it was too slow not mature enough at the time. It is still too slow. But, I’ve found very few other people how can understand Perl. So, now I end up writing bash just so that other people can understand and modify what I did, even though the code is longer and not as straight-forward.

April 29, 2019
- Stephen Snow
  
  Thank you, I was not very familiar with awk prior to writing the article which I began last Wednesday evening. By Thursday morning I realized that I wasn’t even going to scratch the surface of potential in awk/gawk. I thought Perl was pretty much ubiquitous in the realm of system administration. I am not as familiar with Perl as bash or zsh scripting, and I don’t feel that I am much more than a power user with them. As for awk/gawk I think a more advanced article would be a good thing, but I would like a bit of time playing with it to explore potential before writing something of greater detail. I would like to see someone do a Perl intro/intermediate article, it is a language I have intended on getting more familiar with, just never seemed to do it.
  
  April 30, 2019
  - Ivan
    
    Stephen, during normal work, I cannot live without awk! It is so powerfull!
    parsing files, filtering!
    For example, if you want to print field 2 and the fied 1 of /etc/passwd (a stupid example), you can do:
    awk -F: ‘{print $2, $1}’ /etc/passwd
    
    -F: means ” use : as separator”
    Fast and simple!
    
    April 30, 2019
    - johanh
      
      Be aware that using awk could be much slower and more consuming on resources than e.g. using
      
      cut -d: -f2,1 /etc/passwd
      
      So if you are using it for processing a lot of data, you should rather use “cut” in this example. This is because awk is a larger program and meant to be used for much more than parsing a simple field. (But I often use it myself in this way.)
      
      April 30, 2019
      - Osvaldo Marques Jr
        
        Sorry, but the cut comnand put the field in the order of the file, not the order required.
        
        May 1, 2019
- Sebastiaan Franken
  
  Just out of curiosity: what kind of astronomical workloads do you run on your machine(s) that Python is “too slow”?
  
  April 30, 2019
silber

I like to look at awk scripts as programs. I tested on Ubuntu 16.04:
command: which awk
returns: /usr/bin/awk
Now with text editor open bingo-num.awk file and insert at top of file:
#!/usr/bin/awk -f

Save and exit the file. Add execute permissions:
chmod 700 bingo-num.awk

And now awk program can be directly executed from shell:
./bing-num.awk

April 30, 2019
silber

The first contact with awk I have had like 20 years ago when I needed to print only some columns out of file. For example let say I need to print first and third column of passwd file:
awk -F: ‘{print $1, $3}’ /etc/passwd

Default column delimiter in awk is space/tab, but passwd uses “:” as column delimiters, so -F: defines the delimiters.

April 30, 2019
silber

Adding some header and footer in text file:
gawk -f bingo-num.awk | gawk ‘BEGIN {print “my header”} {print $0} END {print “my footer”}’

It prints “my header” in first line. Then print all of the lines in file ($0 is for print all columns) and after the file it prints “my footer”.

April 30, 2019
silver

To ignore first line from the file:
gawk -f bingo-num.awk | gawk ‘NR > 1’
or little bit longer syntax
gawk -f bingo-num.awk | gawk ‘NR > 1 {print}’
or full syntax
gawk -f bingo-num.awk | gawk ‘NR > 1 {print $0}’

April 30, 2019
Jürgen

Hi Stephen, thanks a lot for your blog post!

You got a typo in your code examples:
$ awk ‘/dnf isntall/’ ~/.bash_history
=> should read “install” instead of “isntall”

April 30, 2019
- Stephen Snow
  
  Hi Jürgen,
  Thanks for catching that. It’s fixed now!
  
  April 30, 2019
mythcat

This is a simple intro.
The awk tool (known as a special-purpose programming language) is a simple tool with many possibilities into the area of text processing.
I expected to see more examples with special issues…

April 30, 2019
- Stephen Snow
  
  Indeed, I barely scratched the surface of what you can do with awk/gawk. I would like to delve into it in more detail maybe for a future article. For that though, I would want to use it for a while to explore it’s potential.
  
  May 1, 2019
rapra

I realized the awesome power of ‘awk’ when I had to perform numerical operations on some of the columns of the 32 column 2000 line data file.
With ‘awk’, it was so easy, fast and elegant, and best of all, it is a one line command.

April 30, 2019
- Stevko
  
  Yes, simple numerical stuff.
  Let’s say you have file like this (every line has person and how much of something they got):
  Alice 20
  Bob 5
  Alice 4
  Alice 6
  Eva 67
  Alice 9
  Bob 10
  
  To sum them all in awk you would do use script.awk with:
  
  {sums[$1] += $2;}
  END {for (f in sums) {print f ” has ” sums[f];}}
  
  and run
  awk -f script.awk inputfile
  and get the output
  Eva has 67
  Bob has 15
  Alice has 39
  
  You can write the script into command line if you want (instead of separate file).
  
  May 1, 2019
Nita Mathews

Nice and beautiful article. Thank you for sharing with us.

May 26, 2019