Awk utility in Fedora

Fedora provides awk as part of its default installation, including all its editions, including the immutable ones like Silverblue. But you may be asking, what is awk and why would you need it?

Awk is a data driven programming language that acts when it matches a pattern. On Fedora, and most other distributions, GNU awk or gawk is used. Read on for more about this language and how to use it.

A brief history of awk

Awk began at Bell Labs in 1977. Its name is an acronym from the initials of the designers: Alfred V. Aho, Peter J. Weinberger, and Brian W. Kernighan.

The specification for awk in the POSIX Command Language and Utilities standard further clarified the language. Both the gawk designers and the original awk designers at Bell Laboratories provided feedback for the POSIX specification.

From The GNU Awk User’s Guide

For a more in-depth look at how awk/gawk ended up being as powerful and useful as it is, follow the link above. Numerous individuals have contributed to the current state of gawk. Among those are:

  • Arnold Robbins and David Trueman, the creators of gawk
  • Michael Brennan, the creator of mawk, which later was merged with gawk
  • Jurgen Kahrs, who added networking capabilities to gawk in 1997
  • John Hague, who rewrote the gawk internals and added an awk-level debugger in 2011

Using awk

The following sections show various ways of using awk in Fedora.

At the command line

The simples way to invoke awk is at the command line. You can search a text file for a particular pattern, and if found, print out the line(s) of the file that match the pattern anywhere. As an example, use cat to take a look at the command history file in your home director:

$ cat ~/.bash_history

There are probably many lines scrolling by right now.

Awk helps with this type of file quite easily. Instead of printing the entire file out to the terminal like cat, you can use awk to find something of specific interest. For this example, type the following at the command line if you’re running a standard Fedora edition:

$ awk '/dnf/' ~/.bash_history

If you’re running Silverblue, try this instead:

$ awk '/rpm-ostree/' ~/.bash_history

In both cases, more data likely appears than what you really want. That’s no problem for awk since it can accept regular expressions. Using the previous example, you can change the pattern to more closely match search requirements of wanting to know about installs only. Try changing the search pattern to one of these:

$ awk '/rpm-ostree install/' ~/.bash_history
$ awk '/dnf install/' ~/.bash_history

All the entries of your bash command line history appear that have the pattern specified at any position along the line. Awk works on one line of a data file at a time. It matches pattern, then performs an action, then moves to next line until the end of file (EOF) is reached.

From an awk program

Using awk at the command line as above is not much different than piping output to grep, like this:

$ cat .bash_history | grep 'dnf install'

The end result of printing to standard output (stdout) is the same with both methods.

Awk is a programming language, and the command awk is an interpreter of that language. The real power and flexibility of awk is you can make programs with it, and combine them with shell scripts to create even more powerful programs. For more feature rich development with awk, you can also incorporate C or C++ code using Dynamic-Extensions.

Next, to show the power of awk, let’s make a couple of program files to print the header and draw five numbers for the first row of a bingo card. To do this we’ll create two awk program files.

The first file prints out the header of the bingo card. For this example it is called bingo-title.awk. Use your favorite editor to save this text as that file name:


BEGIN {
    print "B\tI\tN\tG\tO"
}

Now the title program is ready. You could try it out with this command:

$ awk -f bingo-title.awk

The program prints the word BINGO, with a tab space (\t) between the characters. For the number selection, let’s use one of awk’s builtin numeric functions called rand() and use two of the control statements, for and switch. (Except the editor changed my program, so no switch statement used this time).

The title of the second awk program is bingo-num.awk. Enter the following into your favorite editor and save with that file name:


@include "bingo-title.awk"
BEGIN {
    for (i = 1; i < = 5; i++) {
    b = int(rand() * 15) + (15*(i-1))
    printf "%s\t", b
    }
    print
}

The @include statement in the file tells the interpreter to process the included file first. In this case the interpreter processs the bingo-title.awk file so the title prints out first.

Running the test program

Now enter the command to pick a row of bingo numbers:

$ awk -f bingo-num.awk

Output appears similar to the following. Note that the rand() function in awk is not ideal for truly random numbers. It’s used here only as for example purposes.


$ awk -f bingo-num.awk
B   I   N   G   O
13  23  34  53  71

In the example, we created two programs with only beginning sections that used actions to manipulate data generated from within the awk program. In order to satisfy the rules of Bingo, more work is needed to achieve the desirable results. The reader is encouraged to fix the programs so they can reliably pick bingo numbers, maybe look at the awk function srand() for answers on how that could be done.

Final examples

Awk can be useful even for mundane daily search tasks that you encounter, like listing all flatpak’s on the Flathub repository from org.gnome (providing you have the Flathub repository setup). The command to do that would be:

$ flatpak remote-ls flathub --system | awk /org.gnome/

A listing appears that shows all output from remote-ls that matches the org.gnome pattern. To see flatpaks already installed from org.gnome, enter this command:

$ flatpak list --system | awk /org.gnome/

Awk is a powerful and flexible programming language that fills a niche with text file manipulation exceedingly well.

Fedora Project community

39 Comments

  1. boaboa

    you mean  awk ‘/dnf/’ ~/.bash_history ? There is confusion between slash and antislash.

    • Sorry, no I mean what was typed, but the regex will need to be single quoted for bash. I use zshell as my shell of choice and sometimes I forget to adapt what I do to bash since they are generally interchangeable, this would be one of the exceptions.Yes, you are correct, that is exactly what I meant. The command in bash needs the regex single quoted so it then looks like

      awk '/dnf/' ~/.bash_history

      instead.

      • Todd Lewis

        That particular regex does not need to be quoted in bash.

        • You are correct, it works without the single quotes. I was more referring to the regex with two words in it.

      • Todd Lewis

        Stephen, the command as presented doesn’t work with zsh or bash.

        • Yeah, it would appear I was remiss in checking after edits. You are again correct the command is not supposed to have \ in the regex, it must be / since awk will interpret the \ plus the following character as an escape sequence and fault. So the command’s should be like this

          awk /dnf/ ~/.bash_history

          and

          awk '/rpm-ostree install/' ~/.bash_history

          as example.

  2. Tomas

    Those should most likely be forward slashes in the quote marks:

    awk ‘\dnf\’ ~/.bash_history

    • Again, no they are part of the regular expression used by awk/gawk as the test when searching a file. Bash will need the single quotes, but if you use zshell for instance it doesn’t.
      My appologies, you are correct, I really need to check edit’s. Anyway the backslashes in the regex need to be forward slashes. The single quotes are only needed when the regex uses more than one word as in the ‘/dnf install/’ test.

  3. Sebastiaan Franken

    Sadly the first few examples don’t work for me. They throw the following errors:

    awk: cmd. line:1: \dnf install\
    awk: cmd. line:1: ^ backslash not last character on line
    • Hello,

      Sorry for the confusion on the first two usage examples, on some systems, the shell will give issues, such as bash and backslashes or slashes. in those instances simply single quote your regex making the command

      awk '/dnf install/' ~/.bash_history

      , and you should get the correct result.

  4. Hello to those who have had the misfortune of using my messed up bit’s of code in the article. I have fixed the errors and they should be good to try now. Sorry for the confusion, and thank you to all who have posted their comments trying to help me realize them.

  5. Marcelo Mafra

    The line 5 of the bingo-num.awk file reads:
    for (i = 1; i <= 5; i++) {
    should be:
    for (i = 1; i <= 5; i++) {
    Probably a copy & paste problem.

    • Hello,
      Yes it must be. Originally, it was a less elegant version that used a switch statement, the editors cleaned it up, but I didn’t check it afterwards. I was hoping readers would pick it apart and offer better solutions. I wrote the article mainly to generate some interest around awk/gawk since it is a pretty cool programming language for specific uses. The Bingo example really needs to be fixed in any event, it will give the same results for each line if you were to try to fill a bingo card with numbers it picks.

      • Here is an improved version of your BINGO example:

        perl -e 'print "B\tI\tN\tG\tO\n"; while ($i++ < 5) { print int rand 100, "\t"; }; print "\n";'

        ????

      • Correction: I’m not a BINGO player, I didn’t realize that there were limitations to the range of numbers that could appear in each column. Here is a version of that PERL script that takes into account the range limits for anyone who might actually be interested:

        perl -e 'print "B\tI\tN\tG\tO\n"; do { print ((int rand 16) + ($i*15)); print "\t"; } while (++$i < 5); print "\n";'
      • So, apparently zero isn’t allowed either. One last try:

        perl -e 'print "B\tI\tN\tG\tO\n"; do { print (1 + (int rand 15) + ($i*15)); print "\t"; } while (++$i < 5); print "\n";'
        • Yeah, funny thing is I don’t play it either and had to look up the rules on the wikipedia page. If you want an introduction to Bingo and simultaneous exposure to a bit of Canadiana, check out Stompin’ Tom Connors song Sudbury Saturday Night for your edification. On a related note, the intent I had was for the readers to make a better Bingo program with awk or gawk.

    • alain

      What’s the difference?

  6. johanh

    When comparing to using grep, I would simply have compared to “grep ‘dnf install’ .bash_history”. No need for cat and pipe in this case.

  7. jnagy

    grep searches for PATTERN in each FILE.
    grep ‘dnf install’ .bash_history
    Dont neccessary to use cat and pipe

  8. Thanks for the article. A follow-up with some more advanced usage would be nice. I am interested in learning more. I never learned awk as I went straight to Perl after trying to write scripts in bash. Perl does everything that bash, sed and awk do. Once the code gets more complicated than running a few commands, a language like Perl is better. I looked at Python but it was too slow not mature enough at the time. It is still too slow. But, I’ve found very few other people how can understand Perl. So, now I end up writing bash just so that other people can understand and modify what I did, even though the code is longer and not as straight-forward.

    • Thank you, I was not very familiar with awk prior to writing the article which I began last Wednesday evening. By Thursday morning I realized that I wasn’t even going to scratch the surface of potential in awk/gawk. I thought Perl was pretty much ubiquitous in the realm of system administration. I am not as familiar with Perl as bash or zsh scripting, and I don’t feel that I am much more than a power user with them. As for awk/gawk I think a more advanced article would be a good thing, but I would like a bit of time playing with it to explore potential before writing something of greater detail. I would like to see someone do a Perl intro/intermediate article, it is a language I have intended on getting more familiar with, just never seemed to do it.

      • Ivan

        Stephen, during normal work, I cannot live without awk! It is so powerfull!
        parsing files, filtering!
        For example, if you want to print field 2 and the fied 1 of /etc/passwd (a stupid example), you can do:
        awk -F: ‘{print $2, $1}’ /etc/passwd

        -F: means ” use : as separator”
        Fast and simple!

        • johanh

          Be aware that using awk could be much slower and more consuming on resources than e.g. using

          cut -d: -f2,1 /etc/passwd

          So if you are using it for processing a lot of data, you should rather use “cut” in this example. This is because awk is a larger program and meant to be used for much more than parsing a simple field. (But I often use it myself in this way.)

          • Osvaldo Marques Jr

            Sorry, but the cut comnand put the field in the order of the file, not the order required.

    • Sebastiaan Franken

      Just out of curiosity: what kind of astronomical workloads do you run on your machine(s) that Python is “too slow”?

  9. silber

    I like to look at awk scripts as programs. I tested on Ubuntu 16.04:
    command: which awk
    returns: /usr/bin/awk
    Now with text editor open bingo-num.awk file and insert at top of file:
    #!/usr/bin/awk -f

    Save and exit the file. Add execute permissions:
    chmod 700 bingo-num.awk

    And now awk program can be directly executed from shell:
    ./bing-num.awk

  10. silber

    The first contact with awk I have had like 20 years ago when I needed to print only some columns out of file. For example let say I need to print first and third column of passwd file:
    awk -F: ‘{print $1, $3}’ /etc/passwd

    Default column delimiter in awk is space/tab, but passwd uses “:” as column delimiters, so -F: defines the delimiters.

  11. silber

    Adding some header and footer in text file:
    gawk -f bingo-num.awk | gawk ‘BEGIN {print “my header”} {print $0} END {print “my footer”}’

    It prints “my header” in first line. Then print all of the lines in file ($0 is for print all columns) and after the file it prints “my footer”.

  12. silver

    To ignore first line from the file:
    gawk -f bingo-num.awk | gawk ‘NR > 1’
    or little bit longer syntax
    gawk -f bingo-num.awk | gawk ‘NR > 1 {print}’
    or full syntax
    gawk -f bingo-num.awk | gawk ‘NR > 1 {print $0}’

  13. Hi Stephen, thanks a lot for your blog post!

    You got a typo in your code examples:
    $ awk ‘/dnf isntall/’ ~/.bash_history
    => should read “install” instead of “isntall”

  14. This is a simple intro.
    The awk tool (known as a special-purpose programming language) is a simple tool with many possibilities into the area of text processing.
    I expected to see more examples with special issues…

    • Indeed, I barely scratched the surface of what you can do with awk/gawk. I would like to delve into it in more detail maybe for a future article. For that though, I would want to use it for a while to explore it’s potential.

  15. rapra

    I realized the awesome power of ‘awk’ when I had to perform numerical operations on some of the columns of the 32 column 2000 line data file.
    With ‘awk’, it was so easy, fast and elegant, and best of all, it is a one line command.

    • Stevko

      Yes, simple numerical stuff.
      Let’s say you have file like this (every line has person and how much of something they got):
      Alice 20
      Bob 5
      Alice 4
      Alice 6
      Eva 67
      Alice 9
      Bob 10

      To sum them all in awk you would do use script.awk with:

      {sums[$1] += $2;}
      END {for (f in sums) {print f ” has ” sums[f];}}

      and run
      awk -f script.awk inputfile
      and get the output
      Eva has 67
      Bob has 15
      Alice has 39

      You can write the script into command line if you want (instead of separate file).

  16. Nice and beautiful article. Thank you for sharing with us.

Comments are Closed

The opinions expressed on this website are those of each author, not of the author's employer or of Red Hat. Fedora Magazine aspires to publish all content under a Creative Commons license but may not be able to do so in all cases. You are responsible for ensuring that you have the necessary permission to reuse any work on this site. The Fedora logo is a trademark of Red Hat, Inc. Terms and Conditions