Command line quick tips: Locate and process files with find and xargs

Image by Ryan Lerch (CC BY-SA 4.0)

find is one of the more powerful and flexible command-line programs in the daily toolbox. It does what the name suggests: it finds files and directories that match the conditions you specify. And with arguments like -exec or -delete, you can have find take action on what it… finds.

In this installment of the Command Line Quick Tips series, you’ll get an introduction to the find command and learn how to use it to process files with built-in commands or the xargs command.

Finding files

At a minimum, find takes a path to find things in. For example, this command will find (and print) every file on the system:

find /

And since everything is a file, you will get a lot of output to sort through. This probably doesn’t help you locate what you’re looking for. You can change the path argument to narrow things down a bit, but it’s still not really any more helpful than using the ls command. So you need to think about what you’re trying to locate.

Perhaps you want to find all the JPEG files in your home directory. The -name argument allows you to restrict your results to files that match the given pattern.

find ~ -name '*jpg'

But wait! What if some of them have an uppercase extension? -iname is like -name, but it is case-insensitive:

find ~ -iname '*jpg'

Great! But the 8.3 name scheme is so 1985. Some of the pictures might have a .jpeg extension. Fortunately, we can combine patterns with an “or,” represented by -o. The parentheses are escaped so that the shell doesn’t try to interpret them instead of the find command.

find ~ \( -iname '*jpeg' -o -iname '*jpg' \)

We’re getting closer. But what if you have some directories that end in jpg? (Why you named a directory bucketofjpg instead of pictures is beyond me.) We can modify our command with the -type argument to look only for files:

find ~ \( -iname '*jpeg' -o -iname '*jpg' \) -type f

Or maybe you’d like to find those oddly named directories so you can rename them later:

find ~ \( -iname '*jpeg' -o -iname '*jpg' \) -type d

It turns out you’ve been taking a lot of pictures lately, so narrow this down to files that have changed in the last week with -mtime (modification time). The -7 means all files modified in 7 days or fewer.

find ~ \( -iname '*jpeg' -o -iname '*jpg' \) -type f -mtime -7

Taking action with xargs

The xargs command takes arguments from the standard input stream and executes a command based on them. Sticking with the example in the previous section, let’s say you want to copy all of the JPEG files in your home directory that have been modified in the last week to a thumb drive that you’ll attach to a digital photo display. Assume you already have the thumb drive mounted as /media/photo_display.

find ~ \( -iname '*jpeg' -o -iname '*jpg' \) -type f -mtime -7 -print0 | xargs -0 cp -t /media/photo_display

The find command is slightly modified from the previous version. The -print0 command makes a subtle change on how the output is written: instead of using a newline, it adds a null character. The -0 (zero) option to xargs adjusts the parsing to expect this. This is important because otherwise actions on file names that contain spaces, quotes, or other special characters may not work as expected. You should use these options whenever you’re taking action on files.

The -t argument to cp is important because cp normally expects the destination to come last. You can do this without xargs using find‘s -exec command, but the xargs method will be faster, especially with a large number of files, because it will run as a single invocation of cp.

Find out more

This post only scratches the surface of what find can do. find supports testing based on permissions, ownership, access time, and much more. It can even compare the files in the search path to other files. Combining tests with Boolean logic can give you incredible flexibility to find exactly the files you’re looking for. With build in commands or piping to xargs, you can quickly process a large set of files.

Portions of this article were previously published on Opensource.com. Photo by Warren Wong on Unsplash.

Using Software

22 Comments

  1. Seirdy

    For a faster alternative, try fd 0

    It’s listed as “fd-find” in Fedora’s repos.

  2. pamsu

    I think it would be good if articles like these had their own subsection.

  3. Stevko

    GNU find has two versions of exec. One of them ends with ; and one with +. The second one is what you may want instead of xargs.
    Also, since Unix (and Linux) decided to have almost no restriction on filenames, it means that filenames can contain newlines and this will break. Use -print0 in find and -0 in xargs.

    • Peder

      Yes, ” time find . ( -iname ‘jpg’ -o -iname ‘jpeg’ ) -type f -mtime -7 -exec cp -t Pics -a {} +” works just as well as the xargs version.
      It also handles apostrophes and newlines, is less to type but just as fast and doesn’t make you have to remember any -print0 | -0 flags.

    • Peder

      The forum ate my backslashes!
      The -exec command should look like this, I hope : time find . \( -iname ‘jpg’ -o -iname ‘jpeg’ \) -type f -mtime -7 -exec cp -t Pics {} \+

      With a backslash before each parenthesis and one before the +

    • Elliott

      From the POSIX standard for xargs:
      “… if xargs is used to bundle the output of commands like find dir -print or ls into commands to be executed, unexpected results are likely if any filenames contain , , or quoting characters. This can be solved by using find to call a script that converts each file found into a quoted string that is then piped to xargs, but in most cases it is preferable just to have find do the argument aggregation itself by using -exec with a ‘+’ terminator instead of ‘;’.”

      Also, the xargs -0 option is not in the POSIX standard so it may not be available, depending on the environment.

  4. Pedro

    Simply amazing 🙂

  5. kilgoretrout

    One major limiting factor with xargs is that it will refuse to process files with a single quote like “Dec. ’97” or “Jerry’s Kids”. If it encounters such a file, it will spit out an “unmatched single quote” error message and stop in its tracks. I found this out the hard way and started using exec with the find command instead of xargs.

    • The -0 (zero) option will correct this. From the manual page:

      Input items are terminated by a null character instead of by whitespace, and the quotes and backslash are not special (every character is taken literally). Disables the end of file string, which is treated like any other argument. Useful when input items might contain white space, quote marks, or backslashes. The GNU find -print0 option produces input suitable for this mode.

      I’ll update the example to include this since, as you said, it is a lesson that can be learned painfully.

  6. ernesto

    kudos for demystifying two daunting but important commands to all shell new commers.

    there are so many things you can do with ‘find’, perhaps another article show casing these features?

    i do a agree with another poster, all these command line articles require a section of their own for easier access/browsing in the future 😀

  7. Robin

    Interesting, I was looking for a way to convert all my .xls files to .ods few days ago and that will work just fine associated with LibreOffice CLI.

  8. chris

    I much more prefer to use something of the form:

    ll *.jpg| gawk ‘{system(“command_line_execution.sh “$9)}’

    its simpler for me I just never tried to use xargs or find as you have to escape everthing differently

    • Göran Uddeborg

      Gawk or shell scripts could be used in place of xargs, but you can’t replace all the testing features of find with ls. Try to find all JPEG files on any level below the current directory, and I think you will have problems. 🙂 When done, try to find only the new ones! 🙂

      • Chris

        That would be ls -Rl | grep “.jpg” ….
        Or you could use
        ls -Rl | gawk ‘if ($9 ~ /.jpg/) {system(“bash_command.sh “$1)}’

        Either way I’m just not to hip to find and the exec option. I use find all the time, and you could actually use find then pipe into gawk’s system function.

        • Göran Uddeborg

          ls -Rl would not give you the full path to the file. You would have to use a longer gawk script which also records the path when the pattern from ls lists it, and combine the two when you find a match. All of which find would give you for free.

      • Chris

        As far as finding new ones, I guess I’m lost on what you mean by new ones? Do you mean past a certain date? Again I don’t have any problems with find other than the exec option. It has wierd escaping, and from what I have used of it only allows for one argument. For instance what if you wanted to mv all of those jpgs but change the name and have the name some derivative of the first name. It gets complicated and gawk system allows you to use piped in input with as many Fields as you want and put them wherever you want in the cli execution.

        • Göran Uddeborg

          With finding new ones, I did indeed mean files modified recently. That was an example used in the article.

          As for renaming in more complicated ways, I might also use a script, possibly combined with the various print options of find. But I would still call it directly via -exec and/or xargs, rather than going via system in gawk.

          • chris

            I think you fail to see the point of my original comment, and are trying to sell me (or someone else) on find. I use find and understand how to use it, I just feel that having a fundamental understanding of a few things and you can accomplish most tasks anyways you would like.

            for instance, you can change ownership permissions either,

            ie: chmod 777 exec.sh
            or: chmod o+rwx,g+rwx,u+rwx exec.sh

            two ways to skin a cat and for some one way is easier than the other or prefered.

            • Göran Uddeborg

              I was trying to show how your examples could not achieve much of what find does. When it could, it made things more complicated, in my opinion.

              But maybe I don’t get your point. Of course, everyone should do things the way he or she prefers.

              • This thread helps to illustrate there are many ways for doing things, and they will achieve some common and some divergent goals sometimes. It would be preferable to move on since the thread value is diminishing. Thanks! — ed.

  9. Göran Uddeborg

    The globbing stars were lost in the first example with disjunction (-o).

Comments are Closed

The opinions expressed on this website are those of each author, not of the author's employer or of Red Hat. Fedora Magazine aspires to publish all content under a Creative Commons license but may not be able to do so in all cases. You are responsible for ensuring that you have the necessary permission to reuse any work on this site. The Fedora logo is a trademark of Red Hat, Inc. Terms and Conditions