Saturday, May 19, 2012

Finding Files Born After a Given Date

A few weeks ago Friend TK came up with a question:

RC, how can I find all of the pictures I put on my computer since Christmas without seeing all of the pictures in the .cache directories and other junk places?

An interesting question, indeed. We want to search for files that are newer than a certain target date, and they'll probably have extensions like .jpg or .png, though there will be the occasional file with an extension .jpeg or even .JPEG. It obviously requires some form of the Unix find command, but it's not exactly obvious what that would be. TK and I hunted around on the web a bit and finally came up with this:


touch -d 20111225 tokenfile
find . -type f \( -iname "*jpg" -o -iname "*jpeg" \) -anewer tokenfile  -print | egrep -v ".cache/|.thumbnails|.kde/"

which creates a marker, tokenfile, which looks to have been born last Christmas Day, and finds all files newer than that and ending in jpg or (the -o) jpeg without regard to case (that's the -iname). We then pipe the file through egrep, striping off (-v) files who have names matching directories we don't want to see (the | is the grep equivalent of find's -o.

This is not entirely elegant. I was pretty sure I could make this into a purely find one-liner, leaving no trace behind (such as that tokenfile we created up there. To do this I knew I'd have to delve deeper into findology than I had in my 30 or so years of Unix use. If found some of the clues at Linux.ie's finder-keepers page, and other hints elsewhere. Eventually this let me put together this one-line script:


find . -type d \( -iname .\[a-z\]\* -o -iname work \) -prune -o -newermt 2011225 \( -iname "*j*g" -o -iname "*png" \) -print

Let's look at this in some detail, since it contains some things that I hadn't known:

  • find, of course, is the Unix/Linux command for looking through your file system.
  • . is the current directory. Find uses it as the starting directory. find will look at every file in this directory, its sub-directories, and all their children and grandchildren, to the umpteenth generation. If you only wanted to search where you thought pictures might be, you could instead write
    find $HOME/Pictures
  • -type d says that the next set of files will describe directories, rather than files (which would be -type f).
  • The \( and \) delineate what will be a list of file descriptors. The backslashes make sure that the parenthesis are passed to find, and not gobbled up by the shell. Sometimes you can use quotations marks within find, instead of parenthesis, but this isn't one of those times.
  • Now for the heart of the matter: -iname .\[a-z\]\* tells find to look for directory names that start with a . followed by a letter ([a-z]), and then by anything else (*). Again the backslashes are there to make sure the next characters are passed to the find command. Note that this gets rid of every hidden directory in you file tree.
  • As before, -o is the or command. The -iname work command identifies my work directory, which may have pictures in it but nothing I'm interested in at the present time. You can add further -o -iname commands as needed.
  • -prune tells find to look at every file except those in the directories just named. This serves the purpose of the egrep -v command in our initial attempt.
  • You know, I'm not sure why the -o follows -prune. You'd think it would be some kind of and command, but just dropping it doesn't work.
  • -newermt 20111225 is one of the newer options in find. This one says to look at all files modified (m) after a certain time (t). The time here is a date, written in the format yyyymmdd, in this case last Christmas. If we wanted files written after noon on Christmas, we'd use -newermt "20111225 1200".
  • Again we have a list of -inames, delineated by backslash-parenthesis. These are file names you want to look for. Note that j*g catches all files ending in jpg, jpeg, JPG, or JPEG. It also catches files in .jynormouslyinteresting, but you can't have everything.
  • Finally, -print lists all the files. You can actually drop this, as find takes -print as its default action.

And that's it. OK, not quite. What TK wanted was to copy all of the files he found to a new directory, we'll call it recent, so he could examine them in detail. To do that we use the -exec option of find:


find . -type d \( -iname .\[a-z\]\* -o -name work -o -name recent \) -prune -o -newermt 20111225 \( -iname "*j*g" -o -iname "*png" \) -exec cp -p {} ~/recent ';'

  • Note that we've added the recent directory to our list of avoided directories. Otherwise find will search recent, and give errors.
  • -exec tells find to execute the following file command.
  • cp -r is the usual copy command, with -r saying to preserve the original timestamps on the copied files. If you wanted to save space, and weren't going to modify the pictures, you could link with either ln or ln -s instead.
  • {} is where find places its output. That is, if there is a file Pictures/cutekids.jpg, find issues the command
    cp -r Pictures/cutekids.jpg recent
  • ; tells find that we're done.

0 comments: