How do you check if an image file (tiff, psd, psb) is corrupted, other than looking at its thumbnail with Bridge, or opening it on Photoshop? With a small Python script and ImageMagick! Read along.
The client of mine I work as a retoucher for had some problems with the so-called Data Migration (the dull, time consuming, and error prone process of transferring a lifetime backup from old, once very expensive external drives to a set of new, somehow still equally expensive external drives). As a result, he got some corrupted files here and there in the destination drives – that’s the reason why you migrate data: the source has insufficient capacity, it has become unstable, obsolete, or both combined.
Problem is that we’re talking about several TB of data, mostly as .psb files (ranging from about 1 up to 20GB each), and it goes without saying that opening them all in Photoshop is not an option; nor you can trust Adobe Bridge thumbnails – provided that you’ve set the preferences to render previews for big files too – because it’s a manual process anyway. Even if I’m paid by the hour, staring at thumbs is not my preferred way to get blind.
After some research, I’ve found no way (other than the one I’m about to describe) to check for psd/psb files corruption in an automated fashion. Which seems to me quite odd – if you have better, i.e. faster and/or simpler, solutions, please do suggest them to me in the comments below.
What you need
Python 3, and ImageMagick. Both will work either on Mac or Windows: I’ve no experience of them on the latter platform, so I will just assume that you will be successful in following the installation instruction provided in the official home pages. PC owners: read at least the part relative to ImageMagick. Mac users: read it all.
If you’re on a Mac like me, you already have Python installed. Chances are that it is version 2.7, or another one but 3. Open the Terminal and type:
$ python --version
This is what I still get after having installed Python 3.6 myself, via Homebrew. I’m no Python expert, so it took me some Google time to understand that on a Mac you can still have the
python command pointing to the System’s old 2.7 version, even if you’ve freshly installed the new one.
Solutions involve manually changing symlinks (power users advise against it), or using one of the available packages to create isolate Python environments (e.g. virtualenv, pyenv, etc. A list of them is found here). I couldn’t make neither of them to work in a reasonable amount of time, so I’ve resorted to simply use the
python3 command, e.g.
$ python3 --version
ImageMagick is a multiplatform, open source commandline utility that performs a huge amount of tasks on all kinds of image files. I’ve installed via Homebrew, but it turns out that, at least on the Mac, it doesn’t come by default with the proper Delegates (aka Libraries) to deal with .psb files, which is what I needed the most.
Finding the proper way to do so proved almost impossible to me: while reading the source code documentation (the last thing I wanted to do was to compile it from the source), I’ve discovered that via Homebrew you can list all the possible installation options for a package:
$ brew options imagemagick
Compile with FFTW support
Build with fontconfig support
Build with ghostscript support
Compile with HDRI support
Build with liblqr support
Build with librsvg support
Build with libwmf support
Build with little-cms support
Build with little-cms2 support
Compile with OpenCL support
Build with openexr support
Build with openjpeg support
Compile with OpenMP support
Build with pango support
Compile with PerlMagick
Build with webp support
Build with x11 support
Disables depending on XML configuration files
Build without freetype support
Build without jpeg support
Build without libpng support
Build without libtiff support
disable build/install of Magick++
Disable support for dynamically loadable modules
Disable threads support
Install HEAD version
So, after a first installation (without psb support), with no clear hint about the proper option(s) to use in my case, and even less spare time to test, I’ve chained them all – at least the seemingly appropriate ones, with little worries about being redundant. At all events, no one was watching me, nor would have ever known 🙂 The embarrassing line I’ve used is:
brew reinstall imagemagick --with-fftw --with-fontconfig --with-ghostscript --with-hdri --with-libde265 --with-liblqr --with-librsvg --with-libwmf --with-little-cms --with-little-cms2 --with-opencl --with-openexr --with-openjpeg --with-openmp --with-pango --with-perl --with-webp --with-x11
It worked, so I was a happy camper.
The Python Script
Which is far from perfect, but it does the job – I’m sure that a proper Python developer can make it much better: it comes from surgical copy&paste from various Google Search result, plus a very light editing on my side.
from subprocess import Popen, PIPE
folderToCheck = '/Volumes/16TB/whatever/path'
fileExtension = '.psb'
proc = Popen(['identify', '-verbose', fn], stdout=PIPE, stderr=PIPE)
out, err = proc.communicate()
exitcode = proc.returncode
return exitcode, out, err
for directory, subdirectories, files, in os.walk(folderToCheck):
for file in files:
filePath = os.path.join(directory, file)
code, output, error = checkImage(filePath)
if str(code) !="0" or str(error, "utf-8") != "":
print("ERROR " + filePath)
print("OK " + filePath)
print("-------------- DONE --------------");
How it works
The basic is the
identify call (that comes from ImageMagick), which is set to
-verbose. This is what performs the check: the rest is just looping through the filesystem, looking for the appropriate file extension, and logging a message.
How to use it
Save this on a file with a
.py extension, and then run it with the
python3 command on a terminal, e.g.
$ python3 check.py
Before doing so, do change the content of the
folderToCheck variable with an actual folder on your disk (with absolute path), and the
fileExtension too: I’ve used
.psb, but you can change it to
.jpg, etc. As a result you’ll get a log in the Terminal; I’ve used a nifty, cheap application called Code Runner for such tests, and this is the result:
As you see, I’m just logging OK/ERROR with the path, very basic. What to do with this newly acquired piece of knowledge is up to you.
Please note that:
- The script processes nested folders too.
- It is awfully slow and hungry: it eats CPU cycles and RAM. But it’s automatic, so heck!
- The file extension is case sensitive, so
".JPG"is different from
How to make it better
Few suggestions for the skilled Python developer (which I’m not, alas)
- Write on a log file instead of the console
- Keep track of the processing status and resume from there
- Display the advancement status (say, “image 34 of 320”)
- Make the file extension case insensitive.
If you know how to do any of this, please share your knowledge in the comments! Thank you!