This document provides an introduction to using ssdeep and was last updated on The current version of this document can be found on the ssdeep web site at http://ssdeep.sourceforge.net/.
This guide starts with an explanation of the basic functions of ssdeep and then gives some examples of using fuzzy hashing in real world situations.
Users running Microsoft Windows are strongly encouraged to download the precompiled binaries from http://ssdeep.sourceforge.net/. Please note that these binaries are created using a MinGw cross compiler. Compiling the programs directly from Windows is not supported.
If your operating system does not support the automatic installation methods described above, you will have to download the source code and compile the programs yourself. First download the latest tarball of the program from http://ssdeep.sourceforge.net/. This file should be named something like ssdeep-2.2.tar.gz. Uncompress the file with the following command:
$ tar zxvf ssdeep-2.4.tar.gz
Change into the decompressed directory
$ cd ssdeep-2.4and configure the program.
$ ./configureThe configure script can accept lots of options. Run ./configure --help for the complete list. The most common option used is the prefix option which installs the program in a location other than the default, /usr/local/bin. If you wanted to install the program elsewhere, for example, /tmp/ssdeep, you would run ./configure --prefix=/tmp/ssdeep instead.
You can now compile the program using the make command:
$ makeand install it:
$ make installNote that you must be root on most operating systems to install the program to its default location, /usr/local/bin. The tool sudo may help:
$ sudo make install
C:\temp> ssdeep config.h INSTALL doc\README ssdeep,1.0--blocksize:hash:hash,filename 96:KQhaGCVZGhr83h3bc0ok3892m12wzgnH5w2pw+sxNEI58:FIVkH4x73h39LH+2w+sxaD,"C:\temp\config.h" 96:MD9fHjsEuddrg31904l8bgx5ROg2MQZHZqpAlycowOsexbHDbk:MJwz/l2PqGqqbr2yk6pVgrwPV,"C:\temp\INSTALL" 96:EQOJvOl4ab3hhiNFXc4wwcweomr0cNJDBoqXjmAHKX8dEt001nfEhVIuX0dDcs:3mzpAsZpprbshfu3oujjdENdp21,"C:\temp\doc\README"Notice how the above output shows the full path in the filename. You can have ssdeep print relative filenames instead of absolute ones. That is, omit all of the path information except that specified on the command line. To enable relative paths, use the -l flag. Repeating our first example with the -l flag:
C:\temp> ssdeep -l config.h INSTALL doc\README ssdeep,1.0--blocksize:hash:hash,filename 96:KQhaGCVZGhr83h3bc0ok3892m12wzgnH5w2pw+sxNEI58:FIVkH4x73h39LH+2w+sxaD,"config.h" 96:MD9fHjsEuddrg31904l8bgx5ROg2MQZHZqpAlycowOsexbHDbk:MJwz/l2PqGqqbr2yk6pVgrwPV,"INSTALL" 96:EQOJvOl4ab3hhiNFXc4wwcweomr0cNJDBoqXjmAHKX8dEt001nfEhVIuX0dDcs:3mzpAsZpprbshfu3oujjdENdp21,"doc\README"You can have ssdeep only print out the basename of each file it processes. That is, all directory information will be stripped off. To enable basename mode, use the -b flag:
C:\temp> ssdeep -b config.h INSTALL \doc\README ssdeep,1.0--blocksize:hash:hash,filename 96:KQhaGCVZGhr83h3bc0ok3892m12wzgnH5w2pw+sxNEI58:FIVkH4x73h39LH+2w+sxaD,"config.h" 96:MD9fHjsEuddrg31904l8bgx5ROg2MQZHZqpAlycowOsexbHDbk:MJwz/l2PqGqqbr2yk6pVgrwPV,"INSTALL" 96:EQOJvOl4ab3hhiNFXc4wwcweomr0cNJDBoqXjmAHKX8dEt001nfEhVIuX0dDcs:3mzpAsZpprbshfu3oujjdENdp21,"README"
C:\temp> ssdeep ssdeep: No input filesAlthough some programs process standard input and thus allow you to pipe the output of other programs to them, ssdeep does not support this functionality. If an input file can't be found, an error message is normally printed. These, and all other error messages, can be surpressed by using the -s flag.
C:\temp> ssdeep doesnotexist.txt ssdeep: C:\temp\doesnotexist.txt: No such file or directory C:\temp> ssdeep -s doesnotexist.txt C:\temp>
C:\temp> ssdeep * ssdeep: C:\temp\backups Is a directory ssdeep,1.0--blocksize:hash:hash,filename 96:KQhaGCVZGhr83h3bc0ok3892m12wzgnH5w2pw+sxNEI58:FIVkH4x73h39LH+2w+sxaD,"config.h" ssdeep: C:\temp\www Is a directory C:\temp> ssdeep -r * ssdeep,1.0--blocksize:hash:hash,filename 768:McAQ8tPlH25e85Q2OiYpD08NvHmjJ97UfPMO47sekO:uN9M553OiiN/OJ9MM+e3,"C:\temp\backups\mystuff.zip" 384:bcEKuglk+GUYIk90a1lEF+Wfsy2solvW8mK1enQXP79:bmlFGUNk9L1roy4K1enQ,"C:\temp\backups\ssdeep.exe" 96:CFzROqsgconvv7uUo6jTcEGEvpVCN116S:CNVnqj8cMVCv16,"C:\temp\backups\foo.doc" 96:KQhaGCVZGhr83h3bc0ok3892m12wzgnH5w2pw+sxNEI58:FIVkH4x73h39LH+2w+sxaD,"config.h" 96:aN0jOc0WlWW+LWQnjv7ufGcE5ESr5YaZ6uicEDEO9VCN116Sb5EutkB:aSeoF+L/zqfGtfr5YiWcsVCv16W5htk,"C:temp\www\index.html"
One of the more powerful features of ssdeep is the ability to match the hashes of input files against a list of known hashes. Because of inexact nature of fuzzy hashing, note that just because ssdeep indicates that two files match, it does not mean that those files are related. You should examine every pair of matching files individually to see how well they correspond.
Here's a simple example of how ssdeep can match files that are not identical. We take an existing file, make a copy of it, and append a single character to it.
$ ls -l foo.txt -rw-r--r-- 1 jessekor jessekor 240 Oct 25 08:01 foo.txt $ cp foo.txt bar.txt $ echo 1 >> bar.txt
A cryptographic hashing algorithm like MD5 can't be used to match these files; they have wildly different hashes.
$ md5deep foo.txt bar.txt 7b3e9e08ecc391f2da684dd784c5af7c /Users/jessekornblum/foo.txt 32436c952f0f4c53bea1dc955a081de4 /Users/jessekornblum/bar.txt
But fuzzy hashing can! We compute the fuzzy hash of one file and use the matching mode to match the other one.
$ ssdeep -b foo.txt > hashes.txt $ ssdeep -bm hashes.txt bar.txt bar.txt matches foo.txt (64)The number at the end of the line is a match score, or a weighted measure of how similar these files are. The higher the number, the more similar the files.
C:\> ssdeep -lr md5deep-1.12 > md5deep-hashes.txt C:\>ssdeep -lrm md5deep-hashes.txt ssdeep-1.1 ssdeep-1.1\cycles.c matches md5deep-1.12\cycles.c (94) ssdeep-1.1\dig.c matches md5deep-1.12\dig.c (35) ssdeep-1.1\helpers.c matches md5deep-1.12\helpers.c (57)Ta da! You can see that I reused code from the md5deep project when writing ssdeep.
Along with source code reuse, you can also use fuzzy hashing to find truncated files. Here's a sample using a fake filename. We'll compute the fuzzy hash for the file, make a copy that contains only the first 29% of the original, and then try to match the truncated version back to the original.
$ ls -lsh -rwxr-xr-x 1 jvalenti users 699M Sep 29 2006 all-the-kings-men.avi $ ssdeep -b all-the-kings-men.avi > sig.txt $ cat sig.txt ssdeep,1.0--blocksize:hash:hash,filename 12582912:fgQl/nUjQAbaBQvHf8yLr5CHJu3dyh+YJ27TuXyphJs3wHC6+rEfAV+wDrw6C/AT:fPl8cdAUyLr5CHJu3dyh8uzwHC6+reAS,"all-the-kings-men.avi" $ dd if=all-the-kings-men.avi of=partial.avi bs=1m count=200 200+0 records in 200+0 records out 209715200 bytes transferred in 14.510224 secs (14452926 bytes/sec) $ ls -lsh partial.avi -rw-r--r-- 1 jvalenti users 200M Oct 6 06:40 partial.avi $ ssdeep -bm sig.txt partial.avi partial.avi matches all-the-kings-men.avi (57)
You can also compare many without writing out any hashes to the disk using two different methods. Let's say that we have a whole bunch of files in two or three directories and want to know which ones are similar to each other. We can use the -d mode to display these matches. The switch causes ssdeep to compute a fuzzy hash for each input file and compare it against all of the other input files.
In this example, we've gathered a whole bunch of Microsoft Word documents in the folders Incoming, Outgoing, and Trash. Rather than go through all of the documents, it would be nice to eliminate those are substantially the same.
C:\temp> ssdeep -lrd Incoming Outgoing Trash Incoming\Budget 2007.doc matches Outgoing\Corporate Espionage\Our Budget.doc (99) Incoming\Salaries.doc matches Outgoing\Personnel Mayhem\Your Buddy Makes More Than You.doc (45) Outgoing\Plan for Hostile Takeover.doc matches Trash\DO NOT DISTRIBUTE.doc (88)Oh my!
The -p mode works similarly, but displays the results in a slightly nicer format. If there are two input files A and B that match, the -d mode will only display that "A matches B." The -p mode will display that "A matches B," skips a line, and then "B matches A." This greatly increases the length of the output, but can make files easier to find. Here's the above input again, this time using the -p flag.
C:\temp> ssdeep -lrp Incoming Outgoing Trash Incoming\Budget 2007.doc matches Outgoing\Corporate Espionage\Our Budget.doc (99) Incoming\Salaries.doc matches Outgoing\Personnel Mayhem\Your Buddy Makes More Than You.doc (45) Outgoing\Corporate Espionage\Our Budget.doc matches Incoming\Budget 2007.doc (99) Outgoing\Personnel Mayhem\Your Buddy Makes More Than You.doc matches Incoming\Salaries.doc (45) Outgoing\Plan for Hostile Takeover.doc matches Trash\DO NOT DISTRIBUTE.doc (88) Trash\DO NOT DISTRIBUTE.doc matches Outgoing\Plan for Hostile Takeover.doc (88)
After you've generated several files of fuzzy hashes you may wish to compare those signatures to each other. You can compare one or more files of signatures against each other using the -x flag.
$ ssdeep -r /etc > list1.txt $ ssdeep -r /usr > list2.txt $ ssdeep -lr ./known_malware > list3.txt $ ssdeep -x list1.txt list2.txt list3.txt list1:/etc/rcc.d/init.d matches list3:./known_malware/wlk_rootkit/dropper (86) list3:./known_malware/wlk_rootkit/dropper matches list1:/etc/rcc.d/init.d (86)
The above method compares all of the signatures against each other. This can take some time, especially if the files are large. If you'd rather compare some unknown signatures against a set of known signatures, you can use the -k flag. Let's say you have some signatures for malicious programs, badfiles.txt and worsefiles.txt. You then compute the fuzzy hashes for programs on some workstations, which are saved to comp1.txt, comp2.txt, and comp3.txt. You can compare these unknowns to the knows like this:
C:\> ssdeep -k badfiles.txt -k worsefiles.txt comp1.txt comp2.txt comp3.txt comp1.txt:WINWORD2.EXE matches badfiles.txt:some_trojan.exe (84) comp3:txt:ntoskrrnl.exe matches worsefiles.txt:delete_all_data.exe (77)