Getting Started with ssdeep

This document provides an introduction to using ssdeep and was last updated on The current version of this document can be found on the ssdeep web site at http://ssdeep.sourceforge.net/.

This guide starts with an explanation of the basic functions of ssdeep and then gives some examples of using fuzzy hashing in real world situations.


Installing ssdeep

Microsoft Windows

Users running Microsoft Windows are strongly encouraged to download the precompiled binaries from http://ssdeep.sourceforge.net/. Please note that these binaries are created using a MinGw cross compiler. Compiling the programs directly from Windows is not supported.

Automatic Installation

Before you try to install ssdeep manually, see if your operating system support s the programs via an automatic installation method. Some operating systems that provide this feature for ssdeep are:

Linux: Ubuntu, Debian

Manual installation

If your operating system does not support the automatic installation methods described above, you will have to download the source code and compile the programs yourself. First download the latest tarball of the program from http://ssdeep.sourceforge.net/. This file should be named something like ssdeep-2.2.tar.gz. Uncompress the file with the following command:

    $ tar zxvf ssdeep-2.4.tar.gz 

Change into the decompressed directory

    $ cd ssdeep-2.4 
and configure the program.
    $ ./configure 
The configure script can accept lots of options. Run ./configure --help for the complete list. The most common option used is the prefix option which installs the program in a location other than the default, /usr/local/bin. If you wanted to install the program elsewhere, for example, /tmp/ssdeep, you would run ./configure --prefix=/tmp/ssdeep instead.

You can now compile the program using the make command:

    $ make 
and install it:
    $ make install 
Note that you must be root on most operating systems to install the program to its default location, /usr/local/bin. The tool sudo may help:
    $ sudo make install 

Basic Operation

By default, ssdeep generates context triggered piecewise hashes, or fuzzy hashes, for each input file. The output is proceeded by a file header.
C:\temp> ssdeep config.h INSTALL doc\README
ssdeep,1.0--blocksize:hash:hash,filename
96:KQhaGCVZGhr83h3bc0ok3892m12wzgnH5w2pw+sxNEI58:FIVkH4x73h39LH+2w+sxaD,"C:\temp\config.h"
96:MD9fHjsEuddrg31904l8bgx5ROg2MQZHZqpAlycowOsexbHDbk:MJwz/l2PqGqqbr2yk6pVgrwPV,"C:\temp\INSTALL"
96:EQOJvOl4ab3hhiNFXc4wwcweomr0cNJDBoqXjmAHKX8dEt001nfEhVIuX0dDcs:3mzpAsZpprbshfu3oujjdENdp21,"C:\temp\doc\README"
Notice how the above output shows the full path in the filename. You can have ssdeep print relative filenames instead of absolute ones. That is, omit all of the path information except that specified on the command line. To enable relative paths, use the -l flag. Repeating our first example with the -l flag:
C:\temp> ssdeep -l config.h INSTALL doc\README
ssdeep,1.0--blocksize:hash:hash,filename
96:KQhaGCVZGhr83h3bc0ok3892m12wzgnH5w2pw+sxNEI58:FIVkH4x73h39LH+2w+sxaD,"config.h"
96:MD9fHjsEuddrg31904l8bgx5ROg2MQZHZqpAlycowOsexbHDbk:MJwz/l2PqGqqbr2yk6pVgrwPV,"INSTALL"
96:EQOJvOl4ab3hhiNFXc4wwcweomr0cNJDBoqXjmAHKX8dEt001nfEhVIuX0dDcs:3mzpAsZpprbshfu3oujjdENdp21,"doc\README"
You can have ssdeep only print out the basename of each file it processes. That is, all directory information will be stripped off. To enable basename mode, use the -b flag:
C:\temp> ssdeep -b config.h INSTALL \doc\README
ssdeep,1.0--blocksize:hash:hash,filename
96:KQhaGCVZGhr83h3bc0ok3892m12wzgnH5w2pw+sxNEI58:FIVkH4x73h39LH+2w+sxaD,"config.h"
96:MD9fHjsEuddrg31904l8bgx5ROg2MQZHZqpAlycowOsexbHDbk:MJwz/l2PqGqqbr2yk6pVgrwPV,"INSTALL"
96:EQOJvOl4ab3hhiNFXc4wwcweomr0cNJDBoqXjmAHKX8dEt001nfEhVIuX0dDcs:3mzpAsZpprbshfu3oujjdENdp21,"README"

Error messages

If no input files are specified, an error is displayed.
C:\temp> ssdeep 
ssdeep: No input files
Although some programs process standard input and thus allow you to pipe the output of other programs to them, ssdeep does not support this functionality. If an input file can't be found, an error message is normally printed. These, and all other error messages, can be surpressed by using the -s flag.
C:\temp> ssdeep doesnotexist.txt
ssdeep: C:\temp\doesnotexist.txt: No such file or directory
C:\temp> ssdeep -s doesnotexist.txt
C:\temp> 

Recursive Mode

Normally, attempting to process a directory will generate an error message. Under recursive mode, ssdeep will hash files in the current directory and file in subdirectories. Recursive mode is activated by using the -r flag.
C:\temp> ssdeep *
ssdeep: C:\temp\backups Is a directory
ssdeep,1.0--blocksize:hash:hash,filename
96:KQhaGCVZGhr83h3bc0ok3892m12wzgnH5w2pw+sxNEI58:FIVkH4x73h39LH+2w+sxaD,"config.h"
ssdeep: C:\temp\www Is a directory

C:\temp> ssdeep -r *
ssdeep,1.0--blocksize:hash:hash,filename
768:McAQ8tPlH25e85Q2OiYpD08NvHmjJ97UfPMO47sekO:uN9M553OiiN/OJ9MM+e3,"C:\temp\backups\mystuff.zip"
384:bcEKuglk+GUYIk90a1lEF+Wfsy2solvW8mK1enQXP79:bmlFGUNk9L1roy4K1enQ,"C:\temp\backups\ssdeep.exe"
96:CFzROqsgconvv7uUo6jTcEGEvpVCN116S:CNVnqj8cMVCv16,"C:\temp\backups\foo.doc"
96:KQhaGCVZGhr83h3bc0ok3892m12wzgnH5w2pw+sxNEI58:FIVkH4x73h39LH+2w+sxaD,"config.h"
96:aN0jOc0WlWW+LWQnjv7ufGcE5ESr5YaZ6uicEDEO9VCN116Sb5EutkB:aSeoF+L/zqfGtfr5YiWcsVCv16W5htk,"C:temp\www\index.html"

Matching mode

One of the more powerful features of ssdeep is the ability to match the hashes of input files against a list of known hashes. Because of inexact nature of fuzzy hashing, note that just because ssdeep indicates that two files match, it does not mean that those files are related. You should examine every pair of matching files individually to see how well they correspond.

Here's a simple example of how ssdeep can match files that are not identical. We take an existing file, make a copy of it, and append a single character to it.

$ ls -l foo.txt
-rw-r--r--   1 jessekor  jessekor  240 Oct 25 08:01 foo.txt

$ cp foo.txt bar.txt
$ echo 1 >> bar.txt

A cryptographic hashing algorithm like MD5 can't be used to match these files; they have wildly different hashes.

$ md5deep foo.txt bar.txt
7b3e9e08ecc391f2da684dd784c5af7c  /Users/jessekornblum/foo.txt
32436c952f0f4c53bea1dc955a081de4  /Users/jessekornblum/bar.txt

But fuzzy hashing can! We compute the fuzzy hash of one file and use the matching mode to match the other one.

$ ssdeep -b foo.txt > hashes.txt
$ ssdeep -bm hashes.txt bar.txt
bar.txt matches foo.txt (64)
The number at the end of the line is a match score, or a weighted measure of how similar these files are. The higher the number, the more similar the files.

Source Code Reuse

As a more practical example of ssdeep's matching functionality, you can use ssdeep's matching mode to help find source code reuse. Let's say we have two folders, ssdeep-1.1 and md5deep-1.12 that contain the source code for each of those tools. You can compare their contents by computing fuzzy hashes for one tree and then comparing them against the other:
C:\> ssdeep -lr md5deep-1.12 > md5deep-hashes.txt

C:\>ssdeep -lrm md5deep-hashes.txt ssdeep-1.1
ssdeep-1.1\cycles.c matches md5deep-1.12\cycles.c (94)
ssdeep-1.1\dig.c matches md5deep-1.12\dig.c (35)
ssdeep-1.1\helpers.c matches md5deep-1.12\helpers.c (57)
Ta da! You can see that I reused code from the md5deep project when writing ssdeep.

Truncated Files

Along with source code reuse, you can also use fuzzy hashing to find truncated files. Here's a sample using a fake filename. We'll compute the fuzzy hash for the file, make a copy that contains only the first 29% of the original, and then try to match the truncated version back to the original.

$ ls -lsh
-rwxr-xr-x   1 jvalenti  users  699M Sep 29  2006 all-the-kings-men.avi

$ ssdeep -b all-the-kings-men.avi > sig.txt

$ cat sig.txt 
ssdeep,1.0--blocksize:hash:hash,filename
12582912:fgQl/nUjQAbaBQvHf8yLr5CHJu3dyh+YJ27TuXyphJs3wHC6+rEfAV+wDrw6C/AT:fPl8cdAUyLr5CHJu3dyh8uzwHC6+reAS,"all-the-kings-men.avi"

$ dd if=all-the-kings-men.avi of=partial.avi bs=1m count=200 
200+0 records in
200+0 records out
209715200 bytes transferred in 14.510224 secs (14452926 bytes/sec)

$ ls -lsh partial.avi 
-rw-r--r--   1 jvalenti  users       200M Oct  6 06:40 partial.avi

$ ssdeep -bm sig.txt partial.avi 
partial.avi matches all-the-kings-men.avi (57)

Needles in a Haystack

You can also compare many without writing out any hashes to the disk using two different methods. Let's say that we have a whole bunch of files in two or three directories and want to know which ones are similar to each other. We can use the -d mode to display these matches. The switch causes ssdeep to compute a fuzzy hash for each input file and compare it against all of the other input files.

In this example, we've gathered a whole bunch of Microsoft Word documents in the folders Incoming, Outgoing, and Trash. Rather than go through all of the documents, it would be nice to eliminate those are substantially the same.

C:\temp> ssdeep -lrd Incoming Outgoing Trash
Incoming\Budget 2007.doc matches Outgoing\Corporate Espionage\Our Budget.doc (99)
Incoming\Salaries.doc matches Outgoing\Personnel Mayhem\Your Buddy Makes More Than You.doc (45)
Outgoing\Plan for Hostile Takeover.doc matches Trash\DO NOT DISTRIBUTE.doc (88)
Oh my!

The -p mode works similarly, but displays the results in a slightly nicer format. If there are two input files A and B that match, the -d mode will only display that "A matches B." The -p mode will display that "A matches B," skips a line, and then "B matches A." This greatly increases the length of the output, but can make files easier to find. Here's the above input again, this time using the -p flag.

C:\temp> ssdeep -lrp Incoming Outgoing Trash
Incoming\Budget 2007.doc matches Outgoing\Corporate Espionage\Our Budget.doc (99)

Incoming\Salaries.doc matches Outgoing\Personnel Mayhem\Your Buddy Makes More Than You.doc  (45)

Outgoing\Corporate Espionage\Our Budget.doc matches Incoming\Budget 2007.doc (99)

Outgoing\Personnel Mayhem\Your Buddy Makes More Than You.doc matches Incoming\Salaries.doc (45)

Outgoing\Plan for Hostile Takeover.doc matches Trash\DO NOT DISTRIBUTE.doc (88)

Trash\DO NOT DISTRIBUTE.doc matches Outgoing\Plan for Hostile Takeover.doc (88)

Comparing Files of Signatures

After you've generated several files of fuzzy hashes you may wish to compare those signatures to each other. You can compare one or more files of signatures against each other using the -x flag.

$ ssdeep -r /etc > list1.txt
$ ssdeep -r /usr > list2.txt
$ ssdeep -lr ./known_malware > list3.txt
$ ssdeep -x list1.txt list2.txt list3.txt

list1:/etc/rcc.d/init.d matches list3:./known_malware/wlk_rootkit/dropper (86)

list3:./known_malware/wlk_rootkit/dropper matches list1:/etc/rcc.d/init.d (86)

The above method compares all of the signatures against each other. This can take some time, especially if the files are large. If you'd rather compare some unknown signatures against a set of known signatures, you can use the -k flag. Let's say you have some signatures for malicious programs, badfiles.txt and worsefiles.txt. You then compute the fuzzy hashes for programs on some workstations, which are saved to comp1.txt, comp2.txt, and comp3.txt. You can compare these unknowns to the knows like this:

C:\> ssdeep -k badfiles.txt -k worsefiles.txt comp1.txt comp2.txt comp3.txt

comp1.txt:WINWORD2.EXE matches badfiles.txt:some_trojan.exe (84)

comp3:txt:ntoskrrnl.exe matches worsefiles.txt:delete_all_data.exe (77)

SourceForge.net Logo