Defines | Functions

fuzzy.h File Reference

These functions allow a programmer to compute the fuzzy hashes (also called the context-triggered piecewise hashes) of a buffer of text , the contents of a file on the disk , and the contents of an open file handle . There is also a function to compute the similarity between any two fuzzy signatures . More...

#include <inttypes.h>

Go to the source code of this file.

Defines

#define FUZZY_MAX_RESULT   (SPAMSUM_LENGTH + (SPAMSUM_LENGTH/2 + 20))
 The longest possible length for a fuzzy hash signature (without the filename).
#define SPAMSUM_LENGTH   64
 Length of an individual fuzzy hash signature component.

Functions

int fuzzy_hash_buf (const unsigned char *buf, uint32_t buf_len, char *result)
 Compute the fuzzy hash of a buffer.
int fuzzy_hash_file (FILE *handle, char *result)
 Compute the fuzzy hash of a file using an open handle.
int fuzzy_hash_filename (const char *filename, char *result)
 Compute the fuzzy hash of a file.
int fuzzy_compare (const char *sig1, const char *sig2)

Detailed Description

These functions allow a programmer to compute the fuzzy hashes (also called the context-triggered piecewise hashes) of a buffer of text , the contents of a file on the disk , and the contents of an open file handle . There is also a function to compute the similarity between any two fuzzy signatures .


Function Documentation

int fuzzy_compare ( const char *  sig1,
const char *  sig2 
)

Computes the match score between two fuzzy hash signatures.

Returns:
Returns a value from zero to 100 indicating the match score of the two signatures. A match score of zero indicates the sigantures did not match. When an error occurs, such as if one of the inputs is NULL, returns -1.
int fuzzy_hash_buf ( const unsigned char *  buf,
uint32_t  buf_len,
char *  result 
)

Compute the fuzzy hash of a buffer.

The computes the fuzzy hash of the first buf_len bytes of the buffer. It is the caller's responsibility to append the filename, if any, to result after computation.

Parameters:
buf The data to be fuzzy hashed
buf_len The length of the data being hashed
result Where the fuzzy hash of buf is stored. This variable must be allocated to hold at least FUZZY_MAX_RESULT bytes.
Returns:
Returns zero on success, non-zero on error.
int fuzzy_hash_file ( FILE *  handle,
char *  result 
)

Compute the fuzzy hash of a file using an open handle.

Computes the fuzzy hash of the contents of the open file, starting at the beginning of the file. When finished, the file pointer is returned to its original position. If an error occurs, the file pointer's value is undefined. It is the callers's responsibility to append the filename to the result after computation.

Parameters:
handle Open handle to the file to be hashed
result Where the fuzzy hash of the file is stored. This variable must be allocated to hold at least FUZZY_MAX_RESULT bytes.
Returns:
Returns zero on success, non-zero on error
int fuzzy_hash_filename ( const char *  filename,
char *  result 
)

Compute the fuzzy hash of a file.

Opens, reads, and hashes the contents of the file 'filename' The result must be allocated to hold FUZZY_MAX_RESULT characters. It is the caller's responsibility to append the filename to the result after computation.

Parameters:
filename The file to be hashed
result Where the fuzzy hash of the file is stored. This variable must be allocated to hold at least FUZZY_MAX_RESULT bytes.
Returns:
Returns zero on success, non-zero on error.