Time to look over the edge of this cliff.
The paper I went to Monterey to present is this one:
"File Fragment Encoding Classification: An Empirical Approach"
It is several parts about how the digital forensics community
has failed to approach the fragment identification problem in
a reasonable manner, a few parts suggestions, and a study of
all things DEFLATE (.docx/.xlsx/.png/.zip etc). Presentation
should be a little easier to understand, though since the demo
was live it is not included.
I wrote a tool (by fuzzing and fixing an open-source png decoder)
to gather the statistics in the paper, and reworked it into a
classifier for the conference. It's purpose is to classify
compressed data in the way that we can classify data with clear
and easy file formats.
Its name is zsniff and it is on github. The tool works by
brute-force searching for tiny DEFLATE headers and Huffman code
tables in the input stream. It is not fast, but it works as
advertised and it's three-platform portable (as long as your mac
has a reasonable compiler.)
It identifies compressed text (xml-ish or plain or spreadsheets)
about 99% of the time so far. We can also separate compressed
executables from PNG about 81% of the time. I've noticed that
we can fairly definitively say "Not DEFLATE" for high entropy
data as well but that isn't baked into the tool.
Side note: I am @candicenonsense on github and in theory I am
going to have time to put more code up there. Really. Not
looking forward to impending svn->git migration for sdhash.
candice at September 29, 2013 09:29 PM
« Bixby Bridge ... Current ... last call »