home ... me ... pictures ... email ... feed ... rust chronicles... twitter...

May 31, 2010

shannon entropy toy

This last semester I found myself with a final two electives to get my concentration done: the Math version of Cryptography, and Forensics. Aside from the scary math of AES (finite field GF256 can go to hell) and the bizarre world of quantum cryptography (intro to quantum mechanics just about killed the entire CS department contingent in the class) it was cool. AES was even cool, just painful.

And how this applies to entropy is that we got told "Go do something neat for 40% of your grade, either write a paper or write some code." So I decided to write up a little piece of code to calculate shannon entropy, figuring I could use it later to distinguish packed from unpacked where it refers to binaries.

Code: shannon.c under a don't steal this for your homework license.

I hesitate to put a link to the web frontend up, because it was hastily written, but I got extra credit for it, which was cool. I had to make a powerpoint presentation full of background and results as well.

Values go from 0 to 8 in theory, but I've not seen more than 5.5. UPX packed malware looks almost as high entropy as encrypting a text file with AES. Hand-rolled packing is a little bit lower entropy. First two samples come from Forensics Puzzle Contest #5, the rest are from email.

UPX packed:
skirt% ../shannon file.exe\[1\].octet-stream
filename: file.exe[1].octet-stream
bytes in file: 68097.000000
entropy: 5.3945613760

Same file, unpacked, second layer less obfuscated:
skirt% ../shannon file.exe.octet-stream
filename: file.exe.octet-stream
bytes in file: 82433.000000
entropy: 5.0533276137

Random file AES encrypted:
filename: test2.bin
bytes in file: 4100801.000000
entropy: 5.5451381195

Partly encrypted of what is likely a bredolab trojan:
filename: officexp-KB910721-FullFile-ENU.exe
bytes in file: 23553.000000
entropy: 3.3439501448

Another variant of the same family is much higher:
filename: DHL_invoice _2345.exe
bytes in file: 74241.000000
entropy: 4.1100030304

Plain text is predictably around 3.0, which the stuff I remember from information theory backs up. Things that are lower than plain text in the 1.0 range are IDApro save file databases and blank goat files, and the binary for the old virus Murkry.390 (it was homework in reverse engineering class last year.) Some of the malware I collect out of email attachments actually doesn't appear to be completely encrypted which is odd, but I haven't broken this down to work on sections, which is really the next step here.

candice at May 31, 2010 03:59 PM


« crepes for breakfast ... Current ... happy birthday susie »