Monday, March 09, 2009

Reverse Engineering / Bug hunting trainings in Amsterdam

Hey all,

I haven't given a reverse engineering trainings class in Amsterdam for a few years, but this year is different :-) -- I will be at BH Amsterdam, and there are still seats open in the trainings class for April 14th and 15th.

What will be done in the course ? Well, for one thing, we'll go bug-hunting in some interesting piece of code. Furthermore, we'll talk quite a bit about C++ and it's effects in the binary. We'll do a fair bit of differential debugging, some more bug-hunting, and a lot of IDA automation. Questions like
  • given a C++ executable, how do I recover an inheritance diagram of the classes ?
  • given a big and ugly executable, how do I find the interesting places to focus on ?
  • how do I make sure IDAPython and NaviPython make my life easier ?
will be treated thoroughly.

So, if you still have some trainings/travel budget left in spite of the crisis, you can find more
details here.

Wednesday, March 04, 2009

Diffing x86 vs ARM code

I posted a while ago about the new DiffDeluxe comparison engine, and that we'd release it in Q1 2009. Well, we're almost there, the engine is now in beta. If you are a BinDiff user and wish to give the new engine a try, send mail to info@zynamics.com :-)

I mentioned in my last post on the topic that DiffDeluxe was designed to facilitate symbol porting, and to allow comparisons between executables that are "far away" from each other.

In the last post I wrote about Mozilla JS engine vs. Acrobat EScript.dll. Today I am going to try something slightly crazier: In order to evaluate how well these matching algorithms work, we will be diffing an executable that was compiled for ARM against a very similar executable compiled for x86.

My coworker Vincenzo is a big fan of all things OSX, and he brought up the idea of comparing x86 and ARM versions of the OSX dynamic loader -- namely the disassembly of dyld on the iphone against the disassembly of dyld on OSX.

Now, the first voices are going to yell: "You have names for all functions, BinDiffing is easy then!". Well, true, but we will run DiffDeluxe without taking the names into account, and then just using the names to validate the results.

The two executables have 704 (x86) and 618 (ARM) functions respectively. Without name
matching, we match 345 functions. Inspecting the symbols, we see that we have matched
160 of these functions in full accordance with the symbols. Let's have a look at some of the details:
Cute, eh ? Let's look at some more...
It is almost surprising how far one can get without actually looking at the instruction semantics.

If we take the names into account, matching functions becomes easy, but matching basic blocks properly ends up the difficulty. With name matching enabled, DiffDeluxe matches 3809 basic blocks, out of 7904 respective 5196.

So to summarize: The structural comparison is sufficiently strong to yield some useful results even accross two different CPUs. While there is still (a good amount) of room for improvement, I am quite happy with these results so far :-)

So, if you want to beta, and you already use BinDiff, drop us a line !