Windows users can either instal Python binding of Capstone from Windows installer, or using our PyPi package capstone-windows. Note that this already includes the prebuilt libraries (for both Win32 & Win64 editions) inside, so there is no need to install the core separately.
In essence, a disassembler is the exact opposite of an assembler. Where an assembler converts code written in an assembly language into binary machine code, a disassembler reverses the process and attempts to recreate the assembly code from the binary machine code.
Since most assembly languages have a one-to-one correspondence with underlying machine instructions, the process of disassembly is relatively straight-forward, and a basic disassembler can often be implemented simply by reading in bytes, and performing a table lookup. Of course, disassembly has its own problems and pitfalls, and they are covered later in this chapter.
Many disassemblers have the option to output assembly language instructions in Intel, AT&T, or (occasionally) HLA syntax. Examples in this book will use Intel and AT&T syntax interchangeably. We will typically not use HLA syntax for code examples, but that may change in the future.
Here we are going to list some commonly available disassembler tools. Notice that there are professional disassemblers (which cost money for a license) and there are freeware/shareware disassemblers. Each disassembler will have different features, so it is up to you as the reader to determine which tools you prefer to use.
Many of the Unix disassemblers, especially the open source ones, have been ported to other platforms, like Windows (mostly using MinGW or Cygwin). Some Disassemblers like otool ([OS X) are distro-specific.
Since data and instructions are all stored in an executable as binary data, the obvious question arises: how can a disassembler tell code from data? Is any given byte a variable, or part of an instruction?
Many interactive disassemblers will give the user the option to render segments of code as either code or data, but non-interactive disassemblers will make the separation automatically. Disassemblers often will provide the instruction AND the corresponding hex data on the same line, shifting the burden for decisions about the nature of the code to the user. Some disassemblers (e.g. ciasdis) will allow you to specify rules about whether to disassemble as data or code and invent label names, based on the content of the object under scrutiny. Scripting your own "crawler" in this way is more efficient; for large programs interactive disassembling may be impractical to the point of being unfeasible.
The general problem of separating code from data in arbitrary executable programs is equivalent to the halting problem. As a consequence, it is not possible to write a disassembler that will correctly separate code and data for all possible input programs. Reverse engineering is full of such theoretical limitations, although by Rice's theorem all interesting questions about program properties are undecidable (so compilers and many other tools that deal with programs in any form run into such limits as well). In practice a combination of interactive and automatic analysis and perseverance can handle all but programs specifically designed to thwart reverse engineering, like using encryption and decrypting code just prior to use, and moving code around in memory.
User defined textual identifiers, such as variable names, label names, and macros are removed by the assembly process. They may still be present in generated object files, for use by tools like debuggers and relocating linkers, but the direct connection is lost and re-establishing that connection requires more than a mere disassembler. Especially small constants may have more than one possible name. Operating system calls (like DLLs in MS-Windows, or syscalls in Unices) may be reconstructed, as their names appear in a separate segment or are known beforehand. Many disassemblers allow the user to attach a name to a label or constant based on his understanding of the code. These identifiers, in addition to comments in the source file, help to make the code more readable to a human, and can also shed some clues on the purpose of the code. Without these comments and identifiers, it is harder to understand the purpose of the source code, and it can be difficult to determine the algorithm being used by that code. When you combine this problem with the possibility that the code you are trying to read may, in reality, be data (as outlined above), then it can be even harder to determine what is going on. Another challenge is posed by modern optimising compilers; they inline small subroutines, then combine instructions over call and return boundaries. This loses valuable information about the way the program is structured.
Akin to Disassembly, Decompilers take the process a step further and actually try to reproduce the code in a high level language. Frequently, this high level language is C, because C is simple and primitive enough to facilitate the decompilation process. Decompilation does have its drawbacks, because lots of data and readability constructs are lost during the original compilation process, and they cannot be reproduced. Since the science of decompilation is still young, and results are "good" but not "great", this page will limit itself to a listing of decompilers, and a general (but brief) discussion of the possibilities of decompilation. Compared to disassemblers a decompiler generates code that doesnot require that one is familiar at the processor at hand. It may even be that the decompiled code can be compiled on a different processor, or give a reasonable starting point to reproduce the program on a different processor.
From a human disassembler's point of view, this is a nightmare, although this is straightforward to read in the original Assembly source code, as there is no way to decide if the db should be interpreted or not from the binary form, and this may contain various jumps to real executable code area, triggering analysis of code that should never be analysed, and interfering with the analysis of the real code (e.g. disassembling the above code from 0000h or 0001h won't give the same results at all).
It's available for users with the operating system Windows 95 and prior versions, and you can download it only in English. The current version of the program is 0.25 and the latest update happened on 6/14/2011.
About the download, Win32Program Disassembler is a not that heavy program that doesn't take up as much free space than many programs in the section Development software. It's a program often downloaded in some countries such as India, Romania, and Pakistan.
The 'result.c' file is created in the directory '../output'. These files include the executable, required run-time libraries, support filessuch as type definitions files for many Windows and Linux APIs etc.Simply download and unzip the files in a directory of your choice. There is noinstaller, nothing is changed in your registry. The program RecStudio4 can be run fromthe extracted bin directory.
Sep. 19, 2015Updated disassembler to udis 1.7.2. More aggressive type detection. Better handling of partial registers (e.g. RAX -> EAX -> AX -> AL). Improved x86_64 register definitions. Added several APIs to support files. Improved navigation to prev/next function in the UI.Jun. 2, 2014Fixed crash with huge number of _t variables. Fixed type detection in truncating assignments. Fixed recognition of memcpy sequences in Windows.Jan. 14, 2014Added detection of unicode strings. Improved code structuring.Nov. 20, 2012Fixed definition of entry points in signature files. Recognition of PlayStation 1 binary files, and addition of PS1 system call signature files. Improved support for some MIPS instructions.Oct. 20, 2012Significantly reduced memory used and significantly increased decompilation speed. Major improvements and fixes in code structuring, and fixed several bugs.Aug. 29, 2012Fixed decoding of IMUL d,a,# instruction.Aug. 25, 2012Function detection from .dymsym and .got symbols. Added MIPSLE support package.May 25, 2012Better recognition of parameters from symbolic info. Added gcc C++ support file for C++ exceptions. Improved user interface. Added GUI version for Mac.May 10, 2012Fixed switch() detection. Fixed loading PowerPC MachO files. Significantly speeded up Linux version.Apr. 24, 2012Fixed memory leak and divide by zero during constant expr. evaluation. Added loading 32-bit x86 MachO binaries and 64-bit CLI on Windows.Apr. 15, 2012Show line number and name local variables from -g info if present in ELF. Show location of local variables.Apr. 9, 2012Improved code structuring and symbol detection in ELF.Apr. 6, 2012Fixed memory leak, endless loop, crashesMar. 19, 2012Added command line version for Linux
From version 2.1, RecStudio uses the disassembler included in the Netwide Assembler package (version 0.98.39). The project to build the ia32 disassembler as a DLL is available here. No other portions of NASM are used in RecStudio. The Netwide Assembler can be downloaded from SourceForge.net. At this time the other disassemblers are still statically linked in the RecStudio executable, although eventually they'll be made available as shared libraries.
z80dasm is a disassembler for the Zilog Z80 microprocessor and compatibles. Itcan be used to reverse engineer programs and operating systems for 1980'smicrocomputers using this processor architecture (for example Sinclair ZX81,Spectrum and many others). It was developed to produce the Galaksija ROMdisassembly.
The core of z80dasm is based largely on dz80 3.0, a Z80 disassemblerwritten by Jan Panteltje. Compared to dz80, z80dasm fixes multiple bugs andadds several new features. It also has a more UNIX-like command line interface.For a detailed list of changes compared to dz80, see NEWS file included in thesource.
A friend of mine downloaded some malware from Facebook, and I'm curious to see what it does without infecting myself. I know that you can't really decompile an .exe, but can I at least view it in Assembly or attach a debugger? 2b1af7f3a8