PE format is actually a data structure that tells Windows OS loader what information is required in order to manage the wrapped executable code.
This includes dynamic library references for linking, API export, and import tables, resource management data, and TLS data. The data structures on disk are the same data structures used in the memory and if you know how to find something in a PE file, it will help while analyzing any Windows malware samples.
DOS Header.
DOS Header occupies the first 64 bytes of the file. DOS Header there because DOS can recognize it as a valid executable and can run it in the DOS stub mode.
As we can see we have a list of structures that came under the DOS header. We will not discuss everything as it is beyond our scope; we will discuss important ones that are required, such as e_magic and e_lfanew structure.
e_magic: Determine whether a file is a PE file. A list of file signatures can be found Here
e_lfanew: Offset relative to the beginning of the file, used to find the PE header.
As shown in the above figure e_magicvalue is 4D 5A (MZ) and e_lfanew is 0x00000108 (PE File header address)
DOS Stub.
A stub is a tiny program or a piece of code that is run by default when the execution of an application starts. This stub prints out the message “This program cannot be run in DOS mode” when the program is not compatible with Windows.
PE File Header.
The PE header is located by looking at the e_lfanew field of the MS-DOS Header. The e_lfanew field gives the offset of the PE header location.
The main PE Header is a structure of type IMAGE_NT_HEADERS and mainly contains PEsignature, IMAGE_FILE_HEADER, and IMAGE_OPTIONAL_HEADER.
struct_IMAGE_FILE_HEADER{ 0x00 WORD Machine; //CPU platform for program execution: 0X0: any platform, 0X14C: intel i386 and subsequent processors
0x02WORDNumberOfSections; //ThenumberofblocksinthePEfile 0x04 DWORD TimeDateStamp; //Timestamp: The total number of seconds between the time the linker generated this file and 1969/12/31-16:00P:00
0x08 DWORD PointerToSymbolTable; //The offset position of the COFF symbol table. This field is only useful for COFF debugging information
0x0c DWORD NumberOfSymbols; //The number of symbols in the COFF symbol table. This value and the previous value are 0 in the release version of the program
0x10 WORD SizeOfOptionalHeader; //IMAGE_OPTIONAL_HEADER structure size (bytes): 32-bit default E0H, 64-bit default F0H (can be modified)
0x12WORDCharacteristics; //Describefileattributes,eg: //Single attribute (only 1bit is 1): #define IMAGE_FILE_DLL 0x2000 //File is a DLL.
//Combined attributes (multiple bits are 1, single attribute or operation): 0X010F executable file
};
The Standard PE header is the next 20 bytes of the PE file and contains only the most basic information about the layout of the file.
Optional PE header (_IMAGE_OPTIONAL_HEADER)
struct_IMAGE_OPTIONAL_HEADER{ 0x00 WORD Magic; //※Magic number (magic number), 0x0107: ROM image, 0x010B: 32-bit PE, 0X020B: 64-bit PE
0x02BYTEMajorLinkerVersion; //Connectormajorversionnumber0x03BYTEMinorLinkerVersion; //Connectorminorversionnumber 0x04 DWORD SizeOfCode; //The total size of all code segments, note: it must be an integer multiple of FileAlignment, exists but is useless
0x08 DWORD SizeOfInitializedData; //The size of the initialized data, note: it must be an integer multiple of FileAlignment, exists but is useless
0x0c DWORD SizeOfUninitializedData; //The size of uninitialized data, note: it must be an integer multiple of FileAlignment, exists but is useless
0x10 DWORD AddressOfEntryPoint; //The program entry address OEP, which is an RVA (Relative Virtual Address), usually falls in .textsection, this field is applicable to DLLs/EXEs.
0x14 DWORD BaseOfCode; //Code segment starting address (code base address), (the beginning of the code is not necessarily related to the program)
0x18DWORDBaseOfData; //Datasegmentstartaddress (data baseaddress) 0x1c DWORD ImageBase; //Memory mirror base address (default loading starting address), default is 4000H
0x20 DWORD SectionAlignment; //Memory alignment: Once mapped to memory, each section is guaranteed to start from a virtual address of "multiple of this value"
0x24DWORDFileAlignment; //Filealignment:originally200H,now1000H0x28WORDMajorOperatingSystemVersion; //Therequiredoperatingsystemmajorversionnumber0x2aWORDMinorOperatingSystemVersion; //Requiredoperatingsystemminorversionnumber 0x2c WORD MajorImageVersion; //Customize the main version number, use the parameter settings of the connector, eg:LINK /VERSION:2.0 myobj.obj
0x2e WORD MinorImageVersion; //Customize the minor version number, use the parameter settings of the connector
0x30 WORD MajorSubsystemVersion; //The required subsystem major version number, typical value 4.0 (Windows 4.0/that is, Windows 95)
0x32WORDMinorSubsystemVersion; //Therequiredsubsystemminorversionnumber0x34DWORDWin32VersionValue; //Always0 0x38 DWORD SizeOfImage; //The total image size of the PE file in memory, sizeof(ImageBuffer), a multiple of SectionAlignment
0x3c DWORD SizeOfHeaders; //DOS header (64B) + PE mark (4B) + standard PE header (20B) + optional PE header + total size of section table, aligned according to the file (multiple of FileAlignment)
0x40DWORDCheckSum; //PEfileCRCchecksum,todeterminewhetherthefilehasbeenmodified0x44WORDSubsystem; //Subsystemtypeusedintheuserinterface0x46WORDDllCharacteristics; //Always00x48DWORDSizeOfStackReserve; //Thereservedsizeofthedefaultthreadinitializationstack0x4cDWORDSizeOfStackCommit; //Thesizeofthethreadstackactuallysubmittedduringinitialization 0x50 DWORD SizeOfHeapReserve; //The virtual memory size reserved for the initialized process heap by default
0x54DWORDSizeOfHeapCommit; //Theactualsubmittedprocessheapsizeduringinitialization0x58DWORDLoaderFlags; //Always00x5cDWORDNumberOfRvaAndSizes; //Numberofdirectoryitems:always0X00000010H(16) 0x60 _IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES]; //define IMAGE_NUMBEROF_DIRECTORY_ENTRIES 16
};
In the Example the first member (Magic, 2Byte): the magic number 020B, which means that the file is a 64-bit PE.
The optional PE header is followed by the standard PE header, and its size is 32-bit default E0H, 64-bit default F0H bytes. The optional header contains most of the meaningful information about the executable image, such as initial stack size, program entry point location, preferred base address, operating system version, section alignment information.
Data Directories (_IMAGE_DATA_DIRECTORY)
It is the last entry of the Optional Header. The data directory indicates where to find other important components of executable information in the file. It is really nothing more than an array of IMAGE_DATA_DIRECTORY structures that are located at the end of the optional header structure. The current PE file format defines 16 possible data directories, 11 of which are now being used.
Each data directory entry specifies the size and relative virtual address of the directory. To locate a particular directory, you determine the relative address from the data directory array in the optional header.
Then use the virtual address to determine which section the directory is in. Once you determine which section contains the directory, the section header for that section is then used to find the exact file offset location of the data directory.
Section Header Table
Section Header Table is an array of IMAGE_SECTION_HEADER structures and contains information related to the various sections available in the image of an executable file. The sections in the image are sorted by the RVAs rather than alphabetically.
Sections Headers Table contains the following important fields:
Name
Virtual Size
Virtual Address
Raw Size
Raw Address
Reloc Address
Linenumbers
Relocations Number
Linenumbers Number
Characteristics
Sections
PE section headers also specify the section name using using a simple character array field, called as Name. Below are the various common sections names available from an executable file:
.text: This is normally the first section and contains the executable code for the application. Inside this section is also an entry point of the application: the address of the first application instruction that will be executed. An application can have more than one section with the executable code.
.data: This section contains an initialized data of an application such as strings.
.rdataor .idata: Usually these section names are used for the sections where the import table is located. This is the table that lists the Windows API used by the application (along with the names of their associated DLLs). Using this, the Windows loader knows the API to find, in which system DLL, in order to retrieve its address.
.reloc: contains relocation information.
.rsrc: This is the common name for the resource-container section, which contains things like images used for the application’s UI.