pefile-1.2.10-139/0000755000076500000240000000000012252150355013261 5ustar erostaff00000000000000pefile-1.2.10-139/CHANGES_up_to_1.2.60000644000076500000240000001460311171342517016214 0ustar erostaff00000000000000Thu Aug 9 01:27:18 CEST 2007 ero@dkbza.org * Fixed two bugs parsing OffensiveComputing files layne@elsenot.com reported errors when doing a dump_info() on two files from OffensiveComputing with hashes: 88ca1747f5242c852b26f73fd90dad6f eb274d67e89fad0fcc9dfad5aeb06dd360ff7f17 The errors were easy to fix with some additional safety checks. Wed Jul 25 17:43:11 CEST 2007 ero@dkbza.org * Included peutils in the installation Wed Jul 25 17:42:01 CEST 2007 ero@dkbza.org * Various enhancements -peutils can now download signatures from a URL or process them directly from data -Added signature generation. Now signatures can be created for the sections in a pefile or at the entry point. Wed Jul 25 17:32:53 CEST 2007 ero@dkbza.org * Small fixes -Fixed small bug retrieving section data -Sections now have an attribute named "entropy" which contains a float value between 0.0 and 8.0 Wed Jul 25 01:07:49 CEST 2007 ero@dkbza.org * Changes for pefile 1.2.6 -Added extra warnings for suspicious values in the PE header -Fixed minor bugs -Merged Gera's enhancements -Added reporting of the section's entropy -Bumped version number Wed Jul 25 01:01:01 CEST 2007 ero@dkbza.org * Added initial version of peutils Thu May 17 19:30:41 CEST 2007 ero@dkbza.org * Added parsing error functionality and fixed some parsing issues -Version bumped to 1.2.5 -Added parsing error reporting "get_warnings()" and "show_warnings()" -Fixed issues parsing version information that led to infinite loops Tue Mar 13 11:51:30 CET 2007 ero@dkbza.org * Added length check to escape an infinite loop when parsing the version information Wed Mar 7 21:04:01 CET 2007 ero@dkbza.org * Increased version number Wed Mar 7 21:01:19 CET 2007 ero@dkbza.org * Fixed issue writing PE files Fixed issue with the "writer". A previous enhancement triggered a bug that generates wrong data if the PE file is written back to disk (only in the case that the file had version information in the resources directory) Wed Feb 28 00:35:52 CET 2007 ero@dkbza.org * Increased version number Wed Feb 28 00:35:04 CET 2007 ero@dkbza.org * Fixed bug with large NumberOfRvaAndSizes values that could not be converted to int Thu Feb 22 13:25:39 CET 2007 ero@dkbza.org * Parsing enhancements -pefile-1.2.2 can now correctly parse the files from the Tiny PE challenge (http://www.phreedom.org/solar/code/tinype/), which push the limits of valid parsing -Improved parsing of basic headers and data directories Tue Feb 20 00:24:19 CET 2007 ero@dkbza.org * Small cosmetic changes Tue Feb 13 19:22:06 CET 2007 ero@dkbza.org * New features and bugfixes for version 1.2.2 -Added support for parsing the version information structures in the resources directory -Fixed bug in writing mode in Windows -The display of hexadecimal number is now explicit, all hex numbers have the prefix '0x' -Increased version number to 1.2.2 Sat Jan 20 20:56:21 CET 2007 ero@dkbza.org * Improved processing of unicode strings Wed Nov 1 21:39:31 CET 2006 ero@dkbza.org * Updated README Wed Nov 1 18:17:04 CET 2006 ero@dkbza.org * Multiple bugfixes and enhancements -Added more machine types -Added support for the PE32+ format -Fixed bugs handling unicode data when dumping the PE file information in text mode -Improved handling of malformed directories in the Optional Header. Large values of NumberOfRvaAndSizes should finally have no effect and the file should still be successfully parsed -Improved handling of section names -Improved handling of the resources directory hierarchy Tue Oct 24 12:47:57 CEST 2006 ero@dkbza.org * Added notes about the writing support Tue Oct 24 12:33:22 CEST 2006 ero@dkbza.org * Merged OC patch and other bugfixes -Merged all the changes made by the Offensive Computing people. Fixing miscellaneous bugs when parsing files bordering the malformed. -Fixed bug when appending unicode strings from the resources to the plain ascii ones generated when dumping the file information. As everything was converted to unicode in order to be appended, sometimes non-printable characters got converted to unicode that could not be mapped back later to a printable char, hence 'print' was failing. Now all the unicode strings are str()'d before they are appended to the rest of the output. Sun Oct 22 15:39:33 CEST 2006 ero@dkbza.org * Added support for writing changes to the PE file Tue May 30 03:16:50 CEST 2006 ero@dkbza.org * Updated copyright strings Tue May 30 03:10:54 CEST 2006 ero@dkbza.org * Bumped version number to 1.1 and added a new method Added the method 'get_memory_mapped_image' to retrieve the PE image data layouted as it would exist once laded in memory. Wed May 17 14:14:57 CEST 2006 ero@dkbza.org * Fixed bug parsing sections Both Jarkko Turkulainen and Nicolas Falliere had noticed that there was a bug when parsing the section headers. pefile made the assumption that they always start right after the Optional Header end and used a default size for it. In fact, the correct way of reaching the section headers is by adding the SizeOfOptionalHeader to the its start offset. Jarkko's patch, besides fixing that issue, also checks earlier for a correct PE signature. Mon Dec 26 23:40:51 CET 2005 ero@dkbza.org * Changed name from pype to pefile and bumped up version Mon Dec 26 23:25:50 CET 2005 ero@dkbza.org * Added more fields to the setup file Mon Dec 26 23:10:54 CET 2005 ero@dkbza.org * Multiple enhancements -Added support for delayed imports provided by Adam Morrison -Fixed a bug parsing corrupted imports table Mon Nov 14 23:59:55 CET 2005 ero@dkbza.org * Improvements and bug fixes -Added handling of invalind timestamp values in some fields in the header -Fixed file opening mode for Windows, now files are always opened in binary mode. -Fixed loading of mischievous PE files where the sections are not properly aligned according to the FileAligment field contents. (Jarkko bumped into a packer that used this trick. -Fixed printing of unrecorgnized RESOURCE_TYPE IDs when dumping the resources directory. -Bumped version up to 0.9.1 Mon Nov 14 23:59:24 CET 2005 ero@dkbza.org * Small fixes Mon Aug 29 10:16:42 CEST 2005 ero@dkbza.org * Fixed reading of PE files with non-standard number of directories Tue Jul 19 00:24:10 CEST 2005 ero@dkbza.org * Initial import pefile-1.2.10-139/COPYING0000644000076500000240000000263012252127534014320 0ustar erostaff00000000000000Copyright (c) 2004-2013 Ero Carrera . All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. pefile-1.2.10-139/MANIFEST0000644000076500000240000000006510656455377014435 0ustar erostaff00000000000000COPYING CHANGES README pefile.py peutils.py setup.py pefile-1.2.10-139/ordlookup/0000755000076500000240000000000012252150355015277 5ustar erostaff00000000000000pefile-1.2.10-139/ordlookup/__init__.py0000644000076500000240000000122312247735317017421 0ustar erostaff00000000000000import ws2_32 import oleaut32 ''' A small module for keeping a database of ordinal to symbol mappings for DLLs which frequently get linked without symbolic infoz. ''' ords = { 'ws2_32.dll':ws2_32.ord_names, 'wsock32.dll':ws2_32.ord_names, 'oleaut32.dll':oleaut32.ord_names, } def ordLookup(libname, ord, make_name=False): ''' Lookup a name for the given ordinal if it's in our database. ''' names = ords.get(libname.lower()) if names == None: if make_name is True: return 'ord%d' % ord return None name = names.get(ord) if name == None: return 'ord%d' % ord return name pefile-1.2.10-139/ordlookup/oleaut32.py0000644000076500000240000002354112243154250017312 0ustar erostaff00000000000000ord_names = { 2:'SysAllocString', 3:'SysReAllocString', 4:'SysAllocStringLen', 5:'SysReAllocStringLen', 6:'SysFreeString', 7:'SysStringLen', 8:'VariantInit', 9:'VariantClear', 10:'VariantCopy', 11:'VariantCopyInd', 12:'VariantChangeType', 13:'VariantTimeToDosDateTime', 14:'DosDateTimeToVariantTime', 15:'SafeArrayCreate', 16:'SafeArrayDestroy', 17:'SafeArrayGetDim', 18:'SafeArrayGetElemsize', 19:'SafeArrayGetUBound', 20:'SafeArrayGetLBound', 21:'SafeArrayLock', 22:'SafeArrayUnlock', 23:'SafeArrayAccessData', 24:'SafeArrayUnaccessData', 25:'SafeArrayGetElement', 26:'SafeArrayPutElement', 27:'SafeArrayCopy', 28:'DispGetParam', 29:'DispGetIDsOfNames', 30:'DispInvoke', 31:'CreateDispTypeInfo', 32:'CreateStdDispatch', 33:'RegisterActiveObject', 34:'RevokeActiveObject', 35:'GetActiveObject', 36:'SafeArrayAllocDescriptor', 37:'SafeArrayAllocData', 38:'SafeArrayDestroyDescriptor', 39:'SafeArrayDestroyData', 40:'SafeArrayRedim', 41:'SafeArrayAllocDescriptorEx', 42:'SafeArrayCreateEx', 43:'SafeArrayCreateVectorEx', 44:'SafeArraySetRecordInfo', 45:'SafeArrayGetRecordInfo', 46:'VarParseNumFromStr', 47:'VarNumFromParseNum', 48:'VarI2FromUI1', 49:'VarI2FromI4', 50:'VarI2FromR4', 51:'VarI2FromR8', 52:'VarI2FromCy', 53:'VarI2FromDate', 54:'VarI2FromStr', 55:'VarI2FromDisp', 56:'VarI2FromBool', 57:'SafeArraySetIID', 58:'VarI4FromUI1', 59:'VarI4FromI2', 60:'VarI4FromR4', 61:'VarI4FromR8', 62:'VarI4FromCy', 63:'VarI4FromDate', 64:'VarI4FromStr', 65:'VarI4FromDisp', 66:'VarI4FromBool', 67:'SafeArrayGetIID', 68:'VarR4FromUI1', 69:'VarR4FromI2', 70:'VarR4FromI4', 71:'VarR4FromR8', 72:'VarR4FromCy', 73:'VarR4FromDate', 74:'VarR4FromStr', 75:'VarR4FromDisp', 76:'VarR4FromBool', 77:'SafeArrayGetVartype', 78:'VarR8FromUI1', 79:'VarR8FromI2', 80:'VarR8FromI4', 81:'VarR8FromR4', 82:'VarR8FromCy', 83:'VarR8FromDate', 84:'VarR8FromStr', 85:'VarR8FromDisp', 86:'VarR8FromBool', 87:'VarFormat', 88:'VarDateFromUI1', 89:'VarDateFromI2', 90:'VarDateFromI4', 91:'VarDateFromR4', 92:'VarDateFromR8', 93:'VarDateFromCy', 94:'VarDateFromStr', 95:'VarDateFromDisp', 96:'VarDateFromBool', 97:'VarFormatDateTime', 98:'VarCyFromUI1', 99:'VarCyFromI2', 100:'VarCyFromI4', 101:'VarCyFromR4', 102:'VarCyFromR8', 103:'VarCyFromDate', 104:'VarCyFromStr', 105:'VarCyFromDisp', 106:'VarCyFromBool', 107:'VarFormatNumber', 108:'VarBstrFromUI1', 109:'VarBstrFromI2', 110:'VarBstrFromI4', 111:'VarBstrFromR4', 112:'VarBstrFromR8', 113:'VarBstrFromCy', 114:'VarBstrFromDate', 115:'VarBstrFromDisp', 116:'VarBstrFromBool', 117:'VarFormatPercent', 118:'VarBoolFromUI1', 119:'VarBoolFromI2', 120:'VarBoolFromI4', 121:'VarBoolFromR4', 122:'VarBoolFromR8', 123:'VarBoolFromDate', 124:'VarBoolFromCy', 125:'VarBoolFromStr', 126:'VarBoolFromDisp', 127:'VarFormatCurrency', 128:'VarWeekdayName', 129:'VarMonthName', 130:'VarUI1FromI2', 131:'VarUI1FromI4', 132:'VarUI1FromR4', 133:'VarUI1FromR8', 134:'VarUI1FromCy', 135:'VarUI1FromDate', 136:'VarUI1FromStr', 137:'VarUI1FromDisp', 138:'VarUI1FromBool', 139:'VarFormatFromTokens', 140:'VarTokenizeFormatString', 141:'VarAdd', 142:'VarAnd', 143:'VarDiv', 144:'DllCanUnloadNow', 145:'DllGetClassObject', 146:'DispCallFunc', 147:'VariantChangeTypeEx', 148:'SafeArrayPtrOfIndex', 149:'SysStringByteLen', 150:'SysAllocStringByteLen', 151:'DllRegisterServer', 152:'VarEqv', 153:'VarIdiv', 154:'VarImp', 155:'VarMod', 156:'VarMul', 157:'VarOr', 158:'VarPow', 159:'VarSub', 160:'CreateTypeLib', 161:'LoadTypeLib', 162:'LoadRegTypeLib', 163:'RegisterTypeLib', 164:'QueryPathOfRegTypeLib', 165:'LHashValOfNameSys', 166:'LHashValOfNameSysA', 167:'VarXor', 168:'VarAbs', 169:'VarFix', 170:'OaBuildVersion', 171:'ClearCustData', 172:'VarInt', 173:'VarNeg', 174:'VarNot', 175:'VarRound', 176:'VarCmp', 177:'VarDecAdd', 178:'VarDecDiv', 179:'VarDecMul', 180:'CreateTypeLib2', 181:'VarDecSub', 182:'VarDecAbs', 183:'LoadTypeLibEx', 184:'SystemTimeToVariantTime', 185:'VariantTimeToSystemTime', 186:'UnRegisterTypeLib', 187:'VarDecFix', 188:'VarDecInt', 189:'VarDecNeg', 190:'VarDecFromUI1', 191:'VarDecFromI2', 192:'VarDecFromI4', 193:'VarDecFromR4', 194:'VarDecFromR8', 195:'VarDecFromDate', 196:'VarDecFromCy', 197:'VarDecFromStr', 198:'VarDecFromDisp', 199:'VarDecFromBool', 200:'GetErrorInfo', 201:'SetErrorInfo', 202:'CreateErrorInfo', 203:'VarDecRound', 204:'VarDecCmp', 205:'VarI2FromI1', 206:'VarI2FromUI2', 207:'VarI2FromUI4', 208:'VarI2FromDec', 209:'VarI4FromI1', 210:'VarI4FromUI2', 211:'VarI4FromUI4', 212:'VarI4FromDec', 213:'VarR4FromI1', 214:'VarR4FromUI2', 215:'VarR4FromUI4', 216:'VarR4FromDec', 217:'VarR8FromI1', 218:'VarR8FromUI2', 219:'VarR8FromUI4', 220:'VarR8FromDec', 221:'VarDateFromI1', 222:'VarDateFromUI2', 223:'VarDateFromUI4', 224:'VarDateFromDec', 225:'VarCyFromI1', 226:'VarCyFromUI2', 227:'VarCyFromUI4', 228:'VarCyFromDec', 229:'VarBstrFromI1', 230:'VarBstrFromUI2', 231:'VarBstrFromUI4', 232:'VarBstrFromDec', 233:'VarBoolFromI1', 234:'VarBoolFromUI2', 235:'VarBoolFromUI4', 236:'VarBoolFromDec', 237:'VarUI1FromI1', 238:'VarUI1FromUI2', 239:'VarUI1FromUI4', 240:'VarUI1FromDec', 241:'VarDecFromI1', 242:'VarDecFromUI2', 243:'VarDecFromUI4', 244:'VarI1FromUI1', 245:'VarI1FromI2', 246:'VarI1FromI4', 247:'VarI1FromR4', 248:'VarI1FromR8', 249:'VarI1FromDate', 250:'VarI1FromCy', 251:'VarI1FromStr', 252:'VarI1FromDisp', 253:'VarI1FromBool', 254:'VarI1FromUI2', 255:'VarI1FromUI4', 256:'VarI1FromDec', 257:'VarUI2FromUI1', 258:'VarUI2FromI2', 259:'VarUI2FromI4', 260:'VarUI2FromR4', 261:'VarUI2FromR8', 262:'VarUI2FromDate', 263:'VarUI2FromCy', 264:'VarUI2FromStr', 265:'VarUI2FromDisp', 266:'VarUI2FromBool', 267:'VarUI2FromI1', 268:'VarUI2FromUI4', 269:'VarUI2FromDec', 270:'VarUI4FromUI1', 271:'VarUI4FromI2', 272:'VarUI4FromI4', 273:'VarUI4FromR4', 274:'VarUI4FromR8', 275:'VarUI4FromDate', 276:'VarUI4FromCy', 277:'VarUI4FromStr', 278:'VarUI4FromDisp', 279:'VarUI4FromBool', 280:'VarUI4FromI1', 281:'VarUI4FromUI2', 282:'VarUI4FromDec', 283:'BSTR_UserSize', 284:'BSTR_UserMarshal', 285:'BSTR_UserUnmarshal', 286:'BSTR_UserFree', 287:'VARIANT_UserSize', 288:'VARIANT_UserMarshal', 289:'VARIANT_UserUnmarshal', 290:'VARIANT_UserFree', 291:'LPSAFEARRAY_UserSize', 292:'LPSAFEARRAY_UserMarshal', 293:'LPSAFEARRAY_UserUnmarshal', 294:'LPSAFEARRAY_UserFree', 295:'LPSAFEARRAY_Size', 296:'LPSAFEARRAY_Marshal', 297:'LPSAFEARRAY_Unmarshal', 298:'VarDecCmpR8', 299:'VarCyAdd', 300:'DllUnregisterServer', 301:'OACreateTypeLib2', 303:'VarCyMul', 304:'VarCyMulI4', 305:'VarCySub', 306:'VarCyAbs', 307:'VarCyFix', 308:'VarCyInt', 309:'VarCyNeg', 310:'VarCyRound', 311:'VarCyCmp', 312:'VarCyCmpR8', 313:'VarBstrCat', 314:'VarBstrCmp', 315:'VarR8Pow', 316:'VarR4CmpR8', 317:'VarR8Round', 318:'VarCat', 319:'VarDateFromUdateEx', 322:'GetRecordInfoFromGuids', 323:'GetRecordInfoFromTypeInfo', 325:'SetVarConversionLocaleSetting', 326:'GetVarConversionLocaleSetting', 327:'SetOaNoCache', 329:'VarCyMulI8', 330:'VarDateFromUdate', 331:'VarUdateFromDate', 332:'GetAltMonthNames', 333:'VarI8FromUI1', 334:'VarI8FromI2', 335:'VarI8FromR4', 336:'VarI8FromR8', 337:'VarI8FromCy', 338:'VarI8FromDate', 339:'VarI8FromStr', 340:'VarI8FromDisp', 341:'VarI8FromBool', 342:'VarI8FromI1', 343:'VarI8FromUI2', 344:'VarI8FromUI4', 345:'VarI8FromDec', 346:'VarI2FromI8', 347:'VarI2FromUI8', 348:'VarI4FromI8', 349:'VarI4FromUI8', 360:'VarR4FromI8', 361:'VarR4FromUI8', 362:'VarR8FromI8', 363:'VarR8FromUI8', 364:'VarDateFromI8', 365:'VarDateFromUI8', 366:'VarCyFromI8', 367:'VarCyFromUI8', 368:'VarBstrFromI8', 369:'VarBstrFromUI8', 370:'VarBoolFromI8', 371:'VarBoolFromUI8', 372:'VarUI1FromI8', 373:'VarUI1FromUI8', 374:'VarDecFromI8', 375:'VarDecFromUI8', 376:'VarI1FromI8', 377:'VarI1FromUI8', 378:'VarUI2FromI8', 379:'VarUI2FromUI8', 401:'OleLoadPictureEx', 402:'OleLoadPictureFileEx', 411:'SafeArrayCreateVector', 412:'SafeArrayCopyData', 413:'VectorFromBstr', 414:'BstrFromVector', 415:'OleIconToCursor', 416:'OleCreatePropertyFrameIndirect', 417:'OleCreatePropertyFrame', 418:'OleLoadPicture', 419:'OleCreatePictureIndirect', 420:'OleCreateFontIndirect', 421:'OleTranslateColor', 422:'OleLoadPictureFile', 423:'OleSavePictureFile', 424:'OleLoadPicturePath', 425:'VarUI4FromI8', 426:'VarUI4FromUI8', 427:'VarI8FromUI8', 428:'VarUI8FromI8', 429:'VarUI8FromUI1', 430:'VarUI8FromI2', 431:'VarUI8FromR4', 432:'VarUI8FromR8', 433:'VarUI8FromCy', 434:'VarUI8FromDate', 435:'VarUI8FromStr', 436:'VarUI8FromDisp', 437:'VarUI8FromBool', 438:'VarUI8FromI1', 439:'VarUI8FromUI2', 440:'VarUI8FromUI4', 441:'VarUI8FromDec', 442:'RegisterTypeLibForUser', 443:'UnRegisterTypeLibForUser', } pefile-1.2.10-139/ordlookup/ws2_32.py0000644000076500000240000000573012232237145016676 0ustar erostaff00000000000000 ord_names = { 1:'accept', 2:'bind', 3:'closesocket', 4:'connect', 5:'getpeername', 6:'getsockname', 7:'getsockopt', 8:'htonl', 9:'htons', 10:'ioctlsocket', 11:'inet_addr', 12:'inet_ntoa', 13:'listen', 14:'ntohl', 15:'ntohs', 16:'recv', 17:'recvfrom', 18:'select', 19:'send', 20:'sendto', 21:'setsockopt', 22:'shutdown', 23:'socket', 24:'GetAddrInfoW', 25:'GetNameInfoW', 26:'WSApSetPostRoutine', 27:'FreeAddrInfoW', 28:'WPUCompleteOverlappedRequest', 29:'WSAAccept', 30:'WSAAddressToStringA', 31:'WSAAddressToStringW', 32:'WSACloseEvent', 33:'WSAConnect', 34:'WSACreateEvent', 35:'WSADuplicateSocketA', 36:'WSADuplicateSocketW', 37:'WSAEnumNameSpaceProvidersA', 38:'WSAEnumNameSpaceProvidersW', 39:'WSAEnumNetworkEvents', 40:'WSAEnumProtocolsA', 41:'WSAEnumProtocolsW', 42:'WSAEventSelect', 43:'WSAGetOverlappedResult', 44:'WSAGetQOSByName', 45:'WSAGetServiceClassInfoA', 46:'WSAGetServiceClassInfoW', 47:'WSAGetServiceClassNameByClassIdA', 48:'WSAGetServiceClassNameByClassIdW', 49:'WSAHtonl', 50:'WSAHtons', 51:'gethostbyaddr', 52:'gethostbyname', 53:'getprotobyname', 54:'getprotobynumber', 55:'getservbyname', 56:'getservbyport', 57:'gethostname', 58:'WSAInstallServiceClassA', 59:'WSAInstallServiceClassW', 60:'WSAIoctl', 61:'WSAJoinLeaf', 62:'WSALookupServiceBeginA', 63:'WSALookupServiceBeginW', 64:'WSALookupServiceEnd', 65:'WSALookupServiceNextA', 66:'WSALookupServiceNextW', 67:'WSANSPIoctl', 68:'WSANtohl', 69:'WSANtohs', 70:'WSAProviderConfigChange', 71:'WSARecv', 72:'WSARecvDisconnect', 73:'WSARecvFrom', 74:'WSARemoveServiceClass', 75:'WSAResetEvent', 76:'WSASend', 77:'WSASendDisconnect', 78:'WSASendTo', 79:'WSASetEvent', 80:'WSASetServiceA', 81:'WSASetServiceW', 82:'WSASocketA', 83:'WSASocketW', 84:'WSAStringToAddressA', 85:'WSAStringToAddressW', 86:'WSAWaitForMultipleEvents', 87:'WSCDeinstallProvider', 88:'WSCEnableNSProvider', 89:'WSCEnumProtocols', 90:'WSCGetProviderPath', 91:'WSCInstallNameSpace', 92:'WSCInstallProvider', 93:'WSCUnInstallNameSpace', 94:'WSCUpdateProvider', 95:'WSCWriteNameSpaceOrder', 96:'WSCWriteProviderOrder', 97:'freeaddrinfo', 98:'getaddrinfo', 99:'getnameinfo', 101:'WSAAsyncSelect', 102:'WSAAsyncGetHostByAddr', 103:'WSAAsyncGetHostByName', 104:'WSAAsyncGetProtoByNumber', 105:'WSAAsyncGetProtoByName', 106:'WSAAsyncGetServByPort', 107:'WSAAsyncGetServByName', 108:'WSACancelAsyncRequest', 109:'WSASetBlockingHook', 110:'WSAUnhookBlockingHook', 111:'WSAGetLastError', 112:'WSASetLastError', 113:'WSACancelBlockingCall', 114:'WSAIsBlocking', 115:'WSAStartup', 116:'WSACleanup', 151:'__WSAFDIsSet', 500:'WEP', } pefile-1.2.10-139/pefile.egg-info/0000755000076500000240000000000012252150355016217 5ustar erostaff00000000000000pefile-1.2.10-139/pefile.egg-info/dependency_links.txt0000644000076500000240000000000112252150355022265 0ustar erostaff00000000000000 pefile-1.2.10-139/pefile.egg-info/PKG-INFO0000644000076500000240000000277612252150355017330 0ustar erostaff00000000000000Metadata-Version: 1.1 Name: pefile Version: 1.2.10-139 Summary: Python PE parsing module Home-page: http://code.google.com/p/pefile/ Author: Ero Carrera Author-email: ero.carrera@gmail.com License: UNKNOWN Download-URL: http://pefile.googlecode.com/files/pefile-1.2.10-139.tar.gz Description: pefile, Portable Executable reader module All the PE file basic structures are available with their default names as attributes of the instance returned. Processed elements such as the import table are made available with lowercase names, to differentiate them from the upper case basic structure names. pefile has been tested against the limits of valid PE headers, that is, malware. Lots of packed malware attempt to abuse the format way beyond its standard use. To the best of my knowledge most of the abuses are handled gracefully. Copyright (c) 2005-2013 Ero Carrera All rights reserved. For detailed copyright information see the file COPYING in the root of the distribution archive. Platform: any Classifier: Development Status :: 5 - Production/Stable Classifier: Intended Audience :: Developers Classifier: Intended Audience :: Science/Research Classifier: Natural Language :: English Classifier: Operating System :: OS Independent Classifier: Programming Language :: Python Classifier: Topic :: Software Development :: Libraries :: Python Modules pefile-1.2.10-139/pefile.egg-info/SOURCES.txt0000644000076500000240000000040112252150355020076 0ustar erostaff00000000000000CHANGES_up_to_1.2.6 COPYING MANIFEST README pefile.py peutils.py setup.py ordlookup/__init__.py ordlookup/oleaut32.py ordlookup/ws2_32.py pefile.egg-info/PKG-INFO pefile.egg-info/SOURCES.txt pefile.egg-info/dependency_links.txt pefile.egg-info/top_level.txtpefile-1.2.10-139/pefile.egg-info/top_level.txt0000644000076500000240000000003112252150355020743 0ustar erostaff00000000000000pefile ordlookup peutils pefile-1.2.10-139/pefile.py0000644000076500000240000062200012252130073015072 0ustar erostaff00000000000000# -*- coding: Latin-1 -*- """pefile, Portable Executable reader module All the PE file basic structures are available with their default names as attributes of the instance returned. Processed elements such as the import table are made available with lowercase names, to differentiate them from the upper case basic structure names. pefile has been tested against the limits of valid PE headers, that is, malware. Lots of packed malware attempt to abuse the format way beyond its standard use. To the best of my knowledge most of the abuses are handled gracefully. Copyright (c) 2005-2013 Ero Carrera All rights reserved. For detailed copyright information see the file COPYING in the root of the distribution archive. """ __revision__ = "$LastChangedRevision: 139 $" __author__ = 'Ero Carrera' __version__ = '1.2.10-%d' % int( __revision__[21:-2] ) __contact__ = 'ero.carrera@gmail.com' import os import struct import time import math import re import exceptions import string import array import mmap import ordlookup sha1, sha256, sha512, md5 = None, None, None, None try: import hashlib sha1 = hashlib.sha1 sha256 = hashlib.sha256 sha512 = hashlib.sha512 md5 = hashlib.md5 except ImportError: try: import sha sha1 = sha.new except ImportError: pass try: import md5 md5 = md5.new except ImportError: pass try: enumerate except NameError: def enumerate(iter): L = list(iter) return zip(range(0, len(L)), L) def is_bytearray_available(): if isinstance(__builtins__, dict): return ('bytearray' in __builtins__) return ('bytearray' in __builtins__.__dict__) fast_load = False # This will set a maximum length of a string to be retrieved from the file. # It's there to prevent loading massive amounts of data from memory mapped # files. Strings longer than 1MB should be rather rare. MAX_STRING_LENGTH = 0x100000 # 2^20 IMAGE_DOS_SIGNATURE = 0x5A4D IMAGE_DOSZM_SIGNATURE = 0x4D5A IMAGE_NE_SIGNATURE = 0x454E IMAGE_LE_SIGNATURE = 0x454C IMAGE_LX_SIGNATURE = 0x584C IMAGE_TE_SIGNATURE = 0x5A56 # Terse Executables have a 'VZ' signature IMAGE_NT_SIGNATURE = 0x00004550 IMAGE_NUMBEROF_DIRECTORY_ENTRIES= 16 IMAGE_ORDINAL_FLAG = 0x80000000L IMAGE_ORDINAL_FLAG64 = 0x8000000000000000L OPTIONAL_HEADER_MAGIC_PE = 0x10b OPTIONAL_HEADER_MAGIC_PE_PLUS = 0x20b directory_entry_types = [ ('IMAGE_DIRECTORY_ENTRY_EXPORT', 0), ('IMAGE_DIRECTORY_ENTRY_IMPORT', 1), ('IMAGE_DIRECTORY_ENTRY_RESOURCE', 2), ('IMAGE_DIRECTORY_ENTRY_EXCEPTION', 3), ('IMAGE_DIRECTORY_ENTRY_SECURITY', 4), ('IMAGE_DIRECTORY_ENTRY_BASERELOC', 5), ('IMAGE_DIRECTORY_ENTRY_DEBUG', 6), ('IMAGE_DIRECTORY_ENTRY_COPYRIGHT', 7), # Architecture on non-x86 platforms ('IMAGE_DIRECTORY_ENTRY_GLOBALPTR', 8), ('IMAGE_DIRECTORY_ENTRY_TLS', 9), ('IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG', 10), ('IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT', 11), ('IMAGE_DIRECTORY_ENTRY_IAT', 12), ('IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT', 13), ('IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR',14), ('IMAGE_DIRECTORY_ENTRY_RESERVED', 15) ] DIRECTORY_ENTRY = dict([(e[1], e[0]) for e in directory_entry_types]+directory_entry_types) image_characteristics = [ ('IMAGE_FILE_RELOCS_STRIPPED', 0x0001), ('IMAGE_FILE_EXECUTABLE_IMAGE', 0x0002), ('IMAGE_FILE_LINE_NUMS_STRIPPED', 0x0004), ('IMAGE_FILE_LOCAL_SYMS_STRIPPED', 0x0008), ('IMAGE_FILE_AGGRESIVE_WS_TRIM', 0x0010), ('IMAGE_FILE_LARGE_ADDRESS_AWARE', 0x0020), ('IMAGE_FILE_16BIT_MACHINE', 0x0040), ('IMAGE_FILE_BYTES_REVERSED_LO', 0x0080), ('IMAGE_FILE_32BIT_MACHINE', 0x0100), ('IMAGE_FILE_DEBUG_STRIPPED', 0x0200), ('IMAGE_FILE_REMOVABLE_RUN_FROM_SWAP', 0x0400), ('IMAGE_FILE_NET_RUN_FROM_SWAP', 0x0800), ('IMAGE_FILE_SYSTEM', 0x1000), ('IMAGE_FILE_DLL', 0x2000), ('IMAGE_FILE_UP_SYSTEM_ONLY', 0x4000), ('IMAGE_FILE_BYTES_REVERSED_HI', 0x8000) ] IMAGE_CHARACTERISTICS = dict([(e[1], e[0]) for e in image_characteristics]+image_characteristics) section_characteristics = [ ('IMAGE_SCN_TYPE_REG', 0x00000000), # reserved ('IMAGE_SCN_TYPE_DSECT', 0x00000001), # reserved ('IMAGE_SCN_TYPE_NOLOAD', 0x00000002), # reserved ('IMAGE_SCN_TYPE_GROUP', 0x00000004), # reserved ('IMAGE_SCN_TYPE_NO_PAD', 0x00000008), # reserved ('IMAGE_SCN_TYPE_COPY', 0x00000010), # reserved ('IMAGE_SCN_CNT_CODE', 0x00000020), ('IMAGE_SCN_CNT_INITIALIZED_DATA', 0x00000040), ('IMAGE_SCN_CNT_UNINITIALIZED_DATA', 0x00000080), ('IMAGE_SCN_LNK_OTHER', 0x00000100), ('IMAGE_SCN_LNK_INFO', 0x00000200), ('IMAGE_SCN_LNK_OVER', 0x00000400), # reserved ('IMAGE_SCN_LNK_REMOVE', 0x00000800), ('IMAGE_SCN_LNK_COMDAT', 0x00001000), ('IMAGE_SCN_MEM_PROTECTED', 0x00004000), # obsolete ('IMAGE_SCN_NO_DEFER_SPEC_EXC', 0x00004000), ('IMAGE_SCN_GPREL', 0x00008000), ('IMAGE_SCN_MEM_FARDATA', 0x00008000), ('IMAGE_SCN_MEM_SYSHEAP', 0x00010000), # obsolete ('IMAGE_SCN_MEM_PURGEABLE', 0x00020000), ('IMAGE_SCN_MEM_16BIT', 0x00020000), ('IMAGE_SCN_MEM_LOCKED', 0x00040000), ('IMAGE_SCN_MEM_PRELOAD', 0x00080000), ('IMAGE_SCN_ALIGN_1BYTES', 0x00100000), ('IMAGE_SCN_ALIGN_2BYTES', 0x00200000), ('IMAGE_SCN_ALIGN_4BYTES', 0x00300000), ('IMAGE_SCN_ALIGN_8BYTES', 0x00400000), ('IMAGE_SCN_ALIGN_16BYTES', 0x00500000), # default alignment ('IMAGE_SCN_ALIGN_32BYTES', 0x00600000), ('IMAGE_SCN_ALIGN_64BYTES', 0x00700000), ('IMAGE_SCN_ALIGN_128BYTES', 0x00800000), ('IMAGE_SCN_ALIGN_256BYTES', 0x00900000), ('IMAGE_SCN_ALIGN_512BYTES', 0x00A00000), ('IMAGE_SCN_ALIGN_1024BYTES', 0x00B00000), ('IMAGE_SCN_ALIGN_2048BYTES', 0x00C00000), ('IMAGE_SCN_ALIGN_4096BYTES', 0x00D00000), ('IMAGE_SCN_ALIGN_8192BYTES', 0x00E00000), ('IMAGE_SCN_ALIGN_MASK', 0x00F00000), ('IMAGE_SCN_LNK_NRELOC_OVFL', 0x01000000), ('IMAGE_SCN_MEM_DISCARDABLE', 0x02000000), ('IMAGE_SCN_MEM_NOT_CACHED', 0x04000000), ('IMAGE_SCN_MEM_NOT_PAGED', 0x08000000), ('IMAGE_SCN_MEM_SHARED', 0x10000000), ('IMAGE_SCN_MEM_EXECUTE', 0x20000000), ('IMAGE_SCN_MEM_READ', 0x40000000), ('IMAGE_SCN_MEM_WRITE', 0x80000000L) ] SECTION_CHARACTERISTICS = dict([(e[1], e[0]) for e in section_characteristics]+section_characteristics) debug_types = [ ('IMAGE_DEBUG_TYPE_UNKNOWN', 0), ('IMAGE_DEBUG_TYPE_COFF', 1), ('IMAGE_DEBUG_TYPE_CODEVIEW', 2), ('IMAGE_DEBUG_TYPE_FPO', 3), ('IMAGE_DEBUG_TYPE_MISC', 4), ('IMAGE_DEBUG_TYPE_EXCEPTION', 5), ('IMAGE_DEBUG_TYPE_FIXUP', 6), ('IMAGE_DEBUG_TYPE_OMAP_TO_SRC', 7), ('IMAGE_DEBUG_TYPE_OMAP_FROM_SRC', 8), ('IMAGE_DEBUG_TYPE_BORLAND', 9), ('IMAGE_DEBUG_TYPE_RESERVED10', 10), ('IMAGE_DEBUG_TYPE_CLSID', 11) ] DEBUG_TYPE = dict([(e[1], e[0]) for e in debug_types]+debug_types) subsystem_types = [ ('IMAGE_SUBSYSTEM_UNKNOWN', 0), ('IMAGE_SUBSYSTEM_NATIVE', 1), ('IMAGE_SUBSYSTEM_WINDOWS_GUI', 2), ('IMAGE_SUBSYSTEM_WINDOWS_CUI', 3), ('IMAGE_SUBSYSTEM_OS2_CUI', 5), ('IMAGE_SUBSYSTEM_POSIX_CUI', 7), ('IMAGE_SUBSYSTEM_NATIVE_WINDOWS', 8), ('IMAGE_SUBSYSTEM_WINDOWS_CE_GUI', 9), ('IMAGE_SUBSYSTEM_EFI_APPLICATION', 10), ('IMAGE_SUBSYSTEM_EFI_BOOT_SERVICE_DRIVER', 11), ('IMAGE_SUBSYSTEM_EFI_RUNTIME_DRIVER', 12), ('IMAGE_SUBSYSTEM_EFI_ROM', 13), ('IMAGE_SUBSYSTEM_XBOX', 14), ('IMAGE_SUBSYSTEM_WINDOWS_BOOT_APPLICATION', 16)] SUBSYSTEM_TYPE = dict([(e[1], e[0]) for e in subsystem_types]+subsystem_types) machine_types = [ ('IMAGE_FILE_MACHINE_UNKNOWN', 0), ('IMAGE_FILE_MACHINE_I386', 0x014c), ('IMAGE_FILE_MACHINE_R3000', 0x0162), ('IMAGE_FILE_MACHINE_R4000', 0x0166), ('IMAGE_FILE_MACHINE_R10000', 0x0168), ('IMAGE_FILE_MACHINE_WCEMIPSV2',0x0169), ('IMAGE_FILE_MACHINE_ALPHA', 0x0184), ('IMAGE_FILE_MACHINE_SH3', 0x01a2), ('IMAGE_FILE_MACHINE_SH3DSP', 0x01a3), ('IMAGE_FILE_MACHINE_SH3E', 0x01a4), ('IMAGE_FILE_MACHINE_SH4', 0x01a6), ('IMAGE_FILE_MACHINE_SH5', 0x01a8), ('IMAGE_FILE_MACHINE_ARM', 0x01c0), ('IMAGE_FILE_MACHINE_THUMB', 0x01c2), ('IMAGE_FILE_MACHINE_ARMNT', 0x01c4), ('IMAGE_FILE_MACHINE_AM33', 0x01d3), ('IMAGE_FILE_MACHINE_POWERPC', 0x01f0), ('IMAGE_FILE_MACHINE_POWERPCFP',0x01f1), ('IMAGE_FILE_MACHINE_IA64', 0x0200), ('IMAGE_FILE_MACHINE_MIPS16', 0x0266), ('IMAGE_FILE_MACHINE_ALPHA64', 0x0284), ('IMAGE_FILE_MACHINE_AXP64', 0x0284), # same ('IMAGE_FILE_MACHINE_MIPSFPU', 0x0366), ('IMAGE_FILE_MACHINE_MIPSFPU16',0x0466), ('IMAGE_FILE_MACHINE_TRICORE', 0x0520), ('IMAGE_FILE_MACHINE_CEF', 0x0cef), ('IMAGE_FILE_MACHINE_EBC', 0x0ebc), ('IMAGE_FILE_MACHINE_AMD64', 0x8664), ('IMAGE_FILE_MACHINE_M32R', 0x9041), ('IMAGE_FILE_MACHINE_CEE', 0xc0ee), ] MACHINE_TYPE = dict([(e[1], e[0]) for e in machine_types]+machine_types) relocation_types = [ ('IMAGE_REL_BASED_ABSOLUTE', 0), ('IMAGE_REL_BASED_HIGH', 1), ('IMAGE_REL_BASED_LOW', 2), ('IMAGE_REL_BASED_HIGHLOW', 3), ('IMAGE_REL_BASED_HIGHADJ', 4), ('IMAGE_REL_BASED_MIPS_JMPADDR', 5), ('IMAGE_REL_BASED_SECTION', 6), ('IMAGE_REL_BASED_REL', 7), ('IMAGE_REL_BASED_MIPS_JMPADDR16', 9), ('IMAGE_REL_BASED_IA64_IMM64', 9), ('IMAGE_REL_BASED_DIR64', 10), ('IMAGE_REL_BASED_HIGH3ADJ', 11) ] RELOCATION_TYPE = dict([(e[1], e[0]) for e in relocation_types]+relocation_types) dll_characteristics = [ ('IMAGE_LIBRARY_PROCESS_INIT', 0x0001), # reserved ('IMAGE_LIBRARY_PROCESS_TERM', 0x0002), # reserved ('IMAGE_LIBRARY_THREAD_INIT', 0x0004), # reserved ('IMAGE_LIBRARY_THREAD_TERM', 0x0008), # reserved ('IMAGE_DLLCHARACTERISTICS_HIGH_ENTROPY_VA', 0x0020), ('IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE', 0x0040), ('IMAGE_DLLCHARACTERISTICS_FORCE_INTEGRITY', 0x0080), ('IMAGE_DLLCHARACTERISTICS_NX_COMPAT', 0x0100), ('IMAGE_DLLCHARACTERISTICS_NO_ISOLATION', 0x0200), ('IMAGE_DLLCHARACTERISTICS_NO_SEH', 0x0400), ('IMAGE_DLLCHARACTERISTICS_NO_BIND', 0x0800), ('IMAGE_DLLCHARACTERISTICS_APPCONTAINER', 0x1000), ('IMAGE_DLLCHARACTERISTICS_WDM_DRIVER', 0x2000), ('IMAGE_DLLCHARACTERISTICS_GUARD_CF', 0x4000), ('IMAGE_DLLCHARACTERISTICS_TERMINAL_SERVER_AWARE', 0x8000) ] DLL_CHARACTERISTICS = dict([(e[1], e[0]) for e in dll_characteristics]+dll_characteristics) # Resource types resource_type = [ ('RT_CURSOR', 1), ('RT_BITMAP', 2), ('RT_ICON', 3), ('RT_MENU', 4), ('RT_DIALOG', 5), ('RT_STRING', 6), ('RT_FONTDIR', 7), ('RT_FONT', 8), ('RT_ACCELERATOR', 9), ('RT_RCDATA', 10), ('RT_MESSAGETABLE', 11), ('RT_GROUP_CURSOR', 12), ('RT_GROUP_ICON', 14), ('RT_VERSION', 16), ('RT_DLGINCLUDE', 17), ('RT_PLUGPLAY', 19), ('RT_VXD', 20), ('RT_ANICURSOR', 21), ('RT_ANIICON', 22), ('RT_HTML', 23), ('RT_MANIFEST', 24) ] RESOURCE_TYPE = dict([(e[1], e[0]) for e in resource_type]+resource_type) # Language definitions lang = [ ('LANG_NEUTRAL', 0x00), ('LANG_INVARIANT', 0x7f), ('LANG_AFRIKAANS', 0x36), ('LANG_ALBANIAN', 0x1c), ('LANG_ARABIC', 0x01), ('LANG_ARMENIAN', 0x2b), ('LANG_ASSAMESE', 0x4d), ('LANG_AZERI', 0x2c), ('LANG_BASQUE', 0x2d), ('LANG_BELARUSIAN', 0x23), ('LANG_BENGALI', 0x45), ('LANG_BULGARIAN', 0x02), ('LANG_CATALAN', 0x03), ('LANG_CHINESE', 0x04), ('LANG_CROATIAN', 0x1a), ('LANG_CZECH', 0x05), ('LANG_DANISH', 0x06), ('LANG_DIVEHI', 0x65), ('LANG_DUTCH', 0x13), ('LANG_ENGLISH', 0x09), ('LANG_ESTONIAN', 0x25), ('LANG_FAEROESE', 0x38), ('LANG_FARSI', 0x29), ('LANG_FINNISH', 0x0b), ('LANG_FRENCH', 0x0c), ('LANG_GALICIAN', 0x56), ('LANG_GEORGIAN', 0x37), ('LANG_GERMAN', 0x07), ('LANG_GREEK', 0x08), ('LANG_GUJARATI', 0x47), ('LANG_HEBREW', 0x0d), ('LANG_HINDI', 0x39), ('LANG_HUNGARIAN', 0x0e), ('LANG_ICELANDIC', 0x0f), ('LANG_INDONESIAN', 0x21), ('LANG_ITALIAN', 0x10), ('LANG_JAPANESE', 0x11), ('LANG_KANNADA', 0x4b), ('LANG_KASHMIRI', 0x60), ('LANG_KAZAK', 0x3f), ('LANG_KONKANI', 0x57), ('LANG_KOREAN', 0x12), ('LANG_KYRGYZ', 0x40), ('LANG_LATVIAN', 0x26), ('LANG_LITHUANIAN', 0x27), ('LANG_MACEDONIAN', 0x2f), ('LANG_MALAY', 0x3e), ('LANG_MALAYALAM', 0x4c), ('LANG_MANIPURI', 0x58), ('LANG_MARATHI', 0x4e), ('LANG_MONGOLIAN', 0x50), ('LANG_NEPALI', 0x61), ('LANG_NORWEGIAN', 0x14), ('LANG_ORIYA', 0x48), ('LANG_POLISH', 0x15), ('LANG_PORTUGUESE', 0x16), ('LANG_PUNJABI', 0x46), ('LANG_ROMANIAN', 0x18), ('LANG_RUSSIAN', 0x19), ('LANG_SANSKRIT', 0x4f), ('LANG_SERBIAN', 0x1a), ('LANG_SINDHI', 0x59), ('LANG_SLOVAK', 0x1b), ('LANG_SLOVENIAN', 0x24), ('LANG_SPANISH', 0x0a), ('LANG_SWAHILI', 0x41), ('LANG_SWEDISH', 0x1d), ('LANG_SYRIAC', 0x5a), ('LANG_TAMIL', 0x49), ('LANG_TATAR', 0x44), ('LANG_TELUGU', 0x4a), ('LANG_THAI', 0x1e), ('LANG_TURKISH', 0x1f), ('LANG_UKRAINIAN', 0x22), ('LANG_URDU', 0x20), ('LANG_UZBEK', 0x43), ('LANG_VIETNAMESE', 0x2a), ('LANG_GAELIC', 0x3c), ('LANG_MALTESE', 0x3a), ('LANG_MAORI', 0x28), ('LANG_RHAETO_ROMANCE',0x17), ('LANG_SAAMI', 0x3b), ('LANG_SORBIAN', 0x2e), ('LANG_SUTU', 0x30), ('LANG_TSONGA', 0x31), ('LANG_TSWANA', 0x32), ('LANG_VENDA', 0x33), ('LANG_XHOSA', 0x34), ('LANG_ZULU', 0x35), ('LANG_ESPERANTO', 0x8f), ('LANG_WALON', 0x90), ('LANG_CORNISH', 0x91), ('LANG_WELSH', 0x92), ('LANG_BRETON', 0x93) ] LANG = dict(lang+[(e[1], e[0]) for e in lang]) # Sublanguage definitions sublang = [ ('SUBLANG_NEUTRAL', 0x00), ('SUBLANG_DEFAULT', 0x01), ('SUBLANG_SYS_DEFAULT', 0x02), ('SUBLANG_ARABIC_SAUDI_ARABIA', 0x01), ('SUBLANG_ARABIC_IRAQ', 0x02), ('SUBLANG_ARABIC_EGYPT', 0x03), ('SUBLANG_ARABIC_LIBYA', 0x04), ('SUBLANG_ARABIC_ALGERIA', 0x05), ('SUBLANG_ARABIC_MOROCCO', 0x06), ('SUBLANG_ARABIC_TUNISIA', 0x07), ('SUBLANG_ARABIC_OMAN', 0x08), ('SUBLANG_ARABIC_YEMEN', 0x09), ('SUBLANG_ARABIC_SYRIA', 0x0a), ('SUBLANG_ARABIC_JORDAN', 0x0b), ('SUBLANG_ARABIC_LEBANON', 0x0c), ('SUBLANG_ARABIC_KUWAIT', 0x0d), ('SUBLANG_ARABIC_UAE', 0x0e), ('SUBLANG_ARABIC_BAHRAIN', 0x0f), ('SUBLANG_ARABIC_QATAR', 0x10), ('SUBLANG_AZERI_LATIN', 0x01), ('SUBLANG_AZERI_CYRILLIC', 0x02), ('SUBLANG_CHINESE_TRADITIONAL', 0x01), ('SUBLANG_CHINESE_SIMPLIFIED', 0x02), ('SUBLANG_CHINESE_HONGKONG', 0x03), ('SUBLANG_CHINESE_SINGAPORE', 0x04), ('SUBLANG_CHINESE_MACAU', 0x05), ('SUBLANG_DUTCH', 0x01), ('SUBLANG_DUTCH_BELGIAN', 0x02), ('SUBLANG_ENGLISH_US', 0x01), ('SUBLANG_ENGLISH_UK', 0x02), ('SUBLANG_ENGLISH_AUS', 0x03), ('SUBLANG_ENGLISH_CAN', 0x04), ('SUBLANG_ENGLISH_NZ', 0x05), ('SUBLANG_ENGLISH_EIRE', 0x06), ('SUBLANG_ENGLISH_SOUTH_AFRICA', 0x07), ('SUBLANG_ENGLISH_JAMAICA', 0x08), ('SUBLANG_ENGLISH_CARIBBEAN', 0x09), ('SUBLANG_ENGLISH_BELIZE', 0x0a), ('SUBLANG_ENGLISH_TRINIDAD', 0x0b), ('SUBLANG_ENGLISH_ZIMBABWE', 0x0c), ('SUBLANG_ENGLISH_PHILIPPINES', 0x0d), ('SUBLANG_FRENCH', 0x01), ('SUBLANG_FRENCH_BELGIAN', 0x02), ('SUBLANG_FRENCH_CANADIAN', 0x03), ('SUBLANG_FRENCH_SWISS', 0x04), ('SUBLANG_FRENCH_LUXEMBOURG', 0x05), ('SUBLANG_FRENCH_MONACO', 0x06), ('SUBLANG_GERMAN', 0x01), ('SUBLANG_GERMAN_SWISS', 0x02), ('SUBLANG_GERMAN_AUSTRIAN', 0x03), ('SUBLANG_GERMAN_LUXEMBOURG', 0x04), ('SUBLANG_GERMAN_LIECHTENSTEIN', 0x05), ('SUBLANG_ITALIAN', 0x01), ('SUBLANG_ITALIAN_SWISS', 0x02), ('SUBLANG_KASHMIRI_SASIA', 0x02), ('SUBLANG_KASHMIRI_INDIA', 0x02), ('SUBLANG_KOREAN', 0x01), ('SUBLANG_LITHUANIAN', 0x01), ('SUBLANG_MALAY_MALAYSIA', 0x01), ('SUBLANG_MALAY_BRUNEI_DARUSSALAM', 0x02), ('SUBLANG_NEPALI_INDIA', 0x02), ('SUBLANG_NORWEGIAN_BOKMAL', 0x01), ('SUBLANG_NORWEGIAN_NYNORSK', 0x02), ('SUBLANG_PORTUGUESE', 0x02), ('SUBLANG_PORTUGUESE_BRAZILIAN', 0x01), ('SUBLANG_SERBIAN_LATIN', 0x02), ('SUBLANG_SERBIAN_CYRILLIC', 0x03), ('SUBLANG_SPANISH', 0x01), ('SUBLANG_SPANISH_MEXICAN', 0x02), ('SUBLANG_SPANISH_MODERN', 0x03), ('SUBLANG_SPANISH_GUATEMALA', 0x04), ('SUBLANG_SPANISH_COSTA_RICA', 0x05), ('SUBLANG_SPANISH_PANAMA', 0x06), ('SUBLANG_SPANISH_DOMINICAN_REPUBLIC', 0x07), ('SUBLANG_SPANISH_VENEZUELA', 0x08), ('SUBLANG_SPANISH_COLOMBIA', 0x09), ('SUBLANG_SPANISH_PERU', 0x0a), ('SUBLANG_SPANISH_ARGENTINA', 0x0b), ('SUBLANG_SPANISH_ECUADOR', 0x0c), ('SUBLANG_SPANISH_CHILE', 0x0d), ('SUBLANG_SPANISH_URUGUAY', 0x0e), ('SUBLANG_SPANISH_PARAGUAY', 0x0f), ('SUBLANG_SPANISH_BOLIVIA', 0x10), ('SUBLANG_SPANISH_EL_SALVADOR', 0x11), ('SUBLANG_SPANISH_HONDURAS', 0x12), ('SUBLANG_SPANISH_NICARAGUA', 0x13), ('SUBLANG_SPANISH_PUERTO_RICO', 0x14), ('SUBLANG_SWEDISH', 0x01), ('SUBLANG_SWEDISH_FINLAND', 0x02), ('SUBLANG_URDU_PAKISTAN', 0x01), ('SUBLANG_URDU_INDIA', 0x02), ('SUBLANG_UZBEK_LATIN', 0x01), ('SUBLANG_UZBEK_CYRILLIC', 0x02), ('SUBLANG_DUTCH_SURINAM', 0x03), ('SUBLANG_ROMANIAN', 0x01), ('SUBLANG_ROMANIAN_MOLDAVIA', 0x02), ('SUBLANG_RUSSIAN', 0x01), ('SUBLANG_RUSSIAN_MOLDAVIA', 0x02), ('SUBLANG_CROATIAN', 0x01), ('SUBLANG_LITHUANIAN_CLASSIC', 0x02), ('SUBLANG_GAELIC', 0x01), ('SUBLANG_GAELIC_SCOTTISH', 0x02), ('SUBLANG_GAELIC_MANX', 0x03) ] SUBLANG = dict(sublang+[(e[1], e[0]) for e in sublang]) # Initialize the dictionary with all the name->value pairs SUBLANG = dict( sublang ) # Now add all the value->name information, handling duplicates appropriately for sublang_name, sublang_value in sublang: if SUBLANG.has_key( sublang_value ): SUBLANG[ sublang_value ].append( sublang_name ) else: SUBLANG[ sublang_value ] = [ sublang_name ] # Resolve a sublang name given the main lang name # def get_sublang_name_for_lang( lang_value, sublang_value ): lang_name = LANG.get(lang_value, '*unknown*') for sublang_name in SUBLANG.get(sublang_value, list()): # if the main language is a substring of sublang's name, then # return that if lang_name in sublang_name: return sublang_name # otherwise return the first sublang name return SUBLANG.get(sublang_value, ['*unknown*'])[0] # Ange Albertini's code to process resources' strings # def parse_strings(data, counter, l): i = 0 error_count = 0 while i < len(data): data_slice = data[i:i + 2] if len(data_slice) < 2: break len_ = struct.unpack("= 3: break i += len_ * 2 counter += 1 def retrieve_flags(flag_dict, flag_filter): """Read the flags from a dictionary and return them in a usable form. Will return a list of (flag, value) for all flags in "flag_dict" matching the filter "flag_filter". """ return [(f[0], f[1]) for f in flag_dict.items() if isinstance(f[0], str) and f[0].startswith(flag_filter)] def set_flags(obj, flag_field, flags): """Will process the flags and set attributes in the object accordingly. The object "obj" will gain attributes named after the flags provided in "flags" and valued True/False, matching the results of applying each flag value from "flags" to flag_field. """ for flag in flags: if flag[1] & flag_field: #setattr(obj, flag[0], True) obj.__dict__[flag[0]] = True else: #setattr(obj, flag[0], False) obj.__dict__[flag[0]] = False def power_of_two(val): return val != 0 and (val & (val-1)) == 0 FILE_ALIGNEMNT_HARDCODED_VALUE = 0x200 FileAlignment_Warning = False # We only want to print the warning once SectionAlignment_Warning = False # We only want to print the warning once class UnicodeStringWrapperPostProcessor: """This class attempts to help the process of identifying strings that might be plain Unicode or Pascal. A list of strings will be wrapped on it with the hope the overlappings will help make the decision about their type.""" def __init__(self, pe, rva_ptr): self.pe = pe self.rva_ptr = rva_ptr self.string = None def get_rva(self): """Get the RVA of the string.""" return self.rva_ptr def __str__(self): """Return the escaped ASCII representation of the string.""" def convert_char(char): if char in string.printable: return char else: return r'\x%02x' % ord(char) if self.string: return ''.join([convert_char(c) for c in self.string]) return '' def invalidate(self): """Make this instance None, to express it's no known string type.""" self = None def render_pascal_16(self): self.string = self.pe.get_string_u_at_rva( self.rva_ptr+2, max_length=self.get_pascal_16_length()) def ask_pascal_16(self, next_rva_ptr): """The next RVA is taken to be the one immediately following this one. Such RVA could indicate the natural end of the string and will be checked with the possible length contained in the first word. """ length = self.get_pascal_16_length() if length == (next_rva_ptr - (self.rva_ptr+2)) / 2: self.length = length return True return False def get_pascal_16_length(self): return self.__get_word_value_at_rva(self.rva_ptr) def __get_word_value_at_rva(self, rva): try: data = self.pe.get_data(self.rva_ptr, 2) except PEFormatError, e: return False if len(data)<2: return False return struct.unpack(' self.__format_length__: data = data[:self.__format_length__] # OC Patch: # Some malware have incorrect header lengths. # Fail gracefully if this occurs # Buggy malware: a29b0118af8b7408444df81701ad5a7f # elif len(data) < self.__format_length__: raise PEFormatError('Data length less than expected header length.') if data.count(chr(0)) == len(data): self.__all_zeroes__ = True self.__unpacked_data_elms__ = struct.unpack(self.__format__, data) for i in xrange(len(self.__unpacked_data_elms__)): for key in self.__keys__[i]: #self.values[key] = self.__unpacked_data_elms__[i] setattr(self, key, self.__unpacked_data_elms__[i]) def __pack__(self): new_values = [] for i in xrange(len(self.__unpacked_data_elms__)): for key in self.__keys__[i]: new_val = getattr(self, key) old_val = self.__unpacked_data_elms__[i] # In the case of Unions, when the first changed value # is picked the loop is exited if new_val != old_val: break new_values.append(new_val) return struct.pack(self.__format__, *new_values) def __str__(self): return '\n'.join( self.dump() ) def __repr__(self): return '' % (' '.join( [' '.join(s.split()) for s in self.dump()] )) def dump(self, indentation=0): """Returns a string representation of the structure.""" dump = [] dump.append('[%s]' % self.name) # Refer to the __set_format__ method for an explanation # of the following construct. for keys in self.__keys__: for key in keys: val = getattr(self, key) if isinstance(val, int) or isinstance(val, long): val_str = '0x%-8X' % (val) if key == 'TimeDateStamp' or key == 'dwTimeStamp': try: val_str += ' [%s UTC]' % time.asctime(time.gmtime(val)) except exceptions.ValueError, e: val_str += ' [INVALID TIME]' else: val_str = ''.join(filter(lambda c:c != '\0', str(val))) dump.append('0x%-8X 0x%-3X %-30s %s' % ( self.__field_offsets__[key] + self.__file_offset__, self.__field_offsets__[key], key+':', val_str)) return dump def dump_dict(self): """Returns a dictionary representation of the structure.""" dump_dict = dict() dump_dict['Structure'] = self.name # Refer to the __set_format__ method for an explanation # of the following construct. for keys in self.__keys__: for key in keys: val = getattr(self, key) if isinstance(val, int) or isinstance(val, long): if key == 'TimeDateStamp' or key == 'dwTimeStamp': try: val = '0x%-8X [%s UTC]' % (val, time.asctime(time.gmtime(val))) except exceptions.ValueError, e: val = '0x%-8X [INVALID TIME]' % val else: val = ''.join(filter(lambda c:c != '\0', str(val))) dump_dict[key] = {'FileOffset': self.__field_offsets__[key] + self.__file_offset__, 'Offset': self.__field_offsets__[key], 'Value': val} return dump_dict class SectionStructure(Structure): """Convenience section handling class.""" def __init__(self, *argl, **argd): if 'pe' in argd: self.pe = argd['pe'] del argd['pe'] Structure.__init__(self, *argl, **argd) def get_data(self, start=None, length=None): """Get data chunk from a section. Allows to query data from the section by passing the addresses where the PE file would be loaded by default. It is then possible to retrieve code and data by its real addresses as it would be if loaded. """ PointerToRawData_adj = self.pe.adjust_FileAlignment( self.PointerToRawData, self.pe.OPTIONAL_HEADER.FileAlignment ) VirtualAddress_adj = self.pe.adjust_SectionAlignment( self.VirtualAddress, self.pe.OPTIONAL_HEADER.SectionAlignment, self.pe.OPTIONAL_HEADER.FileAlignment ) if start is None: offset = PointerToRawData_adj else: offset = ( start - VirtualAddress_adj ) + PointerToRawData_adj if length is not None: end = offset + length else: end = offset + self.SizeOfRawData # PointerToRawData is not adjusted here as we might want to read any possible extra bytes # that might get cut off by aligning the start (and hence cutting something off the end) # if end > self.PointerToRawData + self.SizeOfRawData: end = self.PointerToRawData + self.SizeOfRawData return self.pe.__data__[offset:end] def __setattr__(self, name, val): if name == 'Characteristics': section_flags = retrieve_flags(SECTION_CHARACTERISTICS, 'IMAGE_SCN_') # Set the section's flags according the the Characteristics member set_flags(self, val, section_flags) elif 'IMAGE_SCN_' in name and hasattr(self, name): if val: self.__dict__['Characteristics'] |= SECTION_CHARACTERISTICS[name] else: self.__dict__['Characteristics'] ^= SECTION_CHARACTERISTICS[name] self.__dict__[name] = val def get_rva_from_offset(self, offset): return offset - self.pe.adjust_FileAlignment( self.PointerToRawData, self.pe.OPTIONAL_HEADER.FileAlignment ) + self.pe.adjust_SectionAlignment( self.VirtualAddress, self.pe.OPTIONAL_HEADER.SectionAlignment, self.pe.OPTIONAL_HEADER.FileAlignment ) def get_offset_from_rva(self, rva): return (rva - self.pe.adjust_SectionAlignment( self.VirtualAddress, self.pe.OPTIONAL_HEADER.SectionAlignment, self.pe.OPTIONAL_HEADER.FileAlignment ) ) + self.pe.adjust_FileAlignment( self.PointerToRawData, self.pe.OPTIONAL_HEADER.FileAlignment ) def contains_offset(self, offset): """Check whether the section contains the file offset provided.""" if self.PointerToRawData is None: # bss and other sections containing only uninitialized data must have 0 # and do not take space in the file return False return ( self.pe.adjust_FileAlignment( self.PointerToRawData, self.pe.OPTIONAL_HEADER.FileAlignment ) <= offset < self.pe.adjust_FileAlignment( self.PointerToRawData, self.pe.OPTIONAL_HEADER.FileAlignment ) + self.SizeOfRawData ) def contains_rva(self, rva): """Check whether the section contains the address provided.""" # Check if the SizeOfRawData is realistic. If it's bigger than the size of # the whole PE file minus the start address of the section it could be # either truncated or the SizeOfRawData contain a misleading value. # In either of those cases we take the VirtualSize # if len(self.pe.__data__) - self.pe.adjust_FileAlignment( self.PointerToRawData, self.pe.OPTIONAL_HEADER.FileAlignment ) < self.SizeOfRawData: # PECOFF documentation v8 says: # VirtualSize: The total size of the section when loaded into memory. # If this value is greater than SizeOfRawData, the section is zero-padded. # This field is valid only for executable images and should be set to zero # for object files. # size = self.Misc_VirtualSize else: size = max(self.SizeOfRawData, self.Misc_VirtualSize) VirtualAddress_adj = self.pe.adjust_SectionAlignment( self.VirtualAddress, self.pe.OPTIONAL_HEADER.SectionAlignment, self.pe.OPTIONAL_HEADER.FileAlignment ) # Check whether there's any section after the current one that starts before the # calculated end for the current one, if so, cut the current section's size # to fit in the range up to where the next section starts. if (self.next_section_virtual_address is not None and self.next_section_virtual_address > self.VirtualAddress and VirtualAddress_adj + size > self.next_section_virtual_address): size = self.next_section_virtual_address - VirtualAddress_adj return VirtualAddress_adj <= rva < VirtualAddress_adj + size def contains(self, rva): #print "DEPRECATION WARNING: you should use contains_rva() instead of contains()" return self.contains_rva(rva) def get_entropy(self): """Calculate and return the entropy for the section.""" return self.entropy_H( self.get_data() ) def get_hash_sha1(self): """Get the SHA-1 hex-digest of the section's data.""" if sha1 is not None: return sha1( self.get_data() ).hexdigest() def get_hash_sha256(self): """Get the SHA-256 hex-digest of the section's data.""" if sha256 is not None: return sha256( self.get_data() ).hexdigest() def get_hash_sha512(self): """Get the SHA-512 hex-digest of the section's data.""" if sha512 is not None: return sha512( self.get_data() ).hexdigest() def get_hash_md5(self): """Get the MD5 hex-digest of the section's data.""" if md5 is not None: return md5( self.get_data() ).hexdigest() def entropy_H(self, data): """Calculate the entropy of a chunk of data.""" if len(data) == 0: return 0.0 occurences = array.array('L', [0]*256) for x in data: occurences[ord(x)] += 1 entropy = 0 for x in occurences: if x: p_x = float(x) / len(data) entropy -= p_x*math.log(p_x, 2) return entropy class DataContainer(object): """Generic data container.""" def __init__(self, **args): bare_setattr = super(DataContainer, self).__setattr__ for key, value in args.items(): bare_setattr(key, value) class ImportDescData(DataContainer): """Holds import descriptor information. dll: name of the imported DLL imports: list of imported symbols (ImportData instances) struct: IMAGE_IMPORT_DESCRIPTOR structure """ class ImportData(DataContainer): """Holds imported symbol's information. ordinal: Ordinal of the symbol name: Name of the symbol bound: If the symbol is bound, this contains the address. """ def __setattr__(self, name, val): # If the instance doesn't yet have an ordinal attribute # it's not fully initialized so can't do any of the # following # if hasattr(self, 'ordinal') and hasattr(self, 'bound') and hasattr(self, 'name'): if name == 'ordinal': if self.pe.PE_TYPE == OPTIONAL_HEADER_MAGIC_PE: ordinal_flag = IMAGE_ORDINAL_FLAG elif self.pe.PE_TYPE == OPTIONAL_HEADER_MAGIC_PE_PLUS: ordinal_flag = IMAGE_ORDINAL_FLAG64 # Set the ordinal and flag the entry as importing by ordinal self.struct_table.Ordinal = ordinal_flag | (val & 0xffff) self.struct_table.AddressOfData = self.struct_table.Ordinal self.struct_table.Function = self.struct_table.Ordinal self.struct_table.ForwarderString = self.struct_table.Ordinal elif name == 'bound': if self.struct_iat is not None: self.struct_iat.AddressOfData = val self.struct_iat.AddressOfData = self.struct_iat.AddressOfData self.struct_iat.Function = self.struct_iat.AddressOfData self.struct_iat.ForwarderString = self.struct_iat.AddressOfData elif name == 'address': self.struct_table.AddressOfData = val self.struct_table.Ordinal = self.struct_table.AddressOfData self.struct_table.Function = self.struct_table.AddressOfData self.struct_table.ForwarderString = self.struct_table.AddressOfData elif name == 'name': # Make sure we reset the entry in case the import had been set to import by ordinal if self.name_offset: name_rva = self.pe.get_rva_from_offset( self.name_offset ) self.pe.set_dword_at_offset( self.ordinal_offset, (0<<31) | name_rva ) # Complain if the length of the new name is longer than the existing one if len(val) > len(self.name): #raise Exception('The export name provided is longer than the existing one.') pass self.pe.set_bytes_at_offset( self.name_offset, val ) self.__dict__[name] = val class ExportDirData(DataContainer): """Holds export directory information. struct: IMAGE_EXPORT_DIRECTORY structure symbols: list of exported symbols (ExportData instances) """ class ExportData(DataContainer): """Holds exported symbols' information. ordinal: ordinal of the symbol address: address of the symbol name: name of the symbol (None if the symbol is exported by ordinal only) forwarder: if the symbol is forwarded it will contain the name of the target symbol, None otherwise. """ def __setattr__(self, name, val): # If the instance doesn't yet have an ordinal attribute # it's not fully initialized so can't do any of the # following # if hasattr(self, 'ordinal') and hasattr(self, 'address') and hasattr(self, 'forwarder') and hasattr(self, 'name'): if name == 'ordinal': self.pe.set_word_at_offset( self.ordinal_offset, val ) elif name == 'address': self.pe.set_dword_at_offset( self.address_offset, val ) elif name == 'name': # Complain if the length of the new name is longer than the existing one if len(val) > len(self.name): #raise Exception('The export name provided is longer than the existing one.') pass self.pe.set_bytes_at_offset( self.name_offset, val ) elif name == 'forwarder': # Complain if the length of the new name is longer than the existing one if len(val) > len(self.forwarder): #raise Exception('The forwarder name provided is longer than the existing one.') pass self.pe.set_bytes_at_offset( self.forwarder_offset, val ) self.__dict__[name] = val class ResourceDirData(DataContainer): """Holds resource directory information. struct: IMAGE_RESOURCE_DIRECTORY structure entries: list of entries (ResourceDirEntryData instances) """ class ResourceDirEntryData(DataContainer): """Holds resource directory entry data. struct: IMAGE_RESOURCE_DIRECTORY_ENTRY structure name: If the resource is identified by name this attribute will contain the name string. None otherwise. If identified by id, the id is available at 'struct.Id' id: the id, also in struct.Id directory: If this entry has a lower level directory this attribute will point to the ResourceDirData instance representing it. data: If this entry has no further lower directories and points to the actual resource data, this attribute will reference the corresponding ResourceDataEntryData instance. (Either of the 'directory' or 'data' attribute will exist, but not both.) """ class ResourceDataEntryData(DataContainer): """Holds resource data entry information. struct: IMAGE_RESOURCE_DATA_ENTRY structure lang: Primary language ID sublang: Sublanguage ID """ class DebugData(DataContainer): """Holds debug information. struct: IMAGE_DEBUG_DIRECTORY structure """ class BaseRelocationData(DataContainer): """Holds base relocation information. struct: IMAGE_BASE_RELOCATION structure entries: list of relocation data (RelocationData instances) """ class RelocationData(DataContainer): """Holds relocation information. type: Type of relocation The type string is can be obtained by RELOCATION_TYPE[type] rva: RVA of the relocation """ def __setattr__(self, name, val): # If the instance doesn't yet have a struct attribute # it's not fully initialized so can't do any of the # following # if hasattr(self, 'struct'): # Get the word containing the type and data # word = self.struct.Data if name == 'type': word = (val << 12) | (word & 0xfff) elif name == 'rva': offset = val-self.base_rva if offset < 0: offset = 0 word = ( word & 0xf000) | ( offset & 0xfff) # Store the modified data # self.struct.Data = word self.__dict__[name] = val class TlsData(DataContainer): """Holds TLS information. struct: IMAGE_TLS_DIRECTORY structure """ class BoundImportDescData(DataContainer): """Holds bound import descriptor data. This directory entry will provide with information on the DLLs this PE files has been bound to (if bound at all). The structure will contain the name and timestamp of the DLL at the time of binding so that the loader can know whether it differs from the one currently present in the system and must, therefore, re-bind the PE's imports. struct: IMAGE_BOUND_IMPORT_DESCRIPTOR structure name: DLL name entries: list of entries (BoundImportRefData instances) the entries will exist if this DLL has forwarded symbols. If so, the destination DLL will have an entry in this list. """ class LoadConfigData(DataContainer): """Holds Load Config data. struct: IMAGE_LOAD_CONFIG_DIRECTORY structure name: dll name """ class BoundImportRefData(DataContainer): """Holds bound import forwarder reference data. Contains the same information as the bound descriptor but for forwarded DLLs, if any. struct: IMAGE_BOUND_FORWARDER_REF structure name: dll name """ # Valid FAT32 8.3 short filename characters according to: # http://en.wikipedia.org/wiki/8.3_filename # This will help decide whether DLL ASCII names are likely # to be valid or otherwise corrupt data # # The filename length is not checked because the DLLs filename # can be longer that the 8.3 allowed_filename = string.lowercase + string.uppercase + string.digits + "!#$%&'()-@^_`{}~+,.;=[]" + ''.join( [chr(i) for i in range(128, 256)] ) def is_valid_dos_filename(s): if s is None or not isinstance(s, str): return False for c in s: if c not in allowed_filename: return False return True # Check if a imported name uses the valid accepted characters expected in mangled # function names. If the symbol's characters don't fall within this charset # we will assume the name is invalid # allowed_function_name = string.lowercase + string.uppercase + string.digits + '_?@$()' def is_valid_function_name(s): if s is None or not isinstance(s, str): return False for c in s: if c not in allowed_function_name: return False return True class PE: """A Portable Executable representation. This class provides access to most of the information in a PE file. It expects to be supplied the name of the file to load or PE data to process and an optional argument 'fast_load' (False by default) which controls whether to load all the directories information, which can be quite time consuming. pe = pefile.PE('module.dll') pe = pefile.PE(name='module.dll') would load 'module.dll' and process it. If the data would be already available in a buffer the same could be achieved with: pe = pefile.PE(data=module_dll_data) The "fast_load" can be set to a default by setting its value in the module itself by means, for instance, of a "pefile.fast_load = True". That will make all the subsequent instances not to load the whole PE structure. The "full_load" method can be used to parse the missing data at a later stage. Basic headers information will be available in the attributes: DOS_HEADER NT_HEADERS FILE_HEADER OPTIONAL_HEADER All of them will contain among their attributes the members of the corresponding structures as defined in WINNT.H The raw data corresponding to the header (from the beginning of the file up to the start of the first section) will be available in the instance's attribute 'header' as a string. The sections will be available as a list in the 'sections' attribute. Each entry will contain as attributes all the structure's members. Directory entries will be available as attributes (if they exist): (no other entries are processed at this point) DIRECTORY_ENTRY_IMPORT (list of ImportDescData instances) DIRECTORY_ENTRY_EXPORT (ExportDirData instance) DIRECTORY_ENTRY_RESOURCE (ResourceDirData instance) DIRECTORY_ENTRY_DEBUG (list of DebugData instances) DIRECTORY_ENTRY_BASERELOC (list of BaseRelocationData instances) DIRECTORY_ENTRY_TLS DIRECTORY_ENTRY_BOUND_IMPORT (list of BoundImportData instances) The following dictionary attributes provide ways of mapping different constants. They will accept the numeric value and return the string representation and the opposite, feed in the string and get the numeric constant: DIRECTORY_ENTRY IMAGE_CHARACTERISTICS SECTION_CHARACTERISTICS DEBUG_TYPE SUBSYSTEM_TYPE MACHINE_TYPE RELOCATION_TYPE RESOURCE_TYPE LANG SUBLANG """ # # Format specifications for PE structures. # __IMAGE_DOS_HEADER_format__ = ('IMAGE_DOS_HEADER', ('H,e_magic', 'H,e_cblp', 'H,e_cp', 'H,e_crlc', 'H,e_cparhdr', 'H,e_minalloc', 'H,e_maxalloc', 'H,e_ss', 'H,e_sp', 'H,e_csum', 'H,e_ip', 'H,e_cs', 'H,e_lfarlc', 'H,e_ovno', '8s,e_res', 'H,e_oemid', 'H,e_oeminfo', '20s,e_res2', 'I,e_lfanew')) __IMAGE_FILE_HEADER_format__ = ('IMAGE_FILE_HEADER', ('H,Machine', 'H,NumberOfSections', 'I,TimeDateStamp', 'I,PointerToSymbolTable', 'I,NumberOfSymbols', 'H,SizeOfOptionalHeader', 'H,Characteristics')) __IMAGE_DATA_DIRECTORY_format__ = ('IMAGE_DATA_DIRECTORY', ('I,VirtualAddress', 'I,Size')) __IMAGE_OPTIONAL_HEADER_format__ = ('IMAGE_OPTIONAL_HEADER', ('H,Magic', 'B,MajorLinkerVersion', 'B,MinorLinkerVersion', 'I,SizeOfCode', 'I,SizeOfInitializedData', 'I,SizeOfUninitializedData', 'I,AddressOfEntryPoint', 'I,BaseOfCode', 'I,BaseOfData', 'I,ImageBase', 'I,SectionAlignment', 'I,FileAlignment', 'H,MajorOperatingSystemVersion', 'H,MinorOperatingSystemVersion', 'H,MajorImageVersion', 'H,MinorImageVersion', 'H,MajorSubsystemVersion', 'H,MinorSubsystemVersion', 'I,Reserved1', 'I,SizeOfImage', 'I,SizeOfHeaders', 'I,CheckSum', 'H,Subsystem', 'H,DllCharacteristics', 'I,SizeOfStackReserve', 'I,SizeOfStackCommit', 'I,SizeOfHeapReserve', 'I,SizeOfHeapCommit', 'I,LoaderFlags', 'I,NumberOfRvaAndSizes' )) __IMAGE_OPTIONAL_HEADER64_format__ = ('IMAGE_OPTIONAL_HEADER64', ('H,Magic', 'B,MajorLinkerVersion', 'B,MinorLinkerVersion', 'I,SizeOfCode', 'I,SizeOfInitializedData', 'I,SizeOfUninitializedData', 'I,AddressOfEntryPoint', 'I,BaseOfCode', 'Q,ImageBase', 'I,SectionAlignment', 'I,FileAlignment', 'H,MajorOperatingSystemVersion', 'H,MinorOperatingSystemVersion', 'H,MajorImageVersion', 'H,MinorImageVersion', 'H,MajorSubsystemVersion', 'H,MinorSubsystemVersion', 'I,Reserved1', 'I,SizeOfImage', 'I,SizeOfHeaders', 'I,CheckSum', 'H,Subsystem', 'H,DllCharacteristics', 'Q,SizeOfStackReserve', 'Q,SizeOfStackCommit', 'Q,SizeOfHeapReserve', 'Q,SizeOfHeapCommit', 'I,LoaderFlags', 'I,NumberOfRvaAndSizes' )) __IMAGE_NT_HEADERS_format__ = ('IMAGE_NT_HEADERS', ('I,Signature',)) __IMAGE_SECTION_HEADER_format__ = ('IMAGE_SECTION_HEADER', ('8s,Name', 'I,Misc,Misc_PhysicalAddress,Misc_VirtualSize', 'I,VirtualAddress', 'I,SizeOfRawData', 'I,PointerToRawData', 'I,PointerToRelocations', 'I,PointerToLinenumbers', 'H,NumberOfRelocations', 'H,NumberOfLinenumbers', 'I,Characteristics')) __IMAGE_DELAY_IMPORT_DESCRIPTOR_format__ = ('IMAGE_DELAY_IMPORT_DESCRIPTOR', ('I,grAttrs', 'I,szName', 'I,phmod', 'I,pIAT', 'I,pINT', 'I,pBoundIAT', 'I,pUnloadIAT', 'I,dwTimeStamp')) __IMAGE_IMPORT_DESCRIPTOR_format__ = ('IMAGE_IMPORT_DESCRIPTOR', ('I,OriginalFirstThunk,Characteristics', 'I,TimeDateStamp', 'I,ForwarderChain', 'I,Name', 'I,FirstThunk')) __IMAGE_EXPORT_DIRECTORY_format__ = ('IMAGE_EXPORT_DIRECTORY', ('I,Characteristics', 'I,TimeDateStamp', 'H,MajorVersion', 'H,MinorVersion', 'I,Name', 'I,Base', 'I,NumberOfFunctions', 'I,NumberOfNames', 'I,AddressOfFunctions', 'I,AddressOfNames', 'I,AddressOfNameOrdinals')) __IMAGE_RESOURCE_DIRECTORY_format__ = ('IMAGE_RESOURCE_DIRECTORY', ('I,Characteristics', 'I,TimeDateStamp', 'H,MajorVersion', 'H,MinorVersion', 'H,NumberOfNamedEntries', 'H,NumberOfIdEntries')) __IMAGE_RESOURCE_DIRECTORY_ENTRY_format__ = ('IMAGE_RESOURCE_DIRECTORY_ENTRY', ('I,Name', 'I,OffsetToData')) __IMAGE_RESOURCE_DATA_ENTRY_format__ = ('IMAGE_RESOURCE_DATA_ENTRY', ('I,OffsetToData', 'I,Size', 'I,CodePage', 'I,Reserved')) __VS_VERSIONINFO_format__ = ( 'VS_VERSIONINFO', ('H,Length', 'H,ValueLength', 'H,Type' )) __VS_FIXEDFILEINFO_format__ = ( 'VS_FIXEDFILEINFO', ('I,Signature', 'I,StrucVersion', 'I,FileVersionMS', 'I,FileVersionLS', 'I,ProductVersionMS', 'I,ProductVersionLS', 'I,FileFlagsMask', 'I,FileFlags', 'I,FileOS', 'I,FileType', 'I,FileSubtype', 'I,FileDateMS', 'I,FileDateLS')) __StringFileInfo_format__ = ( 'StringFileInfo', ('H,Length', 'H,ValueLength', 'H,Type' )) __StringTable_format__ = ( 'StringTable', ('H,Length', 'H,ValueLength', 'H,Type' )) __String_format__ = ( 'String', ('H,Length', 'H,ValueLength', 'H,Type' )) __Var_format__ = ( 'Var', ('H,Length', 'H,ValueLength', 'H,Type' )) __IMAGE_THUNK_DATA_format__ = ('IMAGE_THUNK_DATA', ('I,ForwarderString,Function,Ordinal,AddressOfData',)) __IMAGE_THUNK_DATA64_format__ = ('IMAGE_THUNK_DATA', ('Q,ForwarderString,Function,Ordinal,AddressOfData',)) __IMAGE_DEBUG_DIRECTORY_format__ = ('IMAGE_DEBUG_DIRECTORY', ('I,Characteristics', 'I,TimeDateStamp', 'H,MajorVersion', 'H,MinorVersion', 'I,Type', 'I,SizeOfData', 'I,AddressOfRawData', 'I,PointerToRawData')) __IMAGE_BASE_RELOCATION_format__ = ('IMAGE_BASE_RELOCATION', ('I,VirtualAddress', 'I,SizeOfBlock') ) __IMAGE_BASE_RELOCATION_ENTRY_format__ = ('IMAGE_BASE_RELOCATION_ENTRY', ('H,Data',) ) __IMAGE_TLS_DIRECTORY_format__ = ('IMAGE_TLS_DIRECTORY', ('I,StartAddressOfRawData', 'I,EndAddressOfRawData', 'I,AddressOfIndex', 'I,AddressOfCallBacks', 'I,SizeOfZeroFill', 'I,Characteristics' ) ) __IMAGE_TLS_DIRECTORY64_format__ = ('IMAGE_TLS_DIRECTORY', ('Q,StartAddressOfRawData', 'Q,EndAddressOfRawData', 'Q,AddressOfIndex', 'Q,AddressOfCallBacks', 'I,SizeOfZeroFill', 'I,Characteristics' ) ) __IMAGE_LOAD_CONFIG_DIRECTORY_format__ = ('IMAGE_LOAD_CONFIG_DIRECTORY', ('I,Size', 'I,TimeDateStamp', 'H,MajorVersion', 'H,MinorVersion', 'I,GlobalFlagsClear', 'I,GlobalFlagsSet', 'I,CriticalSectionDefaultTimeout', 'I,DeCommitFreeBlockThreshold', 'I,DeCommitTotalFreeThreshold', 'I,LockPrefixTable', 'I,MaximumAllocationSize', 'I,VirtualMemoryThreshold', 'I,ProcessHeapFlags', 'I,ProcessAffinityMask', 'H,CSDVersion', 'H,Reserved1', 'I,EditList', 'I,SecurityCookie', 'I,SEHandlerTable', 'I,SEHandlerCount', 'I,GuardCFCheckFunctionPointer', 'I,Reserved2', 'I,GuardCFFunctionTable', 'I,GuardCFFunctionCount', 'I,GuardFlags' ) ) __IMAGE_LOAD_CONFIG_DIRECTORY64_format__ = ('IMAGE_LOAD_CONFIG_DIRECTORY', ('I,Size', 'I,TimeDateStamp', 'H,MajorVersion', 'H,MinorVersion', 'I,GlobalFlagsClear', 'I,GlobalFlagsSet', 'I,CriticalSectionDefaultTimeout', 'Q,DeCommitFreeBlockThreshold', 'Q,DeCommitTotalFreeThreshold', 'Q,LockPrefixTable', 'Q,MaximumAllocationSize', 'Q,VirtualMemoryThreshold', 'Q,ProcessAffinityMask', 'I,ProcessHeapFlags', 'H,CSDVersion', 'H,Reserved1', 'Q,EditList', 'Q,SecurityCookie', 'Q,SEHandlerTable', 'Q,SEHandlerCount', 'Q,GuardCFCheckFunctionPointer', 'Q,Reserved2', 'Q,GuardCFFunctionTable', 'Q,GuardCFFunctionCount', 'I,GuardFlags' ) ) __IMAGE_BOUND_IMPORT_DESCRIPTOR_format__ = ('IMAGE_BOUND_IMPORT_DESCRIPTOR', ('I,TimeDateStamp', 'H,OffsetModuleName', 'H,NumberOfModuleForwarderRefs')) __IMAGE_BOUND_FORWARDER_REF_format__ = ('IMAGE_BOUND_FORWARDER_REF', ('I,TimeDateStamp', 'H,OffsetModuleName', 'H,Reserved') ) def __init__(self, name=None, data=None, fast_load=None): self.sections = [] self.__warnings = [] self.PE_TYPE = None if not name and not data: return # This list will keep track of all the structures created. # That will allow for an easy iteration through the list # in order to save the modifications made self.__structures__ = [] self.__from_file = None if not fast_load: fast_load = globals()['fast_load'] try: self.__parse__(name, data, fast_load) except: self.close() raise def close(self): if ( self.__from_file is True and hasattr(self, '__data__') and ((isinstance(mmap.mmap, type) and isinstance(self.__data__, mmap.mmap)) or 'mmap.mmap' in repr(type(self.__data__))) ): self.__data__.close() def __unpack_data__(self, format, data, file_offset): """Apply structure format to raw data. Returns and unpacked structure object if successful, None otherwise. """ structure = Structure(format, file_offset=file_offset) try: structure.__unpack__(data) except PEFormatError, err: self.__warnings.append( 'Corrupt header "%s" at file offset %d. Exception: %s' % ( format[0], file_offset, str(err)) ) return None self.__structures__.append(structure) return structure def __parse__(self, fname, data, fast_load): """Parse a Portable Executable file. Loads a PE file, parsing all its structures and making them available through the instance's attributes. """ if fname: stat = os.stat(fname) if stat.st_size == 0: raise PEFormatError('The file is empty') try: fd = file(fname, 'rb') self.fileno = fd.fileno() if hasattr(mmap, 'MAP_PRIVATE'): # Unix self.__data__ = mmap.mmap(self.fileno, 0, mmap.MAP_PRIVATE) else: # Windows self.__data__ = mmap.mmap(self.fileno, 0, access=mmap.ACCESS_READ) self.__from_file = True finally: fd.close() elif data: self.__data__ = data self.__from_file = False dos_header_data = self.__data__[:64] if len(dos_header_data) != 64: raise PEFormatError('Unable to read the DOS Header, possibly a truncated file.') self.DOS_HEADER = self.__unpack_data__( self.__IMAGE_DOS_HEADER_format__, dos_header_data, file_offset=0) if self.DOS_HEADER.e_magic == IMAGE_DOSZM_SIGNATURE: raise PEFormatError('Probably a ZM Executable (not a PE file).') if not self.DOS_HEADER or self.DOS_HEADER.e_magic != IMAGE_DOS_SIGNATURE: raise PEFormatError('DOS Header magic not found.') # OC Patch: # Check for sane value in e_lfanew # if self.DOS_HEADER.e_lfanew > len(self.__data__): raise PEFormatError('Invalid e_lfanew value, probably not a PE file') nt_headers_offset = self.DOS_HEADER.e_lfanew self.NT_HEADERS = self.__unpack_data__( self.__IMAGE_NT_HEADERS_format__, self.__data__[nt_headers_offset:nt_headers_offset+8], file_offset = nt_headers_offset) # We better check the signature right here, before the file screws # around with sections: # OC Patch: # Some malware will cause the Signature value to not exist at all if not self.NT_HEADERS or not self.NT_HEADERS.Signature: raise PEFormatError('NT Headers not found.') if (0xFFFF & self.NT_HEADERS.Signature) == IMAGE_NE_SIGNATURE: raise PEFormatError('Invalid NT Headers signature. Probably a NE file') if (0xFFFF & self.NT_HEADERS.Signature) == IMAGE_LE_SIGNATURE: raise PEFormatError('Invalid NT Headers signature. Probably a LE file') if (0xFFFF & self.NT_HEADERS.Signature) == IMAGE_LX_SIGNATURE: raise PEFormatError('Invalid NT Headers signature. Probably a LX file') if (0xFFFF & self.NT_HEADERS.Signature) == IMAGE_TE_SIGNATURE: raise PEFormatError('Invalid NT Headers signature. Probably a TE file') if self.NT_HEADERS.Signature != IMAGE_NT_SIGNATURE: raise PEFormatError('Invalid NT Headers signature.') self.FILE_HEADER = self.__unpack_data__( self.__IMAGE_FILE_HEADER_format__, self.__data__[nt_headers_offset+4:nt_headers_offset+4+32], file_offset = nt_headers_offset+4) image_flags = retrieve_flags(IMAGE_CHARACTERISTICS, 'IMAGE_FILE_') if not self.FILE_HEADER: raise PEFormatError('File Header missing') # Set the image's flags according the the Characteristics member set_flags(self.FILE_HEADER, self.FILE_HEADER.Characteristics, image_flags) optional_header_offset = \ nt_headers_offset+4+self.FILE_HEADER.sizeof() # Note: location of sections can be controlled from PE header: sections_offset = optional_header_offset + self.FILE_HEADER.SizeOfOptionalHeader self.OPTIONAL_HEADER = self.__unpack_data__( self.__IMAGE_OPTIONAL_HEADER_format__, # Read up to 256 bytes to allow creating a copy of too much data self.__data__[optional_header_offset:optional_header_offset+256], file_offset = optional_header_offset) # According to solardesigner's findings for his # Tiny PE project, the optional header does not # need fields beyond "Subsystem" in order to be # loadable by the Windows loader (given that zeros # are acceptable values and the header is loaded # in a zeroed memory page) # If trying to parse a full Optional Header fails # we try to parse it again with some 0 padding # MINIMUM_VALID_OPTIONAL_HEADER_RAW_SIZE = 69 if ( self.OPTIONAL_HEADER is None and len(self.__data__[optional_header_offset:optional_header_offset+0x200]) >= MINIMUM_VALID_OPTIONAL_HEADER_RAW_SIZE ): # Add enough zeros to make up for the unused fields # padding_length = 128 # Create padding # padded_data = self.__data__[optional_header_offset:optional_header_offset+0x200] + ( '\0' * padding_length) self.OPTIONAL_HEADER = self.__unpack_data__( self.__IMAGE_OPTIONAL_HEADER_format__, padded_data, file_offset = optional_header_offset) # Check the Magic in the OPTIONAL_HEADER and set the PE file # type accordingly # if self.OPTIONAL_HEADER is not None: if self.OPTIONAL_HEADER.Magic == OPTIONAL_HEADER_MAGIC_PE: self.PE_TYPE = OPTIONAL_HEADER_MAGIC_PE elif self.OPTIONAL_HEADER.Magic == OPTIONAL_HEADER_MAGIC_PE_PLUS: self.PE_TYPE = OPTIONAL_HEADER_MAGIC_PE_PLUS self.OPTIONAL_HEADER = self.__unpack_data__( self.__IMAGE_OPTIONAL_HEADER64_format__, self.__data__[optional_header_offset:optional_header_offset+0x200], file_offset = optional_header_offset) # Again, as explained above, we try to parse # a reduced form of the Optional Header which # is still valid despite not including all # structure members # MINIMUM_VALID_OPTIONAL_HEADER_RAW_SIZE = 69+4 if ( self.OPTIONAL_HEADER is None and len(self.__data__[optional_header_offset:optional_header_offset+0x200]) >= MINIMUM_VALID_OPTIONAL_HEADER_RAW_SIZE ): padding_length = 128 padded_data = self.__data__[optional_header_offset:optional_header_offset+0x200] + ( '\0' * padding_length) self.OPTIONAL_HEADER = self.__unpack_data__( self.__IMAGE_OPTIONAL_HEADER64_format__, padded_data, file_offset = optional_header_offset) if not self.FILE_HEADER: raise PEFormatError('File Header missing') # OC Patch: # Die gracefully if there is no OPTIONAL_HEADER field # 975440f5ad5e2e4a92c4d9a5f22f75c1 if self.PE_TYPE is None or self.OPTIONAL_HEADER is None: raise PEFormatError("No Optional Header found, invalid PE32 or PE32+ file") dll_characteristics_flags = retrieve_flags(DLL_CHARACTERISTICS, 'IMAGE_DLLCHARACTERISTICS_') # Set the Dll Characteristics flags according the the DllCharacteristics member set_flags( self.OPTIONAL_HEADER, self.OPTIONAL_HEADER.DllCharacteristics, dll_characteristics_flags) self.OPTIONAL_HEADER.DATA_DIRECTORY = [] #offset = (optional_header_offset + self.FILE_HEADER.SizeOfOptionalHeader) offset = (optional_header_offset + self.OPTIONAL_HEADER.sizeof()) self.NT_HEADERS.FILE_HEADER = self.FILE_HEADER self.NT_HEADERS.OPTIONAL_HEADER = self.OPTIONAL_HEADER # Windows 8 specific check # if self.OPTIONAL_HEADER.AddressOfEntryPoint < self.OPTIONAL_HEADER.SizeOfHeaders: self.__warnings.append( 'SizeOfHeaders is smaller than AddressOfEntryPoint: this file cannot run under Windows 8' ) # The NumberOfRvaAndSizes is sanitized to stay within # reasonable limits so can be casted to an int # if self.OPTIONAL_HEADER.NumberOfRvaAndSizes > 0x10: self.__warnings.append( 'Suspicious NumberOfRvaAndSizes in the Optional Header. ' + 'Normal values are never larger than 0x10, the value is: 0x%x' % self.OPTIONAL_HEADER.NumberOfRvaAndSizes ) MAX_ASSUMED_VALID_NUMBER_OF_RVA_AND_SIZES = 0x100 for i in xrange(int(0x7fffffffL & self.OPTIONAL_HEADER.NumberOfRvaAndSizes)): if len(self.__data__) - offset == 0: break if len(self.__data__) - offset < 8: data = self.__data__[offset:] + '\0'*8 else: data = self.__data__[offset:offset+MAX_ASSUMED_VALID_NUMBER_OF_RVA_AND_SIZES] dir_entry = self.__unpack_data__( self.__IMAGE_DATA_DIRECTORY_format__, data, file_offset = offset) if dir_entry is None: break # Would fail if missing an entry # 1d4937b2fa4d84ad1bce0309857e70ca offending sample try: dir_entry.name = DIRECTORY_ENTRY[i] except (KeyError, AttributeError): break offset += dir_entry.sizeof() self.OPTIONAL_HEADER.DATA_DIRECTORY.append(dir_entry) # If the offset goes outside the optional header, # the loop is broken, regardless of how many directories # NumberOfRvaAndSizes says there are # # We assume a normally sized optional header, hence that we do # a sizeof() instead of reading SizeOfOptionalHeader. # Then we add a default number of directories times their size, # if we go beyond that, we assume the number of directories # is wrong and stop processing if offset >= (optional_header_offset + self.OPTIONAL_HEADER.sizeof() + 8*16) : break offset = self.parse_sections(sections_offset) # OC Patch: # There could be a problem if there are no raw data sections # greater than 0 # fc91013eb72529da005110a3403541b6 example # Should this throw an exception in the minimum header offset # can't be found? # rawDataPointers = [ self.adjust_FileAlignment( s.PointerToRawData, self.OPTIONAL_HEADER.FileAlignment ) for s in self.sections if s.PointerToRawData>0 ] if len(rawDataPointers) > 0: lowest_section_offset = min(rawDataPointers) else: lowest_section_offset = None if not lowest_section_offset or lowest_section_offset < offset: self.header = self.__data__[:offset] else: self.header = self.__data__[:lowest_section_offset] # Check whether the entry point lies within a section # if self.get_section_by_rva(self.OPTIONAL_HEADER.AddressOfEntryPoint) is not None: # Check whether the entry point lies within the file # ep_offset = self.get_offset_from_rva(self.OPTIONAL_HEADER.AddressOfEntryPoint) if ep_offset > len(self.__data__): self.__warnings.append( 'Possibly corrupt file. AddressOfEntryPoint lies outside the file. ' + 'AddressOfEntryPoint: 0x%x' % self.OPTIONAL_HEADER.AddressOfEntryPoint ) else: self.__warnings.append( 'AddressOfEntryPoint lies outside the sections\' boundaries. ' + 'AddressOfEntryPoint: 0x%x' % self.OPTIONAL_HEADER.AddressOfEntryPoint ) if not fast_load: self.parse_data_directories() class RichHeader: pass rich_header = self.parse_rich_header() if rich_header: self.RICH_HEADER = RichHeader() self.RICH_HEADER.checksum = rich_header.get('checksum', None) self.RICH_HEADER.values = rich_header.get('values', None) else: self.RICH_HEADER = None def parse_rich_header(self): """Parses the rich header see http://www.ntcore.com/files/richsign.htm for more information Structure: 00 DanS ^ checksum, checksum, checksum, checksum 10 Symbol RVA ^ checksum, Symbol size ^ checksum... ... XX Rich, checksum, 0, 0,... """ # Rich Header constants # DANS = 0x536E6144 # 'DanS' as dword RICH = 0x68636952 # 'Rich' as dword # Read a block of data try: rich_data = self.get_data(0x80, 0x80) if len(rich_data) != 0x80: return None data = list(struct.unpack("<32I", rich_data)) except PEFormatError: return None # the checksum should be present 3 times after the DanS signature # checksum = data[1] if (data[0] ^ checksum != DANS or data[2] != checksum or data[3] != checksum): return None result = {"checksum": checksum} headervalues = [] result ["values"] = headervalues data = data[4:] for i in xrange(len(data) / 2): # Stop until the Rich footer signature is found # if data[2 * i] == RICH: # it should be followed by the checksum # if data[2 * i + 1] != checksum: self.__warnings.append('Rich Header corrupted') break # header values come by pairs # headervalues += [data[2 * i] ^ checksum, data[2 * i + 1] ^ checksum] return result def get_warnings(self): """Return the list of warnings. Non-critical problems found when parsing the PE file are appended to a list of warnings. This method returns the full list. """ return self.__warnings def show_warnings(self): """Print the list of warnings. Non-critical problems found when parsing the PE file are appended to a list of warnings. This method prints the full list to standard output. """ for warning in self.__warnings: print '>', warning def full_load(self): """Process the data directories. This method will load the data directories which might not have been loaded if the "fast_load" option was used. """ self.parse_data_directories() def write(self, filename=None): """Write the PE file. This function will process all headers and components of the PE file and include all changes made (by just assigning to attributes in the PE objects) and write the changes back to a file whose name is provided as an argument. The filename is optional, if not provided the data will be returned as a 'str' object. """ if is_bytearray_available(): # Making a list of a byte file is incredibly inefficient and will # cause pefile to take far more RAM than it should. Use bytearrays # instead. file_data = bytearray(self.__data__) else: file_data = list(self.__data__) for structure in self.__structures__: if is_bytearray_available(): struct_data = bytearray(structure.__pack__()) else: struct_data = list(structure.__pack__()) offset = structure.get_file_offset() file_data[offset:offset+len(struct_data)] = struct_data if hasattr(self, 'VS_VERSIONINFO'): if hasattr(self, 'FileInfo'): for entry in self.FileInfo: if hasattr(entry, 'StringTable'): for st_entry in entry.StringTable: for key, entry in st_entry.entries.items(): offsets = st_entry.entries_offsets[key] lengths = st_entry.entries_lengths[key] if is_bytearray_available(): if len( entry ) > lengths[1]: l = bytearray() for idx, c in enumerate(entry): if ord(c) > 256: l.extend( [ ord(c) & 0xff, chr( (ord(c) & 0xff00) >> 8) ] ) else: l.extend( [ ord(c), '\0' ] ) file_data[offsets[1]:offsets[1]+lengths[1]*2 ] = l else: l = bytearray() for idx, c in enumerate(entry): if ord(c) > 256: l.extend( [ chr(ord(c) & 0xff), chr( (ord(c) & 0xff00) >>8) ] ) else: l.extend( [ ord(c), '\0'] ) file_data[offsets[1]:offsets[1]+len(entry)*2 ] = l remainder = lengths[1] - len(entry) if remainder: start = offsets[1] + len(entry)*2 end = offsets[1] + lengths[1]*2 file_data[start:end] = ['\0'] * remainder*2 else: if len( entry ) > lengths[1]: l = list() for idx, c in enumerate(entry): if ord(c) > 256: l.extend( [ chr(ord(c) & 0xff), chr( (ord(c) & 0xff00) >>8) ] ) else: l.extend( [chr( ord(c) ), '\0'] ) file_data[ offsets[1] : offsets[1] + lengths[1]*2 ] = l else: l = list() for idx, c in enumerate(entry): if ord(c) > 256: l.extend( [chr(ord(c) & 0xff), chr( (ord(c) & 0xff00) >>8) ] ) else: l.extend( [chr(ord(c)), '\0'] ) file_data[offsets[1]:offsets[1]+len(entry)*2] = l remainder = lengths[1] - len(entry) start = offsets[1] + len(entry)*2 end = offsets[1] + lengths[1]*2 file_data[start:end] = [u'\0'] * remainder*2 if is_bytearray_available(): new_file_data = ''.join( chr(c) for c in file_data ) else: new_file_data = ''.join( [ chr(ord(c)) for c in file_data] ) if filename: f = file(filename, 'wb+') f.write(new_file_data) f.close() else: return new_file_data def parse_sections(self, offset): """Fetch the PE file sections. The sections will be readily available in the "sections" attribute. Its attributes will contain all the section information plus "data" a buffer containing the section's data. The "Characteristics" member will be processed and attributes representing the section characteristics (with the 'IMAGE_SCN_' string trimmed from the constant's names) will be added to the section instance. Refer to the SectionStructure class for additional info. """ self.sections = [] MAX_SIMULTANEOUS_ERRORS = 3 for i in xrange(self.FILE_HEADER.NumberOfSections): simultaneous_errors = 0 section = SectionStructure( self.__IMAGE_SECTION_HEADER_format__, pe=self ) if not section: break section_offset = offset + section.sizeof() * i section.set_file_offset(section_offset) section_data = self.__data__[section_offset : section_offset + section.sizeof()] # Check if the section is all nulls and stop if so. if section_data.count('\0') == section.sizeof(): self.__warnings.append( ('Invalid section %d. ' % i) + 'Contents are null-bytes.') break section.__unpack__(section_data) self.__structures__.append(section) if section.SizeOfRawData+section.PointerToRawData > len(self.__data__): simultaneous_errors += 1 self.__warnings.append( ('Error parsing section %d. ' % i) + 'SizeOfRawData is larger than file.') if self.adjust_FileAlignment( section.PointerToRawData, self.OPTIONAL_HEADER.FileAlignment ) > len(self.__data__): simultaneous_errors += 1 self.__warnings.append( ('Error parsing section %d. ' % i) + 'PointerToRawData points beyond the end of the file.') if section.Misc_VirtualSize > 0x10000000: simultaneous_errors += 1 self.__warnings.append( ('Suspicious value found parsing section %d. ' % i) + 'VirtualSize is extremely large > 256MiB.') if self.adjust_SectionAlignment( section.VirtualAddress, self.OPTIONAL_HEADER.SectionAlignment, self.OPTIONAL_HEADER.FileAlignment ) > 0x10000000: simultaneous_errors += 1 self.__warnings.append( ('Suspicious value found parsing section %d. ' % i) + 'VirtualAddress is beyond 0x10000000.') if ( self.OPTIONAL_HEADER.FileAlignment != 0 and ( section.PointerToRawData % self.OPTIONAL_HEADER.FileAlignment) != 0): simultaneous_errors += 1 self.__warnings.append( ('Error parsing section %d. ' % i) + 'PointerToRawData should normally be ' + 'a multiple of FileAlignment, this might imply the file ' + 'is trying to confuse tools which parse this incorrectly.') if simultaneous_errors >= MAX_SIMULTANEOUS_ERRORS: self.__warnings.append('Too many warnings parsing section. Aborting.') break section_flags = retrieve_flags(SECTION_CHARACTERISTICS, 'IMAGE_SCN_') # Set the section's flags according the the Characteristics member set_flags(section, section.Characteristics, section_flags) if ( section.__dict__.get('IMAGE_SCN_MEM_WRITE', False) and section.__dict__.get('IMAGE_SCN_MEM_EXECUTE', False) ): if section.Name == 'PAGE' and self.is_driver(): # Drivers can have a PAGE section with those flags set without # implying that it is malicious pass else: self.__warnings.append( ('Suspicious flags set for section %d. ' % i) + 'Both IMAGE_SCN_MEM_WRITE and IMAGE_SCN_MEM_EXECUTE are set. ' + 'This might indicate a packed executable.') self.sections.append(section) # Sort the sections by their VirtualAddress and add a field to each of them # with the VirtualAddress of the next section. This will allow to check # for potentially overlapping sections in badly constructed PEs. self.sections.sort(cmp=lambda a,b: cmp(a.VirtualAddress, b.VirtualAddress)) for idx, section in enumerate(self.sections): if idx == len(self.sections)-1: section.next_section_virtual_address = None else: section.next_section_virtual_address = self.sections[idx+1].VirtualAddress if self.FILE_HEADER.NumberOfSections > 0 and self.sections: return offset + self.sections[0].sizeof()*self.FILE_HEADER.NumberOfSections else: return offset def parse_data_directories(self, directories=None): """Parse and process the PE file's data directories. If the optional argument 'directories' is given, only the directories at the specified indexes will be parsed. Such functionality allows parsing of areas of interest without the burden of having to parse all others. The directories can then be specified as: For export / import only: directories = [ 0, 1 ] or (more verbosely): directories = [ DIRECTORY_ENTRY['IMAGE_DIRECTORY_ENTRY_IMPORT'], DIRECTORY_ENTRY['IMAGE_DIRECTORY_ENTRY_EXPORT'] ] If 'directories' is a list, the ones that are processed will be removed, leaving only the ones that are not present in the image. """ directory_parsing = ( ('IMAGE_DIRECTORY_ENTRY_IMPORT', self.parse_import_directory), ('IMAGE_DIRECTORY_ENTRY_EXPORT', self.parse_export_directory), ('IMAGE_DIRECTORY_ENTRY_RESOURCE', self.parse_resources_directory), ('IMAGE_DIRECTORY_ENTRY_DEBUG', self.parse_debug_directory), ('IMAGE_DIRECTORY_ENTRY_BASERELOC', self.parse_relocations_directory), ('IMAGE_DIRECTORY_ENTRY_TLS', self.parse_directory_tls), ('IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG', self.parse_directory_load_config), ('IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT', self.parse_delay_import_directory), ('IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT', self.parse_directory_bound_imports) ) if directories is not None: if not isinstance(directories, (tuple, list)): directories = [directories] for entry in directory_parsing: # OC Patch: # try: directory_index = DIRECTORY_ENTRY[entry[0]] dir_entry = self.OPTIONAL_HEADER.DATA_DIRECTORY[directory_index] except IndexError: break # Only process all the directories if no individual ones have # been chosen # if directories is None or directory_index in directories: if dir_entry.VirtualAddress: value = entry[1](dir_entry.VirtualAddress, dir_entry.Size) if value: setattr(self, entry[0][6:], value) if (directories is not None) and isinstance(directories, list) and (entry[0] in directories): directories.remove(directory_index) def parse_directory_bound_imports(self, rva, size): """""" bnd_descr = Structure(self.__IMAGE_BOUND_IMPORT_DESCRIPTOR_format__) bnd_descr_size = bnd_descr.sizeof() start = rva bound_imports = [] while True: bnd_descr = self.__unpack_data__( self.__IMAGE_BOUND_IMPORT_DESCRIPTOR_format__, self.__data__[rva:rva+bnd_descr_size], file_offset = rva) if bnd_descr is None: # If can't parse directory then silently return. # This directory does not necessarily have to be valid to # still have a valid PE file self.__warnings.append( 'The Bound Imports directory exists but can\'t be parsed.') return if bnd_descr.all_zeroes(): break rva += bnd_descr.sizeof() section = self.get_section_by_offset(rva) file_offset = self.get_offset_from_rva(rva) if section is None: safety_boundary = len(self.__data__) - file_offset sections_after_offset = [section.PointerToRawData for section in self.sections if section.PointerToRawData > file_offset] if sections_after_offset: # Find the first section starting at a later offset than that specified by 'rva' first_section_after_offset = min(sections_after_offset) section = self.get_section_by_offset(first_section_after_offset) if section is not None: safety_boundary = section.PointerToRawData - file_offset else: safety_boundary = section.PointerToRawData + len(section.get_data()) - file_offset if not section: self.__warnings.append( 'RVA of IMAGE_BOUND_IMPORT_DESCRIPTOR points to an invalid address: %x' % rva) return forwarder_refs = [] # 8 is the size of __IMAGE_BOUND_IMPORT_DESCRIPTOR_format__ for idx in xrange( min( bnd_descr.NumberOfModuleForwarderRefs, safety_boundary/8) ): # Both structures IMAGE_BOUND_IMPORT_DESCRIPTOR and # IMAGE_BOUND_FORWARDER_REF have the same size. bnd_frwd_ref = self.__unpack_data__( self.__IMAGE_BOUND_FORWARDER_REF_format__, self.__data__[rva:rva+bnd_descr_size], file_offset = rva) # OC Patch: if not bnd_frwd_ref: raise PEFormatError( "IMAGE_BOUND_FORWARDER_REF cannot be read") rva += bnd_frwd_ref.sizeof() offset = start+bnd_frwd_ref.OffsetModuleName name_str = self.get_string_from_data( 0, self.__data__[offset : offset + MAX_STRING_LENGTH]) # OffsetModuleName points to a DLL name. These shouldn't be too long. # Anything longer than a safety length of 128 will be taken to indicate # a corrupt entry and abort the processing of these entries. # Names shorted than 4 characters will be taken as invalid as well. if name_str: invalid_chars = [c for c in name_str if c not in string.printable] if len(name_str) > 256 or len(name_str) < 4 or invalid_chars: break forwarder_refs.append(BoundImportRefData( struct = bnd_frwd_ref, name = name_str)) offset = start+bnd_descr.OffsetModuleName name_str = self.get_string_from_data( 0, self.__data__[offset : offset + MAX_STRING_LENGTH]) if name_str: invalid_chars = [c for c in name_str if c not in string.printable] if len(name_str) > 256 or len(name_str) < 4 or invalid_chars: break if not name_str: break bound_imports.append( BoundImportDescData( struct = bnd_descr, name = name_str, entries = forwarder_refs)) return bound_imports def parse_directory_tls(self, rva, size): """""" if self.PE_TYPE == OPTIONAL_HEADER_MAGIC_PE: format = self.__IMAGE_TLS_DIRECTORY_format__ elif self.PE_TYPE == OPTIONAL_HEADER_MAGIC_PE_PLUS: format = self.__IMAGE_TLS_DIRECTORY64_format__ try: tls_struct = self.__unpack_data__( format, self.get_data( rva, Structure(format).sizeof() ), file_offset = self.get_offset_from_rva(rva)) except PEFormatError: self.__warnings.append( 'Invalid TLS information. Can\'t read ' + 'data at RVA: 0x%x' % rva) tls_struct = None if not tls_struct: return None return TlsData( struct = tls_struct ) def parse_directory_load_config(self, rva, size): """""" if self.PE_TYPE == OPTIONAL_HEADER_MAGIC_PE: format = self.__IMAGE_LOAD_CONFIG_DIRECTORY_format__ elif self.PE_TYPE == OPTIONAL_HEADER_MAGIC_PE_PLUS: format = self.__IMAGE_LOAD_CONFIG_DIRECTORY64_format__ try: load_config_struct = self.__unpack_data__( format, self.get_data( rva, Structure(format).sizeof() ), file_offset = self.get_offset_from_rva(rva)) except PEFormatError: self.__warnings.append( 'Invalid LOAD_CONFIG information. Can\'t read ' + 'data at RVA: 0x%x' % rva) load_config_struct = None if not load_config_struct: return None return LoadConfigData( struct = load_config_struct ) def parse_relocations_directory(self, rva, size): """""" rlc_size = Structure(self.__IMAGE_BASE_RELOCATION_format__).sizeof() end = rva+size relocations = [] while rva < end: # OC Patch: # Malware that has bad RVA entries will cause an error. # Just continue on after an exception # try: rlc = self.__unpack_data__( self.__IMAGE_BASE_RELOCATION_format__, self.get_data(rva, rlc_size), file_offset = self.get_offset_from_rva(rva) ) except PEFormatError: self.__warnings.append( 'Invalid relocation information. Can\'t read ' + 'data at RVA: 0x%x' % rva) rlc = None if not rlc: break # rlc.VirtualAddress must lie within the Image if rlc.VirtualAddress > self.OPTIONAL_HEADER.SizeOfImage: self.__warnings.append( 'Invalid relocation information. VirtualAddress outside' + ' of Image: 0x%x' % rlc.VirtualAddress) break # rlc.SizeOfBlock must be less or equal than the size of the image # (It's a rather loose sanity test) if rlc.SizeOfBlock > self.OPTIONAL_HEADER.SizeOfImage: self.__warnings.append( 'Invalid relocation information. SizeOfBlock too large' + ': %d' % rlc.SizeOfBlock) break reloc_entries = self.parse_relocations( rva+rlc_size, rlc.VirtualAddress, rlc.SizeOfBlock-rlc_size ) relocations.append( BaseRelocationData( struct = rlc, entries = reloc_entries)) if not rlc.SizeOfBlock: break rva += rlc.SizeOfBlock return relocations def parse_relocations(self, data_rva, rva, size): """""" data = self.get_data(data_rva, size) file_offset = self.get_offset_from_rva(data_rva) entries = [] offsets_and_type = [] for idx in xrange( len(data) / 2 ): entry = self.__unpack_data__( self.__IMAGE_BASE_RELOCATION_ENTRY_format__, data[idx*2:(idx+1)*2], file_offset = file_offset ) if not entry: break word = entry.Data reloc_type = (word>>12) reloc_offset = (word & 0x0fff) if (reloc_offset, reloc_type) in offsets_and_type: self.__warnings.append( 'Overlapping offsets in relocation data ' + 'data at RVA: 0x%x' % (reloc_offset+rva)) break if len(offsets_and_type) >= 1000: offsets_and_type.pop() offsets_and_type.insert(0, (reloc_offset, reloc_type)) entries.append( RelocationData( struct = entry, type = reloc_type, base_rva = rva, rva = reloc_offset+rva)) file_offset += entry.sizeof() return entries def parse_debug_directory(self, rva, size): """""" dbg_size = Structure(self.__IMAGE_DEBUG_DIRECTORY_format__).sizeof() debug = [] for idx in xrange(size/dbg_size): try: data = self.get_data(rva+dbg_size*idx, dbg_size) except PEFormatError, e: self.__warnings.append( 'Invalid debug information. Can\'t read ' + 'data at RVA: 0x%x' % rva) return None dbg = self.__unpack_data__( self.__IMAGE_DEBUG_DIRECTORY_format__, data, file_offset = self.get_offset_from_rva(rva+dbg_size*idx)) if not dbg: return None debug.append( DebugData( struct = dbg)) return debug def parse_resources_directory(self, rva, size=0, base_rva = None, level = 0, dirs=None): """Parse the resources directory. Given the RVA of the resources directory, it will process all its entries. The root will have the corresponding member of its structure, IMAGE_RESOURCE_DIRECTORY plus 'entries', a list of all the entries in the directory. Those entries will have, correspondingly, all the structure's members (IMAGE_RESOURCE_DIRECTORY_ENTRY) and an additional one, "directory", pointing to the IMAGE_RESOURCE_DIRECTORY structure representing upper layers of the tree. This one will also have an 'entries' attribute, pointing to the 3rd, and last, level. Another directory with more entries. Those last entries will have a new attribute (both 'leaf' or 'data_entry' can be used to access it). This structure finally points to the resource data. All the members of this structure, IMAGE_RESOURCE_DATA_ENTRY, are available as its attributes. """ # OC Patch: if dirs is None: dirs = [rva] if base_rva is None: base_rva = rva resources_section = self.get_section_by_rva(rva) try: # If the RVA is invalid all would blow up. Some EXEs seem to be # specially nasty and have an invalid RVA. data = self.get_data(rva, Structure(self.__IMAGE_RESOURCE_DIRECTORY_format__).sizeof() ) except PEFormatError, e: self.__warnings.append( 'Invalid resources directory. Can\'t read ' + 'directory data at RVA: 0x%x' % rva) return None # Get the resource directory structure, that is, the header # of the table preceding the actual entries # resource_dir = self.__unpack_data__( self.__IMAGE_RESOURCE_DIRECTORY_format__, data, file_offset = self.get_offset_from_rva(rva) ) if resource_dir is None: # If can't parse resources directory then silently return. # This directory does not necessarily have to be valid to # still have a valid PE file self.__warnings.append( 'Invalid resources directory. Can\'t parse ' + 'directory data at RVA: 0x%x' % rva) return None dir_entries = [] # Advance the RVA to the position immediately following the directory # table header and pointing to the first entry in the table # rva += resource_dir.sizeof() number_of_entries = ( resource_dir.NumberOfNamedEntries + resource_dir.NumberOfIdEntries ) # Set a hard limit on the maximum reasonable number of entries MAX_ALLOWED_ENTRIES = 4096 if number_of_entries > MAX_ALLOWED_ENTRIES: self.__warnings.append( 'Error parsing the resources directory. ' 'The directory contains %d entries (>%s)' % (number_of_entries, MAX_ALLOWED_ENTRIES) ) return None strings_to_postprocess = list() # Keep track of the last name's start and end offsets in order # to be able to detect overlapping entries that might suggest # and invalid or corrupt directory. last_name_begin_end = None for idx in xrange(number_of_entries): res = self.parse_resource_entry(rva) if res is None: self.__warnings.append( 'Error parsing the resources directory, ' 'Entry %d is invalid, RVA = 0x%x. ' % (idx, rva) ) break entry_name = None entry_id = None name_is_string = (res.Name & 0x80000000L) >> 31 if not name_is_string: entry_id = res.Name else: ustr_offset = base_rva+res.NameOffset try: entry_name = UnicodeStringWrapperPostProcessor(self, ustr_offset) # If the last entry's offset points before the current's but its end # is past the current's beginning, assume the overlap indicates a # corrupt name. if last_name_begin_end and (last_name_begin_end[0] < ustr_offset and last_name_begin_end[1] >= ustr_offset): # Remove the previous overlapping entry as it's likely to be already corrupt data. strings_to_postprocess.pop() self.__warnings.append( 'Error parsing the resources directory, ' 'attempting to read entry name. ' 'Entry names overlap 0x%x' % (ustr_offset) ) break last_name_begin_end = (ustr_offset, ustr_offset+entry_name.get_pascal_16_length()) strings_to_postprocess.append(entry_name) except PEFormatError, excp: self.__warnings.append( 'Error parsing the resources directory, ' 'attempting to read entry name. ' 'Can\'t read unicode string at offset 0x%x' % (ustr_offset) ) if res.DataIsDirectory: # OC Patch: # # One trick malware can do is to recursively reference # the next directory. This causes hilarity to ensue when # trying to parse everything correctly. # If the original RVA given to this function is equal to # the next one to parse, we assume that it's a trick. # Instead of raising a PEFormatError this would skip some # reasonable data so we just break. # # 9ee4d0a0caf095314fd7041a3e4404dc is the offending sample if (base_rva + res.OffsetToDirectory) in dirs: break else: entry_directory = self.parse_resources_directory( base_rva+res.OffsetToDirectory, size-(rva-base_rva), # size base_rva=base_rva, level = level+1, dirs=dirs + [base_rva + res.OffsetToDirectory]) if not entry_directory: break # Ange Albertini's code to process resources' strings # strings = None if entry_id == RESOURCE_TYPE['RT_STRING']: strings = dict() for resource_id in entry_directory.entries: if hasattr(resource_id, 'directory'): resource_strings = dict() for resource_lang in resource_id.directory.entries: if (resource_lang is None or not hasattr(resource_lang, 'data') or resource_lang.data.struct.Size is None or resource_id.id is None): continue string_entry_rva = resource_lang.data.struct.OffsetToData string_entry_size = resource_lang.data.struct.Size string_entry_id = resource_id.id string_entry_data = self.get_data(string_entry_rva, string_entry_size) parse_strings( string_entry_data, (int(string_entry_id) - 1) * 16, resource_strings ) strings.update(resource_strings) resource_id.directory.strings = resource_strings dir_entries.append( ResourceDirEntryData( struct = res, name = entry_name, id = entry_id, directory = entry_directory)) else: struct = self.parse_resource_data_entry( base_rva + res.OffsetToDirectory) if struct: entry_data = ResourceDataEntryData( struct = struct, lang = res.Name & 0x3ff, sublang = res.Name >> 10 ) dir_entries.append( ResourceDirEntryData( struct = res, name = entry_name, id = entry_id, data = entry_data)) else: break # Check if this entry contains version information # if level == 0 and res.Id == RESOURCE_TYPE['RT_VERSION']: if len(dir_entries)>0: last_entry = dir_entries[-1] rt_version_struct = None try: rt_version_struct = last_entry.directory.entries[0].directory.entries[0].data.struct except: # Maybe a malformed directory structure...? # Lets ignore it pass if rt_version_struct is not None: self.parse_version_information(rt_version_struct) rva += res.sizeof() string_rvas = [s.get_rva() for s in strings_to_postprocess] string_rvas.sort() for idx, s in enumerate(strings_to_postprocess): s.render_pascal_16() resource_directory_data = ResourceDirData( struct = resource_dir, entries = dir_entries) return resource_directory_data def parse_resource_data_entry(self, rva): """Parse a data entry from the resources directory.""" try: # If the RVA is invalid all would blow up. Some EXEs seem to be # specially nasty and have an invalid RVA. data = self.get_data(rva, Structure(self.__IMAGE_RESOURCE_DATA_ENTRY_format__).sizeof() ) except PEFormatError, excp: self.__warnings.append( 'Error parsing a resource directory data entry, ' + 'the RVA is invalid: 0x%x' % ( rva ) ) return None data_entry = self.__unpack_data__( self.__IMAGE_RESOURCE_DATA_ENTRY_format__, data, file_offset = self.get_offset_from_rva(rva) ) return data_entry def parse_resource_entry(self, rva): """Parse a directory entry from the resources directory.""" try: data = self.get_data( rva, Structure(self.__IMAGE_RESOURCE_DIRECTORY_ENTRY_format__).sizeof() ) except PEFormatError, excp: # A warning will be added by the caller if this method returns None return None resource = self.__unpack_data__( self.__IMAGE_RESOURCE_DIRECTORY_ENTRY_format__, data, file_offset = self.get_offset_from_rva(rva) ) if resource is None: return None #resource.NameIsString = (resource.Name & 0x80000000L) >> 31 resource.NameOffset = resource.Name & 0x7FFFFFFFL resource.__pad = resource.Name & 0xFFFF0000L resource.Id = resource.Name & 0x0000FFFFL resource.DataIsDirectory = (resource.OffsetToData & 0x80000000L) >> 31 resource.OffsetToDirectory = resource.OffsetToData & 0x7FFFFFFFL return resource def parse_version_information(self, version_struct): """Parse version information structure. The date will be made available in three attributes of the PE object. VS_VERSIONINFO will contain the first three fields of the main structure: 'Length', 'ValueLength', and 'Type' VS_FIXEDFILEINFO will hold the rest of the fields, accessible as sub-attributes: 'Signature', 'StrucVersion', 'FileVersionMS', 'FileVersionLS', 'ProductVersionMS', 'ProductVersionLS', 'FileFlagsMask', 'FileFlags', 'FileOS', 'FileType', 'FileSubtype', 'FileDateMS', 'FileDateLS' FileInfo is a list of all StringFileInfo and VarFileInfo structures. StringFileInfo structures will have a list as an attribute named 'StringTable' containing all the StringTable structures. Each of those structures contains a dictionary 'entries' with all the key / value version information string pairs. VarFileInfo structures will have a list as an attribute named 'Var' containing all Var structures. Each Var structure will have a dictionary as an attribute named 'entry' which will contain the name and value of the Var. """ # Retrieve the data for the version info resource # start_offset = self.get_offset_from_rva( version_struct.OffsetToData ) raw_data = self.__data__[ start_offset : start_offset+version_struct.Size ] # Map the main structure and the subsequent string # versioninfo_struct = self.__unpack_data__( self.__VS_VERSIONINFO_format__, raw_data, file_offset = start_offset ) if versioninfo_struct is None: return ustr_offset = version_struct.OffsetToData + versioninfo_struct.sizeof() try: versioninfo_string = self.get_string_u_at_rva( ustr_offset ) except PEFormatError, excp: self.__warnings.append( 'Error parsing the version information, ' + 'attempting to read VS_VERSION_INFO string. Can\'t ' + 'read unicode string at offset 0x%x' % ( ustr_offset ) ) versioninfo_string = None # If the structure does not contain the expected name, it's assumed to be invalid # if versioninfo_string != u'VS_VERSION_INFO': self.__warnings.append('Invalid VS_VERSION_INFO block') return # Set the PE object's VS_VERSIONINFO to this one # self.VS_VERSIONINFO = versioninfo_struct # The the Key attribute to point to the unicode string identifying the structure # self.VS_VERSIONINFO.Key = versioninfo_string # Process the fixed version information, get the offset and structure # fixedfileinfo_offset = self.dword_align( versioninfo_struct.sizeof() + 2 * (len(versioninfo_string) + 1), version_struct.OffsetToData) fixedfileinfo_struct = self.__unpack_data__( self.__VS_FIXEDFILEINFO_format__, raw_data[fixedfileinfo_offset:], file_offset = start_offset+fixedfileinfo_offset ) if not fixedfileinfo_struct: return # Set the PE object's VS_FIXEDFILEINFO to this one # self.VS_FIXEDFILEINFO = fixedfileinfo_struct # Start parsing all the StringFileInfo and VarFileInfo structures # # Get the first one # stringfileinfo_offset = self.dword_align( fixedfileinfo_offset + fixedfileinfo_struct.sizeof(), version_struct.OffsetToData) original_stringfileinfo_offset = stringfileinfo_offset # Set the PE object's attribute that will contain them all. # self.FileInfo = list() while True: # Process the StringFileInfo/VarFileInfo structure # stringfileinfo_struct = self.__unpack_data__( self.__StringFileInfo_format__, raw_data[stringfileinfo_offset:], file_offset = start_offset+stringfileinfo_offset ) if stringfileinfo_struct is None: self.__warnings.append( 'Error parsing StringFileInfo/VarFileInfo struct' ) return None # Get the subsequent string defining the structure. # ustr_offset = ( version_struct.OffsetToData + stringfileinfo_offset + versioninfo_struct.sizeof() ) try: stringfileinfo_string = self.get_string_u_at_rva( ustr_offset ) except PEFormatError, excp: self.__warnings.append( 'Error parsing the version information, ' + 'attempting to read StringFileInfo string. Can\'t ' + 'read unicode string at offset 0x%x' % ( ustr_offset ) ) break # Set such string as the Key attribute # stringfileinfo_struct.Key = stringfileinfo_string # Append the structure to the PE object's list # self.FileInfo.append(stringfileinfo_struct) # Parse a StringFileInfo entry # if stringfileinfo_string and stringfileinfo_string.startswith(u'StringFileInfo'): if stringfileinfo_struct.Type in (0,1) and stringfileinfo_struct.ValueLength == 0: stringtable_offset = self.dword_align( stringfileinfo_offset + stringfileinfo_struct.sizeof() + 2*(len(stringfileinfo_string)+1), version_struct.OffsetToData) stringfileinfo_struct.StringTable = list() # Process the String Table entries # while True: stringtable_struct = self.__unpack_data__( self.__StringTable_format__, raw_data[stringtable_offset:], file_offset = start_offset+stringtable_offset ) if not stringtable_struct: break ustr_offset = ( version_struct.OffsetToData + stringtable_offset + stringtable_struct.sizeof() ) try: stringtable_string = self.get_string_u_at_rva( ustr_offset ) except PEFormatError, excp: self.__warnings.append( 'Error parsing the version information, ' + 'attempting to read StringTable string. Can\'t ' + 'read unicode string at offset 0x%x' % ( ustr_offset ) ) break stringtable_struct.LangID = stringtable_string stringtable_struct.entries = dict() stringtable_struct.entries_offsets = dict() stringtable_struct.entries_lengths = dict() stringfileinfo_struct.StringTable.append(stringtable_struct) entry_offset = self.dword_align( stringtable_offset + stringtable_struct.sizeof() + 2*(len(stringtable_string)+1), version_struct.OffsetToData) # Process all entries in the string table # while entry_offset < stringtable_offset + stringtable_struct.Length: string_struct = self.__unpack_data__( self.__String_format__, raw_data[entry_offset:], file_offset = start_offset+entry_offset ) if not string_struct: break ustr_offset = ( version_struct.OffsetToData + entry_offset + string_struct.sizeof() ) try: key = self.get_string_u_at_rva( ustr_offset ) key_offset = self.get_offset_from_rva( ustr_offset ) except PEFormatError, excp: self.__warnings.append( 'Error parsing the version information, ' + 'attempting to read StringTable Key string. Can\'t ' + 'read unicode string at offset 0x%x' % ( ustr_offset ) ) break value_offset = self.dword_align( 2*(len(key)+1) + entry_offset + string_struct.sizeof(), version_struct.OffsetToData) ustr_offset = version_struct.OffsetToData + value_offset try: value = self.get_string_u_at_rva( ustr_offset, max_length = string_struct.ValueLength ) value_offset = self.get_offset_from_rva( ustr_offset ) except PEFormatError, excp: self.__warnings.append( 'Error parsing the version information, ' + 'attempting to read StringTable Value string. ' + 'Can\'t read unicode string at offset 0x%x' % ( ustr_offset ) ) break if string_struct.Length == 0: entry_offset = stringtable_offset + stringtable_struct.Length else: entry_offset = self.dword_align( string_struct.Length+entry_offset, version_struct.OffsetToData) key_as_char = [] for c in key: if ord(c) >= 0x80: key_as_char.append('\\x%02x' % ord(c)) else: key_as_char.append(c) key_as_char = ''.join(key_as_char) stringtable_struct.entries[key] = value stringtable_struct.entries_offsets[key] = (key_offset, value_offset) stringtable_struct.entries_lengths[key] = (len(key), len(value)) new_stringtable_offset = self.dword_align( stringtable_struct.Length + stringtable_offset, version_struct.OffsetToData) # check if the entry is crafted in a way that would lead to an infinite # loop and break if so # if new_stringtable_offset == stringtable_offset: break stringtable_offset = new_stringtable_offset if stringtable_offset >= stringfileinfo_struct.Length: break # Parse a VarFileInfo entry # elif stringfileinfo_string and stringfileinfo_string.startswith( u'VarFileInfo' ): varfileinfo_struct = stringfileinfo_struct varfileinfo_struct.name = 'VarFileInfo' if varfileinfo_struct.Type in (0, 1) and varfileinfo_struct.ValueLength == 0: var_offset = self.dword_align( stringfileinfo_offset + varfileinfo_struct.sizeof() + 2*(len(stringfileinfo_string)+1), version_struct.OffsetToData) varfileinfo_struct.Var = list() # Process all entries # while True: var_struct = self.__unpack_data__( self.__Var_format__, raw_data[var_offset:], file_offset = start_offset+var_offset ) if not var_struct: break ustr_offset = ( version_struct.OffsetToData + var_offset + var_struct.sizeof() ) try: var_string = self.get_string_u_at_rva( ustr_offset ) except PEFormatError, excp: self.__warnings.append( 'Error parsing the version information, ' + 'attempting to read VarFileInfo Var string. ' + 'Can\'t read unicode string at offset 0x%x' % (ustr_offset)) break if var_string is None: break varfileinfo_struct.Var.append(var_struct) varword_offset = self.dword_align( 2*(len(var_string)+1) + var_offset + var_struct.sizeof(), version_struct.OffsetToData) orig_varword_offset = varword_offset while varword_offset < orig_varword_offset + var_struct.ValueLength: word1 = self.get_word_from_data( raw_data[varword_offset:varword_offset+2], 0) word2 = self.get_word_from_data( raw_data[varword_offset+2:varword_offset+4], 0) varword_offset += 4 if isinstance(word1, (int, long)) and isinstance(word2, (int, long)): var_struct.entry = {var_string: '0x%04x 0x%04x' % (word1, word2)} var_offset = self.dword_align( var_offset+var_struct.Length, version_struct.OffsetToData) if var_offset <= var_offset+var_struct.Length: break # Increment and align the offset # stringfileinfo_offset = self.dword_align( stringfileinfo_struct.Length+stringfileinfo_offset, version_struct.OffsetToData) # Check if all the StringFileInfo and VarFileInfo items have been processed # if stringfileinfo_struct.Length == 0 or stringfileinfo_offset >= versioninfo_struct.Length: break def parse_export_directory(self, rva, size): """Parse the export directory. Given the RVA of the export directory, it will process all its entries. The exports will be made available as a list of ExportData instances in the 'IMAGE_DIRECTORY_ENTRY_EXPORT' PE attribute. """ try: export_dir = self.__unpack_data__( self.__IMAGE_EXPORT_DIRECTORY_format__, self.get_data( rva, Structure(self.__IMAGE_EXPORT_DIRECTORY_format__).sizeof() ), file_offset = self.get_offset_from_rva(rva) ) except PEFormatError: self.__warnings.append( 'Error parsing export directory at RVA: 0x%x' % ( rva ) ) return if not export_dir: return # We keep track of the bytes left in the file and use it to set a upper # bound in the number of items that can be read from the different # arrays # def length_until_eof(rva): return len(self.__data__) - self.get_offset_from_rva(rva) try: address_of_names = self.get_data( export_dir.AddressOfNames, min( length_until_eof(export_dir.AddressOfNames), export_dir.NumberOfNames*4)) address_of_name_ordinals = self.get_data( export_dir.AddressOfNameOrdinals, min( length_until_eof(export_dir.AddressOfNameOrdinals), export_dir.NumberOfNames*4) ) address_of_functions = self.get_data( export_dir.AddressOfFunctions, min( length_until_eof(export_dir.AddressOfFunctions), export_dir.NumberOfFunctions*4) ) except PEFormatError: self.__warnings.append( 'Error parsing export directory at RVA: 0x%x' % ( rva ) ) return exports = [] max_failed_entries_before_giving_up = 10 section = self.get_section_by_rva(export_dir.AddressOfNames) if not section: self.__warnings.append( 'RVA AddressOfNames in the export directory points to an invalid address: %x' % export_dir.AddressOfNames) return else: safety_boundary = section.VirtualAddress + len(section.get_data()) - export_dir.AddressOfNames for i in xrange( min( export_dir.NumberOfNames, safety_boundary/4) ): symbol_name_address = self.get_dword_from_data(address_of_names, i) if symbol_name_address is None: max_failed_entries_before_giving_up -= 1 if max_failed_entries_before_giving_up <= 0: break symbol_name = self.get_string_at_rva( symbol_name_address ) if not is_valid_function_name(symbol_name): break try: symbol_name_offset = self.get_offset_from_rva( symbol_name_address ) except PEFormatError: max_failed_entries_before_giving_up -= 1 if max_failed_entries_before_giving_up <= 0: break continue symbol_ordinal = self.get_word_from_data( address_of_name_ordinals, i) if symbol_ordinal is not None and symbol_ordinal*4 < len(address_of_functions): symbol_address = self.get_dword_from_data( address_of_functions, symbol_ordinal) else: # Corrupt? a bad pointer... we assume it's all # useless, no exports return None if symbol_address is None or symbol_address == 0: continue # If the function's RVA points within the export directory # it will point to a string with the forwarded symbol's string # instead of pointing the the function start address. if symbol_address >= rva and symbol_address < rva+size: forwarder_str = self.get_string_at_rva(symbol_address) try: forwarder_offset = self.get_offset_from_rva( symbol_address ) except PEFormatError: continue else: forwarder_str = None forwarder_offset = None exports.append( ExportData( pe = self, ordinal = export_dir.Base+symbol_ordinal, ordinal_offset = self.get_offset_from_rva( export_dir.AddressOfNameOrdinals + 2*i ), address = symbol_address, address_offset = self.get_offset_from_rva( export_dir.AddressOfFunctions + 4*symbol_ordinal ), name = symbol_name, name_offset = symbol_name_offset, forwarder = forwarder_str, forwarder_offset = forwarder_offset )) ordinals = [exp.ordinal for exp in exports] max_failed_entries_before_giving_up = 10 section = self.get_section_by_rva(export_dir.AddressOfFunctions) if not section: self.__warnings.append( 'RVA AddressOfFunctions in the export directory points to an invalid address: %x' % export_dir.AddressOfFunctions) return else: safety_boundary = section.VirtualAddress + len(section.get_data()) - export_dir.AddressOfFunctions safety_boundary = section.VirtualAddress + len(section.get_data()) - export_dir.AddressOfFunctions for idx in xrange( min(export_dir.NumberOfFunctions, safety_boundary/4) ): if not idx+export_dir.Base in ordinals: try: symbol_address = self.get_dword_from_data( address_of_functions, idx) except PEFormatError: symbol_address = None if symbol_address is None: max_failed_entries_before_giving_up -= 1 if max_failed_entries_before_giving_up <= 0: break if symbol_address == 0: continue # # Checking for forwarder again. # if symbol_address >= rva and symbol_address < rva+size: forwarder_str = self.get_string_at_rva(symbol_address) else: forwarder_str = None exports.append( ExportData( ordinal = export_dir.Base+idx, address = symbol_address, name = None, forwarder = forwarder_str)) return ExportDirData( struct = export_dir, symbols = exports) def dword_align(self, offset, base): return ((offset+base+3) & 0xfffffffcL) - (base & 0xfffffffcL) def parse_delay_import_directory(self, rva, size): """Walk and parse the delay import directory.""" import_descs = [] while True: try: # If the RVA is invalid all would blow up. Some PEs seem to be # specially nasty and have an invalid RVA. data = self.get_data( rva, Structure(self.__IMAGE_DELAY_IMPORT_DESCRIPTOR_format__).sizeof() ) except PEFormatError, e: self.__warnings.append( 'Error parsing the Delay import directory at RVA: 0x%x' % ( rva ) ) break file_offset = self.get_offset_from_rva(rva) import_desc = self.__unpack_data__( self.__IMAGE_DELAY_IMPORT_DESCRIPTOR_format__, data, file_offset = file_offset ) # If the structure is all zeros, we reached the end of the list if not import_desc or import_desc.all_zeroes(): break rva += import_desc.sizeof() # If the array of thunk's is somewhere earlier than the import # descriptor we can set a maximum length for the array. Otherwise # just set a maximum length of the size of the file max_len = len(self.__data__) - file_offset if rva > import_desc.pINT or rva > import_desc.pIAT: max_len = max(rva-import_desc.pINT, rva-import_desc.pIAT) try: import_data = self.parse_imports( import_desc.pINT, import_desc.pIAT, None, max_length = max_len) except PEFormatError, e: self.__warnings.append( 'Error parsing the Delay import directory. ' + 'Invalid import data at RVA: 0x%x (%s)' % ( rva, e.value) ) break if not import_data: continue dll = self.get_string_at_rva(import_desc.szName) if not is_valid_dos_filename(dll): dll = '*invalid*' if dll: for symbol in import_data: if symbol.name is None: funcname = ordlookup.ordLookup(dll.lower(), symbol.ordinal) if funcname: symbol.name = funcname import_descs.append( ImportDescData( struct = import_desc, imports = import_data, dll = dll)) return import_descs def get_imphash(self): impstrs = [] exts = ['ocx', 'sys', 'dll'] if not hasattr(self, "DIRECTORY_ENTRY_IMPORT"): return "" for entry in self.DIRECTORY_ENTRY_IMPORT: libname = entry.dll.lower() parts = libname.rsplit('.', 1) if len(parts) > 1 and parts[1] in exts: libname = parts[0] for imp in entry.imports: funcname = None if not imp.name: funcname = ordlookup.ordLookup(entry.dll.lower(), imp.ordinal, make_name=True) if not funcname: raise Exception("Unable to look up ordinal %s:%04x" % (entry.dll, imp.ordinal)) else: funcname = imp.name if not funcname: continue impstrs.append('%s.%s' % (libname.lower(),funcname.lower())) return hashlib.md5( ','.join( impstrs ) ).hexdigest() def parse_import_directory(self, rva, size): """Walk and parse the import directory.""" import_descs = [] while True: try: # If the RVA is invalid all would blow up. Some EXEs seem to be # specially nasty and have an invalid RVA. data = self.get_data(rva, Structure(self.__IMAGE_IMPORT_DESCRIPTOR_format__).sizeof() ) except PEFormatError, e: self.__warnings.append( 'Error parsing the import directory at RVA: 0x%x' % ( rva ) ) break file_offset = self.get_offset_from_rva(rva) import_desc = self.__unpack_data__( self.__IMAGE_IMPORT_DESCRIPTOR_format__, data, file_offset = file_offset ) # If the structure is all zeros, we reached the end of the list if not import_desc or import_desc.all_zeroes(): break rva += import_desc.sizeof() # If the array of thunk's is somewhere earlier than the import # descriptor we can set a maximum length for the array. Otherwise # just set a maximum length of the size of the file max_len = len(self.__data__) - file_offset if rva > import_desc.OriginalFirstThunk or rva > import_desc.FirstThunk: max_len = max(rva-import_desc.OriginalFirstThunk, rva-import_desc.FirstThunk) try: import_data = self.parse_imports( import_desc.OriginalFirstThunk, import_desc.FirstThunk, import_desc.ForwarderChain, max_length = max_len) except PEFormatError, e: self.__warnings.append( 'Error parsing the import directory. ' + 'Invalid Import data at RVA: 0x%x (%s)' % ( rva, e.value ) ) break if not import_data: continue dll = self.get_string_at_rva(import_desc.Name) if not is_valid_dos_filename(dll): dll = '*invalid*' if dll: for symbol in import_data: if symbol.name is None: funcname = ordlookup.ordLookup(dll.lower(), symbol.ordinal) if funcname: symbol.name = funcname import_descs.append( ImportDescData( struct = import_desc, imports = import_data, dll = dll)) suspicious_imports = set([ 'LoadLibrary', 'GetProcAddress' ]) suspicious_imports_count = 0 total_symbols = 0 for imp_dll in import_descs: for symbol in imp_dll.imports: for suspicious_symbol in suspicious_imports: if symbol and symbol.name and symbol.name.startswith( suspicious_symbol ): suspicious_imports_count += 1 break total_symbols += 1 if suspicious_imports_count == len(suspicious_imports) and total_symbols < 20: self.__warnings.append( 'Imported symbols contain entries typical of packed executables.' ) return import_descs def parse_imports(self, original_first_thunk, first_thunk, forwarder_chain, max_length=None): """Parse the imported symbols. It will fill a list, which will be available as the dictionary attribute "imports". Its keys will be the DLL names and the values all the symbols imported from that object. """ imported_symbols = [] # The following has been commented as a PE does not # need to have the import data necessarily within # a section, it can keep it in gaps between sections # or overlapping other data. # #imports_section = self.get_section_by_rva(first_thunk) #if not imports_section: # raise PEFormatError, 'Invalid/corrupt imports.' # Import Lookup Table. Contains ordinals or pointers to strings. ilt = self.get_import_table(original_first_thunk, max_length) # Import Address Table. May have identical content to ILT if # PE file is not bounded, Will contain the address of the # imported symbols once the binary is loaded or if it is already # bound. iat = self.get_import_table(first_thunk, max_length) # OC Patch: # Would crash if IAT or ILT had None type if (not iat or len(iat)==0) and (not ilt or len(ilt)==0): raise PEFormatError( 'Invalid Import Table information. ' + 'Both ILT and IAT appear to be broken.') table = None if ilt: table = ilt elif iat: table = iat else: return None imp_offset = 4 address_mask = 0x7fffffff if self.PE_TYPE == OPTIONAL_HEADER_MAGIC_PE: ordinal_flag = IMAGE_ORDINAL_FLAG elif self.PE_TYPE == OPTIONAL_HEADER_MAGIC_PE_PLUS: ordinal_flag = IMAGE_ORDINAL_FLAG64 imp_offset = 8 address_mask = 0x7fffffffffffffffL num_invalid = 0 for idx in xrange(len(table)): imp_ord = None imp_hint = None imp_name = None name_offset = None hint_name_table_rva = None if table[idx].AddressOfData: # If imported by ordinal, we will append the ordinal number # if table[idx].AddressOfData & ordinal_flag: import_by_ordinal = True imp_ord = table[idx].AddressOfData & 0xffff imp_name = None name_offset = None else: import_by_ordinal = False try: hint_name_table_rva = table[idx].AddressOfData & address_mask data = self.get_data(hint_name_table_rva, 2) # Get the Hint imp_hint = self.get_word_from_data(data, 0) imp_name = self.get_string_at_rva(table[idx].AddressOfData+2) if not is_valid_function_name(imp_name): imp_name = '*invalid*' name_offset = self.get_offset_from_rva(table[idx].AddressOfData+2) except PEFormatError, e: pass # by nriva: we want the ThunkRVA and ThunkOffset thunk_offset = table[idx].get_file_offset() thunk_rva = self.get_rva_from_offset(thunk_offset) imp_address = first_thunk + self.OPTIONAL_HEADER.ImageBase + idx * imp_offset struct_iat = None try: if iat and ilt and ilt[idx].AddressOfData != iat[idx].AddressOfData: imp_bound = iat[idx].AddressOfData struct_iat = iat[idx] else: imp_bound = None except IndexError: imp_bound = None # The file with hashes: # # MD5: bfe97192e8107d52dd7b4010d12b2924 # SHA256: 3d22f8b001423cb460811ab4f4789f277b35838d45c62ec0454c877e7c82c7f5 # # has an invalid table built in a way that it's parseable but contains invalid # entries that lead pefile to take extremely long amounts of time to # parse. It also leads to extreme memory consumption. # To prevent similar cases, if invalid entries are found in the middle of a # table the parsing will be aborted # if imp_ord == None and imp_name == None: raise PEFormatError('Invalid entries, aborting parsing.') # Some PEs appear to interleave valid and invalid imports. Instead of # aborting the parsing altogether we will simply skip the invalid entries. # Although if we see 1000 invalid entries and no legit ones, we abort. if imp_name == '*invalid*': if num_invalid > 1000 and num_invalid == idx: raise PEFormatError('Too many invalid names, aborting parsing.') num_invalid += 1 continue if imp_name != '' and (imp_ord or imp_name): imported_symbols.append( ImportData( pe = self, struct_table = table[idx], struct_iat = struct_iat, # for bound imports if any import_by_ordinal = import_by_ordinal, ordinal = imp_ord, ordinal_offset = table[idx].get_file_offset(), hint = imp_hint, name = imp_name, name_offset = name_offset, bound = imp_bound, address = imp_address, hint_name_table_rva = hint_name_table_rva, thunk_offset = thunk_offset, thunk_rva = thunk_rva )) return imported_symbols def get_import_table(self, rva, max_length=None): table = [] # We need the ordinal flag for a simple heuristic # we're implementing within the loop # if self.PE_TYPE == OPTIONAL_HEADER_MAGIC_PE: ordinal_flag = IMAGE_ORDINAL_FLAG format = self.__IMAGE_THUNK_DATA_format__ elif self.PE_TYPE == OPTIONAL_HEADER_MAGIC_PE_PLUS: ordinal_flag = IMAGE_ORDINAL_FLAG64 format = self.__IMAGE_THUNK_DATA64_format__ MAX_ADDRESS_SPREAD = 128*2**20 # 64 MB MAX_REPEATED_ADDRESSES = 15 repeated_address = 0 addresses_of_data_set_64 = set() addresses_of_data_set_32 = set() start_rva = rva while True and rva: if max_length is not None and rva >= start_rva+max_length: self.__warnings.append( 'Error parsing the import table. Entries go beyond bounds.') break # if we see too many times the same entry we assume it could be # a table containing bogus data (with malicious intent or otherwise) if repeated_address >= MAX_REPEATED_ADDRESSES: return [] # if the addresses point somewhere but the difference between the highest # and lowest address is larger than MAX_ADDRESS_SPREAD we assume a bogus # table as the addresses should be contained within a module if (addresses_of_data_set_32 and max(addresses_of_data_set_32) - min(addresses_of_data_set_32) > MAX_ADDRESS_SPREAD ): return [] if (addresses_of_data_set_64 and max(addresses_of_data_set_64) - min(addresses_of_data_set_64) > MAX_ADDRESS_SPREAD ): return [] try: data = self.get_data(rva, Structure(format).sizeof()) except PEFormatError, e: self.__warnings.append( 'Error parsing the import table. ' + 'Invalid data at RVA: 0x%x' % rva) return None thunk_data = self.__unpack_data__( format, data, file_offset=self.get_offset_from_rva(rva) ) # Check if the AddressOfData lies within the range of RVAs that it's # being scanned, abort if that is the case, as it is very unlikely # to be legitimate data. # Seen in PE with SHA256: # 5945bb6f0ac879ddf61b1c284f3b8d20c06b228e75ae4f571fa87f5b9512902c if thunk_data.AddressOfData >= start_rva and thunk_data.AddressOfData <= rva: self.__warnings.append( 'Error parsing the import table. ' + 'AddressOfData overlaps with THUNK_DATA for ' + 'THUNK at RVA 0x%x' % ( rva ) ) break if thunk_data and thunk_data.AddressOfData: # If the entry looks like could be an ordinal... if thunk_data.AddressOfData & ordinal_flag: # but its value is beyond 2^16, we will assume it's a # corrupted and ignore it altogether if thunk_data.AddressOfData & 0x7fffffff > 0xffff: return [] # and if it looks like it should be an RVA else: # keep track of the RVAs seen and store them to study their # properties. When certain non-standard features are detected # the parsing will be aborted if (thunk_data.AddressOfData in addresses_of_data_set_32 or thunk_data.AddressOfData in addresses_of_data_set_64): repeated_address += 1 if thunk_data.AddressOfData >= 2**32: addresses_of_data_set_64.add(thunk_data.AddressOfData) else: addresses_of_data_set_32.add(thunk_data.AddressOfData) if not thunk_data or thunk_data.all_zeroes(): break rva += thunk_data.sizeof() table.append(thunk_data) return table def get_memory_mapped_image(self, max_virtual_address=0x10000000, ImageBase=None): """Returns the data corresponding to the memory layout of the PE file. The data includes the PE header and the sections loaded at offsets corresponding to their relative virtual addresses. (the VirtualAddress section header member). Any offset in this data corresponds to the absolute memory address ImageBase+offset. The optional argument 'max_virtual_address' provides with means of limiting which sections are processed. Any section with their VirtualAddress beyond this value will be skipped. Normally, sections with values beyond this range are just there to confuse tools. It's a common trick to see in packed executables. If the 'ImageBase' optional argument is supplied, the file's relocations will be applied to the image by calling the 'relocate_image()' method. Beware that the relocation information is applied permanently. """ # Rebase if requested # if ImageBase is not None: # Keep a copy of the image's data before modifying it by rebasing it # original_data = self.__data__ self.relocate_image(ImageBase) # Collect all sections in one code block #mapped_data = self.header mapped_data = '' + self.__data__[:] for section in self.sections: # Miscellaneous integrity tests. # Some packer will set these to bogus values to # make tools go nuts. # if section.Misc_VirtualSize == 0 or section.SizeOfRawData == 0: continue if section.SizeOfRawData > len(self.__data__): continue if self.adjust_FileAlignment( section.PointerToRawData, self.OPTIONAL_HEADER.FileAlignment ) > len(self.__data__): continue VirtualAddress_adj = self.adjust_SectionAlignment( section.VirtualAddress, self.OPTIONAL_HEADER.SectionAlignment, self.OPTIONAL_HEADER.FileAlignment ) if VirtualAddress_adj >= max_virtual_address: continue padding_length = VirtualAddress_adj - len(mapped_data) if padding_length>0: mapped_data += '\0'*padding_length elif padding_length<0: mapped_data = mapped_data[:padding_length] mapped_data += section.get_data() # If the image was rebased, restore it to its original form # if ImageBase is not None: self.__data__ = original_data return mapped_data def get_resources_strings(self): """Returns a list of all the strings found withing the resources (if any). This method will scan all entries in the resources directory of the PE, if there is one, and will return a list() with the strings. An empty list will be returned otherwise. """ resources_strings = list() if hasattr(self, 'DIRECTORY_ENTRY_RESOURCE'): for resource_type in self.DIRECTORY_ENTRY_RESOURCE.entries: if hasattr(resource_type, 'directory'): for resource_id in resource_type.directory.entries: if hasattr(resource_id, 'directory'): if hasattr(resource_id.directory, 'strings') and resource_id.directory.strings: for res_string in resource_id.directory.strings.values(): resources_strings.append( res_string ) return resources_strings def get_data(self, rva=0, length=None): """Get data regardless of the section where it lies on. Given a RVA and the size of the chunk to retrieve, this method will find the section where the data lies and return the data. """ s = self.get_section_by_rva(rva) if length: end = rva + length else: end = None if not s: if rva < len(self.header): return self.header[rva:end] # Before we give up we check whether the file might # contain the data anyway. There are cases of PE files # without sections that rely on windows loading the first # 8291 bytes into memory and assume the data will be # there # A functional file with these characteristics is: # MD5: 0008892cdfbc3bda5ce047c565e52295 # SHA-1: c7116b9ff950f86af256defb95b5d4859d4752a9 # if rva < len(self.__data__): return self.__data__[rva:end] raise PEFormatError, 'data at RVA can\'t be fetched. Corrupt header?' return s.get_data(rva, length) def get_rva_from_offset(self, offset): """Get the RVA corresponding to this file offset. """ s = self.get_section_by_offset(offset) if not s: if self.sections: lowest_rva = min( [ self.adjust_SectionAlignment( s.VirtualAddress, self.OPTIONAL_HEADER.SectionAlignment, self.OPTIONAL_HEADER.FileAlignment ) for s in self.sections] ) if offset < lowest_rva: # We will assume that the offset lies within the headers, or # at least points before where the earliest section starts # and we will simply return the offset as the RVA # # The case illustrating this behavior can be found at: # http://corkami.blogspot.com/2010/01/hey-hey-hey-whats-in-your-head.html # where the import table is not contained by any section # hence the RVA needs to be resolved to a raw offset return offset else: return offset #raise PEFormatError("specified offset (0x%x) doesn't belong to any section." % offset) return s.get_rva_from_offset(offset) def get_offset_from_rva(self, rva): """Get the file offset corresponding to this RVA. Given a RVA , this method will find the section where the data lies and return the offset within the file. """ s = self.get_section_by_rva(rva) if not s: # If not found within a section assume it might # point to overlay data or otherwise data present # but not contained in any section. In those # cases the RVA should equal the offset if rva len(data): return None return struct.unpack(' len(self.__data__): return None return self.get_dword_from_data(self.__data__[offset:offset+4], 0) def set_dword_at_rva(self, rva, dword): """Set the double word value at the file offset corresponding to the given RVA.""" return self.set_bytes_at_rva(rva, self.get_data_from_dword(dword)) def set_dword_at_offset(self, offset, dword): """Set the double word value at the given file offset.""" return self.set_bytes_at_offset(offset, self.get_data_from_dword(dword)) ## # Word get / set ## def get_data_from_word(self, word): """Return a two byte string representing the word value. (little endian).""" return struct.pack(' len(data): return None return struct.unpack(' len(self.__data__): return None return self.get_word_from_data(self.__data__[offset:offset+2], 0) def set_word_at_rva(self, rva, word): """Set the word value at the file offset corresponding to the given RVA.""" return self.set_bytes_at_rva(rva, self.get_data_from_word(word)) def set_word_at_offset(self, offset, word): """Set the word value at the given file offset.""" return self.set_bytes_at_offset(offset, self.get_data_from_word(word)) ## # Quad-Word get / set ## def get_data_from_qword(self, word): """Return a eight byte string representing the quad-word value. (little endian).""" return struct.pack(' len(data): return None return struct.unpack(' len(self.__data__): return None return self.get_qword_from_data(self.__data__[offset:offset+8], 0) def set_qword_at_rva(self, rva, qword): """Set the quad-word value at the file offset corresponding to the given RVA.""" return self.set_bytes_at_rva(rva, self.get_data_from_qword(qword)) def set_qword_at_offset(self, offset, qword): """Set the quad-word value at the given file offset.""" return self.set_bytes_at_offset(offset, self.get_data_from_qword(qword)) ## # Set bytes ## def set_bytes_at_rva(self, rva, data): """Overwrite, with the given string, the bytes at the file offset corresponding to the given RVA. Return True if successful, False otherwise. It can fail if the offset is outside the file's boundaries. """ if not isinstance(data, str): raise TypeError('data should be of type: str') offset = self.get_physical_by_rva(rva) if not offset: return False return self.set_bytes_at_offset(offset, data) def set_bytes_at_offset(self, offset, data): """Overwrite the bytes at the given file offset with the given string. Return True if successful, False otherwise. It can fail if the offset is outside the file's boundaries. """ if not isinstance(data, str): raise TypeError('data should be of type: str') if offset >= 0 and offset < len(self.__data__): self.__data__ = ( self.__data__[:offset] + data + self.__data__[offset+len(data):] ) else: return False return True def merge_modified_section_data(self): """Update the PE image content with any individual section data that has been modified.""" for section in self.sections: section_data_start = self.adjust_FileAlignment( section.PointerToRawData, self.OPTIONAL_HEADER.FileAlignment ) section_data_end = section_data_start+section.SizeOfRawData if section_data_start < len(self.__data__) and section_data_end < len(self.__data__): self.__data__ = self.__data__[:section_data_start] + section.get_data() + self.__data__[section_data_end:] def relocate_image(self, new_ImageBase): """Apply the relocation information to the image using the provided new image base. This method will apply the relocation information to the image. Given the new base, all the relocations will be processed and both the raw data and the section's data will be fixed accordingly. The resulting image can be retrieved as well through the method: get_memory_mapped_image() In order to get something that would more closely match what could be found in memory once the Windows loader finished its work. """ relocation_difference = new_ImageBase - self.OPTIONAL_HEADER.ImageBase for reloc in self.DIRECTORY_ENTRY_BASERELOC: virtual_address = reloc.struct.VirtualAddress size_of_block = reloc.struct.SizeOfBlock # We iterate with an index because if the relocation is of type # IMAGE_REL_BASED_HIGHADJ we need to also process the next entry # at once and skip it for the next iteration # entry_idx = 0 while entry_idx>16)&0xffff ) elif entry.type == RELOCATION_TYPE['IMAGE_REL_BASED_LOW']: # Fix the low 16-bits of a relocation # # Add low 16 bits of relocation_difference to the 16-bit value # at RVA=entry.rva self.set_word_at_rva( entry.rva, ( self.get_word_at_rva(entry.rva) + relocation_difference)&0xffff) elif entry.type == RELOCATION_TYPE['IMAGE_REL_BASED_HIGHLOW']: # Handle all high and low parts of a 32-bit relocation # # Add relocation_difference to the value at RVA=entry.rva self.set_dword_at_rva( entry.rva, self.get_dword_at_rva(entry.rva)+relocation_difference) elif entry.type == RELOCATION_TYPE['IMAGE_REL_BASED_HIGHADJ']: # Fix the high 16-bits of a relocation and adjust # # Add high 16-bits of relocation_difference to the 32-bit value # composed from the (16-bit value at RVA=entry.rva)<<16 plus # the 16-bit value at the next relocation entry. # # If the next entry is beyond the array's limits, # abort... the table is corrupt # if entry_idx == len(reloc.entries): break next_entry = reloc.entries[entry_idx] entry_idx += 1 self.set_word_at_rva( entry.rva, ((self.get_word_at_rva(entry.rva)<<16) + next_entry.rva + relocation_difference & 0xffff0000) >> 16 ) elif entry.type == RELOCATION_TYPE['IMAGE_REL_BASED_DIR64']: # Apply the difference to the 64-bit value at the offset # RVA=entry.rva self.set_qword_at_rva( entry.rva, self.get_qword_at_rva(entry.rva) + relocation_difference) def verify_checksum(self): return self.OPTIONAL_HEADER.CheckSum == self.generate_checksum() def generate_checksum(self): # This will make sure that the data representing the PE image # is updated with any changes that might have been made by # assigning values to header fields as those are not automatically # updated upon assignment. # self.__data__ = self.write() # Get the offset to the CheckSum field in the OptionalHeader # checksum_offset = self.OPTIONAL_HEADER.__file_offset__ + 0x40 # 64 checksum = 0 # Verify the data is dword-aligned. Add padding if needed # remainder = len(self.__data__) % 4 data_len = len(self.__data__) + ((4-remainder) * ( remainder != 0 )) for i in xrange( data_len / 4 ): # Skip the checksum field if i == checksum_offset / 4: continue if i+1 == (data_len / 4) and remainder: dword = struct.unpack('I', self.__data__[i*4:]+ ('\0' * (4-remainder)) )[0] else: dword = struct.unpack('I', self.__data__[ i*4 : i*4+4 ])[0] # Optimized the calculation (thanks to Emmanuel Bourg for pointing it out!) checksum += dword if checksum > 2**32: checksum = (checksum & 0xffffffff) + (checksum >> 32) checksum = (checksum & 0xffff) + (checksum >> 16) checksum = (checksum) + (checksum >> 16) checksum = checksum & 0xffff # The length is the one of the original data, not the padded one # return checksum + len(self.__data__) def is_exe(self): """Check whether the file is a standard executable. This will return true only if the file has the IMAGE_FILE_EXECUTABLE_IMAGE flag set and the IMAGE_FILE_DLL not set and the file does not appear to be a driver either. """ EXE_flag = IMAGE_CHARACTERISTICS['IMAGE_FILE_EXECUTABLE_IMAGE'] if (not self.is_dll()) and (not self.is_driver()) and ( EXE_flag & self.FILE_HEADER.Characteristics) == EXE_flag: return True return False def is_dll(self): """Check whether the file is a standard DLL. This will return true only if the image has the IMAGE_FILE_DLL flag set. """ DLL_flag = IMAGE_CHARACTERISTICS['IMAGE_FILE_DLL'] if ( DLL_flag & self.FILE_HEADER.Characteristics) == DLL_flag: return True return False def is_driver(self): """Check whether the file is a Windows driver. This will return true only if there are reliable indicators of the image being a driver. """ # Checking that the ImageBase field of the OptionalHeader is above or # equal to 0x80000000 (that is, whether it lies in the upper 2GB of # the address space, normally belonging to the kernel) is not a # reliable enough indicator. For instance, PEs that play the invalid # ImageBase trick to get relocated could be incorrectly assumed to be # drivers. # This is not reliable either... # # if any( (section.Characteristics & SECTION_CHARACTERISTICS['IMAGE_SCN_MEM_NOT_PAGED']) for section in self.sections ): # return True if hasattr(self, 'DIRECTORY_ENTRY_IMPORT'): # If it imports from "ntoskrnl.exe" or other kernel components it should be a driver # if set( ('ntoskrnl.exe', 'hal.dll', 'ndis.sys', 'bootvid.dll', 'kdcom.dll' ) ).intersection( [ imp.dll.lower() for imp in self.DIRECTORY_ENTRY_IMPORT ] ): return True return False def get_overlay_data_start_offset(self): """Get the offset of data appended to the file and not contained within the area described in the headers.""" highest_PointerToRawData = 0 highest_SizeOfRawData = 0 for section in self.sections: # If a section seems to fall outside the boundaries of the file we assume it's either # because of intentionally misleading values or because the file is truncated # In either case we skip it if section.PointerToRawData + section.SizeOfRawData > len(self.__data__): continue if section.PointerToRawData + section.SizeOfRawData > highest_PointerToRawData + highest_SizeOfRawData: highest_PointerToRawData = section.PointerToRawData highest_SizeOfRawData = section.SizeOfRawData if len(self.__data__) > highest_PointerToRawData + highest_SizeOfRawData: return highest_PointerToRawData + highest_SizeOfRawData return None def get_overlay(self): """Get the data appended to the file and not contained within the area described in the headers.""" overlay_data_offset = self.get_overlay_data_start_offset() if overlay_data_offset is not None: return self.__data__[ overlay_data_offset : ] return None def trim(self): """Return the just data defined by the PE headers, removing any overlayed data.""" overlay_data_offset = self.get_overlay_data_start_offset() if overlay_data_offset is not None: return self.__data__[ : overlay_data_offset ] return self.__data__[:] # According to http://corkami.blogspot.com/2010/01/parce-que-la-planche-aura-brule.html # if PointerToRawData is less that 0x200 it's rounded to zero. Loading the test file # in a debugger it's easy to verify that the PointerToRawData value of 1 is rounded # to zero. Hence we reproduce the behavior # # According to the document: # [ Microsoft Portable Executable and Common Object File Format Specification ] # "The alignment factor (in bytes) that is used to align the raw data of sections in # the image file. The value should be a power of 2 between 512 and 64 K, inclusive. # The default is 512. If the SectionAlignment is less than the architecture’s page # size, then FileAlignment must match SectionAlignment." # # The following is a hard-coded constant if the Windows loader def adjust_FileAlignment( self, val, file_alignment ): global FileAlignment_Warning if file_alignment > FILE_ALIGNEMNT_HARDCODED_VALUE: # If it's not a power of two, report it: if not power_of_two(file_alignment) and FileAlignment_Warning is False: self.__warnings.append( 'If FileAlignment > 0x200 it should be a power of 2. Value: %x' % ( file_alignment) ) FileAlignment_Warning = True if file_alignment < FILE_ALIGNEMNT_HARDCODED_VALUE: return val return (val / 0x200) * 0x200 # According to the document: # [ Microsoft Portable Executable and Common Object File Format Specification ] # "The alignment (in bytes) of sections when they are loaded into memory. It must be # greater than or equal to FileAlignment. The default is the page size for the # architecture." # def adjust_SectionAlignment( self, val, section_alignment, file_alignment ): global SectionAlignment_Warning if file_alignment < FILE_ALIGNEMNT_HARDCODED_VALUE: if file_alignment != section_alignment and SectionAlignment_Warning is False: self.__warnings.append( 'If FileAlignment(%x) < 0x200 it should equal SectionAlignment(%x)' % ( file_alignment, section_alignment) ) SectionAlignment_Warning = True if section_alignment < 0x1000: # page size section_alignment = file_alignment # 0x200 is the minimum valid FileAlignment according to the documentation # although ntoskrnl.exe has an alignment of 0x80 in some Windows versions # #elif section_alignment < 0x80: # section_alignment = 0x80 if section_alignment and val % section_alignment: return section_alignment * ( val / section_alignment ) return val pefile-1.2.10-139/peutils.py0000644000076500000240000004305512252127730015330 0ustar erostaff00000000000000# -*- coding: Latin-1 -*- """peutils, Portable Executable utilities module Copyright (c) 2005-2013 Ero Carrera All rights reserved. For detailed copyright information see the file COPYING in the root of the distribution archive. """ import os import re import string import urllib import pefile __author__ = 'Ero Carrera' __version__ = pefile.__version__ __contact__ = 'ero.carrera@gmail.com' class SignatureDatabase: """This class loads and keeps a parsed PEiD signature database. Usage: sig_db = SignatureDatabase('/path/to/signature/file') and/or sig_db = SignatureDatabase() sig_db.load('/path/to/signature/file') Signature databases can be combined by performing multiple loads. The filename parameter can be a URL too. In that case the signature database will be downloaded from that location. """ def __init__(self, filename=None, data=None): # RegExp to match a signature block # self.parse_sig = re.compile( '\[(.*?)\]\s+?signature\s*=\s*(.*?)(\s+\?\?)*\s*ep_only\s*=\s*(\w+)(?:\s*section_start_only\s*=\s*(\w+)|)', re.S) # Signature information # # Signatures are stored as trees using dictionaries # The keys are the byte values while the values for # each key are either: # # - Other dictionaries of the same form for further # bytes in the signature # # - A dictionary with a string as a key (packer name) # and None as value to indicate a full signature # self.signature_tree_eponly_true = dict () self.signature_count_eponly_true = 0 self.signature_tree_eponly_false = dict () self.signature_count_eponly_false = 0 self.signature_tree_section_start = dict () self.signature_count_section_start = 0 # The depth (length) of the longest signature # self.max_depth = 0 self.__load(filename=filename, data=data) def generate_section_signatures(self, pe, name, sig_length=512): """Generates signatures for all the sections in a PE file. If the section contains any data a signature will be created for it. The signature name will be a combination of the parameter 'name' and the section number and its name. """ section_signatures = list() for idx, section in enumerate(pe.sections): if section.SizeOfRawData < sig_length: continue #offset = pe.get_offset_from_rva(section.VirtualAddress) offset = section.PointerToRawData sig_name = '%s Section(%d/%d,%s)' % ( name, idx + 1, len(pe.sections), ''.join([c for c in section.Name if c in string.printable])) section_signatures.append( self.__generate_signature( pe, offset, sig_name, ep_only=False, section_start_only=True, sig_length=sig_length) ) return '\n'.join(section_signatures)+'\n' def generate_ep_signature(self, pe, name, sig_length=512): """Generate signatures for the entry point of a PE file. Creates a signature whose name will be the parameter 'name' and the section number and its name. """ offset = pe.get_offset_from_rva(pe.OPTIONAL_HEADER.AddressOfEntryPoint) return self.__generate_signature( pe, offset, name, ep_only=True, sig_length=sig_length) def __generate_signature(self, pe, offset, name, ep_only=False, section_start_only=False, sig_length=512): data = pe.__data__[offset:offset+sig_length] signature_bytes = ' '.join(['%02x' % ord(c) for c in data]) if ep_only == True: ep_only = 'true' else: ep_only = 'false' if section_start_only == True: section_start_only = 'true' else: section_start_only = 'false' signature = '[%s]\nsignature = %s\nep_only = %s\nsection_start_only = %s\n' % ( name, signature_bytes, ep_only, section_start_only) return signature def match(self, pe, ep_only=True, section_start_only=False): """Matches and returns the exact match(es). If ep_only is True the result will be a string with the packer name. Otherwise it will be a list of the form (file_ofsset, packer_name). Specifying where in the file the signature was found. """ matches = self.__match(pe, ep_only, section_start_only) # The last match (the most precise) from the # list of matches (if any) is returned # if matches: if ep_only == False: # Get the most exact match for each list of matches # at a given offset # return [(match[0], match[1][-1]) for match in matches] return matches[1][-1] return None def match_all(self, pe, ep_only=True, section_start_only=False): """Matches and returns all the likely matches.""" matches = self.__match(pe, ep_only, section_start_only) if matches: if ep_only == False: # Get the most exact match for each list of matches # at a given offset # return matches return matches[1] return None def __match(self, pe, ep_only, section_start_only): # Load the corresponding set of signatures # Either the one for ep_only equal to True or # to False # if section_start_only is True: # Fetch the data of the executable as it'd # look once loaded in memory # try : data = pe.__data__ except Exception, excp : raise # Load the corresponding tree of signatures # signatures = self.signature_tree_section_start # Set the starting address to start scanning from # scan_addresses = [section.PointerToRawData for section in pe.sections] elif ep_only is True: # Fetch the data of the executable as it'd # look once loaded in memory # try : data = pe.get_memory_mapped_image() except Exception, excp : raise # Load the corresponding tree of signatures # signatures = self.signature_tree_eponly_true # Fetch the entry point of the PE file and the data # at the entry point # ep = pe.OPTIONAL_HEADER.AddressOfEntryPoint # Set the starting address to start scanning from # scan_addresses = [ep] else: data = pe.__data__ signatures = self.signature_tree_eponly_false scan_addresses = xrange( len(data) ) # For each start address, check if any signature matches # matches = [] for idx in scan_addresses: result = self.__match_signature_tree( signatures, data[idx:idx+self.max_depth]) if result: matches.append( (idx, result) ) # Return only the matched items found at the entry point if # ep_only is True (matches will have only one element in that # case) # if ep_only is True: if matches: return matches[0] return matches def match_data(self, code_data, ep_only=True, section_start_only=False): data = code_data scan_addresses = [ 0 ] # Load the corresponding set of signatures # Either the one for ep_only equal to True or # to False # if section_start_only is True: # Load the corresponding tree of signatures # signatures = self.signature_tree_section_start # Set the starting address to start scanning from # elif ep_only is True: # Load the corresponding tree of signatures # signatures = self.signature_tree_eponly_true # For each start address, check if any signature matches # matches = [] for idx in scan_addresses: result = self.__match_signature_tree( signatures, data[idx:idx+self.max_depth]) if result: matches.append( (idx, result) ) # Return only the matched items found at the entry point if # ep_only is True (matches will have only one element in that # case) # if ep_only is True: if matches: return matches[0] return matches def __match_signature_tree(self, signature_tree, data, depth = 0): """Recursive function to find matches along the signature tree. signature_tree is the part of the tree left to walk data is the data being checked against the signature tree depth keeps track of how far we have gone down the tree """ matched_names = list () match = signature_tree # Walk the bytes in the data and match them # against the signature # for idx, byte in enumerate ( [ord (b) for b in data] ): # If the tree is exhausted... # if match is None : break # Get the next byte in the tree # match_next = match.get(byte, None) # If None is among the values for the key # it means that a signature in the database # ends here and that there's an exact match. # if None in match.values(): # idx represent how deep we are in the tree # #names = [idx+depth] names = list() # For each of the item pairs we check # if it has an element other than None, # if not then we have an exact signature # for item in match.items(): if item[1] is None : names.append (item[0]) matched_names.append(names) # If a wildcard is found keep scanning the signature # ignoring the byte. # if match.has_key ('??') : match_tree_alternate = match.get ('??', None) data_remaining = data[idx + 1 :] if data_remaining: matched_names.extend( self.__match_signature_tree( match_tree_alternate, data_remaining, idx+depth+1)) match = match_next # If we have any more packer name in the end of the signature tree # add them to the matches # if match is not None and None in match.values(): #names = [idx + depth + 1] names = list() for item in match.items() : if item[1] is None: names.append(item[0]) matched_names.append(names) return matched_names def load(self , filename=None, data=None): """Load a PEiD signature file. Invoking this method on different files combines the signatures. """ self.__load(filename=filename, data=data) def __load(self, filename=None, data=None): if filename is not None: # If the path does not exist, attempt to open a URL # if not os.path.exists(filename): try: sig_f = urllib.urlopen(filename) sig_data = sig_f.read() sig_f.close() except IOError: # Let this be raised back to the user... raise else: # Get the data for a file # try: sig_f = file( filename, 'rt' ) sig_data = sig_f.read() sig_f.close() except IOError: # Let this be raised back to the user... raise else: sig_data = data # If the file/URL could not be read or no "raw" data # was provided there's nothing else to do # if not sig_data: return # Helper function to parse the signature bytes # def to_byte(value) : if value == '??' or value == '?0' : return value return int (value, 16) # Parse all the signatures in the file # matches = self.parse_sig.findall(sig_data) # For each signature, get the details and load it into the # signature tree # for packer_name, signature, superfluous_wildcards, ep_only, section_start_only in matches: ep_only = ep_only.strip().lower() signature = signature.replace('\\n', '').strip() signature_bytes = [to_byte(b) for b in signature.split()] if ep_only == 'true': ep_only = True else: ep_only = False if section_start_only == 'true': section_start_only = True else: section_start_only = False depth = 0 if section_start_only is True: tree = self.signature_tree_section_start self.signature_count_section_start += 1 else: if ep_only is True : tree = self.signature_tree_eponly_true self.signature_count_eponly_true += 1 else : tree = self.signature_tree_eponly_false self.signature_count_eponly_false += 1 for idx, byte in enumerate (signature_bytes) : if idx+1 == len(signature_bytes): tree[byte] = tree.get( byte, dict() ) tree[byte][packer_name] = None else : tree[byte] = tree.get ( byte, dict() ) tree = tree[byte] depth += 1 if depth > self.max_depth: self.max_depth = depth def is_valid( pe ): """""" pass def is_suspicious( pe ): """ unusual locations of import tables non recognized section names presence of long ASCII strings """ relocations_overlap_entry_point = False sequential_relocs = 0 # If relocation data is found and the entries go over the entry point, and also are very # continuous or point outside section's boundaries => it might imply that an obfuscation # trick is being used or the relocations are corrupt (maybe intentionally) # if hasattr(pe, 'DIRECTORY_ENTRY_BASERELOC'): for base_reloc in pe.DIRECTORY_ENTRY_BASERELOC: last_reloc_rva = None for reloc in base_reloc.entries: if reloc.rva <= pe.OPTIONAL_HEADER.AddressOfEntryPoint <= reloc.rva + 4: relocations_overlap_entry_point = True if last_reloc_rva is not None and last_reloc_rva <= reloc.rva <= last_reloc_rva + 4: sequential_relocs += 1 last_reloc_rva = reloc.rva # If import tables or strings exist (are pointed to) to within the header or in the area # between the PE header and the first section that's supicious # # IMPLEMENT warnings_while_parsing = False # If we have warnings, that's suspicious, some of those will be because of out-of-ordinary # values are found in the PE header fields # Things that are reported in warnings: # (parsing problems, special section characteristics i.e. W & X, uncommon values of fields, # unusual entrypoint, suspicious imports) # warnings = pe.get_warnings() if warnings: warnings_while_parsing # If there are few or none (should come with a standard "density" of strings/kilobytes of data) longer (>8) # ascii sequences that might indicate packed data, (this is similar to the entropy test in some ways but # might help to discard cases of legitimate installer or compressed data) # If compressed data (high entropy) and is_driver => uuuuhhh, nasty pass def is_probably_packed( pe ): """Returns True is there is a high likelihood that a file is packed or contains compressed data. The sections of the PE file will be analyzed, if enough sections look like containing containing compressed data and the data makes up for more than 20% of the total file size. The function will return True. """ # Calculate the lenth of the data up to the end of the last section in the # file. Overlay data won't be taken into account # total_pe_data_length = len( pe.trim() ) has_significant_amount_of_compressed_data = False # If some of the sections have high entropy and they make for more than 20% of the file's size # it's assumed that it could be an installer or a packed file total_compressed_data = 0 for section in pe.sections: s_entropy = section.get_entropy() s_length = len( section.get_data() ) # The value of 7.4 is empircal, based of looking at a few files packed # by different packers if s_entropy > 7.4: total_compressed_data += s_length if ((1.0 * total_compressed_data)/total_pe_data_length) > .2: has_significant_amount_of_compressed_data = True return has_significant_amount_of_compressed_data pefile-1.2.10-139/PKG-INFO0000644000076500000240000000277612252150355014372 0ustar erostaff00000000000000Metadata-Version: 1.1 Name: pefile Version: 1.2.10-139 Summary: Python PE parsing module Home-page: http://code.google.com/p/pefile/ Author: Ero Carrera Author-email: ero.carrera@gmail.com License: UNKNOWN Download-URL: http://pefile.googlecode.com/files/pefile-1.2.10-139.tar.gz Description: pefile, Portable Executable reader module All the PE file basic structures are available with their default names as attributes of the instance returned. Processed elements such as the import table are made available with lowercase names, to differentiate them from the upper case basic structure names. pefile has been tested against the limits of valid PE headers, that is, malware. Lots of packed malware attempt to abuse the format way beyond its standard use. To the best of my knowledge most of the abuses are handled gracefully. Copyright (c) 2005-2013 Ero Carrera All rights reserved. For detailed copyright information see the file COPYING in the root of the distribution archive. Platform: any Classifier: Development Status :: 5 - Production/Stable Classifier: Intended Audience :: Developers Classifier: Intended Audience :: Science/Research Classifier: Natural Language :: English Classifier: Operating System :: OS Independent Classifier: Programming Language :: Python Classifier: Topic :: Software Development :: Libraries :: Python Modules pefile-1.2.10-139/README0000644000076500000240000000537712252127645014163 0ustar erostaff00000000000000 -------------------------------------------------------------------- pefile - Portable Executable reader module -------------------------------------------------------------------- INTRODUCTION pefile will allow to access from any Python script all (or most) of the contents of a given PE file. The structures defined in the Windows header files will be accessible as the PE instance attributes and will have the same names as defined there. (The main structures will have the standard capitalized names and will be attributes of the PE instance. Their members will be attributes.) Other attributes and data, which require further processing but are very useful will be available as lowercase attributes. Some of those are, the imported and exported symbols and the sections, with direct access to their data (if any) and convenient methods to retrieve data based on the address as if the file were loaded, instead of needing to dig the offsets into the file. WRITTING SUPPORT Starting from pefile 1.2 it's possible to write back any changes done to the PE file. One has to be careful with this functionality as it will not be very intelligent reconstructing the PE file. That is, it will not handle displacing structures if that would be needed because a new section has been added. The rule of thumb is, if there's room for an additional header/structure to fit then there'll be no problem and pefile will write it. All other modifications, i.e. changing individual values in header/structure members should work well. One possible useful application of this could be to correct malformed headers used by some malware in order to cause certain analysis tools to malfunction. AVAILABILITY Last versions are available at: http://dkbza.org/pefile.html INSTALLATION/USAGE Just importing it should suffice. The module should be endianness independent and it's known to work on OS X, Windows, and Linux. TODO There might be some obscure info which is not readily accessible, this may be due to my ignorance or laziness. Patches or suggestions are, as usual, welcomed. Thinks known to be missing so far: -Reading and processing the exceptions directory entry. (Architecture dependent info) BUGS Given the amount of information embedded in the PE file format it is difficult to test all the data retrieved thoroughly. I did my best trying and verifying the accuracy of all the parsing. Most of the basic data has been tested by using this module, so no outrageously obvious problems should exist. Any feedback on inconsistent or faulty behavior will be welcome. ------------------------------------------------------------------------- Copyright (c) 2005-2013 Ero Carrera . All rights reserved. ------------------------------------------------------------------------- pefile-1.2.10-139/setup.cfg0000644000076500000240000000007312252150355015102 0ustar erostaff00000000000000[egg_info] tag_build = tag_date = 0 tag_svn_revision = 0 pefile-1.2.10-139/setup.py0000644000076500000240000000204612247501370014776 0ustar erostaff00000000000000#!/usr/bin/env python try: from setuptools import setup except ImportError, excp: from distutils.core import setup import pefile import os os.environ['COPY_EXTENDED_ATTRIBUTES_DISABLE'] = 'true' os.environ['COPYFILE_DISABLE'] = 'true' setup(name = 'pefile', version = pefile.__version__, description = 'Python PE parsing module', author = pefile.__author__, author_email = pefile.__contact__, url = 'http://code.google.com/p/pefile/', download_url = 'http://pefile.googlecode.com/files/pefile-%s.tar.gz' % pefile.__version__, platforms = ['any'], classifiers = ['Development Status :: 5 - Production/Stable', 'Intended Audience :: Developers', 'Intended Audience :: Science/Research', 'Natural Language :: English', 'Operating System :: OS Independent', 'Programming Language :: Python', 'Topic :: Software Development :: Libraries :: Python Modules'], long_description = "\n".join(pefile.__doc__.split('\n')), py_modules = ['pefile', 'peutils'], packages = ['ordlookup'] )