hachoir-core-1.3.3/0000755000175000017500000000000011342040416013076 5ustar haypohaypohachoir-core-1.3.3/MANIFEST.in0000644000175000017500000000030511330150140014623 0ustar haypohaypoinclude AUTHORS include ChangeLog include COPYING include doc/adding_fields.graphviz include doc/*.css include doc/*.dia include doc/*.rst include doc/*.xmi include MANIFEST.in include test_doc.py hachoir-core-1.3.3/hachoir_core.egg-info/0000755000175000017500000000000011342040416017215 5ustar haypohaypohachoir-core-1.3.3/hachoir_core.egg-info/SOURCES.txt0000644000175000017500000000336011342040414021101 0ustar haypohaypoAUTHORS COPYING ChangeLog MANIFEST.in README setup.py test_doc.py doc/adding_fields.graphviz doc/default.css doc/hachoir-api.rst doc/internals.rst doc/rest.css doc/uml.xmi doc/uml_field.dia doc/uml_field_set.dia hachoir_core/__init__.py hachoir_core/benchmark.py hachoir_core/bits.py hachoir_core/cmd_line.py hachoir_core/compatibility.py hachoir_core/config.py hachoir_core/dict.py hachoir_core/endian.py hachoir_core/error.py hachoir_core/event_handler.py hachoir_core/i18n.py hachoir_core/iso639.py hachoir_core/language.py hachoir_core/log.py hachoir_core/memory.py hachoir_core/profiler.py hachoir_core/text_handler.py hachoir_core/timeout.py hachoir_core/tools.py hachoir_core/version.py hachoir_core.egg-info/PKG-INFO hachoir_core.egg-info/SOURCES.txt hachoir_core.egg-info/dependency_links.txt hachoir_core.egg-info/top_level.txt hachoir_core.egg-info/zip-safe hachoir_core/field/__init__.py hachoir_core/field/basic_field_set.py hachoir_core/field/bit_field.py hachoir_core/field/byte_field.py hachoir_core/field/character.py hachoir_core/field/enum.py hachoir_core/field/fake_array.py hachoir_core/field/field.py hachoir_core/field/field_set.py hachoir_core/field/float.py hachoir_core/field/generic_field_set.py hachoir_core/field/helper.py hachoir_core/field/integer.py hachoir_core/field/link.py hachoir_core/field/new_seekable_field_set.py hachoir_core/field/padding.py hachoir_core/field/parser.py hachoir_core/field/seekable_field_set.py hachoir_core/field/static_field_set.py hachoir_core/field/string_field.py hachoir_core/field/sub_file.py hachoir_core/field/timestamp.py hachoir_core/field/vector.py hachoir_core/stream/__init__.py hachoir_core/stream/input.py hachoir_core/stream/input_helper.py hachoir_core/stream/output.py hachoir_core/stream/stream.pyhachoir-core-1.3.3/hachoir_core.egg-info/zip-safe0000644000175000017500000000000111342040414020643 0ustar haypohaypo hachoir-core-1.3.3/hachoir_core.egg-info/PKG-INFO0000644000175000017500000001457711342040414020326 0ustar haypohaypoMetadata-Version: 1.0 Name: hachoir-core Version: 1.3.3 Summary: Core of Hachoir framework: parse and edit binary files Home-page: http://bitbucket.org/haypo/hachoir/wiki/hachoir-core Author: Julien Muchembled, Victor Stinner Author-email: UNKNOWN License: GNU GPL v2 Download-URL: http://bitbucket.org/haypo/hachoir/wiki/hachoir-core Description: Hachoir project =============== Hachoir is a Python library used to represent of a binary file as a tree of Python objects. Each object has a type, a value, an address, etc. The goal is to be able to know the meaning of each bit in a file. Why using slow Python code instead of fast hardcoded C code? Hachoir has many interesting features: * Autofix: Hachoir is able to open invalid / truncated files * Lazy: Open a file is very fast since no information is read from file, data are read and/or computed when the user ask for it * Types: Hachoir has many predefined field types (integer, bit, string, etc.) and supports string with charset (ISO-8859-1, UTF-8, UTF-16, ...) * Addresses and sizes are stored in bit, so flags are stored as classic fields * Endian: You have to set endian once, and then number are converted in the right endian * Editor: Using Hachoir representation of data, you can edit, insert, remove data and then save in a new file. Website: http://bitbucket.org/haypo/hachoir/wiki/hachoir-core Installation ============ For the installation, use setup.py or see: http://bitbucket.org/haypo/hachoir/wiki/Install hachoir-core 1.3.3 (2010-02-26) =============================== * Add writelines() method to UnicodeStdout hachoir-core 1.3.2 (2010-01-28) =============================== * MANIFEST.in includes also the documentation hachoir-core 1.3.1 (2010-01-21) =============================== * Create MANIFEST.in to include ChangeLog and other files for setup.py hachoir-core 1.3 (2010-01-20) ============================= * Add more charsets to GenericString: CP874, WINDOWS-1250, WINDOWS-1251, WINDOWS-1254, WINDOWS-1255, WINDOWS-1256,WINDOWS-1257, WINDOWS-1258, ISO-8859-16 * Fix initLocale(): return charset even if config.unicode_stdout is False * initLocale() leave sys.stdout and sys.stderr unchanged if the readline module is loaded: Hachoir can now be used correctly with ipython * HachoirError: replace "message" attribute by "text" to fix Python 2.6 compatibility (message attribute is deprecated) * StaticFieldSet: fix Python 2.6 warning, object.__new__() takes one only argument (the class). * Fix GenericFieldSet.readMoreFields() result: don't count the number of added fields in a loop, use the number of fields before/after the operation using len() * GenericFieldSet.__iter__() supports iterable result for _fixFeedError() and _stopFeeding() * New seekable field set implementation in hachoir_core.field.new_seekable_field_set hachoir-core 1.2.1 (2008-10) ============================ * Create configuration option "unicode_stdout" which avoid replacing stdout and stderr by objects supporting unicode string * Create TimedeltaWin64 file type * Support WINDOWS-1252 and WINDOWS-1253 charsets for GenericString * guessBytesCharset() now supports ISO-8859-7 (greek) * durationWin64() is now deprecated, use TimedeltaWin64 instead hachoir-core 1.2 (2008-09) ========================== * Create Field.getFieldType(): describe a field type and gives some useful informations (eg. the charset for a string) * Create TimestampUnix64 * GenericString: only guess the charset once; if the charset attribute if not set, guess it when it's asked by the user. hachoir-core 1.1 (2008-04-01) ============================= Main change: string values are always encoded as Unicode. Details: * Create guessBytesCharset() and guessStreamCharset() * GenericString.createValue() is now always Unicode: if charset is not specified, try to guess it. Otherwise, use default charset (ISO-8859-1) * RawBits: add createRawDisplay() to avoid slow down on huge fields * Fix SeekableFieldSet.current_size (use offset and not current_max_size) * GenericString: fix UTF-16-LE string with missing nul byte * Add __nonzero__() method to GenericTimestamp * All stream errors now inherit from StreamError (instead of HachoirError), and create and OutputStreamError * humanDatetime(): strip microseconds by default (add optional argument to keep them) hachoir-core 1.0 (2007-07-10) ============================= Version 1.0.1 changelog: * Rename parser.tags to parser.PARSER_TAGS to be compatible with future hachoir-parser 1.0 Visible changes: * New field type: TimestampUUID60 * SeekableFieldSet: fix __getitem__() method and implement __iter__() and __len__() methods, so it can now be used in hachoir-wx * String value is always Unicode, even on conversion error: use * OutputStream: add readBytes() method * Create Language class using ISO-639-2 * Add hachoir_core.profiler module to run a profiler on a function * Add hachoir_core.timeout module to call a function with a timeout Minor changes: * Fix many spelling mistakes * Dict: use iteritems() instead of items() for faster operations on huge dictionaries Platform: UNKNOWN Classifier: Intended Audience :: Developers Classifier: Development Status :: 5 - Production/Stable Classifier: Environment :: Plugins Classifier: Topic :: Software Development :: Libraries :: Python Modules Classifier: Topic :: Utilities Classifier: License :: OSI Approved :: GNU General Public License (GPL) Classifier: Operating System :: OS Independent Classifier: Natural Language :: English Classifier: Programming Language :: Python hachoir-core-1.3.3/hachoir_core.egg-info/dependency_links.txt0000644000175000017500000000000111342040414023261 0ustar haypohaypo hachoir-core-1.3.3/hachoir_core.egg-info/top_level.txt0000644000175000017500000000001511342040414021741 0ustar haypohaypohachoir_core hachoir-core-1.3.3/doc/0000755000175000017500000000000011342040416013643 5ustar haypohaypohachoir-core-1.3.3/doc/hachoir-api.rst0000644000175000017500000002527511251277274016611 0ustar haypohaypoHachoir is the French name for a mincer: a tool used by butchers to cut meat. Hachoir is also a tool written for hackers to cut a file or any binary stream. A file is split in a tree of fields where the smallest field can be a bit. There are various field types: integer, string, bits, padding, sub file, etc. This document is a presentation of Hachoir API. It tries to show most the interesting part of this tool, but is not exhaustive. Ok, let's start! hachoir.stream: Stream manipulation =================================== To split data we first need is to get data :-) So this section presents the "hachoir.stream" API. In most cases we work on files using the FileInputStream function. This function takes one argument: a Unicode filename. But for practical reasons we will use StringInputStream function in this documentation. >>> data = "point\0\3\0\2\0" >>> from hachoir_core.stream import StringInputStream, LITTLE_ENDIAN >>> stream = StringInputStream(data) >>> stream.source '' >>> len(data), stream.size (10, 80) >>> data[1:6], stream.readBytes(8, 5) ('oint\x00', 'oint\x00') >>> data[6:8], stream.readBits(6*8, 16, LITTLE_ENDIAN) ('\x03\x00', 3) >>> data[8:10], stream.readBits(8*8, 16, LITTLE_ENDIAN) ('\x02\x00', 2) First big difference between a string and a Hachoir stream is that sizes and addresses are written in bits and not bytes. The difference is a factor of eight, that's why we write "6*8" to get the sixth byte for example. You don't need to know anything else to use Hachoir, so let's play with fields! hachoir.field: Field manipulation ================================= Basic parser ------------ We will parse the data used in the last section. >>> from hachoir_core.field import Parser, CString, UInt16 >>> class Point(Parser): ... endian = LITTLE_ENDIAN ... def createFields(self): ... yield CString(self, "name", "Point name") ... yield UInt16(self, "x", "X coordinate") ... yield UInt16(self, "y", "Y coordinate") ... >>> point = Point(stream) >>> for field in point: ... print "%s) %s=%s" % (field.address, field.name, field.display) ... 0) name="point" 48) x=3 64) y=2 `point` is a the root of our field tree. This tree is really simple, it just has one level and three fields: name, x and y. Hachoir stores a lot of information in each field. In this example we just show the address, name and display attributes. But a field has more attributes: >>> x = point["x"] >>> "%s = %s" % (x.path, x.value) '/x = 3' >>> x.parent == point True >>> x.description 'X coordinate' >>> x.index 1 >>> x.address, x.absolute_address (48, 48) The index is not the index of a field in a parent field list, '1' means that it's the second since the index starts at zero. Parser with sub-field sets -------------------------- After learning basic API, let's see a more complex parser: parser with sub-field sets. >>> from hachoir_core.field import FieldSet, UInt8, Character, String >>> class Entry(FieldSet): ... def createFields(self): ... yield Character(self, "letter") ... yield UInt8(self, "code") ... >>> class MyFormat(Parser): ... endian = LITTLE_ENDIAN ... def createFields(self): ... yield String(self, "signature", 3, charset="ASCII") ... yield UInt8(self, "count") ... for index in xrange(self["count"].value): ... yield Entry(self, "point[]") ... >>> data = "MYF\3a\0b\2c\0" >>> stream = StringInputStream(data) >>> root = MyFormat(stream) This example presents many interesting features of Hachoir. First of all, you can see that you can have two or more levels of fields. Here we have a tree with two levels: >>> def displayTree(parent): ... for field in parent: ... print field.path ... if field.is_field_set: displayTree(field) ... >>> displayTree(root) /signature /count /point[0] /point[0]/letter /point[0]/code /point[1] /point[1]/letter /point[1]/code /point[2] /point[2]/letter /point[2]/code A field set is also a field, so it has the same attributes than another field (name, address, size, path, etc.) but has some new attributes like stream or root. Lazy feature ------------ Hachoir is written in Python so it should be slow and eat a lot of CPU and memory, and it does. But in most cases, you don't need to explore an entire field set and read all values; you just need to read some values of some specific fields. Hachoir is really lazy: no field is parsed before you ask for it, no value is read from stream before you read a value, etc. To inspect this behaviour, you can watch "current_length" (number of read fields) and "current_size" (current size in bits of a field set): >>> root = MyFormat(stream) # Rebuild our parser >>> print (root.current_length, root.current_size) (0, 0) >>> print root["signature"].display "MYF" >>> print (root.current_length, root.current_size, root["signature"].size) (1, 24, 24) Just after its creation, a parser is empty (0 fields). When we read the first field, its size becomes the size of the first field. Some operations requires to read more fields: >>> print root["point[0]/letter"].display 'a' >>> print (root.current_length, root.current_size) (3, 48) Reading point[0] needs to read field "count". So root now contains three fields. List of field types =================== Number: * Bit: one bit (True/False) ; * Bits: unsigned number with a size in bits ; * Bytes: vector of know bytes (e.g. file signature) ; * UInt8, UInt16, UInt24, UInt32, UInt64: unsigned number (size: 8, 16, ... bits) ; * Int8, Int16, Int24, Int32, Int64: signed number (size: 8, 16, ... bits) ; * Float32, Float64, Float80: IEEE 754 floating point number (32, 64, 80 bits) ; Text: * Character: 8 bits ASCII character ; * String: fixed length string ; * CString: string ending with nul byte ("\\0") ; * UnixLine: string ending with new line character ("\\n") ; * PascalString8, PascalString16 and PascalString32: string prefixed with length in a unsigned 8 / 16 / 32 bits integer (use parent endian) ; Timestamp (date and time): * TimestampUnix32, TimestampUnix64: 32/64 bits UNIX, number of seconds since the January 1st 1970 ; * TimestampMac32: 32-bit Mac, number of seconds since the January 1st 1904 ; * TimestampWin64: 64-bit Windows, number of 1/10 microseconds since the January 1st 1600 ; * DateTimeMSDOS32 and TimeDateMSDOS32: 32-bit MS-DOS structure, since the January 1st 1980. Timedelta (duration): * TimedeltaWin64: 64-bit Windows, number of 1/10 microseconds Padding and raw bytes: * PaddingBits/PaddingBytes: padding with a size in bits/bytes ; * NullBits/NullBytes: null padding with a size in bits/bytes ; * RawBits/RawBytes: unknown content with a size in bits/bytes. * SubFile: a file contained in the stream ; To create your own type, you can use: * GenericInteger: integer ; * GenericString: string ; * FieldSet: Set of other fields ; * Parser: The main class to parse a stream. Field class =========== Read only attributes: * name (str): Name of the field, is unique in parent field set * address (long): address in bits relative to parent address * absolute_address (long): address in bits relative to input stream * parent (GenericFieldSet): parent field (is a field set) * is_field_set (bool) <~~~ can be replaced: the field contains other fields? * index (int): index of the field in parent field set (first index is 0) Read only and lazy attributes: * size (long), cached: size of the field in bits * description (str|unicode), cached: informal description * display (unicode): value with human representation as unicode string * raw_display (unicode): value with raw representation as unicode string * path (str): concatenation with slash separator of all field name from the root field Method that can be replaced: * createDescription(): create value of 'description' attribute * createValue(): create value of 'value' attribute * createDisplay(): create value of 'display' attribute * _createInputStream(): create an InputStream containing the field content Aliases (method): * __str__() <=> read display attribute * __unicode__() <=> read display attribute * __getitem__(key): alias to getField(key, False) Other methods: * static_size: helper to compute field size. If the value is an integer, the type has constant size. If it's a function, the size depends of the arguments. * hasValue(): check if the field has a value or not (default: self.value is not None) * getField(key, const=True): get the field with specified key, if const is True the field set will not be changed * __contains__(key) * getSubIStream(): return a tagged InputStream containing the field content * setSubIStream(): helper to replace _createInputStream (the old one is passed to the new one to allow chaining) Field set class =============== Read only attributes: * endian: value is BIG_ENDIAN or LITTLE_ENDIAN, the way the bits are written in input stream <~~ can be replaced * stream (InputStream): input stream * root (FieldSet): root of all fields * eof (bool): End Of File: are we at the end of the input stream? * done (bool): The parser is done or not? Read only and lazy attributes: * current_size (long): Current size in bits * current_length (long): Current number of children Methods: * connectEvent(event, handler, local=True): connect an handler to an event * raiseEvent(event, \*args): raise an event * reset(): clear all caches but keep its size if we know it * setUniqueFieldName(): for field with name ending with "[]", replaces "[]" with an unique identifier like, "item[]" => "item[0]". * seekBit(address, ...): create a field to seek to specified address or returns None if we are already there * seekByte(address, ...): create a field to seek to specified address or returns None if we are already there * replaceField(name, fields): replace a field with one or more fields <~~~ I don't like this method :-( * getFieldByAddress(address, feed=True): get the field at the specified address * writeFieldsIn(old, address, new): helper for replaceField() <~~~ can be an helper? * getFieldType(): get the field type as a short string. The type may contains extra informations like the string charset. Lazy methods: * array(): create a FakeArray to easily get a field by its index (see FakeArray API to get more details) * __len__(): number of children in the field set * readFirstFields(number): read first 'number' fields, returns number of new fields * readMoreFields(number): read more 'number' fields, returns number of new fields * __iter__(): iterate over children * createFields(): main function of the parser, create the fields. Don't call this function directly. hachoir-core-1.3.3/doc/uml_field_set.dia0000644000175000017500000014210511251277274017155 0ustar haypohaypo #A4# #GenericFieldSet# ## ## #endian# #ENDIAN# ## ## #fields# #OrderedDict|IndexedDict# ## ## #stream# #InputStream# ## ## #root# #FieldSet|None# ## ## #__getitem__# ## #Field# ## #name# #str# ## ## #__contains__# ## ## ## #name# #str# ## ## #__iter__# ## #generator# ## #__len__# ## #long# ## #connectEvent# ## ## ## #event_name# #str# ## ## #handler# #function# ## ## #local# #bool# #True# ## #raiseEvent# ## ## ## #event_name# #args# #list# ## #createFields# ## ## ## #Parser# ## ## #tags# #dict# ## ## #validate# ## #bool|str# ## #FieldSet# ## ## ## ## ## ## ## ## #Field# ## ## #name# #str# ## ## #size# #int# ## ## #parent# #FieldSet# ## ## #value# #(any)# ## ## #address# #long# ## ## #absolute_address# #long# ## ## #display# #str# ## ## #description# #str# ## ## #path# #str# ## ## #__str__# ## #str# ## #__contains__# ## #bool# ## #name# #str# ## ## hachoir-core-1.3.3/doc/uml_field.dia0000644000175000017500000043714511251277274016315 0ustar haypohaypo #A4# #Field# ## ## #name# #str# ## ## #size# #int# ## ## #parent# #FieldSet# ## ## #value# #(any)# ## ## #address# #long# ## ## #absolute_address# #long# ## ## #display# #str# ## ## #description# #str# ## ## #path# #str# ## ## #__str__# ## #str# ## #__contains__# ## #bool# ## #name# #str# ## ## #Bits# ## ## #value# #long# ## ## #endian# #ENDIAN# ## ## ## ## #Bit# ## ## #size# #1 bit# ## ## #value# #bool# ## ## ## ## #RawBytes# ## ## #endian# #ENDIAN# ## ## #value# #str# ## ## ## ## #Character# ## ## #size# #8 bits# ## ## #value# #chr# ## ## ## ## ## ## #Enum# ## ## #display# #str# ## ## #__init__# ## ## ## #field# #Field# ## ## #enum# #dict# ## ## #GenericString# ## ## #charset# #str# ## ## #length# #long# ## ## #format# #FORMAT# ## ## #value# #str|unicode# ## ## #String# ## ## ## ## ## ## #CString# ## ## #UnixLine# ## ## #PascalString8# ## ## #PascalString16# ## ## #PascalString32# ## ## ## ## ## ## ## ## ## ## ## ## #GenericInteger# ## ## #value# #long# ## ## #signed# #bool# ## ## #text_handler# #function# ## ## #UIntXX# ## ## #size# #XX bits# ## ## #signed# #False# ## ## #IntXX# ## ## #size# #XX bits# ## ## #signed# #True# ## ## ## ## ## ## #GenericFloat# ## ## #value# #float# ## ## #format# #str# ## ## #real_format# #str# ## ## #Float32# ## ## #size# #32 bits# ## ## #Float64# ## ## #size# #64 bits# ## ## ## ## ## ## ## ## #PaddingBits# ## ## #pattern# #int# ## ## ## ## #PaddingBytes# ## ## #pattern# #str# ## ## ## ## hachoir-core-1.3.3/doc/rest.css0000644000175000017500000000375311251277274015357 0ustar haypohaypo/* :Author: Fred L. Drake, Jr. :Author: Olivier Grisel :date: $Date: 2004/03/31 22:31:05 $ :version: $Revision: 1.7 $ */ @import url(default.css); body { margin: 0; padding: 0; background-color: #333; margin: 2% 5% 2% 5%; } div.document { font-size: 13px; padding: 3em 5em 3em 5em; background-color: white; border: 1px outset black; /* background-image: url(strands.png); background-position: 2em 0; background-repeat: repeat-y; */ } h1.title { font-size: 170%; } div.section { margin: 0px 0px 1.5em 0px; } div.section h1 { padding: 0.3em; margin-top: 1em; background-color: #333; border: 1px solid; border-color: #666 #222 #222 #666; } div.section h1 a { color: white; } h1 { font-family: sans-serif; font-size: 135%; margin-bottom: 4ex; } h2 { font-family: sans-serif; font-size: 120%; } h3 { font-family: sans-serif; font-size: 105%; } h4 { font-family: sans-serif; font-size: 100%; } h5 { font-family: sans-serif; font-size: 100%; } h6 { font-family: sans-serif; font-style: italic; font-size: 100%; } a, a:visited { color: #333; text-decoration: none; font-weight: bold; } a:hover { text-decoration: underline; } a:active { color: #666; } hr { width: 75%; } p { text-align: justify; } .literal .pre { background-color: white; font-family: lucidatypewriter, "lucida typewriter", sans-serif; } .literal-block { border: thin solid rgb(180,180,180); font-family: lucidatypewriter, "lucida typewriter", monospace; font-size: 90%; color: #111; line-height: 140%; background-color: #eee; padding: 0.5em; margin-bottom: 1.5em; } table.table { margin-left: 2em; margin-right: 2em; } table.table thead { background-color: rgb(230,230,230); } dt { font-weight: bold; } /* docutils uses the "option" class with both "col" and "span" elements, so we have to be explicit here */ .option-list span.option { font-weight: bold; } .option-list kbd { font-family: inherit; } hachoir-core-1.3.3/doc/default.css0000644000175000017500000000751511251277274016026 0ustar haypohaypo/* :Author: David Goodger :Contact: goodger@users.sourceforge.net :date: $Date: 2003/08/28 22:51:30 $ :version: $Revision: 1.1 $ :copyright: This stylesheet has been placed in the public domain. Default cascading style sheet for the HTML output of Docutils. */ .first { margin-top: 0 } .last { margin-bottom: 0 } a.toc-backref { text-decoration: none ; color: black } dd { margin-bottom: 0.5em } div.abstract { margin: 2em 5em } div.abstract p.topic-title { font-weight: bold ; text-align: center } div.attention, div.caution, div.danger, div.error, div.hint, div.important, div.note, div.tip, div.warning, div.admonition { margin: 2em ; border: medium outset ; padding: 1em } div.attention p.admonition-title, div.caution p.admonition-title, div.danger p.admonition-title, div.error p.admonition-title, div.warning p.admonition-title { color: red ; font-weight: bold ; font-family: sans-serif } div.hint p.admonition-title, div.important p.admonition-title, div.note p.admonition-title, div.tip p.admonition-title, div.admonition p.admonition-title { font-weight: bold ; font-family: sans-serif } div.dedication { margin: 2em 5em ; text-align: center ; font-style: italic } div.dedication p.topic-title { font-weight: bold ; font-style: normal } div.figure { margin-left: 2em } div.footer, div.header { font-size: smaller } div.sidebar { margin-left: 1em ; border: medium outset ; padding: 0em 1em ; background-color: #ffffee ; width: 40% ; float: right ; clear: right } div.sidebar p.rubric { font-family: sans-serif ; font-size: medium } div.system-messages { margin: 5em } div.system-messages h1 { color: red } div.system-message { border: medium outset ; padding: 1em } div.system-message p.system-message-title { color: red ; font-weight: bold } div.topic { margin: 2em } h1.title { text-align: center } h2.subtitle { text-align: center } hr { width: 75% } ol.simple, ul.simple { margin-bottom: 1em } ol.arabic { list-style: decimal } ol.loweralpha { list-style: lower-alpha } ol.upperalpha { list-style: upper-alpha } ol.lowerroman { list-style: lower-roman } ol.upperroman { list-style: upper-roman } p.attribution { text-align: right ; margin-left: 50% } p.caption { font-style: italic } p.credits { font-style: italic ; font-size: smaller } p.label { white-space: nowrap } p.rubric { font-weight: bold ; font-size: larger ; color: darkred ; text-align: center } p.sidebar-title { font-family: sans-serif ; font-weight: bold ; font-size: larger } p.sidebar-subtitle { font-family: sans-serif ; font-weight: bold } p.topic-title { font-weight: bold } pre.address { margin-bottom: 0 ; margin-top: 0 ; font-family: serif ; font-size: 100% } pre.line-block { font-family: serif ; font-size: 100% } pre.literal-block, pre.doctest-block { margin-left: 2em ; margin-right: 2em ; background-color: #eeeeee } span.classifier { font-family: sans-serif ; font-style: oblique } span.classifier-delimiter { font-family: sans-serif ; font-weight: bold } span.interpreted { font-family: sans-serif } span.option { white-space: nowrap } span.option-argument { font-style: italic } span.pre { white-space: pre } span.problematic { color: red } table { margin-top: 0.5em ; margin-bottom: 0.5em } table.citation { border-left: solid thin gray ; padding-left: 0.5ex } table.docinfo { margin: 2em 4em } table.footnote { border-left: solid thin black ; padding-left: 0.5ex } td, th { padding-left: 0.5em ; padding-right: 0.5em ; vertical-align: top } th.docinfo-name, th.field-name { font-weight: bold ; text-align: left ; white-space: nowrap } h1 tt, h2 tt, h3 tt, h4 tt, h5 tt, h6 tt { font-size: 100% } tt { background-color: #eeeeee } ul.auto-toc { list-style-type: none } hachoir-core-1.3.3/doc/uml.xmi0000644000175000017500000112376011251277274015206 0ustar haypohaypo umbrello uml modeller http://uml.sf.net 1.5.1 UnicodeUTF8
hachoir-core-1.3.3/doc/internals.rst0000644000175000017500000001223511251277274016414 0ustar haypohaypoHachoir internals ================= When is a field really created? ------------------------------- A field is created when someone ask to access it, or when another field is asked and the field is before it. So if you use a field in your field set constructor, one or more fields will be created. Example: >>> from hachoir_core.field import Parser, Int8 >>> from hachoir_core.endian import BIG_ENDIAN >>> class Point(Parser): ... endian = BIG_ENDIAN ... def __init__(self, stream): ... Parser.__init__(self, stream) ... if self["color"].value == -1: ... self._description += " (no color)" ... ... def createFields(self): ... yield Int8(self, "color", "Point color (-1 for none)") ... yield Int8(self, "use_3d", "Does it use Z axis?") ... yield Int8(self, "x", "X axis value") ... yield Int8(self, "y", "Y axis value") ... if self["use_3d"] == 1: ... yield Int8(self, "z", "Z axis value") ... In the constructor, the field "color" is asked. So the field list will contains one field (color): >>> from hachoir_core.stream import StringInputStream >>> stream = StringInputStream("\x2A\x00\x04\x05") >>> p = Point(stream) >>> p.current_length 1 If you access another field, the field list will grow up until the requested field is reached: >>> x = p["x"].value >>> p.current_length 3 Some field set methods which create new fields: * ``__getitem__()``: feed field list until requested field is reached (or raise MissingField exception) ; * ``__len__()``: create all fields ; * ``__iter__()``: begin to iterate in existing fields, and the iterate in new fields until all fields are created ; * ``__contains__()``: feed field list until requested field is reached, may create all fields if the field doesn't exist ; * ``readFirstFields()`` ; * ``readMoreFields()``. When is the size really computed? --------------------------------- The size attribute also interact with field list creation, but it's mechanism is little bit more complex. By default, the whole field list have to be built before size value can be read. But you can specify field list size: * if field list is not dynamic (e.g. doesn't depend on flag), use class attribute ``static_size`` ; * otherwise you can set _size instance attribute in the constructor. Two examples: >>> from hachoir_core.field import Parser >>> class FourBytes(Parser): ... endian = BIG_ENDIAN ... static_size = 32 ... def createFields(self): ... yield Integer(self, "four", "uint32") ... >>> class DynamicSize(Parser): ... endian = BIG_ENDIAN ... def __init__(self, stream, nb_items): ... Parser.__init__(self, stream) ... assert 0 < nb_items ... self.nb_items = nb_items ... self._size = nb_items * 32 # 32 is the size of one item ... ... def createFields(self): ... for index in range(self.nb_items): ... yield Integer(self, "item[]", "uint32") ... >>> a = FourBytes(stream) >>> b = DynamicSize(stream, 1) >>> a.size, b.size (32, 32) >>> # Check that nothing is really read from stream ... a.current_length, b.current_length (0, 0) When the value of a field is read? ---------------------------------- When a field is created, the value of the field doesn't exist (equals to None). The value is really read when you read the field value using 'value' or 'display' field attributes. The value is then stored in the field. Details about field name ------------------------ The name of a field have to be unique in a field set because it is used as key in the field list. The argument 'name' of the Field constructor can be changed in the constructor, but should not (and cannot) be changed after that. For arrays, you can use the 'magic' suffix « [] » (e.g. "item[]") which will be replaced by « [index] » where the number index is a counter starting a zero. Endian ------ The "endian" is the way in which ''bytes'' are stored. There are two important orders: * « Big endian » in which *most* significant byte (*big* number) are written first (PowerPC / Motorola CPUs). It's also the network byte order ; * « Little endian » in which *least* significant byte (*little* number) are written first (Intel x86 CPUs). The number 0x1020 is stored "0x10 0x20" in big endian and "0x20 0x10" in little endian. The endian is global to a FieldSet and is a class attribute. Allowed values: * BIG_ENDIAN ; * NETWORK_ENDIAN (alias of BIG_ENDIAN) ; * LITTLE_ENDIAN. Example to set endian: >>> from hachoir_core.endian import LITTLE_ENDIAN >>> class UseLittleEndian(Parser): ... endian = LITTLE_ENDIAN ... For sub-field sets, if endian is not specified, parent endian will be used. Explore a field set using it's path ----------------------------------- Fields are stored in a tree. To explore the tree you have different tools: * attribute *root* of a field which go to tree root ; * attribute *parent* go to field parent (is None for tree root) ; * and you can specify a path in *__getitem__()* argument. There are different valid syntaxes for a path: * path to a child of current node: ``field["content"]`` ; * path to a child of the parent: ``field["../brother"]`` ; * path from the root: ``field["/header/key"]``. hachoir-core-1.3.3/doc/adding_fields.graphviz0000644000175000017500000000130211251277274020204 0ustar haypohaypodigraph GenericFieldSet { rankdir=LR; { node [shape=box]; StopIteration [ color=green ]; FieldError [ color=red, rank=max ]; ParserError -> FieldError; }; _addField -> { _setUniqFieldName _fixFieldSize }; readFirstFields -> readMoreFields; { _stopFeeding _fixFeedError } -> _fixLastField -> _deleteField; { __iter__ _feedAll readMoreFields _feedUntil } -> _addField; { __iter__ _feedAll readMoreFields _feedUntil } -> _stopFeeding [ color=green ]; { __iter__ _feedAll readMoreFields _feedUntil } -> { _fixFeedError FieldError } [ color=red ]; { _addField _stopFeeding } -> ParserError; _fixFieldSize -> StopIteration; { __len__ _getSize } -> _feedAll; _getField -> _feedUntil; } hachoir-core-1.3.3/test_doc.py0000755000175000017500000000265711251277274015306 0ustar haypohaypo#!/usr/bin/env python2.4 import doctest import sys import os import hachoir_core.i18n # import it because it does change the locale from locale import setlocale, LC_ALL def testDoc(filename, name=None): print "--- %s: Run tests" % filename failure, nb_test = doctest.testfile( filename, optionflags=doctest.ELLIPSIS, name=name) if failure: sys.exit(1) print "--- %s: End of tests" % filename def importModule(name): mod = __import__(name) components = name.split('.') for comp in components[1:]: mod = getattr(mod, comp) return mod def testModule(name): print "--- Test module %s" % name module = importModule(name) failure, nb_test = doctest.testmod(module) if failure: sys.exit(1) print "--- End of test" def main(): setlocale(LC_ALL, "C") hachoir_dir = os.path.dirname(__file__) sys.path.append(hachoir_dir) # Configure Hachoir for tests import hachoir_core.config as config config.use_i18n = False # Test documentation in doc/*.rst files testDoc('doc/hachoir-api.rst') testDoc('doc/internals.rst') # Test documentation of some functions/classes testModule("hachoir_core.bits") testModule("hachoir_core.compatibility") testModule("hachoir_core.dict") testModule("hachoir_core.i18n") testModule("hachoir_core.text_handler") testModule("hachoir_core.tools") if __name__ == "__main__": main() hachoir-core-1.3.3/setup.py0000755000175000017500000000504411342040300014606 0ustar haypohaypo#!/usr/bin/env python # Procedure to release a new version: # - edit hachoir_core/version.py: VERSION = "XXX" # - edit ChangeLog (set release date) # - run: ./test_doc.py # - run: hg commit # - run: hg tag hachoir-core-XXX # - run: hg push # - run: python2.5 ./setup.py --setuptools register sdist bdist_egg upload # - run: python2.4 ./setup.py --setuptools bdist_egg upload # - run: python2.6 ./setup.py --setuptools bdist_egg upload # - check: http://pypi.python.org/pypi/hachoir-core # - update the website # * http://bitbucket.org/haypo/hachoir/wiki/Install/source # * http://bitbucket.org/haypo/hachoir/wiki/Home # - edit hachoir_core/version.py: set version to N+1 # - edit ChangeLog: add a new "hachoir-core N+1" section with text XXX # Constants AUTHORS = 'Julien Muchembled, Victor Stinner' DESCRIPTION = "Core of Hachoir framework: parse and edit binary files" CLASSIFIERS = [ 'Intended Audience :: Developers', 'Development Status :: 5 - Production/Stable', 'Environment :: Plugins', 'Topic :: Software Development :: Libraries :: Python Modules', 'Topic :: Utilities', 'License :: OSI Approved :: GNU General Public License (GPL)', 'Operating System :: OS Independent', 'Natural Language :: English', 'Programming Language :: Python'] import sys from os.path import join as path_join from imp import load_source def main(): # Check Python version! if sys.hexversion < 0x2040000: print "Sorry, you need Python 2.4 or greater to run (install) Hachoir!" sys.exit(1) if "--setuptools" in sys.argv: sys.argv.remove("--setuptools") from setuptools import setup use_setuptools = True else: from distutils.core import setup use_setuptools = False # Set some variables hachoir_core = load_source("version", path_join("hachoir_core", "version.py")) long_description = open('README').read() + open('ChangeLog').read() install_options = { "name": hachoir_core.PACKAGE, "version": hachoir_core.VERSION, "url": hachoir_core.WEBSITE, "download_url": hachoir_core.WEBSITE, "license": hachoir_core.LICENSE, "author": AUTHORS, "description": DESCRIPTION, "classifiers": CLASSIFIERS, "packages": ['hachoir_core', 'hachoir_core.field', 'hachoir_core.stream'], "long_description": long_description, } if use_setuptools: install_options["zip_safe"] = True # Call main() setup function setup(**install_options) if __name__ == "__main__": main() hachoir-core-1.3.3/setup.cfg0000644000175000017500000000007311342040416014717 0ustar haypohaypo[egg_info] tag_build = tag_date = 0 tag_svn_revision = 0 hachoir-core-1.3.3/hachoir_core/0000755000175000017500000000000011342040416015523 5ustar haypohaypohachoir-core-1.3.3/hachoir_core/cmd_line.py0000644000175000017500000000266011251277274017670 0ustar haypohaypofrom optparse import OptionGroup from hachoir_core.log import log from hachoir_core.i18n import _, getTerminalCharset from hachoir_core.tools import makePrintable import hachoir_core.config as config def getHachoirOptions(parser): """ Create an option group (type optparse.OptionGroup) of Hachoir library options. """ def setLogFilename(*args): log.setFilename(args[2]) common = OptionGroup(parser, _("Hachoir library"), \ "Configure Hachoir library") common.add_option("--verbose", help=_("Verbose mode"), default=False, action="store_true") common.add_option("--log", help=_("Write log in a file"), type="string", action="callback", callback=setLogFilename) common.add_option("--quiet", help=_("Quiet mode (don't display warning)"), default=False, action="store_true") common.add_option("--debug", help=_("Debug mode"), default=False, action="store_true") return common def configureHachoir(option): # Configure Hachoir using "option" (value from optparse) if option.quiet: config.quiet = True if option.verbose: config.verbose = True if option.debug: config.debug = True def unicodeFilename(filename, charset=None): if not charset: charset = getTerminalCharset() try: return unicode(filename, charset) except UnicodeDecodeError: return makePrintable(filename, charset, to_unicode=True) hachoir-core-1.3.3/hachoir_core/field/0000755000175000017500000000000011342040416016606 5ustar haypohaypohachoir-core-1.3.3/hachoir_core/field/generic_field_set.py0000644000175000017500000004534711251277274022644 0ustar haypohaypofrom hachoir_core.field import (MissingField, BasicFieldSet, Field, ParserError, createRawField, createNullField, createPaddingField, FakeArray) from hachoir_core.dict import Dict, UniqKeyError from hachoir_core.error import HACHOIR_ERRORS from hachoir_core.tools import lowerBound import hachoir_core.config as config class GenericFieldSet(BasicFieldSet): """ Ordered list of fields. Use operator [] to access fields using their name (field names are unique in a field set, but not in the whole document). Class attributes: - endian: Bytes order (L{BIG_ENDIAN} or L{LITTLE_ENDIAN}). Optional if the field set has a parent ; - static_size: (optional) Size of FieldSet in bits. This attribute should be used in parser of constant size. Instance attributes/methods: - _fields: Ordered dictionnary of all fields, may be incomplete because feeded when a field is requested ; - stream: Input stream used to feed fields' value - root: The root of all field sets ; - __len__(): Number of fields, may need to create field set ; - __getitem__(): Get an field by it's name or it's path. And attributes inherited from Field class: - parent: Parent field (may be None if it's the root) ; - name: Field name (unique in parent field set) ; - value: The field set ; - address: Field address (in bits) relative to parent ; - description: A string describing the content (can be None) ; - size: Size of field set in bits, may need to create field set. Event handling: - "connectEvent": Connect an handler to an event ; - "raiseEvent": Raise an event. To implement a new field set, you need to: - create a class which inherite from FieldSet ; - write createFields() method using lines like: yield Class(self, "name", ...) ; - and maybe set endian and static_size class attributes. """ _current_size = 0 def __init__(self, parent, name, stream, description=None, size=None): """ Constructor @param parent: Parent field set, None for root parser @param name: Name of the field, have to be unique in parent. If it ends with "[]", end will be replaced with "[new_id]" (eg. "raw[]" becomes "raw[0]", next will be "raw[1]", and then "raw[2]", etc.) @type name: str @param stream: Input stream from which data are read @type stream: L{InputStream} @param description: Optional string description @type description: str|None @param size: Size in bits. If it's None, size will be computed. You can also set size with class attribute static_size """ BasicFieldSet.__init__(self, parent, name, stream, description, size) self._fields = Dict() self._field_generator = self.createFields() self._array_cache = {} self.__is_feeding = False def array(self, key): try: return self._array_cache[key] except KeyError: array = FakeArray(self, key) self._array_cache[key] = array return self._array_cache[key] def reset(self): """ Reset a field set: * clear fields ; * restart field generator ; * set current size to zero ; * clear field array count. But keep: name, value, description and size. """ BasicFieldSet.reset(self) self._fields = Dict() self._field_generator = self.createFields() self._current_size = 0 self._array_cache = {} def __str__(self): return '<%s path=%s, current_size=%s, current length=%s>' % \ (self.__class__.__name__, self.path, self._current_size, len(self._fields)) def __len__(self): """ Returns number of fields, may need to create all fields if it's not done yet. """ if self._field_generator is not None: self._feedAll() return len(self._fields) def _getCurrentLength(self): return len(self._fields) current_length = property(_getCurrentLength) def _getSize(self): if self._size is None: self._feedAll() return self._size size = property(_getSize, doc="Size in bits, may create all fields to get size") def _getCurrentSize(self): assert not(self.done) return self._current_size current_size = property(_getCurrentSize) eof = property(lambda self: self._checkSize(self._current_size + 1, True) < 0) def _checkSize(self, size, strict): field = self while field._size is None: if not field._parent: assert self.stream.size is None if not strict: return None if self.stream.sizeGe(size): return 0 break size += field._address field = field._parent return field._size - size autofix = property(lambda self: self.root.autofix) def _addField(self, field): """ Add a field to the field set: * add it into _fields * update _current_size May raise a StopIteration() on error """ if not issubclass(field.__class__, Field): raise ParserError("Field type (%s) is not a subclass of 'Field'!" % field.__class__.__name__) assert isinstance(field._name, str) if field._name.endswith("[]"): self.setUniqueFieldName(field) if config.debug: self.info("[+] DBG: _addField(%s)" % field.name) # required for the msoffice parser if field._address != self._current_size: self.warning("Fix address of %s to %s (was %s)" % (field.path, self._current_size, field._address)) field._address = self._current_size ask_stop = False # Compute field size and check that there is enough place for it self.__is_feeding = True try: field_size = field.size except HACHOIR_ERRORS, err: if field.is_field_set and field.current_length and field.eof: self.warning("Error when getting size of '%s': %s" % (field.name, err)) field._stopFeeding() ask_stop = True else: self.warning("Error when getting size of '%s': delete it" % field.name) self.__is_feeding = False raise self.__is_feeding = False # No more place? dsize = self._checkSize(field._address + field.size, False) if (dsize is not None and dsize < 0) or (field.is_field_set and field.size <= 0): if self.autofix and self._current_size: self._fixFieldSize(field, field.size + dsize) else: raise ParserError("Field %s is too large!" % field.path) self._current_size += field.size try: self._fields.append(field._name, field) except UniqKeyError, err: self.warning("Duplicate field name " + unicode(err)) field._name += "[]" self.setUniqueFieldName(field) self._fields.append(field._name, field) if ask_stop: raise StopIteration() def _fixFieldSize(self, field, new_size): if new_size > 0: if field.is_field_set and 0 < field.size: field._truncate(new_size) return # Don't add the field <=> delete item if self._size is None: self._size = self._current_size + new_size self.warning("[Autofix] Delete '%s' (too large)" % field.path) raise StopIteration() def _getField(self, name, const): field = Field._getField(self, name, const) if field is None: if name in self._fields: field = self._fields[name] elif self._field_generator is not None and not const: field = self._feedUntil(name) return field def getField(self, key, const=True): if isinstance(key, (int, long)): if key < 0: raise KeyError("Key must be positive!") if not const: self.readFirstFields(key+1) if len(self._fields.values) <= key: raise MissingField(self, key) return self._fields.values[key] return Field.getField(self, key, const) def _truncate(self, size): assert size > 0 if size < self._current_size: self._size = size while True: field = self._fields.values[-1] if field._address < size: break del self._fields[-1] self._current_size = field._address size -= field._address if size < field._size: if field.is_field_set: field._truncate(size) else: del self._fields[-1] field = createRawField(self, size, "raw[]") self._fields.append(field._name, field) self._current_size = self._size else: assert size < self._size or self._size is None self._size = size if self._size == self._current_size: self._field_generator = None def _deleteField(self, index): field = self._fields.values[index] size = field.size self._current_size -= size del self._fields[index] return field def _fixLastField(self): """ Try to fix last field when we know current field set size. Returns new added field if any, or None. """ assert self._size is not None # Stop parser message = ["stop parser"] self._field_generator = None # If last field is too big, delete it while self._size < self._current_size: field = self._deleteField(len(self._fields)-1) message.append("delete field %s" % field.path) assert self._current_size <= self._size # If field size current is smaller: add a raw field size = self._size - self._current_size if size: field = createRawField(self, size, "raw[]") message.append("add padding") self._current_size += field.size self._fields.append(field._name, field) else: field = None message = ", ".join(message) self.warning("[Autofix] Fix parser error: " + message) assert self._current_size == self._size return field def _stopFeeding(self): new_field = None if self._size is None: if self._parent: self._size = self._current_size elif self._size != self._current_size: if self.autofix: new_field = self._fixLastField() else: raise ParserError("Invalid parser \"%s\" size!" % self.path) self._field_generator = None return new_field def _fixFeedError(self, exception): """ Try to fix a feeding error. Returns False if error can't be fixed, otherwise returns new field if any, or None. """ if self._size is None or not self.autofix: return False self.warning(unicode(exception)) return self._fixLastField() def _feedUntil(self, field_name): """ Return the field if it was found, None else """ if self.__is_feeding \ or (self._field_generator and self._field_generator.gi_running): self.warning("Unable to get %s (and generator is already running)" % field_name) return None try: while True: field = self._field_generator.next() self._addField(field) if field.name == field_name: return field except HACHOIR_ERRORS, err: if self._fixFeedError(err) is False: raise except StopIteration: self._stopFeeding() return None def readMoreFields(self, number): """ Read more number fields, or do nothing if parsing is done. Returns number of new added fields. """ if self._field_generator is None: return 0 oldlen = len(self._fields) try: for index in xrange(number): self._addField( self._field_generator.next() ) except HACHOIR_ERRORS, err: if self._fixFeedError(err) is False: raise except StopIteration: self._stopFeeding() return len(self._fields) - oldlen def _feedAll(self): if self._field_generator is None: return try: while True: field = self._field_generator.next() self._addField(field) except HACHOIR_ERRORS, err: if self._fixFeedError(err) is False: raise except StopIteration: self._stopFeeding() def __iter__(self): """ Create a generator to iterate on each field, may create new fields when needed """ try: done = 0 while True: if done == len(self._fields): if self._field_generator is None: break self._addField( self._field_generator.next() ) for field in self._fields.values[done:]: yield field done += 1 except HACHOIR_ERRORS, err: field = self._fixFeedError(err) if isinstance(field, Field): yield field elif hasattr(field, '__iter__'): for f in field: yield f elif field is False: raise except StopIteration: field = self._stopFeeding() if isinstance(field, Field): yield field elif hasattr(field, '__iter__'): for f in field: yield f def _isDone(self): return (self._field_generator is None) done = property(_isDone, doc="Boolean to know if parsing is done or not") # # FieldSet_SeekUtility # def seekBit(self, address, name="padding[]", description=None, relative=True, null=False): """ Create a field to seek to specified address, or None if it's not needed. May raise an (ParserError) exception if address is invalid. """ if relative: nbits = address - self._current_size else: nbits = address - (self.absolute_address + self._current_size) if nbits < 0: raise ParserError("Seek error, unable to go back!") if 0 < nbits: if null: return createNullField(self, nbits, name, description) else: return createPaddingField(self, nbits, name, description) else: return None def seekByte(self, address, name="padding[]", description=None, relative=True, null=False): """ Same as seekBit(), but with address in byte. """ return self.seekBit(address * 8, name, description, relative, null=null) # # RandomAccessFieldSet # def replaceField(self, name, new_fields): # TODO: Check in self and not self.field # Problem is that "generator is already executing" if name not in self._fields: raise ParserError("Unable to replace %s: field doesn't exist!" % name) assert 1 <= len(new_fields) old_field = self[name] total_size = sum( (field.size for field in new_fields) ) if old_field.size != total_size: raise ParserError("Unable to replace %s: " "new field(s) hasn't same size (%u bits instead of %u bits)!" % (name, total_size, old_field.size)) field = new_fields[0] if field._name.endswith("[]"): self.setUniqueFieldName(field) field._address = old_field.address if field.name != name and field.name in self._fields: raise ParserError( "Unable to replace %s: name \"%s\" is already used!" % (name, field.name)) self._fields.replace(name, field.name, field) self.raiseEvent("field-replaced", old_field, field) if 1 < len(new_fields): index = self._fields.index(new_fields[0].name)+1 address = field.address + field.size for field in new_fields[1:]: if field._name.endswith("[]"): self.setUniqueFieldName(field) field._address = address if field.name in self._fields: raise ParserError( "Unable to replace %s: name \"%s\" is already used!" % (name, field.name)) self._fields.insert(index, field.name, field) self.raiseEvent("field-inserted", index, field) index += 1 address += field.size def getFieldByAddress(self, address, feed=True): """ Only search in existing fields """ if feed and self._field_generator is not None: self._feedAll() if address < self._current_size: i = lowerBound(self._fields.values, lambda x: x.address + x.size <= address) if i is not None: return self._fields.values[i] return None def writeFieldsIn(self, old_field, address, new_fields): """ Can only write in existing fields (address < self._current_size) """ # Check size total_size = sum( field.size for field in new_fields ) if old_field.size < total_size: raise ParserError( \ "Unable to write fields at address %s " \ "(too big)!" % (address)) # Need padding before? replace = [] size = address - old_field.address assert 0 <= size if 0 < size: padding = createPaddingField(self, size) padding._address = old_field.address replace.append(padding) # Set fields address for field in new_fields: field._address = address address += field.size replace.append(field) # Need padding after? size = (old_field.address + old_field.size) - address assert 0 <= size if 0 < size: padding = createPaddingField(self, size) padding._address = address replace.append(padding) self.replaceField(old_field.name, replace) def nextFieldAddress(self): return self._current_size def getFieldIndex(self, field): return self._fields.index(field._name) hachoir-core-1.3.3/hachoir_core/field/integer.py0000644000175000017500000000346411251277274020641 0ustar haypohaypo""" Integer field classes: - UInt8, UInt16, UInt24, UInt32, UInt64: unsigned integer of 8, 16, 32, 64 bits ; - Int8, Int16, Int24, Int32, Int64: signed integer of 8, 16, 32, 64 bits. """ from hachoir_core.field import Bits, FieldError class GenericInteger(Bits): """ Generic integer class used to generate other classes. """ def __init__(self, parent, name, signed, size, description=None): if not (8 <= size <= 256): raise FieldError("Invalid integer size (%s): have to be in 8..256" % size) Bits.__init__(self, parent, name, size, description) self.signed = signed def createValue(self): return self._parent.stream.readInteger( self.absolute_address, self.signed, self._size, self._parent.endian) def integerFactory(name, is_signed, size, doc): class Integer(GenericInteger): __doc__ = doc static_size = size def __init__(self, parent, name, description=None): GenericInteger.__init__(self, parent, name, is_signed, size, description) cls = Integer cls.__name__ = name return cls UInt8 = integerFactory("UInt8", False, 8, "Unsigned integer of 8 bits") UInt16 = integerFactory("UInt16", False, 16, "Unsigned integer of 16 bits") UInt24 = integerFactory("UInt24", False, 24, "Unsigned integer of 24 bits") UInt32 = integerFactory("UInt32", False, 32, "Unsigned integer of 32 bits") UInt64 = integerFactory("UInt64", False, 64, "Unsigned integer of 64 bits") Int8 = integerFactory("Int8", True, 8, "Signed integer of 8 bits") Int16 = integerFactory("Int16", True, 16, "Signed integer of 16 bits") Int24 = integerFactory("Int24", True, 24, "Signed integer of 24 bits") Int32 = integerFactory("Int32", True, 32, "Signed integer of 32 bits") Int64 = integerFactory("Int64", True, 64, "Signed integer of 64 bits") hachoir-core-1.3.3/hachoir_core/field/fake_array.py0000644000175000017500000000436611251277274021312 0ustar haypohaypoimport itertools from hachoir_core.field import MissingField class FakeArray: """ Simulate an array for GenericFieldSet.array(): fielset.array("item")[0] is equivalent to fielset.array("item[0]"). It's possible to iterate over the items using:: for element in fieldset.array("item"): ... And to get array size using len(fieldset.array("item")). """ def __init__(self, fieldset, name): pos = name.rfind("/") if pos != -1: self.fieldset = fieldset[name[:pos]] self.name = name[pos+1:] else: self.fieldset = fieldset self.name = name self._format = "%s[%%u]" % self.name self._cache = {} self._known_size = False self._max_index = -1 def __nonzero__(self): "Is the array empty or not?" if self._cache: return True else: return (0 in self) def __len__(self): "Number of fields in the array" total = self._max_index+1 if not self._known_size: for index in itertools.count(total): try: field = self[index] total += 1 except MissingField: break return total def __contains__(self, index): try: field = self[index] return True except MissingField: return False def __getitem__(self, index): """ Get a field of the array. Returns a field, or raise MissingField exception if the field doesn't exist. """ try: value = self._cache[index] except KeyError: try: value = self.fieldset[self._format % index] except MissingField: self._known_size = True raise self._cache[index] = value self._max_index = max(index, self._max_index) return value def __iter__(self): """ Iterate in the fields in their index order: field[0], field[1], ... """ for index in itertools.count(0): try: yield self[index] except MissingField: raise StopIteration() hachoir-core-1.3.3/hachoir_core/field/basic_field_set.py0000644000175000017500000001121211251277274022271 0ustar haypohaypofrom hachoir_core.field import Field, FieldError from hachoir_core.stream import InputStream from hachoir_core.endian import BIG_ENDIAN, LITTLE_ENDIAN from hachoir_core.event_handler import EventHandler class ParserError(FieldError): """ Error raised by a field set. @see: L{FieldError} """ pass class MatchError(FieldError): """ Error raised by a field set when the stream content doesn't match to file format. @see: L{FieldError} """ pass class BasicFieldSet(Field): _event_handler = None is_field_set = True endian = None def __init__(self, parent, name, stream, description, size): # Sanity checks (preconditions) assert not parent or issubclass(parent.__class__, BasicFieldSet) assert issubclass(stream.__class__, InputStream) # Set field set size if size is None and self.static_size: assert isinstance(self.static_size, (int, long)) size = self.static_size # Set Field attributes self._parent = parent self._name = name self._size = size self._description = description self.stream = stream self._field_array_count = {} # Set endian if not self.endian: assert parent and parent.endian self.endian = parent.endian if parent: # This field set is one of the root leafs self._address = parent.nextFieldAddress() self.root = parent.root assert id(self.stream) == id(parent.stream) else: # This field set is the root self._address = 0 self.root = self self._global_event_handler = None # Sanity checks (post-conditions) assert self.endian in (BIG_ENDIAN, LITTLE_ENDIAN) if (self._size is not None) and (self._size <= 0): raise ParserError("Invalid parser '%s' size: %s" % (self.path, self._size)) def reset(self): self._field_array_count = {} def createValue(self): return None def connectEvent(self, event_name, handler, local=True): assert event_name in ( # Callback prototype: def f(field) # Called when new value is already set "field-value-changed", # Callback prototype: def f(field) # Called when field size is already set "field-resized", # A new field has been inserted in the field set # Callback prototype: def f(index, new_field) "field-inserted", # Callback prototype: def f(old_field, new_field) # Called when new field is already in field set "field-replaced", # Callback prototype: def f(field, new_value) # Called to ask to set new value "set-field-value" ), "Event name %r is invalid" % event_name if local: if self._event_handler is None: self._event_handler = EventHandler() self._event_handler.connect(event_name, handler) else: if self.root._global_event_handler is None: self.root._global_event_handler = EventHandler() self.root._global_event_handler.connect(event_name, handler) def raiseEvent(self, event_name, *args): # Transfer event to local listeners if self._event_handler is not None: self._event_handler.raiseEvent(event_name, *args) # Transfer event to global listeners if self.root._global_event_handler is not None: self.root._global_event_handler.raiseEvent(event_name, *args) def setUniqueFieldName(self, field): key = field._name[:-2] try: self._field_array_count[key] += 1 except KeyError: self._field_array_count[key] = 0 field._name = key + "[%u]" % self._field_array_count[key] def readFirstFields(self, number): """ Read first number fields if they are not read yet. Returns number of new added fields. """ number = number - self.current_length if 0 < number: return self.readMoreFields(number) else: return 0 def createFields(self): raise NotImplementedError() def __iter__(self): raise NotImplementedError() def __len__(self): raise NotImplementedError() def getField(self, key, const=True): raise NotImplementedError() def nextFieldAddress(self): raise NotImplementedError() def getFieldIndex(self, field): raise NotImplementedError() def readMoreFields(self, number): raise NotImplementedError() hachoir-core-1.3.3/hachoir_core/field/string_field.py0000644000175000017500000003475511251277274021664 0ustar haypohaypo""" String field classes: - String: Fixed length string (no prefix/no suffix) ; - CString: String which ends with nul byte ("\0") ; - UnixLine: Unix line of text, string which ends with "\n" ; - PascalString8, PascalString16, PascalString32: String prefixed with length written in a 8, 16, 32-bit integer (use parent endian). Constructor has optional arguments: - strip: value can be a string or True ; - charset: if set, convert string to unicode using this charset (in "replace" mode which replace all buggy characters with "."). Note: For PascalStringXX, prefixed value is the number of bytes and not of characters! """ from hachoir_core.field import FieldError, Bytes from hachoir_core.endian import LITTLE_ENDIAN, BIG_ENDIAN from hachoir_core.tools import alignValue, makePrintable from hachoir_core.i18n import guessBytesCharset, _ from hachoir_core import config from codecs import BOM_UTF16_LE, BOM_UTF16_BE, BOM_UTF32_LE, BOM_UTF32_BE # Default charset used to convert byte string to Unicode # This charset is used if no charset is specified or on conversion error FALLBACK_CHARSET = "ISO-8859-1" class GenericString(Bytes): """ Generic string class. charset have to be in CHARSET_8BIT or in UTF_CHARSET. """ VALID_FORMATS = ("C", "UnixLine", "fixed", "Pascal8", "Pascal16", "Pascal32") # 8-bit charsets CHARSET_8BIT = set(( "ASCII", # ANSI X3.4-1968 "MacRoman", "CP037", # EBCDIC 037 "CP874", # Thai "WINDOWS-1250", # Central Europe "WINDOWS-1251", # Cyrillic "WINDOWS-1252", # Latin I "WINDOWS-1253", # Greek "WINDOWS-1254", # Turkish "WINDOWS-1255", # Hebrew "WINDOWS-1256", # Arabic "WINDOWS-1257", # Baltic "WINDOWS-1258", # Vietnam "ISO-8859-1", # Latin-1 "ISO-8859-2", # Latin-2 "ISO-8859-3", # Latin-3 "ISO-8859-4", # Latin-4 "ISO-8859-5", "ISO-8859-6", "ISO-8859-7", "ISO-8859-8", "ISO-8859-9", # Latin-5 "ISO-8859-10", # Latin-6 "ISO-8859-11", # Thai "ISO-8859-13", # Latin-7 "ISO-8859-14", # Latin-8 "ISO-8859-15", # Latin-9 or ("Latin-0") "ISO-8859-16", # Latin-10 )) # UTF-xx charset familly UTF_CHARSET = { "UTF-8": (8, None), "UTF-16-LE": (16, LITTLE_ENDIAN), "UTF-32LE": (32, LITTLE_ENDIAN), "UTF-16-BE": (16, BIG_ENDIAN), "UTF-32BE": (32, BIG_ENDIAN), "UTF-16": (16, "BOM"), "UTF-32": (32, "BOM"), } # UTF-xx BOM => charset with endian UTF_BOM = { 16: {BOM_UTF16_LE: "UTF-16-LE", BOM_UTF16_BE: "UTF-16-BE"}, 32: {BOM_UTF32_LE: "UTF-32LE", BOM_UTF32_BE: "UTF-32BE"}, } # Suffix format: value is suffix (string) SUFFIX_FORMAT = { "C": { 8: {LITTLE_ENDIAN: "\0", BIG_ENDIAN: "\0"}, 16: {LITTLE_ENDIAN: "\0\0", BIG_ENDIAN: "\0\0"}, 32: {LITTLE_ENDIAN: "\0\0\0\0", BIG_ENDIAN: "\0\0\0\0"}, }, "UnixLine": { 8: {LITTLE_ENDIAN: "\n", BIG_ENDIAN: "\n"}, 16: {LITTLE_ENDIAN: "\n\0", BIG_ENDIAN: "\0\n"}, 32: {LITTLE_ENDIAN: "\n\0\0\0", BIG_ENDIAN: "\0\0\0\n"}, }, } # Pascal format: value is the size of the prefix in bits PASCAL_FORMATS = { "Pascal8": 1, "Pascal16": 2, "Pascal32": 4 } # Raw value: with prefix and suffix, not stripped, # and not converted to Unicode _raw_value = None def __init__(self, parent, name, format, description=None, strip=None, charset=None, nbytes=None, truncate=None): Bytes.__init__(self, parent, name, 1, description) # Is format valid? assert format in self.VALID_FORMATS # Store options self._format = format self._strip = strip self._truncate = truncate # Check charset and compute character size in bytes # (or None when it's not possible to guess character size) if not charset or charset in self.CHARSET_8BIT: self._character_size = 1 # one byte per character elif charset in self.UTF_CHARSET: self._character_size = None else: raise FieldError("Invalid charset for %s: \"%s\"" % (self.path, charset)) self._charset = charset # It is a fixed string? if nbytes is not None: assert self._format == "fixed" # Arbitrary limits, just to catch some bugs... if not (1 <= nbytes <= 0xffff): raise FieldError("Invalid string size for %s: %s" % (self.path, nbytes)) self._content_size = nbytes # content length in bytes self._size = nbytes * 8 self._content_offset = 0 else: # Format with a suffix: Find the end of the string if self._format in self.SUFFIX_FORMAT: self._content_offset = 0 # Choose the suffix suffix = self.suffix_str # Find the suffix length = self._parent.stream.searchBytesLength( suffix, False, self.absolute_address) if length is None: raise FieldError("Unable to find end of string %s (format %s)!" % (self.path, self._format)) if 1 < len(suffix): # Fix length for little endian bug with UTF-xx charset: # u"abc" -> "a\0b\0c\0\0\0" (UTF-16-LE) # search returns length=5, whereas real lenght is 6 length = alignValue(length, len(suffix)) # Compute sizes self._content_size = length # in bytes self._size = (length + len(suffix)) * 8 # Format with a prefix: Read prefixed length in bytes else: assert self._format in self.PASCAL_FORMATS # Get the prefix size prefix_size = self.PASCAL_FORMATS[self._format] self._content_offset = prefix_size # Read the prefix and compute sizes value = self._parent.stream.readBits( self.absolute_address, prefix_size*8, self._parent.endian) self._content_size = value # in bytes self._size = (prefix_size + value) * 8 # For UTF-16 and UTF-32, choose the right charset using BOM if self._charset in self.UTF_CHARSET: # Charset requires a BOM? bomsize, endian = self.UTF_CHARSET[self._charset] if endian == "BOM": # Read the BOM value nbytes = bomsize // 8 bom = self._parent.stream.readBytes(self.absolute_address, nbytes) # Choose right charset using the BOM bom_endian = self.UTF_BOM[bomsize] if bom not in bom_endian: raise FieldError("String %s has invalid BOM (%s)!" % (self.path, repr(bom))) self._charset = bom_endian[bom] self._content_size -= nbytes self._content_offset += nbytes # Compute length in character if possible if self._character_size: self._length = self._content_size // self._character_size else: self._length = None @staticmethod def staticSuffixStr(format, charset, endian): if format not in GenericString.SUFFIX_FORMAT: return '' suffix = GenericString.SUFFIX_FORMAT[format] if charset in GenericString.UTF_CHARSET: suffix_size = GenericString.UTF_CHARSET[charset][0] suffix = suffix[suffix_size] else: suffix = suffix[8] return suffix[endian] def _getSuffixStr(self): return self.staticSuffixStr( self._format, self._charset, self._parent.endian) suffix_str = property(_getSuffixStr) def _convertText(self, text): if not self._charset: # charset is still unknown: guess the charset self._charset = guessBytesCharset(text, default=FALLBACK_CHARSET) # Try to convert to Unicode try: return unicode(text, self._charset, "strict") except UnicodeDecodeError, err: pass #--- Conversion error --- # Fix truncated UTF-16 string like 'B\0e' (3 bytes) # => Add missing nul byte: 'B\0e\0' (4 bytes) if err.reason == "truncated data" \ and err.end == len(text) \ and self._charset == "UTF-16-LE": try: text = unicode(text+"\0", self._charset, "strict") self.warning("Fix truncated %s string: add missing nul byte" % self._charset) return text except UnicodeDecodeError, err: pass # On error, use FALLBACK_CHARSET self.warning(u"Unable to convert string to Unicode: %s" % err) return unicode(text, FALLBACK_CHARSET, "strict") def _guessCharset(self): addr = self.absolute_address + self._content_offset * 8 bytes = self._parent.stream.readBytes(addr, self._content_size) return guessBytesCharset(bytes, default=FALLBACK_CHARSET) def createValue(self, human=True): # Compress data address (in bits) and size (in bytes) if human: addr = self.absolute_address + self._content_offset * 8 size = self._content_size else: addr = self.absolute_address size = self._size // 8 if size == 0: # Empty string return u"" # Read bytes in data stream text = self._parent.stream.readBytes(addr, size) # Don't transform data? if not human: return text # Convert text to Unicode text = self._convertText(text) # Truncate if self._truncate: pos = text.find(self._truncate) if 0 <= pos: text = text[:pos] # Strip string if needed if self._strip: if isinstance(self._strip, (str, unicode)): text = text.strip(self._strip) else: text = text.strip() assert isinstance(text, unicode) return text def createDisplay(self, human=True): if not human: if self._raw_value is None: self._raw_value = GenericString.createValue(self, False) value = makePrintable(self._raw_value, "ASCII", to_unicode=True) elif self._charset: value = makePrintable(self.value, "ISO-8859-1", to_unicode=True) else: value = self.value if config.max_string_length < len(value): # Truncate string if needed value = "%s(...)" % value[:config.max_string_length] if not self._charset or not human: return makePrintable(value, "ASCII", quote='"', to_unicode=True) else: if value: return '"%s"' % value.replace('"', '\\"') else: return _("(empty)") def createRawDisplay(self): return GenericString.createDisplay(self, human=False) def _getLength(self): if self._length is None: self._length = len(self.value) return self._length length = property(_getLength, doc="String length in characters") def _getFormat(self): return self._format format = property(_getFormat, doc="String format (eg. 'C')") def _getCharset(self): if not self._charset: self._charset = self._guessCharset() return self._charset charset = property(_getCharset, doc="String charset (eg. 'ISO-8859-1')") def _getContentSize(self): return self._content_size content_size = property(_getContentSize, doc="Content size in bytes") def _getContentOffset(self): return self._content_offset content_offset = property(_getContentOffset, doc="Content offset in bytes") def getFieldType(self): info = self.charset if self._strip: if isinstance(self._strip, (str, unicode)): info += ",strip=%s" % makePrintable(self._strip, "ASCII", quote="'") else: info += ",strip=True" return "%s<%s>" % (Bytes.getFieldType(self), info) def stringFactory(name, format, doc): class NewString(GenericString): __doc__ = doc def __init__(self, parent, name, description=None, strip=None, charset=None, truncate=None): GenericString.__init__(self, parent, name, format, description, strip=strip, charset=charset, truncate=truncate) cls = NewString cls.__name__ = name return cls # String which ends with nul byte ("\0") CString = stringFactory("CString", "C", r"""C string: string ending with nul byte. See GenericString to get more information.""") # Unix line of text: string which ends with "\n" (ASCII 0x0A) UnixLine = stringFactory("UnixLine", "UnixLine", r"""Unix line: string ending with "\n" (ASCII code 10). See GenericString to get more information.""") # String prefixed with length written in a 8-bit integer PascalString8 = stringFactory("PascalString8", "Pascal8", r"""Pascal string: string prefixed with 8-bit integer containing its length (endian depends on parent endian). See GenericString to get more information.""") # String prefixed with length written in a 16-bit integer (use parent endian) PascalString16 = stringFactory("PascalString16", "Pascal16", r"""Pascal string: string prefixed with 16-bit integer containing its length (endian depends on parent endian). See GenericString to get more information.""") # String prefixed with length written in a 32-bit integer (use parent endian) PascalString32 = stringFactory("PascalString32", "Pascal32", r"""Pascal string: string prefixed with 32-bit integer containing its length (endian depends on parent endian). See GenericString to get more information.""") class String(GenericString): """ String with fixed size (size in bytes). See GenericString to get more information. """ static_size = staticmethod(lambda *args, **kw: args[1]*8) def __init__(self, parent, name, nbytes, description=None, strip=None, charset=None, truncate=None): GenericString.__init__(self, parent, name, "fixed", description, strip=strip, charset=charset, nbytes=nbytes, truncate=truncate) String.__name__ = "FixedString" hachoir-core-1.3.3/hachoir_core/field/character.py0000644000175000017500000000134711251277274021136 0ustar haypohaypo""" Character field class: a 8-bit character """ from hachoir_core.field import Bits from hachoir_core.endian import BIG_ENDIAN from hachoir_core.tools import makePrintable class Character(Bits): """ A 8-bit character using ASCII charset for display attribute. """ static_size = 8 def __init__(self, parent, name, description=None): Bits.__init__(self, parent, name, 8, description=description) def createValue(self): return chr(self._parent.stream.readBits( self.absolute_address, 8, BIG_ENDIAN)) def createRawDisplay(self): return unicode(Bits.createValue(self)) def createDisplay(self): return makePrintable(self.value, "ASCII", quote="'", to_unicode=True) hachoir-core-1.3.3/hachoir_core/field/helper.py0000644000175000017500000000355111251277274020460 0ustar haypohaypofrom hachoir_core.field import (FieldError, RawBits, RawBytes, PaddingBits, PaddingBytes, NullBits, NullBytes, GenericString, GenericInteger) from hachoir_core.stream import FileOutputStream def createRawField(parent, size, name="raw[]", description=None): if size <= 0: raise FieldError("Unable to create raw field of %s bits" % size) if (size % 8) == 0: return RawBytes(parent, name, size/8, description) else: return RawBits(parent, name, size, description) def createPaddingField(parent, nbits, name="padding[]", description=None): if nbits <= 0: raise FieldError("Unable to create padding of %s bits" % nbits) if (nbits % 8) == 0: return PaddingBytes(parent, name, nbits/8, description) else: return PaddingBits(parent, name, nbits, description) def createNullField(parent, nbits, name="padding[]", description=None): if nbits <= 0: raise FieldError("Unable to create null padding of %s bits" % nbits) if (nbits % 8) == 0: return NullBytes(parent, name, nbits/8, description) else: return NullBits(parent, name, nbits, description) def isString(field): return issubclass(field.__class__, GenericString) def isInteger(field): return issubclass(field.__class__, GenericInteger) def writeIntoFile(fieldset, filename): output = FileOutputStream(filename) fieldset.writeInto(output) def createOrphanField(fieldset, address, field_cls, *args, **kw): """ Create an orphan field at specified address: field_cls(fieldset, *args, **kw) The field uses the fieldset properties but it isn't added to the field set. """ save_size = fieldset._current_size try: fieldset._current_size = address field = field_cls(fieldset, *args, **kw) finally: fieldset._current_size = save_size return field hachoir-core-1.3.3/hachoir_core/field/enum.py0000644000175000017500000000144111251277274020141 0ustar haypohaypodef Enum(field, enum, key_func=None): """ Enum is an adapter to another field: it will just change its display attribute. It uses a dictionnary to associate a value to another. key_func is an optional function with prototype "def func(key)->key" which is called to transform key. """ display = field.createDisplay if key_func: def createDisplay(): try: key = key_func(field.value) return enum[key] except LookupError: return display() else: def createDisplay(): try: return enum[field.value] except LookupError: return display() field.createDisplay = createDisplay field.getEnum = lambda: enum return field hachoir-core-1.3.3/hachoir_core/field/float.py0000644000175000017500000000672311251277274020312 0ustar haypohaypofrom hachoir_core.field import Bit, Bits, FieldSet from hachoir_core.endian import BIG_ENDIAN, LITTLE_ENDIAN import struct # Make sure that we use right struct types assert struct.calcsize("f") == 4 assert struct.calcsize("d") == 8 assert struct.unpack("d", "\xc0\0\0\0\0\0\0\0")[0] == -2.0 class FloatMantissa(Bits): def createValue(self): value = Bits.createValue(self) return 1 + float(value) / (2 ** self.size) def createRawDisplay(self): return unicode(Bits.createValue(self)) class FloatExponent(Bits): def __init__(self, parent, name, size): Bits.__init__(self, parent, name, size) self.bias = 2 ** (size-1) - 1 def createValue(self): return Bits.createValue(self) - self.bias def createRawDisplay(self): return unicode(self.value + self.bias) def floatFactory(name, format, mantissa_bits, exponent_bits, doc): size = 1 + mantissa_bits + exponent_bits class Float(FieldSet): static_size = size __doc__ = doc def __init__(self, parent, name, description=None): assert parent.endian in (BIG_ENDIAN, LITTLE_ENDIAN) FieldSet.__init__(self, parent, name, description, size) if format: if self._parent.endian == BIG_ENDIAN: self.struct_format = ">"+format else: self.struct_format = "<"+format else: self.struct_format = None def createValue(self): """ Create float value: use struct.unpack() when it's possible (32 and 64-bit float) or compute it with : mantissa * (2.0 ** exponent) This computation may raise an OverflowError. """ if self.struct_format: raw = self._parent.stream.readBytes( self.absolute_address, self._size//8) try: return struct.unpack(self.struct_format, raw)[0] except struct.error, err: raise ValueError("[%s] conversion error: %s" % (self.__class__.__name__, err)) else: try: value = self["mantissa"].value * (2.0 ** float(self["exponent"].value)) if self["negative"].value: return -(value) else: return value except OverflowError: raise ValueError("[%s] floating point overflow" % self.__class__.__name__) def createFields(self): yield Bit(self, "negative") yield FloatExponent(self, "exponent", exponent_bits) if 64 <= mantissa_bits: yield Bit(self, "one") yield FloatMantissa(self, "mantissa", mantissa_bits-1) else: yield FloatMantissa(self, "mantissa", mantissa_bits) cls = Float cls.__name__ = name return cls # 32-bit float (standart: IEEE 754/854) Float32 = floatFactory("Float32", "f", 23, 8, "Floating point number: format IEEE 754 int 32 bit") # 64-bit float (standart: IEEE 754/854) Float64 = floatFactory("Float64", "d", 52, 11, "Floating point number: format IEEE 754 in 64 bit") # 80-bit float (standart: IEEE 754/854) Float80 = floatFactory("Float80", None, 64, 15, "Floating point number: format IEEE 754 in 80 bit") hachoir-core-1.3.3/hachoir_core/field/static_field_set.py0000644000175000017500000000350111272654472022503 0ustar haypohaypofrom hachoir_core.field import FieldSet, ParserError class StaticFieldSet(FieldSet): """ Static field set: format class attribute is a tuple of all fields in syntax like: format = ( (TYPE1, ARG1, ARG2, ...), (TYPE2, ARG1, ARG2, ..., {KEY1=VALUE1, ...}), ... ) Types with dynamic size are forbidden, eg. CString, PascalString8, etc. """ format = None # You have to redefine this class variable _class = None def __new__(cls, *args, **kw): assert cls.format is not None, "Class attribute 'format' is not set" if cls._class is not cls.__name__: cls._class = cls.__name__ cls.static_size = cls._computeStaticSize() return object.__new__(cls) @staticmethod def _computeItemSize(item): item_class = item[0] if item_class.static_size is None: raise ParserError("Unable to get static size of field type: %s" % item_class.__name__) if callable(item_class.static_size): if isinstance(item[-1], dict): return item_class.static_size(*item[1:-1], **item[-1]) else: return item_class.static_size(*item[1:]) else: assert isinstance(item_class.static_size, (int, long)) return item_class.static_size def createFields(self): for item in self.format: if isinstance(item[-1], dict): yield item[0](self, *item[1:-1], **item[-1]) else: yield item[0](self, *item[1:]) @classmethod def _computeStaticSize(cls, *args): return sum(cls._computeItemSize(item) for item in cls.format) # Initial value of static_size, it changes when first instance # is created (see __new__) static_size = _computeStaticSize hachoir-core-1.3.3/hachoir_core/field/field.py0000644000175000017500000002056411251277274020267 0ustar haypohaypo""" Parent of all (field) classes in Hachoir: Field. """ from hachoir_core.compatibility import reversed from hachoir_core.stream import InputFieldStream from hachoir_core.error import HachoirError, HACHOIR_ERRORS from hachoir_core.log import Logger from hachoir_core.i18n import _ from hachoir_core.tools import makePrintable from weakref import ref as weakref_ref class FieldError(HachoirError): """ Error raised by a L{Field}. @see: L{HachoirError} """ pass def joinPath(path, name): if path != "/": return "/".join((path, name)) else: return "/%s" % name class MissingField(KeyError, FieldError): def __init__(self, field, key): KeyError.__init__(self) self.field = field self.key = key def __str__(self): return 'Can\'t get field "%s" from %s' % (self.key, self.field.path) def __unicode__(self): return u'Can\'t get field "%s" from %s' % (self.key, self.field.path) class Field(Logger): # static size can have two differents value: None (no static size), an # integer (number of bits), or a function which returns an integer. # # This function receives exactly the same arguments than the constructor # except the first one (one). Example of function: # static_size = staticmethod(lambda *args, **kw: args[1]) static_size = None # Indicate if this field contains other fields (is a field set) or not is_field_set = False def __init__(self, parent, name, size=None, description=None): """ Set default class attributes, set right address if None address is given. @param parent: Parent field of this field @type parent: L{Field}|None @param name: Name of the field, have to be unique in parent. If it ends with "[]", end will be replaced with "[new_id]" (eg. "raw[]" becomes "raw[0]", next will be "raw[1]", and then "raw[2]", etc.) @type name: str @param size: Size of the field in bit (can be None, so it will be computed later) @type size: int|None @param address: Address in bit relative to the parent absolute address @type address: int|None @param description: Optional string description @type description: str|None """ assert issubclass(parent.__class__, Field) assert (size is None) or (0 <= size) self._parent = parent self._name = name self._address = parent.nextFieldAddress() self._size = size self._description = description def _logger(self): return self.path def createDescription(self): return "" def _getDescription(self): if self._description is None: try: self._description = self.createDescription() if isinstance(self._description, str): self._description = makePrintable( self._description, "ISO-8859-1", to_unicode=True) except HACHOIR_ERRORS, err: self.error("Error getting description: " + unicode(err)) self._description = "" return self._description description = property(_getDescription, doc="Description of the field (string)") def __str__(self): return self.display def __unicode__(self): return self.display def __repr__(self): return "<%s path=%r, address=%s, size=%s>" % ( self.__class__.__name__, self.path, self._address, self._size) def hasValue(self): return self._getValue() is not None def createValue(self): raise NotImplementedError() def _getValue(self): try: value = self.createValue() except HACHOIR_ERRORS, err: self.error(_("Unable to create value: %s") % unicode(err)) value = None self._getValue = lambda: value return value value = property(lambda self: self._getValue(), doc="Value of field") def _getParent(self): return self._parent parent = property(_getParent, doc="Parent of this field") def createDisplay(self): return unicode(self.value) def _getDisplay(self): if not hasattr(self, "_Field__display"): try: self.__display = self.createDisplay() except HACHOIR_ERRORS, err: self.error("Unable to create display: %s" % err) self.__display = u"" return self.__display display = property(lambda self: self._getDisplay(), doc="Short (unicode) string which represents field content") def createRawDisplay(self): value = self.value if isinstance(value, str): return makePrintable(value, "ASCII", to_unicode=True) else: return unicode(value) def _getRawDisplay(self): if not hasattr(self, "_Field__raw_display"): try: self.__raw_display = self.createRawDisplay() except HACHOIR_ERRORS, err: self.error("Unable to create raw display: %s" % err) self.__raw_display = u"" return self.__raw_display raw_display = property(lambda self: self._getRawDisplay(), doc="(Unicode) string which represents raw field content") def _getName(self): return self._name name = property(_getName, doc="Field name (unique in its parent field set list)") def _getIndex(self): if not self._parent: return None return self._parent.getFieldIndex(self) index = property(_getIndex) def _getPath(self): if not self._parent: return '/' names = [] field = self while field: names.append(field._name) field = field._parent names[-1] = '' return '/'.join(reversed(names)) path = property(_getPath, doc="Full path of the field starting at root field") def _getAddress(self): return self._address address = property(_getAddress, doc="Relative address in bit to parent address") def _getAbsoluteAddress(self): address = self._address current = self._parent while current: address += current._address current = current._parent return address absolute_address = property(_getAbsoluteAddress, doc="Absolute address (from stream beginning) in bit") def _getSize(self): return self._size size = property(_getSize, doc="Content size in bit") def _getField(self, name, const): if name.strip("."): return None field = self for index in xrange(1, len(name)): field = field._parent if field is None: break return field def getField(self, key, const=True): if key: if key[0] == "/": if self._parent: current = self._parent.root else: current = self if len(key) == 1: return current key = key[1:] else: current = self for part in key.split("/"): field = current._getField(part, const) if field is None: raise MissingField(current, part) current = field return current raise KeyError("Key must not be an empty string!") def __getitem__(self, key): return self.getField(key, False) def __contains__(self, key): try: return self.getField(key, False) is not None except FieldError: return False def _createInputStream(self, **args): assert self._parent return InputFieldStream(self, **args) def getSubIStream(self): if hasattr(self, "_sub_istream"): stream = self._sub_istream() else: stream = None if stream is None: stream = self._createInputStream() self._sub_istream = weakref_ref(stream) return stream def setSubIStream(self, createInputStream): cis = self._createInputStream self._createInputStream = lambda **args: createInputStream(cis, **args) def __nonzero__(self): """ Method called by code like "if field: (...)". Always returns True """ return True def getFieldType(self): return self.__class__.__name__ hachoir-core-1.3.3/hachoir_core/field/seekable_field_set.py0000644000175000017500000001374211325704631022767 0ustar haypohaypofrom hachoir_core.field import Field, BasicFieldSet, FakeArray, MissingField, ParserError from hachoir_core.tools import makeUnicode from hachoir_core.error import HACHOIR_ERRORS from itertools import repeat import hachoir_core.config as config class RootSeekableFieldSet(BasicFieldSet): def __init__(self, parent, name, stream, description, size): BasicFieldSet.__init__(self, parent, name, stream, description, size) self._generator = self.createFields() self._offset = 0 self._current_size = 0 if size: self._current_max_size = size else: self._current_max_size = 0 self._field_dict = {} self._field_array = [] def _feedOne(self): assert self._generator field = self._generator.next() self._addField(field) return field def array(self, key): return FakeArray(self, key) def getFieldByAddress(self, address, feed=True): for field in self._field_array: if field.address <= address < field.address + field.size: return field for field in self._readFields(): if field.address <= address < field.address + field.size: return field return None def _stopFeed(self): self._size = self._current_max_size self._generator = None done = property(lambda self: not bool(self._generator)) def _getSize(self): if self._size is None: self._feedAll() return self._size size = property(_getSize) def _getField(self, key, const): field = Field._getField(self, key, const) if field is not None: return field if key in self._field_dict: return self._field_dict[key] if self._generator and not const: try: while True: field = self._feedOne() if field.name == key: return field except StopIteration: self._stopFeed() except HACHOIR_ERRORS, err: self.error("Error: %s" % makeUnicode(err)) self._stopFeed() return None def getField(self, key, const=True): if isinstance(key, (int, long)): if key < 0: raise KeyError("Key must be positive!") if not const: self.readFirstFields(key+1) if len(self._field_array) <= key: raise MissingField(self, key) return self._field_array[key] return Field.getField(self, key, const) def _addField(self, field): if field._name.endswith("[]"): self.setUniqueFieldName(field) if config.debug: self.info("[+] DBG: _addField(%s)" % field.name) if field._address != self._offset: self.warning("Set field %s address to %s (was %s)" % ( field.path, self._offset//8, field._address//8)) field._address = self._offset assert field.name not in self._field_dict self._checkFieldSize(field) self._field_dict[field.name] = field self._field_array.append(field) self._current_size += field.size self._offset += field.size self._current_max_size = max(self._current_max_size, field.address + field.size) def _checkAddress(self, address): if self._size is not None: max_addr = self._size else: # FIXME: Use parent size max_addr = self.stream.size return address < max_addr def _checkFieldSize(self, field): size = field.size addr = field.address if not self._checkAddress(addr+size-1): raise ParserError("Unable to add %s: field is too large" % field.name) def seekBit(self, address, relative=True): if not relative: address -= self.absolute_address if address < 0: raise ParserError("Seek below field set start (%s.%s)" % divmod(address, 8)) if not self._checkAddress(address): raise ParserError("Seek above field set end (%s.%s)" % divmod(address, 8)) self._offset = address return None def seekByte(self, address, relative=True): return self.seekBit(address*8, relative) def readMoreFields(self, number): return self._readMoreFields(xrange(number)) def _feedAll(self): return self._readMoreFields(repeat(1)) def _readFields(self): while True: added = self._readMoreFields(xrange(1)) if not added: break yield self._field_array[-1] def _readMoreFields(self, index_generator): added = 0 if self._generator: try: for index in index_generator: self._feedOne() added += 1 except StopIteration: self._stopFeed() except HACHOIR_ERRORS, err: self.error("Error: %s" % makeUnicode(err)) self._stopFeed() return added current_length = property(lambda self: len(self._field_array)) current_size = property(lambda self: self._offset) def __iter__(self): for field in self._field_array: yield field if self._generator: try: while True: yield self._feedOne() except StopIteration: self._stopFeed() raise StopIteration def __len__(self): if self._generator: self._feedAll() return len(self._field_array) def nextFieldAddress(self): return self._offset def getFieldIndex(self, field): return self._field_array.index(field) class SeekableFieldSet(RootSeekableFieldSet): def __init__(self, parent, name, description=None, size=None): assert issubclass(parent.__class__, BasicFieldSet) RootSeekableFieldSet.__init__(self, parent, name, parent.stream, description, size) hachoir-core-1.3.3/hachoir_core/field/__init__.py0000644000175000017500000000435011251277274020736 0ustar haypohaypo# Field classes from hachoir_core.field.field import Field, FieldError, MissingField, joinPath from hachoir_core.field.bit_field import Bit, Bits, RawBits from hachoir_core.field.byte_field import Bytes, RawBytes from hachoir_core.field.sub_file import SubFile, CompressedField from hachoir_core.field.character import Character from hachoir_core.field.integer import ( Int8, Int16, Int24, Int32, Int64, UInt8, UInt16, UInt24, UInt32, UInt64, GenericInteger) from hachoir_core.field.enum import Enum from hachoir_core.field.string_field import (GenericString, String, CString, UnixLine, PascalString8, PascalString16, PascalString32) from hachoir_core.field.padding import (PaddingBits, PaddingBytes, NullBits, NullBytes) # Functions from hachoir_core.field.helper import (isString, isInteger, createPaddingField, createNullField, createRawField, writeIntoFile, createOrphanField) # FieldSet classes from hachoir_core.field.fake_array import FakeArray from hachoir_core.field.basic_field_set import (BasicFieldSet, ParserError, MatchError) from hachoir_core.field.generic_field_set import GenericFieldSet from hachoir_core.field.seekable_field_set import SeekableFieldSet, RootSeekableFieldSet from hachoir_core.field.field_set import FieldSet from hachoir_core.field.static_field_set import StaticFieldSet from hachoir_core.field.parser import Parser from hachoir_core.field.vector import GenericVector, UserVector # Complex types from hachoir_core.field.float import Float32, Float64, Float80 from hachoir_core.field.timestamp import (GenericTimestamp, TimestampUnix32, TimestampUnix64, TimestampMac32, TimestampUUID60, TimestampWin64, DateTimeMSDOS32, TimeDateMSDOS32, TimedeltaWin64) # Special Field classes from hachoir_core.field.link import Link, Fragment available_types = ( Bit, Bits, RawBits, Bytes, RawBytes, SubFile, Character, Int8, Int16, Int24, Int32, Int64, UInt8, UInt16, UInt24, UInt32, UInt64, String, CString, UnixLine, PascalString8, PascalString16, PascalString32, Float32, Float64, PaddingBits, PaddingBytes, NullBits, NullBytes, TimestampUnix32, TimestampMac32, TimestampWin64, DateTimeMSDOS32, TimeDateMSDOS32, # GenericInteger, GenericString, ) hachoir-core-1.3.3/hachoir_core/field/parser.py0000644000175000017500000000262011251277274020471 0ustar haypohaypofrom hachoir_core.endian import BIG_ENDIAN, LITTLE_ENDIAN from hachoir_core.field import GenericFieldSet from hachoir_core.log import Logger import hachoir_core.config as config class Parser(GenericFieldSet): """ A parser is the root of all other fields. It create first level of fields and have special attributes and methods: - endian: Byte order (L{BIG_ENDIAN} or L{LITTLE_ENDIAN}) of input data ; - stream: Data input stream (set in L{__init__()}) ; - size: Field set size will be size of input stream. """ def __init__(self, stream, description=None): """ Parser constructor @param stream: Data input stream (see L{InputStream}) @param description: (optional) String description """ # Check arguments assert hasattr(self, "endian") \ and self.endian in (BIG_ENDIAN, LITTLE_ENDIAN) # Call parent constructor GenericFieldSet.__init__(self, None, "root", stream, description, stream.askSize(self)) def _logger(self): return Logger._logger(self) def _setSize(self, size): self._truncate(size) self.raiseEvent("field-resized", self) size = property(lambda self: self._size, doc="Size in bits") path = property(lambda self: "/") # dummy definition to prevent hachoir-core from depending on hachoir-parser autofix = property(lambda self: config.autofix) hachoir-core-1.3.3/hachoir_core/field/bit_field.py0000644000175000017500000000336111251277274021121 0ustar haypohaypo""" Bit sized classes: - Bit: Single bit, value is False or True ; - Bits: Integer with a size in bits ; - RawBits: unknown content with a size in bits. """ from hachoir_core.field import Field from hachoir_core.i18n import _ from hachoir_core import config class RawBits(Field): """ Unknown content with a size in bits. """ static_size = staticmethod(lambda *args, **kw: args[1]) def __init__(self, parent, name, size, description=None): """ Constructor: see L{Field.__init__} for parameter description """ Field.__init__(self, parent, name, size, description) def hasValue(self): return True def createValue(self): return self._parent.stream.readBits( self.absolute_address, self._size, self._parent.endian) def createDisplay(self): if self._size < config.max_bit_length: return unicode(self.value) else: return _("<%s size=%u>" % (self.__class__.__name__, self._size)) createRawDisplay = createDisplay class Bits(RawBits): """ Positive integer with a size in bits @see: L{Bit} @see: L{RawBits} """ pass class Bit(RawBits): """ Single bit: value can be False or True, and size is exactly one bit. @see: L{Bits} """ static_size = 1 def __init__(self, parent, name, description=None): """ Constructor: see L{Field.__init__} for parameter description """ RawBits.__init__(self, parent, name, 1, description=description) def createValue(self): return 1 == self._parent.stream.readBits( self.absolute_address, 1, self._parent.endian) def createRawDisplay(self): return unicode(int(self.value)) hachoir-core-1.3.3/hachoir_core/field/vector.py0000644000175000017500000000251411251277274020501 0ustar haypohaypofrom hachoir_core.field import Field, FieldSet, ParserError class GenericVector(FieldSet): def __init__(self, parent, name, nb_items, item_class, item_name="item", description=None): # Sanity checks assert issubclass(item_class, Field) assert isinstance(item_class.static_size, (int, long)) if not(0 < nb_items): raise ParserError('Unable to create empty vector "%s" in %s' \ % (name, parent.path)) size = nb_items * item_class.static_size self.__nb_items = nb_items self._item_class = item_class self._item_name = item_name FieldSet.__init__(self, parent, name, description, size=size) def __len__(self): return self.__nb_items def createFields(self): name = self._item_name + "[]" parser = self._item_class for index in xrange(len(self)): yield parser(self, name) class UserVector(GenericVector): """ To implement: - item_name: name of a field without [] (eg. "color" becomes "color[0]"), default value is "item" - item_class: class of an item """ item_class = None item_name = "item" def __init__(self, parent, name, nb_items, description=None): GenericVector.__init__(self, parent, name, nb_items, self.item_class, self.item_name, description) hachoir-core-1.3.3/hachoir_core/field/timestamp.py0000644000175000017500000000556511251277274021213 0ustar haypohaypofrom hachoir_core.tools import (humanDatetime, humanDuration, timestampUNIX, timestampMac32, timestampUUID60, timestampWin64, durationWin64) from hachoir_core.field import Bits, FieldSet from datetime import datetime class GenericTimestamp(Bits): def __init__(self, parent, name, size, description=None): Bits.__init__(self, parent, name, size, description) def createDisplay(self): return humanDatetime(self.value) def createRawDisplay(self): value = Bits.createValue(self) return unicode(value) def __nonzero__(self): return Bits.createValue(self) != 0 def timestampFactory(cls_name, handler, size): class Timestamp(GenericTimestamp): def __init__(self, parent, name, description=None): GenericTimestamp.__init__(self, parent, name, size, description) def createValue(self): value = Bits.createValue(self) return handler(value) cls = Timestamp cls.__name__ = cls_name return cls TimestampUnix32 = timestampFactory("TimestampUnix32", timestampUNIX, 32) TimestampUnix64 = timestampFactory("TimestampUnix64", timestampUNIX, 64) TimestampMac32 = timestampFactory("TimestampUnix32", timestampMac32, 32) TimestampUUID60 = timestampFactory("TimestampUUID60", timestampUUID60, 60) TimestampWin64 = timestampFactory("TimestampWin64", timestampWin64, 64) class TimeDateMSDOS32(FieldSet): """ 32-bit MS-DOS timestamp (16-bit time, 16-bit date) """ static_size = 32 def createFields(self): # TODO: Create type "MSDOS_Second" : value*2 yield Bits(self, "second", 5, "Second/2") yield Bits(self, "minute", 6) yield Bits(self, "hour", 5) yield Bits(self, "day", 5) yield Bits(self, "month", 4) # TODO: Create type "MSDOS_Year" : value+1980 yield Bits(self, "year", 7, "Number of year after 1980") def createValue(self): return datetime( 1980+self["year"].value, self["month"].value, self["day"].value, self["hour"].value, self["minute"].value, 2*self["second"].value) def createDisplay(self): return humanDatetime(self.value) class DateTimeMSDOS32(TimeDateMSDOS32): """ 32-bit MS-DOS timestamp (16-bit date, 16-bit time) """ def createFields(self): yield Bits(self, "day", 5) yield Bits(self, "month", 4) yield Bits(self, "year", 7, "Number of year after 1980") yield Bits(self, "second", 5, "Second/2") yield Bits(self, "minute", 6) yield Bits(self, "hour", 5) class TimedeltaWin64(GenericTimestamp): def __init__(self, parent, name, description=None): GenericTimestamp.__init__(self, parent, name, 64, description) def createDisplay(self): return humanDuration(self.value) def createValue(self): value = Bits.createValue(self) return durationWin64(value) hachoir-core-1.3.3/hachoir_core/field/padding.py0000644000175000017500000001057711251277274020615 0ustar haypohaypofrom hachoir_core.field import Bits, Bytes from hachoir_core.tools import makePrintable, humanFilesize from hachoir_core import config class PaddingBits(Bits): """ Padding bits used, for example, to align address (of next field). See also NullBits and PaddingBytes types. Arguments: * nbits: Size of the field in bits Optional arguments: * pattern (int): Content pattern, eg. 0 if all bits are set to 0 """ static_size = staticmethod(lambda *args, **kw: args[1]) MAX_SIZE = 128 def __init__(self, parent, name, nbits, description="Padding", pattern=None): Bits.__init__(self, parent, name, nbits, description) self.pattern = pattern self._display_pattern = self.checkPattern() def checkPattern(self): if not(config.check_padding_pattern): return False if self.pattern != 0: return False if self.MAX_SIZE < self._size: value = self._parent.stream.readBits( self.absolute_address, self.MAX_SIZE, self._parent.endian) else: value = self.value if value != 0: self.warning("padding contents doesn't look normal (invalid pattern)") return False if self.MAX_SIZE < self._size: self.info("only check first %u bits" % self.MAX_SIZE) return True def createDisplay(self): if self._display_pattern: return u"" % self.pattern else: return Bits.createDisplay(self) class PaddingBytes(Bytes): """ Padding bytes used, for example, to align address (of next field). See also NullBytes and PaddingBits types. Arguments: * nbytes: Size of the field in bytes Optional arguments: * pattern (str): Content pattern, eg. "\0" for nul bytes """ static_size = staticmethod(lambda *args, **kw: args[1]*8) MAX_SIZE = 4096 def __init__(self, parent, name, nbytes, description="Padding", pattern=None): """ pattern is None or repeated string """ assert (pattern is None) or (isinstance(pattern, str)) Bytes.__init__(self, parent, name, nbytes, description) self.pattern = pattern self._display_pattern = self.checkPattern() def checkPattern(self): if not(config.check_padding_pattern): return False if self.pattern is None: return False if self.MAX_SIZE < self._size/8: self.info("only check first %s of padding" % humanFilesize(self.MAX_SIZE)) content = self._parent.stream.readBytes( self.absolute_address, self.MAX_SIZE) else: content = self.value index = 0 pattern_len = len(self.pattern) while index < len(content): if content[index:index+pattern_len] != self.pattern: self.warning( "padding contents doesn't look normal" " (invalid pattern at byte %u)!" % index) return False index += pattern_len return True def createDisplay(self): if self._display_pattern: return u"" % makePrintable(self.pattern, "ASCII", quote="'") else: return Bytes.createDisplay(self) def createRawDisplay(self): return Bytes.createDisplay(self) class NullBits(PaddingBits): """ Null padding bits used, for example, to align address (of next field). See also PaddingBits and NullBytes types. Arguments: * nbits: Size of the field in bits """ def __init__(self, parent, name, nbits, description=None): PaddingBits.__init__(self, parent, name, nbits, description, pattern=0) def createDisplay(self): if self._display_pattern: return "" else: return Bits.createDisplay(self) class NullBytes(PaddingBytes): """ Null padding bytes used, for example, to align address (of next field). See also PaddingBytes and NullBits types. Arguments: * nbytes: Size of the field in bytes """ def __init__(self, parent, name, nbytes, description=None): PaddingBytes.__init__(self, parent, name, nbytes, description, pattern="\0") def createDisplay(self): if self._display_pattern: return "" else: return Bytes.createDisplay(self) hachoir-core-1.3.3/hachoir_core/field/link.py0000644000175000017500000000615011251277274020134 0ustar haypohaypofrom hachoir_core.field import Field, FieldSet, ParserError, Bytes, MissingField from hachoir_core.stream import FragmentedStream class Link(Field): def __init__(self, parent, name, *args, **kw): Field.__init__(self, parent, name, 0, *args, **kw) def hasValue(self): return True def createValue(self): return self._parent[self.display] def createDisplay(self): value = self.value if value is None: return "<%s>" % MissingField.__name__ return value.path def _getField(self, name, const): target = self.value assert self != target return target._getField(name, const) class Fragments: def __init__(self, first): self.first = first def __iter__(self): fragment = self.first while fragment is not None: data = fragment.getData() yield data and data.size fragment = fragment.next class Fragment(FieldSet): _first = None def __init__(self, *args, **kw): FieldSet.__init__(self, *args, **kw) self._field_generator = self._createFields(self._field_generator) if self.__class__.createFields == Fragment.createFields: self._getData = lambda: self def getData(self): try: return self._getData() except MissingField, e: self.error(str(e)) return None def setLinks(self, first, next=None): self._first = first or self self._next = next self._feedLinks = lambda: self return self def _feedLinks(self): while self._first is None and self.readMoreFields(1): pass if self._first is None: raise ParserError("first is None") return self first = property(lambda self: self._feedLinks()._first) def _getNext(self): next = self._feedLinks()._next if callable(next): self._next = next = next() return next next = property(_getNext) def _createInputStream(self, **args): first = self.first if first is self and hasattr(first, "_getData"): return FragmentedStream(first, packets=Fragments(first), **args) return FieldSet._createInputStream(self, **args) def _createFields(self, field_generator): if self._first is None: for field in field_generator: if self._first is not None: break yield field else: raise ParserError("Fragment.setLinks not called") else: field = None if self._first is not self: link = Link(self, "first", None) link._getValue = lambda: self._first yield link if self._next: link = Link(self, "next", None) link.createValue = self._getNext yield link if field: yield field for field in field_generator: yield field def createFields(self): if self._size is None: self._size = self._getSize() yield Bytes(self, "data", self._size/8) hachoir-core-1.3.3/hachoir_core/field/new_seekable_field_set.py0000644000175000017500000000574711325444756023657 0ustar haypohaypofrom hachoir_core.field import BasicFieldSet, GenericFieldSet, ParserError, createRawField from hachoir_core.error import HACHOIR_ERRORS # getgaps(int, int, [listof (int, int)]) -> generator of (int, int) # Gets all the gaps not covered by a block in `blocks` from `start` for `length` units. def getgaps(start, length, blocks): ''' Example: >>> list(getgaps(0, 20, [(15,3), (6,2), (6,2), (1,2), (2,3), (11,2), (9,5)])) [(0, 1), (5, 1), (8, 1), (14, 1), (18, 2)] ''' # done this way to avoid mutating the original blocks = sorted(blocks, key=lambda b: b[0]) end = start+length for s, l in blocks: if s > start: yield (start, s-start) start = s if s+l > start: start = s+l if start < end: yield (start, end-start) class NewRootSeekableFieldSet(GenericFieldSet): def seekBit(self, address, relative=True): if not relative: address -= self.absolute_address if address < 0: raise ParserError("Seek below field set start (%s.%s)" % divmod(address, 8)) self._current_size = address return None def seekByte(self, address, relative=True): return self.seekBit(address*8, relative) def _fixLastField(self): """ Try to fix last field when we know current field set size. Returns new added field if any, or None. """ assert self._size is not None # Stop parser message = ["stop parser"] self._field_generator = None # If last field is too big, delete it while self._size < self._current_size: field = self._deleteField(len(self._fields)-1) message.append("delete field %s" % field.path) assert self._current_size <= self._size blocks = [(x.absolute_address, x.size) for x in self._fields] fields = [] for start, length in getgaps(self.absolute_address, self._size, blocks): self.seekBit(start, relative=False) field = createRawField(self, length, "unparsed[]") self.setUniqueFieldName(field) self._fields.append(field.name, field) fields.append(field) message.append("found unparsed segment: start %s, length %s" % (start, length)) self.seekBit(self._size, relative=False) message = ", ".join(message) if fields: self.warning("[Autofix] Fix parser error: " + message) return fields def _stopFeeding(self): new_field = None if self._size is None: if self._parent: self._size = self._current_size new_field = self._fixLastField() self._field_generator = None return new_field class NewSeekableFieldSet(NewRootSeekableFieldSet): def __init__(self, parent, name, description=None, size=None): assert issubclass(parent.__class__, BasicFieldSet) NewRootSeekableFieldSet.__init__(self, parent, name, parent.stream, description, size) hachoir-core-1.3.3/hachoir_core/field/byte_field.py0000644000175000017500000000421611251277274021306 0ustar haypohaypo""" Very basic field: raw content with a size in byte. Use this class for unknown content. """ from hachoir_core.field import Field, FieldError from hachoir_core.tools import makePrintable from hachoir_core.bits import str2hex from hachoir_core import config MAX_LENGTH = (2**64) class RawBytes(Field): """ Byte vector of unknown content @see: L{Bytes} """ static_size = staticmethod(lambda *args, **kw: args[1]*8) def __init__(self, parent, name, length, description="Raw data"): assert issubclass(parent.__class__, Field) if not(0 < length <= MAX_LENGTH): raise FieldError("Invalid RawBytes length (%s)!" % length) Field.__init__(self, parent, name, length*8, description) self._display = None def _createDisplay(self, human): max_bytes = config.max_byte_length if type(self._getValue) is type(lambda: None): display = self.value[:max_bytes] else: if self._display is None: address = self.absolute_address length = min(self._size / 8, max_bytes) self._display = self._parent.stream.readBytes(address, length) display = self._display truncated = (8 * len(display) < self._size) if human: if truncated: display += "(...)" return makePrintable(display, "latin-1", quote='"', to_unicode=True) else: display = str2hex(display, format=r"\x%02x") if truncated: return '"%s(...)"' % display else: return '"%s"' % display def createDisplay(self): return self._createDisplay(True) def createRawDisplay(self): return self._createDisplay(False) def hasValue(self): return True def createValue(self): assert (self._size % 8) == 0 if self._display: self._display = None return self._parent.stream.readBytes( self.absolute_address, self._size / 8) class Bytes(RawBytes): """ Byte vector: can be used for magic number or GUID/UUID for example. @see: L{RawBytes} """ pass hachoir-core-1.3.3/hachoir_core/field/field_set.py0000644000175000017500000000044011251277274021131 0ustar haypohaypofrom hachoir_core.field import BasicFieldSet, GenericFieldSet class FieldSet(GenericFieldSet): def __init__(self, parent, name, *args, **kw): assert issubclass(parent.__class__, BasicFieldSet) GenericFieldSet.__init__(self, parent, name, parent.stream, *args, **kw) hachoir-core-1.3.3/hachoir_core/field/sub_file.py0000644000175000017500000000513111251277274020765 0ustar haypohaypofrom hachoir_core.field import Bytes from hachoir_core.tools import makePrintable, humanFilesize from hachoir_core.stream import InputIOStream class SubFile(Bytes): """ File stored in another file """ def __init__(self, parent, name, length, description=None, parser=None, filename=None, mime_type=None, parser_class=None): if filename: if not isinstance(filename, unicode): filename = makePrintable(filename, "ISO-8859-1") if not description: description = 'File "%s" (%s)' % (filename, humanFilesize(length)) Bytes.__init__(self, parent, name, length, description) def createInputStream(cis, **args): tags = args.setdefault("tags",[]) if parser_class: tags.append(( "class", parser_class )) if parser is not None: tags.append(( "id", parser.PARSER_TAGS["id"] )) if mime_type: tags.append(( "mime", mime_type )) if filename: tags.append(( "filename", filename )) return cis(**args) self.setSubIStream(createInputStream) class CompressedStream: offset = 0 def __init__(self, stream, decompressor): self.stream = stream self.decompressor = decompressor(stream) self._buffer = '' def read(self, size): d = self._buffer data = [ d[:size] ] size -= len(d) if size > 0: d = self.decompressor(size) data.append(d[:size]) size -= len(d) while size > 0: n = 4096 if self.stream.size: n = min(self.stream.size - self.offset, n) if not n: break d = self.stream.read(self.offset, n)[1] self.offset += 8 * len(d) d = self.decompressor(size, d) data.append(d[:size]) size -= len(d) self._buffer = d[size+len(d):] return ''.join(data) def CompressedField(field, decompressor): def createInputStream(cis, source=None, **args): if field._parent: stream = cis(source=source) args.setdefault("tags", []).extend(stream.tags) else: stream = field.stream input = CompressedStream(stream, decompressor) if source is None: source = "Compressed source: '%s' (offset=%s)" % (stream.source, field.absolute_address) return InputIOStream(input, source=source, **args) field.setSubIStream(createInputStream) return field hachoir-core-1.3.3/hachoir_core/config.py0000644000175000017500000000173011251277274017360 0ustar haypohaypo""" Configuration of Hachoir """ import os # UI: display options max_string_length = 40 # Max. length in characters of GenericString.display max_byte_length = 14 # Max. length in bytes of RawBytes.display max_bit_length = 256 # Max. length in bits of RawBits.display unicode_stdout = True # Replace stdout and stderr with Unicode compatible objects # Disable it for readline or ipython # Global options debug = False # Display many informations usefull to debug verbose = False # Display more informations quiet = False # Don't display warnings # Use internationalization and localization (gettext)? if os.name == "nt": # TODO: Remove this hack and make i18n works on Windows :-) use_i18n = False else: use_i18n = True # Parser global options autofix = True # Enable Autofix? see hachoir_core.field.GenericFieldSet check_padding_pattern = True # Check padding fields pattern? hachoir-core-1.3.3/hachoir_core/dict.py0000644000175000017500000001240011251277274017032 0ustar haypohaypo""" Dictionnary classes which store values order. """ from hachoir_core.error import HachoirError from hachoir_core.i18n import _ class UniqKeyError(HachoirError): """ Error raised when a value is set whereas the key already exist in a dictionnary. """ pass class Dict(object): """ This class works like classic Python dict() but has an important method: __iter__() which allow to iterate into the dictionnary _values_ (and not keys like Python's dict does). """ def __init__(self, values=None): self._index = {} # key => index self._key_list = [] # index => key self._value_list = [] # index => value if values: for key, value in values: self.append(key,value) def _getValues(self): return self._value_list values = property(_getValues) def index(self, key): """ Search a value by its key and returns its index Returns None if the key doesn't exist. >>> d=Dict( (("two", "deux"), ("one", "un")) ) >>> d.index("two") 0 >>> d.index("one") 1 >>> d.index("three") is None True """ return self._index.get(key) def __getitem__(self, key): """ Get item with specified key. To get a value by it's index, use mydict.values[index] >>> d=Dict( (("two", "deux"), ("one", "un")) ) >>> d["one"] 'un' """ return self._value_list[self._index[key]] def __setitem__(self, key, value): self._value_list[self._index[key]] = value def append(self, key, value): """ Append new value """ if key in self._index: raise UniqKeyError(_("Key '%s' already exists") % key) self._index[key] = len(self._value_list) self._key_list.append(key) self._value_list.append(value) def __len__(self): return len(self._value_list) def __contains__(self, key): return key in self._index def __iter__(self): return iter(self._value_list) def iteritems(self): """ Create a generator to iterate on: (key, value). >>> d=Dict( (("two", "deux"), ("one", "un")) ) >>> for key, value in d.iteritems(): ... print "%r: %r" % (key, value) ... 'two': 'deux' 'one': 'un' """ for index in xrange(len(self)): yield (self._key_list[index], self._value_list[index]) def itervalues(self): """ Create an iterator on values """ return iter(self._value_list) def iterkeys(self): """ Create an iterator on keys """ return iter(self._key_list) def replace(self, oldkey, newkey, new_value): """ Replace an existing value with another one >>> d=Dict( (("two", "deux"), ("one", "un")) ) >>> d.replace("one", "three", 3) >>> d {'two': 'deux', 'three': 3} You can also use the classic form: >>> d['three'] = 4 >>> d {'two': 'deux', 'three': 4} """ index = self._index[oldkey] self._value_list[index] = new_value if oldkey != newkey: del self._index[oldkey] self._index[newkey] = index self._key_list[index] = newkey def __delitem__(self, index): """ Delete item at position index. May raise IndexError. >>> d=Dict( ((6, 'six'), (9, 'neuf'), (4, 'quatre')) ) >>> del d[1] >>> d {6: 'six', 4: 'quatre'} """ if index < 0: index += len(self._value_list) if not (0 <= index < len(self._value_list)): raise IndexError(_("list assignment index out of range (%s/%s)") % (index, len(self._value_list))) del self._value_list[index] del self._key_list[index] # First loop which may alter self._index for key, item_index in self._index.iteritems(): if item_index == index: del self._index[key] break # Second loop update indexes for key, item_index in self._index.iteritems(): if index < item_index: self._index[key] -= 1 def insert(self, index, key, value): """ Insert an item at specified position index. >>> d=Dict( ((6, 'six'), (9, 'neuf'), (4, 'quatre')) ) >>> d.insert(1, '40', 'quarante') >>> d {6: 'six', '40': 'quarante', 9: 'neuf', 4: 'quatre'} """ if key in self: raise UniqKeyError(_("Insert error: key '%s' ready exists") % key) _index = index if index < 0: index += len(self._value_list) if not(0 <= index <= len(self._value_list)): raise IndexError(_("Insert error: index '%s' is invalid") % _index) for item_key, item_index in self._index.iteritems(): if item_index >= index: self._index[item_key] += 1 self._index[key] = index self._key_list.insert(index, key) self._value_list.insert(index, value) def __repr__(self): items = ( "%r: %r" % (key, value) for key, value in self.iteritems() ) return "{%s}" % ", ".join(items) hachoir-core-1.3.3/hachoir_core/event_handler.py0000644000175000017500000000127711251277274020737 0ustar haypohaypoclass EventHandler(object): """ Class to connect events to event handlers. """ def __init__(self): self.handlers = {} def connect(self, event_name, handler): """ Connect an event handler to an event. Append it to handlers list. """ try: self.handlers[event_name].append(handler) except KeyError: self.handlers[event_name] = [handler] def raiseEvent(self, event_name, *args): """ Raiser an event: call each handler for this event_name. """ if event_name not in self.handlers: return for handler in self.handlers[event_name]: handler(*args) hachoir-core-1.3.3/hachoir_core/text_handler.py0000644000175000017500000000355711325704642020601 0ustar haypohaypo""" Utilities used to convert a field to human classic reprentation of data. """ from hachoir_core.tools import ( humanDuration, humanFilesize, alignValue, durationWin64 as doDurationWin64, deprecated) from types import FunctionType, MethodType from hachoir_core.field import Field def textHandler(field, handler): assert isinstance(handler, (FunctionType, MethodType)) assert issubclass(field.__class__, Field) field.createDisplay = lambda: handler(field) return field def displayHandler(field, handler): assert isinstance(handler, (FunctionType, MethodType)) assert issubclass(field.__class__, Field) field.createDisplay = lambda: handler(field.value) return field @deprecated("Use TimedeltaWin64 field type") def durationWin64(field): """ Convert Windows 64-bit duration to string. The timestamp format is a 64-bit number: number of 100ns. See also timestampWin64(). >>> durationWin64(type("", (), dict(value=2146280000, size=64))) u'3 min 34 sec 628 ms' >>> durationWin64(type("", (), dict(value=(1 << 64)-1, size=64))) u'58494 years 88 days 5 hours' """ assert hasattr(field, "value") and hasattr(field, "size") assert field.size == 64 delta = doDurationWin64(field.value) return humanDuration(delta) def filesizeHandler(field): """ Format field value using humanFilesize() """ return displayHandler(field, humanFilesize) def hexadecimal(field): """ Convert an integer to hexadecimal in lower case. Returns unicode string. >>> hexadecimal(type("", (), dict(value=412, size=16))) u'0x019c' >>> hexadecimal(type("", (), dict(value=0, size=32))) u'0x00000000' """ assert hasattr(field, "value") and hasattr(field, "size") size = field.size padding = alignValue(size, 4) // 4 pattern = u"0x%%0%ux" % padding return pattern % field.value hachoir-core-1.3.3/hachoir_core/error.py0000644000175000017500000000247611325704660017250 0ustar haypohaypo""" Functions to display an error (error, warning or information) message. """ from hachoir_core.log import log from hachoir_core.tools import makePrintable import sys, traceback def getBacktrace(empty="Empty backtrace."): """ Try to get backtrace as string. Returns "Error while trying to get backtrace" on failure. """ try: info = sys.exc_info() trace = traceback.format_exception(*info) sys.exc_clear() if trace[0] != "None\n": return "".join(trace) except: # No i18n here (imagine if i18n function calls error...) return "Error while trying to get backtrace" return empty class HachoirError(Exception): """ Parent of all errors in Hachoir library """ def __init__(self, message): message_bytes = makePrintable(message, "ASCII") Exception.__init__(self, message_bytes) self.text = message def __unicode__(self): return self.text # Error classes which may be raised by Hachoir core # FIXME: Add EnvironmentError (IOError or OSError) and AssertionError? # FIXME: Remove ArithmeticError and RuntimeError? HACHOIR_ERRORS = (HachoirError, LookupError, NameError, AttributeError, TypeError, ValueError, ArithmeticError, RuntimeError) info = log.info warning = log.warning error = log.error hachoir-core-1.3.3/hachoir_core/tools.py0000644000175000017500000004154711251277274017265 0ustar haypohaypo# -*- coding: utf-8 -*- """ Various utilities. """ from hachoir_core.i18n import _, ngettext import re import stat from datetime import datetime, timedelta, MAXYEAR from warnings import warn def deprecated(comment=None): """ This is a decorator which can be used to mark functions as deprecated. It will result in a warning being emmitted when the function is used. Examples: :: @deprecated def oldfunc(): ... @deprecated("use newfunc()!") def oldfunc2(): ... Code from: http://code.activestate.com/recipes/391367/ """ def _deprecated(func): def newFunc(*args, **kwargs): message = "Call to deprecated function %s" % func.__name__ if comment: message += ": " + comment warn(message, category=DeprecationWarning, stacklevel=2) return func(*args, **kwargs) newFunc.__name__ = func.__name__ newFunc.__doc__ = func.__doc__ newFunc.__dict__.update(func.__dict__) return newFunc return _deprecated def paddingSize(value, align): """ Compute size of a padding field. >>> paddingSize(31, 4) 1 >>> paddingSize(32, 4) 0 >>> paddingSize(33, 4) 3 Note: (value + paddingSize(value, align)) == alignValue(value, align) """ if value % align != 0: return align - (value % align) else: return 0 def alignValue(value, align): """ Align a value to next 'align' multiple. >>> alignValue(31, 4) 32 >>> alignValue(32, 4) 32 >>> alignValue(33, 4) 36 Note: alignValue(value, align) == (value + paddingSize(value, align)) """ if value % align != 0: return value + align - (value % align) else: return value def timedelta2seconds(delta): """ Convert a datetime.timedelta() objet to a number of second (floatting point number). >>> timedelta2seconds(timedelta(seconds=2, microseconds=40000)) 2.04 >>> timedelta2seconds(timedelta(minutes=1, milliseconds=250)) 60.25 """ return delta.microseconds / 1000000.0 \ + delta.seconds + delta.days * 60*60*24 def humanDurationNanosec(nsec): """ Convert a duration in nanosecond to human natural representation. Returns an unicode string. >>> humanDurationNanosec(60417893) u'60.42 ms' """ # Nano second if nsec < 1000: return u"%u nsec" % nsec # Micro seconds usec, nsec = divmod(nsec, 1000) if usec < 1000: return u"%.2f usec" % (usec+float(nsec)/1000) # Milli seconds msec, usec = divmod(usec, 1000) if msec < 1000: return u"%.2f ms" % (msec + float(usec)/1000) return humanDuration(msec) def humanDuration(delta): """ Convert a duration in millisecond to human natural representation. Returns an unicode string. >>> humanDuration(0) u'0 ms' >>> humanDuration(213) u'213 ms' >>> humanDuration(4213) u'4 sec 213 ms' >>> humanDuration(6402309) u'1 hour 46 min 42 sec' """ if not isinstance(delta, timedelta): delta = timedelta(microseconds=delta*1000) # Milliseconds text = [] if 1000 <= delta.microseconds: text.append(u"%u ms" % (delta.microseconds//1000)) # Seconds minutes, seconds = divmod(delta.seconds, 60) hours, minutes = divmod(minutes, 60) if seconds: text.append(u"%u sec" % seconds) if minutes: text.append(u"%u min" % minutes) if hours: text.append(ngettext("%u hour", "%u hours", hours) % hours) # Days years, days = divmod(delta.days, 365) if days: text.append(ngettext("%u day", "%u days", days) % days) if years: text.append(ngettext("%u year", "%u years", years) % years) if 3 < len(text): text = text[-3:] elif not text: return u"0 ms" return u" ".join(reversed(text)) def humanFilesize(size): """ Convert a file size in byte to human natural representation. It uses the values: 1 KB is 1024 bytes, 1 MB is 1024 KB, etc. The result is an unicode string. >>> humanFilesize(1) u'1 byte' >>> humanFilesize(790) u'790 bytes' >>> humanFilesize(256960) u'250.9 KB' """ if size < 10000: return ngettext("%u byte", "%u bytes", size) % size units = [_("KB"), _("MB"), _("GB"), _("TB")] size = float(size) divisor = 1024 for unit in units: size = size / divisor if size < divisor: return "%.1f %s" % (size, unit) return "%u %s" % (size, unit) def humanBitSize(size): """ Convert a size in bit to human classic representation. It uses the values: 1 Kbit is 1000 bits, 1 Mbit is 1000 Kbit, etc. The result is an unicode string. >>> humanBitSize(1) u'1 bit' >>> humanBitSize(790) u'790 bits' >>> humanBitSize(256960) u'257.0 Kbit' """ divisor = 1000 if size < divisor: return ngettext("%u bit", "%u bits", size) % size units = [u"Kbit", u"Mbit", u"Gbit", u"Tbit"] size = float(size) for unit in units: size = size / divisor if size < divisor: return "%.1f %s" % (size, unit) return u"%u %s" % (size, unit) def humanBitRate(size): """ Convert a bit rate to human classic representation. It uses humanBitSize() to convert size into human reprensation. The result is an unicode string. >>> humanBitRate(790) u'790 bits/sec' >>> humanBitRate(256960) u'257.0 Kbit/sec' """ return "".join((humanBitSize(size), "/sec")) def humanFrequency(hertz): """ Convert a frequency in hertz to human classic representation. It uses the values: 1 KHz is 1000 Hz, 1 MHz is 1000 KMhz, etc. The result is an unicode string. >>> humanFrequency(790) u'790 Hz' >>> humanFrequency(629469) u'629.5 kHz' """ divisor = 1000 if hertz < divisor: return u"%u Hz" % hertz units = [u"kHz", u"MHz", u"GHz", u"THz"] hertz = float(hertz) for unit in units: hertz = hertz / divisor if hertz < divisor: return u"%.1f %s" % (hertz, unit) return u"%s %s" % (hertz, unit) regex_control_code = re.compile(r"([\x00-\x1f\x7f])") controlchars = tuple({ # Don't use "\0", because "\0"+"0"+"1" = "\001" = "\1" (1 character) # Same rease to not use octal syntax ("\1") ord("\n"): r"\n", ord("\r"): r"\r", ord("\t"): r"\t", ord("\a"): r"\a", ord("\b"): r"\b", }.get(code, '\\x%02x' % code) for code in xrange(128) ) def makePrintable(data, charset, quote=None, to_unicode=False, smart=True): r""" Prepare a string to make it printable in the specified charset. It escapes control characters. Characters with code bigger than 127 are escaped if data type is 'str' or if charset is "ASCII". Examples with Unicode: >>> aged = unicode("âgé", "UTF-8") >>> repr(aged) # text type is 'unicode' "u'\\xe2g\\xe9'" >>> makePrintable("abc\0", "UTF-8") 'abc\\0' >>> makePrintable(aged, "latin1") '\xe2g\xe9' >>> makePrintable(aged, "latin1", quote='"') '"\xe2g\xe9"' Examples with string encoded in latin1: >>> aged_latin = unicode("âgé", "UTF-8").encode("latin1") >>> repr(aged_latin) # text type is 'str' "'\\xe2g\\xe9'" >>> makePrintable(aged_latin, "latin1") '\\xe2g\\xe9' >>> makePrintable("", "latin1") '' >>> makePrintable("a", "latin1", quote='"') '"a"' >>> makePrintable("", "latin1", quote='"') '(empty)' >>> makePrintable("abc", "latin1", quote="'") "'abc'" Control codes: >>> makePrintable("\0\x03\x0a\x10 \x7f", "latin1") '\\0\\3\\n\\x10 \\x7f' Quote character may also be escaped (only ' and "): >>> print makePrintable("a\"b", "latin-1", quote='"') "a\"b" >>> print makePrintable("a\"b", "latin-1", quote="'") 'a"b' >>> print makePrintable("a'b", "latin-1", quote="'") 'a\'b' """ if data: if not isinstance(data, unicode): data = unicode(data, "ISO-8859-1") charset = "ASCII" data = regex_control_code.sub( lambda regs: controlchars[ord(regs.group(1))], data) if quote: if quote in "\"'": data = data.replace(quote, '\\' + quote) data = ''.join((quote, data, quote)) elif quote: data = "(empty)" data = data.encode(charset, "backslashreplace") if smart: # Replace \x00\x01 by \0\1 data = re.sub(r"\\x0([0-7])(?=[^0-7]|$)", r"\\\1", data) if to_unicode: data = unicode(data, charset) return data def makeUnicode(text): r""" Convert text to printable Unicode string. For byte string (type 'str'), use charset ISO-8859-1 for the conversion to Unicode >>> makeUnicode(u'abc\0d') u'abc\\0d' >>> makeUnicode('a\xe9') u'a\xe9' """ if isinstance(text, str): text = unicode(text, "ISO-8859-1") elif not isinstance(text, unicode): text = unicode(text) text = regex_control_code.sub( lambda regs: controlchars[ord(regs.group(1))], text) text = re.sub(r"\\x0([0-7])(?=[^0-7]|$)", r"\\\1", text) return text def binarySearch(seq, cmp_func): """ Search a value in a sequence using binary search. Returns index of the value, or None if the value doesn't exist. 'seq' have to be sorted in ascending order according to the comparaison function ; 'cmp_func', prototype func(x), is the compare function: - Return strictly positive value if we have to search forward ; - Return strictly negative value if we have to search backward ; - Otherwise (zero) we got the value. >>> # Search number 5 (search forward) ... binarySearch([0, 4, 5, 10], lambda x: 5-x) 2 >>> # Backward search ... binarySearch([10, 5, 4, 0], lambda x: x-5) 1 """ lower = 0 upper = len(seq) while lower < upper: index = (lower + upper) >> 1 diff = cmp_func(seq[index]) if diff < 0: upper = index elif diff > 0: lower = index + 1 else: return index return None def lowerBound(seq, cmp_func): f = 0 l = len(seq) while l > 0: h = l >> 1 m = f + h if cmp_func(seq[m]): f = m f += 1 l -= h + 1 else: l = h return f def humanUnixAttributes(mode): """ Convert a Unix file attributes (or "file mode") to an unicode string. Original source code: http://cvs.savannah.gnu.org/viewcvs/coreutils/lib/filemode.c?root=coreutils >>> humanUnixAttributes(0644) u'-rw-r--r-- (644)' >>> humanUnixAttributes(02755) u'-rwxr-sr-x (2755)' """ def ftypelet(mode): if stat.S_ISREG (mode) or not stat.S_IFMT(mode): return '-' if stat.S_ISBLK (mode): return 'b' if stat.S_ISCHR (mode): return 'c' if stat.S_ISDIR (mode): return 'd' if stat.S_ISFIFO(mode): return 'p' if stat.S_ISLNK (mode): return 'l' if stat.S_ISSOCK(mode): return 's' return '?' chars = [ ftypelet(mode), 'r', 'w', 'x', 'r', 'w', 'x', 'r', 'w', 'x' ] for i in xrange(1, 10): if not mode & 1 << 9 - i: chars[i] = '-' if mode & stat.S_ISUID: if chars[3] != 'x': chars[3] = 'S' else: chars[3] = 's' if mode & stat.S_ISGID: if chars[6] != 'x': chars[6] = 'S' else: chars[6] = 's' if mode & stat.S_ISVTX: if chars[9] != 'x': chars[9] = 'T' else: chars[9] = 't' return u"%s (%o)" % (''.join(chars), mode) def createDict(data, index): """ Create a new dictionnay from dictionnary key=>values: just keep value number 'index' from all values. >>> data={10: ("dix", 100, "a"), 20: ("vingt", 200, "b")} >>> createDict(data, 0) {10: 'dix', 20: 'vingt'} >>> createDict(data, 2) {10: 'a', 20: 'b'} """ return dict( (key,values[index]) for key, values in data.iteritems() ) # Start of UNIX timestamp (Epoch): 1st January 1970 at 00:00 UNIX_TIMESTAMP_T0 = datetime(1970, 1, 1) def timestampUNIX(value): """ Convert an UNIX (32-bit) timestamp to datetime object. Timestamp value is the number of seconds since the 1st January 1970 at 00:00. Maximum value is 2147483647: 19 january 2038 at 03:14:07. May raise ValueError for invalid value: value have to be in 0..2147483647. >>> timestampUNIX(0) datetime.datetime(1970, 1, 1, 0, 0) >>> timestampUNIX(1154175644) datetime.datetime(2006, 7, 29, 12, 20, 44) >>> timestampUNIX(1154175644.37) datetime.datetime(2006, 7, 29, 12, 20, 44, 370000) >>> timestampUNIX(2147483647) datetime.datetime(2038, 1, 19, 3, 14, 7) """ if not isinstance(value, (float, int, long)): raise TypeError("timestampUNIX(): an integer or float is required") if not(0 <= value <= 2147483647): raise ValueError("timestampUNIX(): value have to be in 0..2147483647") return UNIX_TIMESTAMP_T0 + timedelta(seconds=value) # Start of Macintosh timestamp: 1st January 1904 at 00:00 MAC_TIMESTAMP_T0 = datetime(1904, 1, 1) def timestampMac32(value): """ Convert an Mac (32-bit) timestamp to string. The format is the number of seconds since the 1st January 1904 (to 2040). Returns unicode string. >>> timestampMac32(0) datetime.datetime(1904, 1, 1, 0, 0) >>> timestampMac32(2843043290) datetime.datetime(1994, 2, 2, 14, 14, 50) """ if not isinstance(value, (float, int, long)): raise TypeError("an integer or float is required") if not(0 <= value <= 4294967295): return _("invalid Mac timestamp (%s)") % value return MAC_TIMESTAMP_T0 + timedelta(seconds=value) def durationWin64(value): """ Convert Windows 64-bit duration to string. The timestamp format is a 64-bit number: number of 100ns. See also timestampWin64(). >>> str(durationWin64(1072580000)) '0:01:47.258000' >>> str(durationWin64(2146280000)) '0:03:34.628000' """ if not isinstance(value, (float, int, long)): raise TypeError("an integer or float is required") if value < 0: raise ValueError("value have to be a positive or nul integer") return timedelta(microseconds=value/10) # Start of 64-bit Windows timestamp: 1st January 1600 at 00:00 WIN64_TIMESTAMP_T0 = datetime(1601, 1, 1, 0, 0, 0) def timestampWin64(value): """ Convert Windows 64-bit timestamp to string. The timestamp format is a 64-bit number which represents number of 100ns since the 1st January 1601 at 00:00. Result is an unicode string. See also durationWin64(). Maximum date is 28 may 60056. >>> timestampWin64(0) datetime.datetime(1601, 1, 1, 0, 0) >>> timestampWin64(127840491566710000) datetime.datetime(2006, 2, 10, 12, 45, 56, 671000) """ try: return WIN64_TIMESTAMP_T0 + durationWin64(value) except OverflowError: raise ValueError(_("date newer than year %s (value=%s)") % (MAXYEAR, value)) # Start of 60-bit UUID timestamp: 15 October 1582 at 00:00 UUID60_TIMESTAMP_T0 = datetime(1582, 10, 15, 0, 0, 0) def timestampUUID60(value): """ Convert UUID 60-bit timestamp to string. The timestamp format is a 60-bit number which represents number of 100ns since the the 15 October 1582 at 00:00. Result is an unicode string. >>> timestampUUID60(0) datetime.datetime(1582, 10, 15, 0, 0) >>> timestampUUID60(130435676263032368) datetime.datetime(1996, 2, 14, 5, 13, 46, 303236) """ if not isinstance(value, (float, int, long)): raise TypeError("an integer or float is required") if value < 0: raise ValueError("value have to be a positive or nul integer") try: return UUID60_TIMESTAMP_T0 + timedelta(microseconds=value/10) except OverflowError: raise ValueError(_("timestampUUID60() overflow (value=%s)") % value) def humanDatetime(value, strip_microsecond=True): """ Convert a timestamp to Unicode string: use ISO format with space separator. >>> humanDatetime( datetime(2006, 7, 29, 12, 20, 44) ) u'2006-07-29 12:20:44' >>> humanDatetime( datetime(2003, 6, 30, 16, 0, 5, 370000) ) u'2003-06-30 16:00:05' >>> humanDatetime( datetime(2003, 6, 30, 16, 0, 5, 370000), False ) u'2003-06-30 16:00:05.370000' """ text = unicode(value.isoformat()) text = text.replace('T', ' ') if strip_microsecond and "." in text: text = text.split(".")[0] return text NEWLINES_REGEX = re.compile("\n+") def normalizeNewline(text): r""" Replace Windows and Mac newlines with Unix newlines. Replace multiple consecutive newlines with one newline. >>> normalizeNewline('a\r\nb') 'a\nb' >>> normalizeNewline('a\r\rb') 'a\nb' >>> normalizeNewline('a\n\nb') 'a\nb' """ text = text.replace("\r\n", "\n") text = text.replace("\r", "\n") return NEWLINES_REGEX.sub("\n", text) hachoir-core-1.3.3/hachoir_core/compatibility.py0000644000175000017500000001053111251277274020763 0ustar haypohaypo""" Compatibility constants and functions. This module works on Python 1.5 to 2.5. This module provides: - True and False constants ; - any() and all() function ; - has_yield and has_slice values ; - isinstance() with Python 2.3 behaviour ; - reversed() and sorted() function. True and False constants ======================== Truth constants: True is yes (one) and False is no (zero). >>> int(True), int(False) # int value (1, 0) >>> int(False | True) # and binary operator 1 >>> int(True & False) # or binary operator 0 >>> int(not(True) == False) # not binary operator 1 Warning: on Python smaller than 2.3, True and False are aliases to number 1 and 0. So "print True" will displays 1 and not True. any() function ============== any() returns True if at least one items is True, or False otherwise. >>> any([False, True]) True >>> any([True, True]) True >>> any([False, False]) False all() function ============== all() returns True if all items are True, or False otherwise. This function is just apply binary and operator (&) on all values. >>> all([True, True]) True >>> all([False, True]) False >>> all([False, False]) False has_yield boolean ================= has_yield: boolean which indicatese if the interpreter supports yield keyword. yield keyworkd is available since Python 2.0. has_yield boolean ================= has_slice: boolean which indicates if the interpreter supports slices with step argument or not. slice with step is available since Python 2.3. reversed() and sorted() function ================================ reversed() and sorted() function has been introduced in Python 2.4. It's should returns a generator, but this module it may be a list. >>> data = list("cab") >>> list(sorted(data)) ['a', 'b', 'c'] >>> list(reversed("abc")) ['c', 'b', 'a'] """ import copy import operator # --- True and False constants from Python 2.0 --- # --- Warning: for Python < 2.3, they are aliases for 1 and 0 --- try: True = True False = False except NameError: True = 1 False = 0 # --- any() from Python 2.5 --- try: from __builtin__ import any except ImportError: def any(items): for item in items: if item: return True return False # ---all() from Python 2.5 --- try: from __builtin__ import all except ImportError: def all(items): return reduce(operator.__and__, items) # --- test if interpreter supports yield keyword --- try: eval(compile(""" from __future__ import generators def gen(): yield 1 yield 2 if list(gen()) != [1, 2]: raise KeyError("42") """, "", "exec")) except (KeyError, SyntaxError): has_yield = False else: has_yield = True # --- test if interpreter supports slices (with step argument) --- try: has_slice = eval('"abc"[::-1] == "cba"') except (TypeError, SyntaxError): has_slice = False # --- isinstance with isinstance Python 2.3 behaviour (arg 2 is a type) --- try: if isinstance(1, int): from __builtin__ import isinstance except TypeError: print "Redef isinstance" def isinstance20(a, typea): if type(typea) != type(type): raise TypeError("TypeError: isinstance() arg 2 must be a class, type, or tuple of classes and types") return type(typea) != typea isinstance = isinstance20 # --- reversed() from Python 2.4 --- try: from __builtin__ import reversed except ImportError: # if hasYield() == "ok": # code = """ #def reversed(data): # for index in xrange(len(data)-1, -1, -1): # yield data[index]; #reversed""" # reversed = eval(compile(code, "", "exec")) if has_slice: def reversed(data): if not isinstance(data, list): data = list(data) return data[::-1] else: def reversed(data): if not isinstance(data, list): data = list(data) reversed_data = [] for index in xrange(len(data)-1, -1, -1): reversed_data.append(data[index]) return reversed_data # --- sorted() from Python 2.4 --- try: from __builtin__ import sorted except ImportError: def sorted(data): sorted_data = copy.copy(data) sorted_data.sort() return sorted __all__ = ("True", "False", "any", "all", "has_yield", "has_slice", "isinstance", "reversed", "sorted") hachoir-core-1.3.3/hachoir_core/endian.py0000644000175000017500000000035311251277274017351 0ustar haypohaypo""" Constant values about endian. """ from hachoir_core.i18n import _ BIG_ENDIAN = "ABCD" LITTLE_ENDIAN = "DCBA" NETWORK_ENDIAN = BIG_ENDIAN endian_name = { BIG_ENDIAN: _("Big endian"), LITTLE_ENDIAN: _("Little endian"), } hachoir-core-1.3.3/hachoir_core/timeout.py0000644000175000017500000000412311251277274017600 0ustar haypohaypo""" limitedTime(): set a timeout in seconds when calling a function, raise a Timeout error if time exceed. """ from math import ceil IMPLEMENTATION = None class Timeout(RuntimeError): """ Timeout error, inherits from RuntimeError """ pass def signalHandler(signum, frame): """ Signal handler to catch timeout signal: raise Timeout exception. """ raise Timeout("Timeout exceed!") def limitedTime(second, func, *args, **kw): """ Call func(*args, **kw) with a timeout of second seconds. """ return func(*args, **kw) def fixTimeout(second): """ Fix timeout value: convert to integer with a minimum of 1 second """ if isinstance(second, float): second = int(ceil(second)) assert isinstance(second, (int, long)) return max(second, 1) if not IMPLEMENTATION: try: from signal import signal, alarm, SIGALRM # signal.alarm() implementation def limitedTime(second, func, *args, **kw): second = fixTimeout(second) old_alarm = signal(SIGALRM, signalHandler) try: alarm(second) return func(*args, **kw) finally: alarm(0) signal(SIGALRM, old_alarm) IMPLEMENTATION = "signal.alarm()" except ImportError: pass if not IMPLEMENTATION: try: from signal import signal, SIGXCPU from resource import getrlimit, setrlimit, RLIMIT_CPU # resource.setrlimit(RLIMIT_CPU) implementation # "Bug": timeout is 'CPU' time so sleep() are not part of the timeout def limitedTime(second, func, *args, **kw): second = fixTimeout(second) old_alarm = signal(SIGXCPU, signalHandler) current = getrlimit(RLIMIT_CPU) try: setrlimit(RLIMIT_CPU, (second, current[1])) return func(*args, **kw) finally: setrlimit(RLIMIT_CPU, current) signal(SIGXCPU, old_alarm) IMPLEMENTATION = "resource.setrlimit(RLIMIT_CPU)" except ImportError: pass hachoir-core-1.3.3/hachoir_core/stream/0000755000175000017500000000000011342040416017016 5ustar haypohaypohachoir-core-1.3.3/hachoir_core/stream/input_helper.py0000644000175000017500000000300211251277274022076 0ustar haypohaypofrom hachoir_core.i18n import getTerminalCharset, guessBytesCharset, _ from hachoir_core.stream import InputIOStream, InputSubStream, InputStreamError def FileInputStream(filename, real_filename=None, **args): """ Create an input stream of a file. filename must be unicode. real_filename is an optional argument used to specify the real filename, its type can be 'str' or 'unicode'. Use real_filename when you are not able to convert filename to real unicode string (ie. you have to use unicode(name, 'replace') or unicode(name, 'ignore')). """ assert isinstance(filename, unicode) if not real_filename: real_filename = filename try: inputio = open(real_filename, 'rb') except IOError, err: charset = getTerminalCharset() errmsg = unicode(str(err), charset) raise InputStreamError(_("Unable to open file %s: %s") % (filename, errmsg)) source = "file:" + filename offset = args.pop("offset", 0) size = args.pop("size", None) if offset or size: if size: size = 8 * size stream = InputIOStream(inputio, source=source, **args) return InputSubStream(stream, 8 * offset, size, **args) else: args.setdefault("tags",[]).append(("filename", filename)) return InputIOStream(inputio, source=source, **args) def guessStreamCharset(stream, address, size, default=None): size = min(size, 1024*8) bytes = stream.readBytes(address, size//8) return guessBytesCharset(bytes, default) hachoir-core-1.3.3/hachoir_core/stream/stream.py0000644000175000017500000000013011251277274020672 0ustar haypohaypofrom hachoir_core.error import HachoirError class StreamError(HachoirError): pass hachoir-core-1.3.3/hachoir_core/stream/__init__.py0000644000175000017500000000100111251277274021134 0ustar haypohaypofrom hachoir_core.endian import BIG_ENDIAN, LITTLE_ENDIAN from hachoir_core.stream.stream import StreamError from hachoir_core.stream.input import ( InputStreamError, InputStream, InputIOStream, StringInputStream, InputSubStream, InputFieldStream, FragmentedStream, ConcatStream) from hachoir_core.stream.input_helper import FileInputStream, guessStreamCharset from hachoir_core.stream.output import (OutputStreamError, FileOutputStream, StringOutputStream, OutputStream) hachoir-core-1.3.3/hachoir_core/stream/output.py0000644000175000017500000001252311251277274020750 0ustar haypohaypofrom cStringIO import StringIO from hachoir_core.endian import BIG_ENDIAN from hachoir_core.bits import long2raw from hachoir_core.stream import StreamError from errno import EBADF MAX_READ_NBYTES = 2 ** 16 class OutputStreamError(StreamError): pass class OutputStream(object): def __init__(self, output, filename=None): self._output = output self._filename = filename self._bit_pos = 0 self._byte = 0 def _getFilename(self): return self._filename filename = property(_getFilename) def writeBit(self, state, endian): if self._bit_pos == 7: self._bit_pos = 0 if state: if endian is BIG_ENDIAN: self._byte |= 1 else: self._byte |= 128 self._output.write(chr(self._byte)) self._byte = 0 else: if state: if endian is BIG_ENDIAN: self._byte |= (1 << self._bit_pos) else: self._byte |= (1 << (7-self._bit_pos)) self._bit_pos += 1 def writeBits(self, count, value, endian): assert 0 <= value < 2**count # Feed bits to align to byte address if self._bit_pos != 0: n = 8 - self._bit_pos if n <= count: count -= n if endian is BIG_ENDIAN: self._byte |= (value >> count) value &= ((1 << count) - 1) else: self._byte |= (value & ((1 << n)-1)) << self._bit_pos value >>= n self._output.write(chr(self._byte)) self._bit_pos = 0 self._byte = 0 else: if endian is BIG_ENDIAN: self._byte |= (value << (8-self._bit_pos-count)) else: self._byte |= (value << self._bit_pos) self._bit_pos += count return # Write byte per byte while 8 <= count: count -= 8 if endian is BIG_ENDIAN: byte = (value >> count) value &= ((1 << count) - 1) else: byte = (value & 0xFF) value >>= 8 self._output.write(chr(byte)) # Keep last bits assert 0 <= count < 8 self._bit_pos = count if 0 < count: assert 0 <= value < 2**count if endian is BIG_ENDIAN: self._byte = value << (8-count) else: self._byte = value else: assert value == 0 self._byte = 0 def writeInteger(self, value, signed, size_byte, endian): if signed: value += 1 << (size_byte*8 - 1) raw = long2raw(value, endian, size_byte) self.writeBytes(raw) def copyBitsFrom(self, input, address, nb_bits, endian): if (nb_bits % 8) == 0: self.copyBytesFrom(input, address, nb_bits/8) else: # Arbitrary limit (because we should use a buffer, like copyBytesFrom(), # but with endianess problem assert nb_bits <= 128 data = input.readBits(address, nb_bits, endian) self.writeBits(nb_bits, data, endian) def copyBytesFrom(self, input, address, nb_bytes): if (address % 8): raise OutputStreamError("Unable to copy bytes with address with bit granularity") buffer_size = 1 << 12 # 8192 (8 KB) while 0 < nb_bytes: # Compute buffer size if nb_bytes < buffer_size: buffer_size = nb_bytes # Read data = input.readBytes(address, buffer_size) # Write self.writeBytes(data) # Move address address += buffer_size*8 nb_bytes -= buffer_size def writeBytes(self, bytes): if self._bit_pos != 0: raise NotImplementedError() self._output.write(bytes) def readBytes(self, address, nbytes): """ Read bytes from the stream at specified address (in bits). Address have to be a multiple of 8. nbytes have to in 1..MAX_READ_NBYTES (64 KB). This method is only supported for StringOuputStream (not on FileOutputStream). Return read bytes as byte string. """ assert (address % 8) == 0 assert (1 <= nbytes <= MAX_READ_NBYTES) self._output.flush() oldpos = self._output.tell() try: self._output.seek(0) try: return self._output.read(nbytes) except IOError, err: if err[0] == EBADF: raise OutputStreamError("Stream doesn't support read() operation") finally: self._output.seek(oldpos) def StringOutputStream(): """ Create an output stream into a string. """ data = StringIO() return OutputStream(data) def FileOutputStream(filename, real_filename=None): """ Create an output stream into file with given name. Filename have to be unicode, whereas (optional) real_filename can be str. """ assert isinstance(filename, unicode) if not real_filename: real_filename = filename output = open(real_filename, 'wb') return OutputStream(output, filename=filename) hachoir-core-1.3.3/hachoir_core/stream/input.py0000644000175000017500000004644111251277274020555 0ustar haypohaypofrom hachoir_core.endian import BIG_ENDIAN, LITTLE_ENDIAN from hachoir_core.error import info from hachoir_core.log import Logger from hachoir_core.bits import str2long from hachoir_core.i18n import getTerminalCharset from hachoir_core.tools import lowerBound from hachoir_core.i18n import _ from os import dup, fdopen from errno import ESPIPE from weakref import ref as weakref_ref from hachoir_core.stream import StreamError class InputStreamError(StreamError): pass class ReadStreamError(InputStreamError): def __init__(self, size, address, got=None): self.size = size self.address = address self.got = got if self.got is not None: msg = _("Can't read %u bits at address %u (got %u bits)") % (self.size, self.address, self.got) else: msg = _("Can't read %u bits at address %u") % (self.size, self.address) InputStreamError.__init__(self, msg) class NullStreamError(InputStreamError): def __init__(self, source): self.source = source msg = _("Input size is nul (source='%s')!") % self.source InputStreamError.__init__(self, msg) class FileFromInputStream: _offset = 0 _from_end = False def __init__(self, stream): self.stream = stream self._setSize(stream.askSize(self)) def _setSize(self, size): if size is None: self._size = size elif size % 8: raise InputStreamError("Invalid size") else: self._size = size // 8 def tell(self): if self._from_end: while self._size is None: self.stream._feed(max(self.stream._current_size << 1, 1 << 16)) self._from_end = False self._offset += self._size return self._offset def seek(self, pos, whence=0): if whence == 0: self._from_end = False self._offset = pos elif whence == 1: self._offset += pos elif whence == 2: self._from_end = True self._offset = pos else: raise ValueError("seek() second argument must be 0, 1 or 2") def read(self, size=None): def read(address, size): shift, data, missing = self.stream.read(8 * address, 8 * size) if shift: raise InputStreamError("TODO: handle non-byte-aligned data") return data if self._size or size is not None and not self._from_end: # We don't want self.tell() to read anything # and the size must be known if we read until the end. pos = self.tell() if size is None or None < self._size < pos + size: size = self._size - pos if size <= 0: return '' data = read(pos, size) self._offset += len(data) return data elif self._from_end: # TODO: not tested max_size = - self._offset if size is None or max_size < size: size = max_size if size <= 0: return '' data = '', '' self._offset = max(0, self.stream._current_size // 8 + self._offset) self._from_end = False bs = max(max_size, 1 << 16) while True: d = read(self._offset, bs) data = data[1], d self._offset += len(d) if self._size: bs = self._size - self._offset if not bs: data = data[0] + data[1] d = len(data) - max_size return data[d:d+size] else: # TODO: not tested data = [ ] size = 1 << 16 while True: d = read(self._offset, size) data.append(d) self._offset += len(d) if self._size: size = self._size - self._offset if not size: return ''.join(data) class InputStream(Logger): _set_size = None _current_size = 0 def __init__(self, source=None, size=None, packets=None, **args): self.source = source self._size = size # in bits if size == 0: raise NullStreamError(source) self.tags = tuple(args.get("tags", tuple())) self.packets = packets def askSize(self, client): if self._size != self._current_size: if self._set_size is None: self._set_size = [] self._set_size.append(weakref_ref(client)) return self._size def _setSize(self, size=None): assert self._size is None or self._current_size <= self._size if self._size != self._current_size: self._size = self._current_size if not self._size: raise NullStreamError(self.source) if self._set_size: for client in self._set_size: client = client() if client: client._setSize(self._size) del self._set_size size = property(lambda self: self._size, doc="Size of the stream in bits") checked = property(lambda self: self._size == self._current_size) def sizeGe(self, size, const=False): return self._current_size >= size or \ not (None < self._size < size or const or self._feed(size)) def _feed(self, size): return self.read(size-1,1)[2] def read(self, address, size): """ Read 'size' bits at position 'address' (in bits) from the beginning of the stream. """ raise NotImplementedError def readBits(self, address, nbits, endian): assert endian in (BIG_ENDIAN, LITTLE_ENDIAN) shift, data, missing = self.read(address, nbits) if missing: raise ReadStreamError(nbits, address) value = str2long(data, endian) if endian is BIG_ENDIAN: value >>= len(data) * 8 - shift - nbits else: value >>= shift return value & (1 << nbits) - 1 def readInteger(self, address, signed, nbits, endian): """ Read an integer number """ value = self.readBits(address, nbits, endian) # Signe number. Example with nbits=8: # if 128 <= value: value -= 256 if signed and (1 << (nbits-1)) <= value: value -= (1 << nbits) return value def readBytes(self, address, nb_bytes): shift, data, missing = self.read(address, 8 * nb_bytes) if shift: raise InputStreamError("TODO: handle non-byte-aligned data") if missing: raise ReadStreamError(8 * nb_bytes, address) return data def searchBytesLength(self, needle, include_needle, start_address=0, end_address=None): """ If include_needle is True, add its length to the result. Returns None is needle can't be found. """ pos = self.searchBytes(needle, start_address, end_address) if pos is None: return None length = (pos - start_address) // 8 if include_needle: length += len(needle) return length def searchBytes(self, needle, start_address=0, end_address=None): """ Search some bytes in [start_address;end_address[. Addresses must be aligned to byte. Returns the address of the bytes if found, None else. """ if start_address % 8: raise InputStreamError("Unable to search bytes with address with bit granularity") length = len(needle) size = max(3 * length, 4096) buffer = '' if self._size and (end_address is None or self._size < end_address): end_address = self._size while True: if end_address is not None: todo = (end_address - start_address) >> 3 if todo < size: if todo <= 0: return None size = todo data = self.readBytes(start_address, size) if end_address is None and self._size: end_address = self._size size = (end_address - start_address) >> 3 assert size > 0 data = data[:size] start_address += 8 * size buffer = buffer[len(buffer) - length + 1:] + data found = buffer.find(needle) if found >= 0: return start_address + (found - len(buffer)) * 8 def file(self): return FileFromInputStream(self) class InputPipe(object): """ InputPipe makes input streams seekable by caching a certain amount of data. The memory usage may be unlimited in worst cases. A function (set_size) is called when the size of the stream is known. InputPipe sees the input stream as an array of blocks of size = (2 ^ self.buffer_size) and self.buffers maps to this array. It also maintains a circular ordered list of non-discarded blocks, sorted by access time. Each element of self.buffers is an array of 3 elements: * self.buffers[i][0] is the data. len(self.buffers[i][0]) == 1 << self.buffer_size (except at the end: the length may be smaller) * self.buffers[i][1] is the index of a more recently used block * self.buffers[i][2] is the opposite of self.buffers[1], in order to have a double-linked list. For any discarded block, self.buffers[i] = None self.last is the index of the most recently accessed block. self.first is the first (= smallest index) non-discarded block. How InputPipe discards blocks: * Just before returning from the read method. * Only if there are more than self.buffer_nb_min blocks in memory. * While self.buffers[self.first] is that least recently used block. Property: There is no hole in self.buffers, except at the beginning. """ buffer_nb_min = 256 buffer_size = 16 last = None size = None def __init__(self, input, set_size=None): self._input = input self.first = self.address = 0 self.buffers = [] self.set_size = set_size current_size = property(lambda self: len(self.buffers) << self.buffer_size) def _append(self, data): if self.last is None: self.last = next = prev = 0 else: prev = self.last last = self.buffers[prev] next = last[1] self.last = self.buffers[next][2] = last[1] = len(self.buffers) self.buffers.append([ data, next, prev ]) def _get(self, index): if index >= len(self.buffers): return '' buf = self.buffers[index] if buf is None: raise InputStreamError(_("Error: Buffers too small. Can't seek backward.")) if self.last != index: next = buf[1] prev = buf[2] self.buffers[next][2] = prev self.buffers[prev][1] = next first = self.buffers[self.last][1] buf[1] = first buf[2] = self.last self.buffers[first][2] = index self.buffers[self.last][1] = index self.last = index return buf[0] def _flush(self): lim = len(self.buffers) - self.buffer_nb_min while self.first < lim: buf = self.buffers[self.first] if buf[2] != self.last: break info("Discarding buffer %u." % self.first) self.buffers[self.last][1] = buf[1] self.buffers[buf[1]][2] = self.last self.buffers[self.first] = None self.first += 1 def seek(self, address): assert 0 <= address self.address = address def read(self, size): end = self.address + size for i in xrange(len(self.buffers), (end >> self.buffer_size) + 1): data = self._input.read(1 << self.buffer_size) if len(data) < 1 << self.buffer_size: self.size = (len(self.buffers) << self.buffer_size) + len(data) if self.set_size: self.set_size(self.size) if data: self._append(data) break self._append(data) block, offset = divmod(self.address, 1 << self.buffer_size) data = ''.join(self._get(index) for index in xrange(block, (end - 1 >> self.buffer_size) + 1) )[offset:offset+size] self._flush() self.address += len(data) return data class InputIOStream(InputStream): def __init__(self, input, size=None, **args): if not hasattr(input, "seek"): if size is None: input = InputPipe(input, self._setSize) else: input = InputPipe(input) elif size is None: try: input.seek(0, 2) size = input.tell() * 8 except IOError, err: if err.errno == ESPIPE: input = InputPipe(input, self._setSize) else: charset = getTerminalCharset() errmsg = unicode(str(err), charset) source = args.get("source", "" % input) raise InputStreamError(_("Unable to get size of %s: %s") % (source, errmsg)) self._input = input InputStream.__init__(self, size=size, **args) def __current_size(self): if self._size: return self._size if self._input.size: return 8 * self._input.size return 8 * self._input.current_size _current_size = property(__current_size) def read(self, address, size): assert size > 0 _size = self._size address, shift = divmod(address, 8) self._input.seek(address) size = (size + shift + 7) >> 3 data = self._input.read(size) got = len(data) missing = size != got if missing and _size == self._size: raise ReadStreamError(8 * size, 8 * address, 8 * got) return shift, data, missing def file(self): if hasattr(self._input, "fileno"): new_fd = dup(self._input.fileno()) new_file = fdopen(new_fd, "r") new_file.seek(0) return new_file return InputStream.file(self) class StringInputStream(InputStream): def __init__(self, data, source="", **args): self.data = data InputStream.__init__(self, source=source, size=8*len(data), **args) self._current_size = self._size def read(self, address, size): address, shift = divmod(address, 8) size = (size + shift + 7) >> 3 data = self.data[address:address+size] got = len(data) if got != size: raise ReadStreamError(8 * size, 8 * address, 8 * got) return shift, data, False class InputSubStream(InputStream): def __init__(self, stream, offset, size=None, source=None, **args): if offset is None: offset = 0 if size is None and stream.size is not None: size = stream.size - offset if None < size <= 0: raise ValueError("InputSubStream: offset is outside input stream") self.stream = stream self._offset = offset if source is None: source = "" % (stream.source, offset, size) InputStream.__init__(self, source=source, size=size, **args) self.stream.askSize(self) _current_size = property(lambda self: min(self._size, max(0, self.stream._current_size - self._offset))) def read(self, address, size): return self.stream.read(self._offset + address, size) def InputFieldStream(field, **args): if not field.parent: return field.stream stream = field.parent.stream args["size"] = field.size args.setdefault("source", stream.source + field.path) return InputSubStream(stream, field.absolute_address, **args) class FragmentedStream(InputStream): def __init__(self, field, **args): self.stream = field.parent.stream data = field.getData() self.fragments = [ (0, data.absolute_address, data.size) ] self.next = field.next args.setdefault("source", "%s%s" % (self.stream.source, field.path)) InputStream.__init__(self, **args) if not self.next: self._current_size = data.size self._setSize() def _feed(self, end): if self._current_size < end: if self.checked: raise ReadStreamError(end - self._size, self._size) a, fa, fs = self.fragments[-1] while self.stream.sizeGe(fa + min(fs, end - a)): a += fs f = self.next if a >= end: self._current_size = end if a == end and not f: self._setSize() return False if f: self.next = f.next f = f.getData() if not f: self._current_size = a self._setSize() return True fa = f.absolute_address fs = f.size self.fragments += [ (a, fa, fs) ] self._current_size = a + max(0, self.stream.size - fa) self._setSize() return True return False def read(self, address, size): assert size > 0 missing = self._feed(address + size) if missing: size = self._size - address if size <= 0: return 0, '', True d = [] i = lowerBound(self.fragments, lambda x: x[0] <= address) a, fa, fs = self.fragments[i-1] a -= address fa -= a fs += a s = None while True: n = min(fs, size) u, v, w = self.stream.read(fa, n) assert not w if s is None: s = u else: assert not u d += [ v ] size -= n if not size: return s, ''.join(d), missing a, fa, fs = self.fragments[i] i += 1 class ConcatStream(InputStream): # TODO: concatene any number of any type of stream def __init__(self, streams, **args): if len(streams) > 2 or not streams[0].checked: raise NotImplementedError self.__size0 = streams[0].size size1 = streams[1].askSize(self) if size1 is not None: args["size"] = self.__size0 + size1 self.__streams = streams InputStream.__init__(self, **args) _current_size = property(lambda self: self.__size0 + self.__streams[1]._current_size) def read(self, address, size): _size = self._size s = self.__size0 - address shift, data, missing = None, '', False if s > 0: s = min(size, s) shift, data, w = self.__streams[0].read(address, s) assert not w a, s = 0, size - s else: a, s = -s, size if s: u, v, missing = self.__streams[1].read(a, s) if missing and _size == self._size: raise ReadStreamError(s, a) if shift is None: shift = u else: assert not u data += v return shift, data, missing hachoir-core-1.3.3/hachoir_core/language.py0000644000175000017500000000111211251277274017670 0ustar haypohaypofrom hachoir_core.iso639 import ISO639_2 class Language: def __init__(self, code): code = str(code) if code not in ISO639_2: raise ValueError("Invalid language code: %r" % code) self.code = code def __cmp__(self, other): if other.__class__ != Language: return 1 return cmp(self.code, other.code) def __unicode__(self): return ISO639_2[self.code] def __str__(self): return self.__unicode__() def __repr__(self): return "" % (unicode(self), self.code) hachoir-core-1.3.3/hachoir_core/profiler.py0000644000175000017500000000163311251277274017737 0ustar haypohaypofrom hotshot import Profile from hotshot.stats import load as loadStats from os import unlink def runProfiler(func, args=tuple(), kw={}, verbose=True, nb_func=25, sort_by=('cumulative', 'calls')): profile_filename = "/tmp/profiler" prof = Profile(profile_filename) try: if verbose: print "[+] Run profiler" result = prof.runcall(func, *args, **kw) prof.close() if verbose: print "[+] Stop profiler" print "[+] Process data..." stat = loadStats(profile_filename) if verbose: print "[+] Strip..." stat.strip_dirs() if verbose: print "[+] Sort data..." stat.sort_stats(*sort_by) if verbose: print print "[+] Display statistics" print stat.print_stats(nb_func) return result finally: unlink(profile_filename) hachoir-core-1.3.3/hachoir_core/__init__.py0000644000175000017500000000012411251277274017646 0ustar haypohaypofrom hachoir_core.version import VERSION as __version__, PACKAGE, WEBSITE, LICENSE hachoir-core-1.3.3/hachoir_core/memory.py0000644000175000017500000000456111325704706017425 0ustar haypohaypoimport gc #---- Default implementation when resource is missing ---------------------- PAGE_SIZE = 4096 def getMemoryLimit(): """ Get current memory limit in bytes. Return None on error. """ return None def setMemoryLimit(max_mem): """ Set memory limit in bytes. Use value 'None' to disable memory limit. Return True if limit is set, False on error. """ return False def getMemorySize(): """ Read currenet process memory size: size of available virtual memory. This value is NOT the real memory usage. This function only works on Linux (use /proc/self/statm file). """ try: statm = open('/proc/self/statm').readline().split() except IOError: return None return int(statm[0]) * PAGE_SIZE def clearCaches(): """ Try to clear all caches: call gc.collect() (Python garbage collector). """ gc.collect() #import re; re.purge() try: #---- 'resource' implementation --------------------------------------------- from resource import getpagesize, getrlimit, setrlimit, RLIMIT_AS PAGE_SIZE = getpagesize() def getMemoryLimit(): try: limit = getrlimit(RLIMIT_AS)[0] if 0 < limit: limit *= PAGE_SIZE return limit except ValueError: return None def setMemoryLimit(max_mem): if max_mem is None: max_mem = -1 try: setrlimit(RLIMIT_AS, (max_mem, -1)) return True except ValueError: return False except ImportError: pass def limitedMemory(limit, func, *args, **kw): """ Limit memory grow when calling func(*args, **kw): restrict memory grow to 'limit' bytes. Use try/except MemoryError to catch the error. """ # First step: clear cache to gain memory clearCaches() # Get total program size max_rss = getMemorySize() if max_rss is not None: # Get old limit and then set our new memory limit old_limit = getMemoryLimit() limit = max_rss + limit limited = setMemoryLimit(limit) else: limited = False try: # Call function return func(*args, **kw) finally: # and unset our memory limit if limited: setMemoryLimit(old_limit) # After calling the function: clear all caches clearCaches() hachoir-core-1.3.3/hachoir_core/bits.py0000644000175000017500000001634111251277274017060 0ustar haypohaypo""" Utilities to convert integers and binary strings to binary (number), binary string, number, hexadecimal, etc. """ from hachoir_core.endian import BIG_ENDIAN, LITTLE_ENDIAN from hachoir_core.compatibility import reversed from itertools import chain, repeat from struct import calcsize, unpack, error as struct_error def swap16(value): """ Swap byte between big and little endian of a 16 bits integer. >>> "%x" % swap16(0x1234) '3412' """ return (value & 0xFF) << 8 | (value >> 8) def swap32(value): """ Swap byte between big and little endian of a 32 bits integer. >>> "%x" % swap32(0x12345678) '78563412' """ value = long(value) return ((value & 0x000000FFL) << 24) \ | ((value & 0x0000FF00L) << 8) \ | ((value & 0x00FF0000L) >> 8) \ | ((value & 0xFF000000L) >> 24) def bin2long(text, endian): """ Convert binary number written in a string into an integer. Skip characters differents than "0" and "1". >>> bin2long("110", BIG_ENDIAN) 6 >>> bin2long("110", LITTLE_ENDIAN) 3 >>> bin2long("11 00", LITTLE_ENDIAN) 3 """ assert endian in (LITTLE_ENDIAN, BIG_ENDIAN) bits = [ (ord(character)-ord("0")) \ for character in text if character in "01" ] assert len(bits) != 0 if endian is not BIG_ENDIAN: bits = reversed(bits) value = 0 for bit in bits: value *= 2 value += bit return value def str2hex(value, prefix="", glue=u"", format="%02X"): r""" Convert binary string in hexadecimal (base 16). >>> str2hex("ABC") u'414243' >>> str2hex("\xF0\xAF", glue=" ") u'F0 AF' >>> str2hex("ABC", prefix="0x") u'0x414243' >>> str2hex("ABC", format=r"\x%02X") u'\\x41\\x42\\x43' """ if isinstance(glue, str): glue = unicode(glue) if 0 < len(prefix): text = [prefix] else: text = [] for character in value: text.append(format % ord(character)) return glue.join(text) def countBits(value): """ Count number of bits needed to store a (positive) integer number. >>> countBits(0) 1 >>> countBits(1000) 10 >>> countBits(44100) 16 >>> countBits(18446744073709551615) 64 """ assert 0 <= value count = 1 bits = 1 while (1 << bits) <= value: count += bits value >>= bits bits <<= 1 while 2 <= value: if bits != 1: bits >>= 1 else: bits -= 1 while (1 << bits) <= value: count += bits value >>= bits return count def byte2bin(number, classic_mode=True): """ Convert a byte (integer in 0..255 range) to a binary string. If classic_mode is true (default value), reverse bits. >>> byte2bin(10) '00001010' >>> byte2bin(10, False) '01010000' """ text = "" for i in range(0, 8): if classic_mode: mask = 1 << (7-i) else: mask = 1 << i if (number & mask) == mask: text += "1" else: text += "0" return text def long2raw(value, endian, size=None): r""" Convert a number (positive and not nul) to a raw string. If size is given, add nul bytes to fill to size bytes. >>> long2raw(0x1219, BIG_ENDIAN) '\x12\x19' >>> long2raw(0x1219, BIG_ENDIAN, 4) # 32 bits '\x00\x00\x12\x19' >>> long2raw(0x1219, LITTLE_ENDIAN, 4) # 32 bits '\x19\x12\x00\x00' """ assert (not size and 0 < value) or (0 <= value) assert endian in (LITTLE_ENDIAN, BIG_ENDIAN) text = [] while (value != 0 or text == ""): byte = value % 256 text.append( chr(byte) ) value >>= 8 if size: need = max(size - len(text), 0) else: need = 0 if need: if endian is BIG_ENDIAN: text = chain(repeat("\0", need), reversed(text)) else: text = chain(text, repeat("\0", need)) else: if endian is BIG_ENDIAN: text = reversed(text) return "".join(text) def long2bin(size, value, endian, classic_mode=False): """ Convert a number into bits (in a string): - size: size in bits of the number - value: positive (or nul) number - endian: BIG_ENDIAN (most important bit first) or LITTLE_ENDIAN (least important bit first) - classic_mode (default: False): reverse each packet of 8 bits >>> long2bin(16, 1+4 + (1+8)*256, BIG_ENDIAN) '10100000 10010000' >>> long2bin(16, 1+4 + (1+8)*256, BIG_ENDIAN, True) '00000101 00001001' >>> long2bin(16, 1+4 + (1+8)*256, LITTLE_ENDIAN) '00001001 00000101' >>> long2bin(16, 1+4 + (1+8)*256, LITTLE_ENDIAN, True) '10010000 10100000' """ text = "" assert endian in (LITTLE_ENDIAN, BIG_ENDIAN) assert 0 <= value for index in xrange(size): if (value & 1) == 1: text += "1" else: text += "0" value >>= 1 if endian is LITTLE_ENDIAN: text = text[::-1] result = "" while len(text) != 0: if len(result) != 0: result += " " if classic_mode: result += text[7::-1] else: result += text[:8] text = text[8:] return result def str2bin(value, classic_mode=True): r""" Convert binary string to binary numbers. If classic_mode is true (default value), reverse bits. >>> str2bin("\x03\xFF") '00000011 11111111' >>> str2bin("\x03\xFF", False) '11000000 11111111' """ text = "" for character in value: if text != "": text += " " byte = ord(character) text += byte2bin(byte, classic_mode) return text def _createStructFormat(): """ Create a dictionnary (endian, size_byte) => struct format used by str2long() to convert raw data to positive integer. """ format = { BIG_ENDIAN: {}, LITTLE_ENDIAN: {}, } for struct_format in "BHILQ": try: size = calcsize(struct_format) format[BIG_ENDIAN][size] = '>%s' % struct_format format[LITTLE_ENDIAN][size] = '<%s' % struct_format except struct_error: pass return format _struct_format = _createStructFormat() def str2long(data, endian): r""" Convert a raw data (type 'str') into a long integer. >>> chr(str2long('*', BIG_ENDIAN)) '*' >>> str2long("\x00\x01\x02\x03", BIG_ENDIAN) == 0x10203 True >>> str2long("\x2a\x10", LITTLE_ENDIAN) == 0x102a True >>> str2long("\xff\x14\x2a\x10", BIG_ENDIAN) == 0xff142a10 True >>> str2long("\x00\x01\x02\x03", LITTLE_ENDIAN) == 0x3020100 True >>> str2long("\xff\x14\x2a\x10\xab\x00\xd9\x0e", BIG_ENDIAN) == 0xff142a10ab00d90e True >>> str2long("\xff\xff\xff\xff\xff\xff\xff\xff", BIG_ENDIAN) == (2**64-1) True """ assert 1 <= len(data) <= 32 # arbitrary limit: 256 bits try: return unpack(_struct_format[endian][len(data)], data)[0] except KeyError: pass assert endian in (BIG_ENDIAN, LITTLE_ENDIAN) shift = 0 value = 0 if endian is BIG_ENDIAN: data = reversed(data) for character in data: byte = ord(character) value += (byte << shift) shift += 8 return value hachoir-core-1.3.3/hachoir_core/version.py0000644000175000017500000000020411330151751017560 0ustar haypohaypoPACKAGE = "hachoir-core" VERSION = "1.3.3" WEBSITE = 'http://bitbucket.org/haypo/hachoir/wiki/hachoir-core' LICENSE = 'GNU GPL v2' hachoir-core-1.3.3/hachoir_core/i18n.py0000644000175000017500000001414111342011260016650 0ustar haypohaypo# -*- coding: UTF-8 -*- """ Functions to manage internationalisation (i18n): - initLocale(): setup locales and install Unicode compatible stdout and stderr ; - getTerminalCharset(): guess terminal charset ; - gettext(text) translate a string to current language. The function always returns Unicode string. You can also use the alias: _() ; - ngettext(singular, plural, count): translate a sentence with singular and plural form. The function always returns Unicode string. WARNING: Loading this module indirectly calls initLocale() which sets locale LC_ALL to ''. This is needed to get user preferred locale settings. """ import hachoir_core.config as config import hachoir_core import locale from os import path import sys from codecs import BOM_UTF8, BOM_UTF16_LE, BOM_UTF16_BE def _getTerminalCharset(): """ Function used by getTerminalCharset() to get terminal charset. @see getTerminalCharset() """ # (1) Try locale.getpreferredencoding() try: charset = locale.getpreferredencoding() if charset: return charset except (locale.Error, AttributeError): pass # (2) Try locale.nl_langinfo(CODESET) try: charset = locale.nl_langinfo(locale.CODESET) if charset: return charset except (locale.Error, AttributeError): pass # (3) Try sys.stdout.encoding if hasattr(sys.stdout, "encoding") and sys.stdout.encoding: return sys.stdout.encoding # (4) Otherwise, returns "ASCII" return "ASCII" def getTerminalCharset(): """ Guess terminal charset using differents tests: 1. Try locale.getpreferredencoding() 2. Try locale.nl_langinfo(CODESET) 3. Try sys.stdout.encoding 4. Otherwise, returns "ASCII" WARNING: Call initLocale() before calling this function. """ try: return getTerminalCharset.value except AttributeError: getTerminalCharset.value = _getTerminalCharset() return getTerminalCharset.value class UnicodeStdout(object): def __init__(self, old_device, charset): self.device = old_device self.charset = charset def flush(self): self.device.flush() def write(self, text): if isinstance(text, unicode): text = text.encode(self.charset, 'replace') self.device.write(text) def writelines(self, lines): for text in lines: self.write(text) def initLocale(): # Only initialize locale once if initLocale.is_done: return getTerminalCharset() initLocale.is_done = True # Setup locales try: locale.setlocale(locale.LC_ALL, "") except (locale.Error, IOError): pass # Get the terminal charset charset = getTerminalCharset() # UnicodeStdout conflicts with the readline module if config.unicode_stdout and ('readline' not in sys.modules): # Replace stdout and stderr by unicode objet supporting unicode string sys.stdout = UnicodeStdout(sys.stdout, charset) sys.stderr = UnicodeStdout(sys.stderr, charset) return charset initLocale.is_done = False def _dummy_gettext(text): return unicode(text) def _dummy_ngettext(singular, plural, count): if 1 < abs(count) or not count: return unicode(plural) else: return unicode(singular) def _initGettext(): charset = initLocale() # Try to load gettext module if config.use_i18n: try: import gettext ok = True except ImportError: ok = False else: ok = False # gettext is not available or not needed: use dummy gettext functions if not ok: return (_dummy_gettext, _dummy_ngettext) # Gettext variables package = hachoir_core.PACKAGE locale_dir = path.join(path.dirname(__file__), "..", "locale") # Initialize gettext module gettext.bindtextdomain(package, locale_dir) gettext.textdomain(package) translate = gettext.gettext ngettext = gettext.ngettext # TODO: translate_unicode lambda function really sucks! # => find native function to do that unicode_gettext = lambda text: \ unicode(translate(text), charset) unicode_ngettext = lambda singular, plural, count: \ unicode(ngettext(singular, plural, count), charset) return (unicode_gettext, unicode_ngettext) UTF_BOMS = ( (BOM_UTF8, "UTF-8"), (BOM_UTF16_LE, "UTF-16-LE"), (BOM_UTF16_BE, "UTF-16-BE"), ) # Set of valid characters for specific charset CHARSET_CHARACTERS = ( # U+00E0: LATIN SMALL LETTER A WITH GRAVE (set(u"©®éêè\xE0ç".encode("ISO-8859-1")), "ISO-8859-1"), (set(u"©®éêè\xE0ç€".encode("ISO-8859-15")), "ISO-8859-15"), (set(u"©®".encode("MacRoman")), "MacRoman"), (set(u"εδηιθκμοΡσςυΈί".encode("ISO-8859-7")), "ISO-8859-7"), ) def guessBytesCharset(bytes, default=None): r""" >>> guessBytesCharset("abc") 'ASCII' >>> guessBytesCharset("\xEF\xBB\xBFabc") 'UTF-8' >>> guessBytesCharset("abc\xC3\xA9") 'UTF-8' >>> guessBytesCharset("File written by Adobe Photoshop\xA8 4.0\0") 'MacRoman' >>> guessBytesCharset("\xE9l\xE9phant") 'ISO-8859-1' >>> guessBytesCharset("100 \xA4") 'ISO-8859-15' >>> guessBytesCharset('Word \xb8\xea\xe4\xef\xf3\xe7 - Microsoft Outlook 97 - \xd1\xf5\xe8\xec\xdf\xf3\xe5\xe9\xf2 e-mail') 'ISO-8859-7' """ # Check for UTF BOM for bom_bytes, charset in UTF_BOMS: if bytes.startswith(bom_bytes): return charset # Pure ASCII? try: text = unicode(bytes, 'ASCII', 'strict') return 'ASCII' except UnicodeDecodeError: pass # Valid UTF-8? try: text = unicode(bytes, 'UTF-8', 'strict') return 'UTF-8' except UnicodeDecodeError: pass # Create a set of non-ASCII characters non_ascii_set = set( byte for byte in bytes if ord(byte) >= 128 ) for characters, charset in CHARSET_CHARACTERS: if characters.issuperset(non_ascii_set): return charset return default # Initialize _(), gettext() and ngettext() functions gettext, ngettext = _initGettext() _ = gettext hachoir-core-1.3.3/hachoir_core/log.py0000644000175000017500000001021311251277274016670 0ustar haypohaypoimport os, sys, time import hachoir_core.config as config from hachoir_core.i18n import _ class Log: LOG_INFO = 0 LOG_WARN = 1 LOG_ERROR = 2 level_name = { LOG_WARN: "[warn]", LOG_ERROR: "[err!]", LOG_INFO: "[info]" } def __init__(self): self.__buffer = {} self.__file = None self.use_print = True self.use_buffer = False self.on_new_message = None # Prototype: def func(level, prefix, text, context) def shutdown(self): if self.__file: self._writeIntoFile(_("Stop Hachoir")) def setFilename(self, filename, append=True): """ Use a file to store all messages. The UTF-8 encoding will be used. Write an informative message if the file can't be created. @param filename: C{L{string}} """ # Look if file already exists or not filename = os.path.expanduser(filename) filename = os.path.realpath(filename) append = os.access(filename, os.F_OK) # Create log file (or open it in append mode, if it already exists) try: import codecs if append: self.__file = codecs.open(filename, "a", "utf-8") else: self.__file = codecs.open(filename, "w", "utf-8") self._writeIntoFile(_("Starting Hachoir")) except IOError, err: if err.errno == 2: self.__file = None self.info(_("[Log] setFilename(%s) fails: no such file") % filename) else: raise def _writeIntoFile(self, message): timestamp = time.strftime("%Y-%m-%d %H:%M:%S") self.__file.write(u"%s - %s\n" % (timestamp, message)) self.__file.flush() def newMessage(self, level, text, ctxt=None): """ Write a new message : append it in the buffer, display it to the screen (if needed), and write it in the log file (if needed). @param level: Message level. @type level: C{int} @param text: Message content. @type text: C{str} @param ctxt: The caller instance. """ if level < self.LOG_ERROR and config.quiet or \ level <= self.LOG_INFO and not config.verbose: return if config.debug: from hachoir_core.error import getBacktrace backtrace = getBacktrace(None) if backtrace: text += "\n\n" + backtrace _text = text if hasattr(ctxt, "_logger"): _ctxt = ctxt._logger() if _ctxt is not None: text = "[%s] %s" % (_ctxt, text) # Add message to log buffer if self.use_buffer: if not self.__buffer.has_key(level): self.__buffer[level] = [text] else: self.__buffer[level].append(text) # Add prefix prefix = self.level_name.get(level, "[info]") # Display on stdout (if used) if self.use_print: sys.stdout.flush() sys.stderr.write("%s %s\n" % (prefix, text)) sys.stderr.flush() # Write into outfile (if used) if self.__file: self._writeIntoFile("%s %s" % (prefix, text)) # Use callback (if used) if self.on_new_message: self.on_new_message (level, prefix, _text, ctxt) def info(self, text): """ New informative message. @type text: C{str} """ self.newMessage(Log.LOG_INFO, text) def warning(self, text): """ New warning message. @type text: C{str} """ self.newMessage(Log.LOG_WARN, text) def error(self, text): """ New error message. @type text: C{str} """ self.newMessage(Log.LOG_ERROR, text) log = Log() class Logger(object): def _logger(self): return "<%s>" % self.__class__.__name__ def info(self, text): log.newMessage(Log.LOG_INFO, text, self) def warning(self, text): log.newMessage(Log.LOG_WARN, text, self) def error(self, text): log.newMessage(Log.LOG_ERROR, text, self) hachoir-core-1.3.3/hachoir_core/benchmark.py0000644000175000017500000001520011251277274020042 0ustar haypohaypofrom hachoir_core.tools import humanDurationNanosec from hachoir_core.i18n import _ from math import floor from time import time class BenchmarkError(Exception): """ Error during benchmark, use str(err) to format it as string. """ def __init__(self, message): Exception.__init__(self, "Benchmark internal error: %s" % message) class BenchmarkStat: """ Benchmark statistics. This class automatically computes minimum value, maximum value and sum of all values. Methods: - append(value): append a value - getMin(): minimum value - getMax(): maximum value - getSum(): sum of all values - __len__(): get number of elements - __nonzero__(): isn't empty? """ def __init__(self): self._values = [] def append(self, value): self._values.append(value) try: self._min = min(self._min, value) self._max = max(self._max, value) self._sum += value except AttributeError: self._min = value self._max = value self._sum = value def __len__(self): return len(self._values) def __nonzero__(self): return bool(self._values) def getMin(self): return self._min def getMax(self): return self._max def getSum(self): return self._sum class Benchmark: def __init__(self, max_time=5.0, min_count=5, max_count=None, progress_time=1.0): """ Constructor: - max_time: Maximum wanted duration of the whole benchmark (default: 5 seconds, minimum: 1 second). - min_count: Minimum number of function calls to get good statistics (defaut: 5, minimum: 1). - progress_time: Time between each "progress" message (default: 1 second, minimum: 250 ms). - max_count: Maximum number of function calls (default: no limit). - verbose: Is verbose? (default: False) - disable_gc: Disable garbage collector? (default: False) """ self.max_time = max(max_time, 1.0) self.min_count = max(min_count, 1) self.max_count = max_count self.progress_time = max(progress_time, 0.25) self.verbose = False self.disable_gc = False def formatTime(self, value): """ Format a time delta to string: use humanDurationNanosec() """ return humanDurationNanosec(value * 1000000000) def displayStat(self, stat): """ Display statistics to stdout: - best time (minimum) - average time (arithmetic average) - worst time (maximum) - total time (sum) Use arithmetic avertage instead of geometric average because geometric fails if any value is zero (returns zero) and also because floating point multiplication lose precision with many values. """ average = stat.getSum() / len(stat) values = (stat.getMin(), average, stat.getMax(), stat.getSum()) values = tuple(self.formatTime(value) for value in values) print _("Benchmark: best=%s average=%s worst=%s total=%s") \ % values def _runOnce(self, func, args, kw): before = time() func(*args, **kw) after = time() return after - before def _run(self, func, args, kw): """ Call func(*args, **kw) as many times as needed to get good statistics. Algorithm: - call the function once - compute needed number of calls - and then call function N times To compute number of calls, parameters are: - time of first function call - minimum number of calls (min_count attribute) - maximum test time (max_time attribute) Notice: The function will approximate number of calls. """ # First call of the benchmark stat = BenchmarkStat() diff = self._runOnce(func, args, kw) best = diff stat.append(diff) total_time = diff # Compute needed number of calls count = int(floor(self.max_time / diff)) count = max(count, self.min_count) if self.max_count: count = min(count, self.max_count) # Not other call? Just exit if count == 1: return stat estimate = diff * count if self.verbose: print _("Run benchmark: %s calls (estimate: %s)") \ % (count, self.formatTime(estimate)) display_progress = self.verbose and (1.0 <= estimate) total_count = 1 while total_count < count: # Run benchmark and display each result if display_progress: print _("Result %s/%s: %s (best: %s)") % \ (total_count, count, self.formatTime(diff), self.formatTime(best)) part = count - total_count # Will takes more than one second? average = total_time / total_count if self.progress_time < part * average: part = max( int(self.progress_time / average), 1) for index in xrange(part): diff = self._runOnce(func, args, kw) stat.append(diff) total_time += diff best = min(diff, best) total_count += part if display_progress: print _("Result %s/%s: %s (best: %s)") % \ (count, count, self.formatTime(diff), self.formatTime(best)) return stat def validateStat(self, stat): """ Check statistics and raise a BenchmarkError if they are invalid. Example of tests: reject empty stat, reject stat with only nul values. """ if not stat: raise BenchmarkError("empty statistics") if not stat.getSum(): raise BenchmarkError("nul statistics") def run(self, func, *args, **kw): """ Run function func(*args, **kw), validate statistics, and display the result on stdout. Disable garbage collector if asked too. """ # Disable garbarge collector is needed and if it does exist # (Jython 2.2 don't have it for example) if self.disable_gc: try: import gc except ImportError: self.disable_gc = False if self.disable_gc: gc_enabled = gc.isenabled() gc.disable() else: gc_enabled = False # Run the benchmark stat = self._run(func, args, kw) if gc_enabled: gc.enable() # Validate and display stats self.validateStat(stat) self.displayStat(stat) hachoir-core-1.3.3/hachoir_core/iso639.py0000644000175000017500000004416011251277274017153 0ustar haypohaypo# -*- coding: utf-8 -*- """ ISO639-2 standart: the module only contains the dictionary ISO639_2 which maps a language code in three letters (eg. "fre") to a language name in english (eg. "French"). """ # ISO-639, the list comes from: # http://www.loc.gov/standards/iso639-2/php/English_list.php _ISO639 = ( (u"Abkhazian", "abk", "ab"), (u"Achinese", "ace", None), (u"Acoli", "ach", None), (u"Adangme", "ada", None), (u"Adygei", "ady", None), (u"Adyghe", "ady", None), (u"Afar", "aar", "aa"), (u"Afrihili", "afh", None), (u"Afrikaans", "afr", "af"), (u"Afro-Asiatic (Other)", "afa", None), (u"Ainu", "ain", None), (u"Akan", "aka", "ak"), (u"Akkadian", "akk", None), (u"Albanian", "alb/sqi", "sq"), (u"Alemani", "gsw", None), (u"Aleut", "ale", None), (u"Algonquian languages", "alg", None), (u"Altaic (Other)", "tut", None), (u"Amharic", "amh", "am"), (u"Angika", "anp", None), (u"Apache languages", "apa", None), (u"Arabic", "ara", "ar"), (u"Aragonese", "arg", "an"), (u"Aramaic", "arc", None), (u"Arapaho", "arp", None), (u"Araucanian", "arn", None), (u"Arawak", "arw", None), (u"Armenian", "arm/hye", "hy"), (u"Aromanian", "rup", None), (u"Artificial (Other)", "art", None), (u"Arumanian", "rup", None), (u"Assamese", "asm", "as"), (u"Asturian", "ast", None), (u"Athapascan languages", "ath", None), (u"Australian languages", "aus", None), (u"Austronesian (Other)", "map", None), (u"Avaric", "ava", "av"), (u"Avestan", "ave", "ae"), (u"Awadhi", "awa", None), (u"Aymara", "aym", "ay"), (u"Azerbaijani", "aze", "az"), (u"Bable", "ast", None), (u"Balinese", "ban", None), (u"Baltic (Other)", "bat", None), (u"Baluchi", "bal", None), (u"Bambara", "bam", "bm"), (u"Bamileke languages", "bai", None), (u"Banda", "bad", None), (u"Bantu (Other)", "bnt", None), (u"Basa", "bas", None), (u"Bashkir", "bak", "ba"), (u"Basque", "baq/eus", "eu"), (u"Batak (Indonesia)", "btk", None), (u"Beja", "bej", None), (u"Belarusian", "bel", "be"), (u"Bemba", "bem", None), (u"Bengali", "ben", "bn"), (u"Berber (Other)", "ber", None), (u"Bhojpuri", "bho", None), (u"Bihari", "bih", "bh"), (u"Bikol", "bik", None), (u"Bilin", "byn", None), (u"Bini", "bin", None), (u"Bislama", "bis", "bi"), (u"Blin", "byn", None), (u"Bokmål, Norwegian", "nob", "nb"), (u"Bosnian", "bos", "bs"), (u"Braj", "bra", None), (u"Breton", "bre", "br"), (u"Buginese", "bug", None), (u"Bulgarian", "bul", "bg"), (u"Buriat", "bua", None), (u"Burmese", "bur/mya", "my"), (u"Caddo", "cad", None), (u"Carib", "car", None), (u"Castilian", "spa", "es"), (u"Catalan", "cat", "ca"), (u"Caucasian (Other)", "cau", None), (u"Cebuano", "ceb", None), (u"Celtic (Other)", "cel", None), (u"Central American Indian (Other)", "cai", None), (u"Chagatai", "chg", None), (u"Chamic languages", "cmc", None), (u"Chamorro", "cha", "ch"), (u"Chechen", "che", "ce"), (u"Cherokee", "chr", None), (u"Chewa", "nya", "ny"), (u"Cheyenne", "chy", None), (u"Chibcha", "chb", None), (u"Chichewa", "nya", "ny"), (u"Chinese", "chi/zho", "zh"), (u"Chinook jargon", "chn", None), (u"Chipewyan", "chp", None), (u"Choctaw", "cho", None), (u"Chuang", "zha", "za"), (u"Church Slavic", "chu", "cu"), (u"Church Slavonic", "chu", "cu"), (u"Chuukese", "chk", None), (u"Chuvash", "chv", "cv"), (u"Classical Nepal Bhasa", "nwc", None), (u"Classical Newari", "nwc", None), (u"Coptic", "cop", None), (u"Cornish", "cor", "kw"), (u"Corsican", "cos", "co"), (u"Cree", "cre", "cr"), (u"Creek", "mus", None), (u"Creoles and pidgins (Other)", "crp", None), (u"Creoles and pidgins, English based (Other)", "cpe", None), (u"Creoles and pidgins, French-based (Other)", "cpf", None), (u"Creoles and pidgins, Portuguese-based (Other)", "cpp", None), (u"Crimean Tatar", "crh", None), (u"Crimean Turkish", "crh", None), (u"Croatian", "scr/hrv", "hr"), (u"Cushitic (Other)", "cus", None), (u"Czech", "cze/ces", "cs"), (u"Dakota", "dak", None), (u"Danish", "dan", "da"), (u"Dargwa", "dar", None), (u"Dayak", "day", None), (u"Delaware", "del", None), (u"Dhivehi", "div", "dv"), (u"Dimili", "zza", None), (u"Dimli", "zza", None), (u"Dinka", "din", None), (u"Divehi", "div", "dv"), (u"Dogri", "doi", None), (u"Dogrib", "dgr", None), (u"Dravidian (Other)", "dra", None), (u"Duala", "dua", None), (u"Dutch", "dut/nld", "nl"), (u"Dutch, Middle (ca.1050-1350)", "dum", None), (u"Dyula", "dyu", None), (u"Dzongkha", "dzo", "dz"), (u"Eastern Frisian", "frs", None), (u"Efik", "efi", None), (u"Egyptian (Ancient)", "egy", None), (u"Ekajuk", "eka", None), (u"Elamite", "elx", None), (u"English", "eng", "en"), (u"English, Middle (1100-1500)", "enm", None), (u"English, Old (ca.450-1100)", "ang", None), (u"Erzya", "myv", None), (u"Esperanto", "epo", "eo"), (u"Estonian", "est", "et"), (u"Ewe", "ewe", "ee"), (u"Ewondo", "ewo", None), (u"Fang", "fan", None), (u"Fanti", "fat", None), (u"Faroese", "fao", "fo"), (u"Fijian", "fij", "fj"), (u"Filipino", "fil", None), (u"Finnish", "fin", "fi"), (u"Finno-Ugrian (Other)", "fiu", None), (u"Flemish", "dut/nld", "nl"), (u"Fon", "fon", None), (u"French", "fre/fra", "fr"), (u"French, Middle (ca.1400-1600)", "frm", None), (u"French, Old (842-ca.1400)", "fro", None), (u"Friulian", "fur", None), (u"Fulah", "ful", "ff"), (u"Ga", "gaa", None), (u"Gaelic", "gla", "gd"), (u"Galician", "glg", "gl"), (u"Ganda", "lug", "lg"), (u"Gayo", "gay", None), (u"Gbaya", "gba", None), (u"Geez", "gez", None), (u"Georgian", "geo/kat", "ka"), (u"German", "ger/deu", "de"), (u"German, Low", "nds", None), (u"German, Middle High (ca.1050-1500)", "gmh", None), (u"German, Old High (ca.750-1050)", "goh", None), (u"Germanic (Other)", "gem", None), (u"Gikuyu", "kik", "ki"), (u"Gilbertese", "gil", None), (u"Gondi", "gon", None), (u"Gorontalo", "gor", None), (u"Gothic", "got", None), (u"Grebo", "grb", None), (u"Greek, Ancient (to 1453)", "grc", None), (u"Greek, Modern (1453-)", "gre/ell", "el"), (u"Greenlandic", "kal", "kl"), (u"Guarani", "grn", "gn"), (u"Gujarati", "guj", "gu"), (u"Gwich´in", "gwi", None), (u"Haida", "hai", None), (u"Haitian", "hat", "ht"), (u"Haitian Creole", "hat", "ht"), (u"Hausa", "hau", "ha"), (u"Hawaiian", "haw", None), (u"Hebrew", "heb", "he"), (u"Herero", "her", "hz"), (u"Hiligaynon", "hil", None), (u"Himachali", "him", None), (u"Hindi", "hin", "hi"), (u"Hiri Motu", "hmo", "ho"), (u"Hittite", "hit", None), (u"Hmong", "hmn", None), (u"Hungarian", "hun", "hu"), (u"Hupa", "hup", None), (u"Iban", "iba", None), (u"Icelandic", "ice/isl", "is"), (u"Ido", "ido", "io"), (u"Igbo", "ibo", "ig"), (u"Ijo", "ijo", None), (u"Iloko", "ilo", None), (u"Inari Sami", "smn", None), (u"Indic (Other)", "inc", None), (u"Indo-European (Other)", "ine", None), (u"Indonesian", "ind", "id"), (u"Ingush", "inh", None), (u"Interlingua", "ina", "ia"), (u"Interlingue", "ile", "ie"), (u"Inuktitut", "iku", "iu"), (u"Inupiaq", "ipk", "ik"), (u"Iranian (Other)", "ira", None), (u"Irish", "gle", "ga"), (u"Irish, Middle (900-1200)", "mga", None), (u"Irish, Old (to 900)", "sga", None), (u"Iroquoian languages", "iro", None), (u"Italian", "ita", "it"), (u"Japanese", "jpn", "ja"), (u"Javanese", "jav", "jv"), (u"Judeo-Arabic", "jrb", None), (u"Judeo-Persian", "jpr", None), (u"Kabardian", "kbd", None), (u"Kabyle", "kab", None), (u"Kachin", "kac", None), (u"Kalaallisut", "kal", "kl"), (u"Kalmyk", "xal", None), (u"Kamba", "kam", None), (u"Kannada", "kan", "kn"), (u"Kanuri", "kau", "kr"), (u"Kara-Kalpak", "kaa", None), (u"Karachay-Balkar", "krc", None), (u"Karelian", "krl", None), (u"Karen", "kar", None), (u"Kashmiri", "kas", "ks"), (u"Kashubian", "csb", None), (u"Kawi", "kaw", None), (u"Kazakh", "kaz", "kk"), (u"Khasi", "kha", None), (u"Khmer", "khm", "km"), (u"Khoisan (Other)", "khi", None), (u"Khotanese", "kho", None), (u"Kikuyu", "kik", "ki"), (u"Kimbundu", "kmb", None), (u"Kinyarwanda", "kin", "rw"), (u"Kirdki", "zza", None), (u"Kirghiz", "kir", "ky"), (u"Kirmanjki", "zza", None), (u"Klingon", "tlh", None), (u"Komi", "kom", "kv"), (u"Kongo", "kon", "kg"), (u"Konkani", "kok", None), (u"Korean", "kor", "ko"), (u"Kosraean", "kos", None), (u"Kpelle", "kpe", None), (u"Kru", "kro", None), (u"Kuanyama", "kua", "kj"), (u"Kumyk", "kum", None), (u"Kurdish", "kur", "ku"), (u"Kurukh", "kru", None), (u"Kutenai", "kut", None), (u"Kwanyama", "kua", "kj"), (u"Ladino", "lad", None), (u"Lahnda", "lah", None), (u"Lamba", "lam", None), (u"Lao", "lao", "lo"), (u"Latin", "lat", "la"), (u"Latvian", "lav", "lv"), (u"Letzeburgesch", "ltz", "lb"), (u"Lezghian", "lez", None), (u"Limburgan", "lim", "li"), (u"Limburger", "lim", "li"), (u"Limburgish", "lim", "li"), (u"Lingala", "lin", "ln"), (u"Lithuanian", "lit", "lt"), (u"Lojban", "jbo", None), (u"Low German", "nds", None), (u"Low Saxon", "nds", None), (u"Lower Sorbian", "dsb", None), (u"Lozi", "loz", None), (u"Luba-Katanga", "lub", "lu"), (u"Luba-Lulua", "lua", None), (u"Luiseno", "lui", None), (u"Lule Sami", "smj", None), (u"Lunda", "lun", None), (u"Luo (Kenya and Tanzania)", "luo", None), (u"Lushai", "lus", None), (u"Luxembourgish", "ltz", "lb"), (u"Macedo-Romanian", "rup", None), (u"Macedonian", "mac/mkd", "mk"), (u"Madurese", "mad", None), (u"Magahi", "mag", None), (u"Maithili", "mai", None), (u"Makasar", "mak", None), (u"Malagasy", "mlg", "mg"), (u"Malay", "may/msa", "ms"), (u"Malayalam", "mal", "ml"), (u"Maldivian", "div", "dv"), (u"Maltese", "mlt", "mt"), (u"Manchu", "mnc", None), (u"Mandar", "mdr", None), (u"Mandingo", "man", None), (u"Manipuri", "mni", None), (u"Manobo languages", "mno", None), (u"Manx", "glv", "gv"), (u"Maori", "mao/mri", "mi"), (u"Marathi", "mar", "mr"), (u"Mari", "chm", None), (u"Marshallese", "mah", "mh"), (u"Marwari", "mwr", None), (u"Masai", "mas", None), (u"Mayan languages", "myn", None), (u"Mende", "men", None), (u"Mi'kmaq", "mic", None), (u"Micmac", "mic", None), (u"Minangkabau", "min", None), (u"Mirandese", "mwl", None), (u"Miscellaneous languages", "mis", None), (u"Mohawk", "moh", None), (u"Moksha", "mdf", None), (u"Moldavian", "mol", "mo"), (u"Mon-Khmer (Other)", "mkh", None), (u"Mongo", "lol", None), (u"Mongolian", "mon", "mn"), (u"Mossi", "mos", None), (u"Multiple languages", "mul", None), (u"Munda languages", "mun", None), (u"N'Ko", "nqo", None), (u"Nahuatl", "nah", None), (u"Nauru", "nau", "na"), (u"Navaho", "nav", "nv"), (u"Navajo", "nav", "nv"), (u"Ndebele, North", "nde", "nd"), (u"Ndebele, South", "nbl", "nr"), (u"Ndonga", "ndo", "ng"), (u"Neapolitan", "nap", None), (u"Nepal Bhasa", "new", None), (u"Nepali", "nep", "ne"), (u"Newari", "new", None), (u"Nias", "nia", None), (u"Niger-Kordofanian (Other)", "nic", None), (u"Nilo-Saharan (Other)", "ssa", None), (u"Niuean", "niu", None), (u"No linguistic content", "zxx", None), (u"Nogai", "nog", None), (u"Norse, Old", "non", None), (u"North American Indian", "nai", None), (u"North Ndebele", "nde", "nd"), (u"Northern Frisian", "frr", None), (u"Northern Sami", "sme", "se"), (u"Northern Sotho", "nso", None), (u"Norwegian", "nor", "no"), (u"Norwegian Bokmål", "nob", "nb"), (u"Norwegian Nynorsk", "nno", "nn"), (u"Nubian languages", "nub", None), (u"Nyamwezi", "nym", None), (u"Nyanja", "nya", "ny"), (u"Nyankole", "nyn", None), (u"Nynorsk, Norwegian", "nno", "nn"), (u"Nyoro", "nyo", None), (u"Nzima", "nzi", None), (u"Occitan (post 1500)", "oci", "oc"), (u"Oirat", "xal", None), (u"Ojibwa", "oji", "oj"), (u"Old Bulgarian", "chu", "cu"), (u"Old Church Slavonic", "chu", "cu"), (u"Old Newari", "nwc", None), (u"Old Slavonic", "chu", "cu"), (u"Oriya", "ori", "or"), (u"Oromo", "orm", "om"), (u"Osage", "osa", None), (u"Ossetian", "oss", "os"), (u"Ossetic", "oss", "os"), (u"Otomian languages", "oto", None), (u"Pahlavi", "pal", None), (u"Palauan", "pau", None), (u"Pali", "pli", "pi"), (u"Pampanga", "pam", None), (u"Pangasinan", "pag", None), (u"Panjabi", "pan", "pa"), (u"Papiamento", "pap", None), (u"Papuan (Other)", "paa", None), (u"Pedi", "nso", None), (u"Persian", "per/fas", "fa"), (u"Persian, Old (ca.600-400 B.C.)", "peo", None), (u"Philippine (Other)", "phi", None), (u"Phoenician", "phn", None), (u"Pilipino", "fil", None), (u"Pohnpeian", "pon", None), (u"Polish", "pol", "pl"), (u"Portuguese", "por", "pt"), (u"Prakrit languages", "pra", None), (u"Provençal", "oci", "oc"), (u"Provençal, Old (to 1500)", "pro", None), (u"Punjabi", "pan", "pa"), (u"Pushto", "pus", "ps"), (u"Quechua", "que", "qu"), (u"Raeto-Romance", "roh", "rm"), (u"Rajasthani", "raj", None), (u"Rapanui", "rap", None), (u"Rarotongan", "rar", None), (u"Reserved for local use", "qaa/qtz", None), (u"Romance (Other)", "roa", None), (u"Romanian", "rum/ron", "ro"), (u"Romany", "rom", None), (u"Rundi", "run", "rn"), (u"Russian", "rus", "ru"), (u"Salishan languages", "sal", None), (u"Samaritan Aramaic", "sam", None), (u"Sami languages (Other)", "smi", None), (u"Samoan", "smo", "sm"), (u"Sandawe", "sad", None), (u"Sango", "sag", "sg"), (u"Sanskrit", "san", "sa"), (u"Santali", "sat", None), (u"Sardinian", "srd", "sc"), (u"Sasak", "sas", None), (u"Saxon, Low", "nds", None), (u"Scots", "sco", None), (u"Scottish Gaelic", "gla", "gd"), (u"Selkup", "sel", None), (u"Semitic (Other)", "sem", None), (u"Sepedi", "nso", None), (u"Serbian", "scc/srp", "sr"), (u"Serer", "srr", None), (u"Shan", "shn", None), (u"Shona", "sna", "sn"), (u"Sichuan Yi", "iii", "ii"), (u"Sicilian", "scn", None), (u"Sidamo", "sid", None), (u"Sign Languages", "sgn", None), (u"Siksika", "bla", None), (u"Sindhi", "snd", "sd"), (u"Sinhala", "sin", "si"), (u"Sinhalese", "sin", "si"), (u"Sino-Tibetan (Other)", "sit", None), (u"Siouan languages", "sio", None), (u"Skolt Sami", "sms", None), (u"Slave (Athapascan)", "den", None), (u"Slavic (Other)", "sla", None), (u"Slovak", "slo/slk", "sk"), (u"Slovenian", "slv", "sl"), (u"Sogdian", "sog", None), (u"Somali", "som", "so"), (u"Songhai", "son", None), (u"Soninke", "snk", None), (u"Sorbian languages", "wen", None), (u"Sotho, Northern", "nso", None), (u"Sotho, Southern", "sot", "st"), (u"South American Indian (Other)", "sai", None), (u"South Ndebele", "nbl", "nr"), (u"Southern Altai", "alt", None), (u"Southern Sami", "sma", None), (u"Spanish", "spa", "es"), (u"Sranan Togo", "srn", None), (u"Sukuma", "suk", None), (u"Sumerian", "sux", None), (u"Sundanese", "sun", "su"), (u"Susu", "sus", None), (u"Swahili", "swa", "sw"), (u"Swati", "ssw", "ss"), (u"Swedish", "swe", "sv"), (u"Swiss German", "gsw", None), (u"Syriac", "syr", None), (u"Tagalog", "tgl", "tl"), (u"Tahitian", "tah", "ty"), (u"Tai (Other)", "tai", None), (u"Tajik", "tgk", "tg"), (u"Tamashek", "tmh", None), (u"Tamil", "tam", "ta"), (u"Tatar", "tat", "tt"), (u"Telugu", "tel", "te"), (u"Tereno", "ter", None), (u"Tetum", "tet", None), (u"Thai", "tha", "th"), (u"Tibetan", "tib/bod", "bo"), (u"Tigre", "tig", None), (u"Tigrinya", "tir", "ti"), (u"Timne", "tem", None), (u"Tiv", "tiv", None), (u"tlhIngan-Hol", "tlh", None), (u"Tlingit", "tli", None), (u"Tok Pisin", "tpi", None), (u"Tokelau", "tkl", None), (u"Tonga (Nyasa)", "tog", None), (u"Tonga (Tonga Islands)", "ton", "to"), (u"Tsimshian", "tsi", None), (u"Tsonga", "tso", "ts"), (u"Tswana", "tsn", "tn"), (u"Tumbuka", "tum", None), (u"Tupi languages", "tup", None), (u"Turkish", "tur", "tr"), (u"Turkish, Ottoman (1500-1928)", "ota", None), (u"Turkmen", "tuk", "tk"), (u"Tuvalu", "tvl", None), (u"Tuvinian", "tyv", None), (u"Twi", "twi", "tw"), (u"Udmurt", "udm", None), (u"Ugaritic", "uga", None), (u"Uighur", "uig", "ug"), (u"Ukrainian", "ukr", "uk"), (u"Umbundu", "umb", None), (u"Undetermined", "und", None), (u"Upper Sorbian", "hsb", None), (u"Urdu", "urd", "ur"), (u"Uyghur", "uig", "ug"), (u"Uzbek", "uzb", "uz"), (u"Vai", "vai", None), (u"Valencian", "cat", "ca"), (u"Venda", "ven", "ve"), (u"Vietnamese", "vie", "vi"), (u"Volapük", "vol", "vo"), (u"Votic", "vot", None), (u"Wakashan languages", "wak", None), (u"Walamo", "wal", None), (u"Walloon", "wln", "wa"), (u"Waray", "war", None), (u"Washo", "was", None), (u"Welsh", "wel/cym", "cy"), (u"Western Frisian", "fry", "fy"), (u"Wolof", "wol", "wo"), (u"Xhosa", "xho", "xh"), (u"Yakut", "sah", None), (u"Yao", "yao", None), (u"Yapese", "yap", None), (u"Yiddish", "yid", "yi"), (u"Yoruba", "yor", "yo"), (u"Yupik languages", "ypk", None), (u"Zande", "znd", None), (u"Zapotec", "zap", None), (u"Zaza", "zza", None), (u"Zazaki", "zza", None), (u"Zenaga", "zen", None), (u"Zhuang", "zha", "za"), (u"Zulu", "zul", "zu"), (u"Zuni", "zun", None), ) # Bibliographic ISO-639-2 form (eg. "fre" => "French") ISO639_2 = {} for line in _ISO639: for key in line[1].split("/"): ISO639_2[key] = line[0] del _ISO639 hachoir-core-1.3.3/README0000644000175000017500000000220411251300147013753 0ustar haypohaypoHachoir project =============== Hachoir is a Python library used to represent of a binary file as a tree of Python objects. Each object has a type, a value, an address, etc. The goal is to be able to know the meaning of each bit in a file. Why using slow Python code instead of fast hardcoded C code? Hachoir has many interesting features: * Autofix: Hachoir is able to open invalid / truncated files * Lazy: Open a file is very fast since no information is read from file, data are read and/or computed when the user ask for it * Types: Hachoir has many predefined field types (integer, bit, string, etc.) and supports string with charset (ISO-8859-1, UTF-8, UTF-16, ...) * Addresses and sizes are stored in bit, so flags are stored as classic fields * Endian: You have to set endian once, and then number are converted in the right endian * Editor: Using Hachoir representation of data, you can edit, insert, remove data and then save in a new file. Website: http://bitbucket.org/haypo/hachoir/wiki/hachoir-core Installation ============ For the installation, use setup.py or see: http://bitbucket.org/haypo/hachoir/wiki/Install hachoir-core-1.3.3/ChangeLog0000644000175000017500000000716011342040274014656 0ustar haypohaypohachoir-core 1.3.3 (2010-02-26) =============================== * Add writelines() method to UnicodeStdout hachoir-core 1.3.2 (2010-01-28) =============================== * MANIFEST.in includes also the documentation hachoir-core 1.3.1 (2010-01-21) =============================== * Create MANIFEST.in to include ChangeLog and other files for setup.py hachoir-core 1.3 (2010-01-20) ============================= * Add more charsets to GenericString: CP874, WINDOWS-1250, WINDOWS-1251, WINDOWS-1254, WINDOWS-1255, WINDOWS-1256,WINDOWS-1257, WINDOWS-1258, ISO-8859-16 * Fix initLocale(): return charset even if config.unicode_stdout is False * initLocale() leave sys.stdout and sys.stderr unchanged if the readline module is loaded: Hachoir can now be used correctly with ipython * HachoirError: replace "message" attribute by "text" to fix Python 2.6 compatibility (message attribute is deprecated) * StaticFieldSet: fix Python 2.6 warning, object.__new__() takes one only argument (the class). * Fix GenericFieldSet.readMoreFields() result: don't count the number of added fields in a loop, use the number of fields before/after the operation using len() * GenericFieldSet.__iter__() supports iterable result for _fixFeedError() and _stopFeeding() * New seekable field set implementation in hachoir_core.field.new_seekable_field_set hachoir-core 1.2.1 (2008-10) ============================ * Create configuration option "unicode_stdout" which avoid replacing stdout and stderr by objects supporting unicode string * Create TimedeltaWin64 file type * Support WINDOWS-1252 and WINDOWS-1253 charsets for GenericString * guessBytesCharset() now supports ISO-8859-7 (greek) * durationWin64() is now deprecated, use TimedeltaWin64 instead hachoir-core 1.2 (2008-09) ========================== * Create Field.getFieldType(): describe a field type and gives some useful informations (eg. the charset for a string) * Create TimestampUnix64 * GenericString: only guess the charset once; if the charset attribute if not set, guess it when it's asked by the user. hachoir-core 1.1 (2008-04-01) ============================= Main change: string values are always encoded as Unicode. Details: * Create guessBytesCharset() and guessStreamCharset() * GenericString.createValue() is now always Unicode: if charset is not specified, try to guess it. Otherwise, use default charset (ISO-8859-1) * RawBits: add createRawDisplay() to avoid slow down on huge fields * Fix SeekableFieldSet.current_size (use offset and not current_max_size) * GenericString: fix UTF-16-LE string with missing nul byte * Add __nonzero__() method to GenericTimestamp * All stream errors now inherit from StreamError (instead of HachoirError), and create and OutputStreamError * humanDatetime(): strip microseconds by default (add optional argument to keep them) hachoir-core 1.0 (2007-07-10) ============================= Version 1.0.1 changelog: * Rename parser.tags to parser.PARSER_TAGS to be compatible with future hachoir-parser 1.0 Visible changes: * New field type: TimestampUUID60 * SeekableFieldSet: fix __getitem__() method and implement __iter__() and __len__() methods, so it can now be used in hachoir-wx * String value is always Unicode, even on conversion error: use * OutputStream: add readBytes() method * Create Language class using ISO-639-2 * Add hachoir_core.profiler module to run a profiler on a function * Add hachoir_core.timeout module to call a function with a timeout Minor changes: * Fix many spelling mistakes * Dict: use iteritems() instead of items() for faster operations on huge dictionaries hachoir-core-1.3.3/AUTHORS0000644000175000017500000000131711251277274014165 0ustar haypohaypo * Team Julien Muchembled Victor Stinner aka haypo Robert Xiao aka nneonneo - improve SeekableFieldSet * Packagers Arnaud Pithon aka bildoon - ArchLinux package (v0.5.2 and svn) Emmanuel GARETTE aka GnunuX - ArchLinux package (v0.5.2) Michael Scherer aka misc - Mandriva package (v0.5.2) Michel Casabona aka plumbear - Debian package (v0.5.2) Richard DEMONGEOT - Debian package (v0.5.2) Thomas de Grenier de Latour TGL - Gentoo ebuild hachoir-core-1.3.3/COPYING0000644000175000017500000004313311251277274014152 0ustar haypohaypo GNU GENERAL PUBLIC LICENSE Version 2, June 1991 Copyright (C) 1989, 1991 Free Software Foundation, Inc. 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Library General Public License instead.) You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all. The precise terms and conditions for copying, distribution and modification follow. GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you". Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does. 1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program. In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following: a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.) The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code. 4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it. 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License. 7. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 8. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 9. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation. 10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA Also add information on how to contact you by electronic and paper mail. If the program is interactive, make it output a short notice like this when it starts in an interactive mode: Gnomovision version 69, Copyright (C) year name of author Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than `show w' and `show c'; they could even be mouse-clicks or menu items--whatever suits your program. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the program, if necessary. Here is a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the program `Gnomovision' (which makes passes at compilers) written by James Hacker. , 1 April 1989 Ty Coon, President of Vice This General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Library General Public License instead of this License. hachoir-core-1.3.3/PKG-INFO0000644000175000017500000001457711342040416014211 0ustar haypohaypoMetadata-Version: 1.0 Name: hachoir-core Version: 1.3.3 Summary: Core of Hachoir framework: parse and edit binary files Home-page: http://bitbucket.org/haypo/hachoir/wiki/hachoir-core Author: Julien Muchembled, Victor Stinner Author-email: UNKNOWN License: GNU GPL v2 Download-URL: http://bitbucket.org/haypo/hachoir/wiki/hachoir-core Description: Hachoir project =============== Hachoir is a Python library used to represent of a binary file as a tree of Python objects. Each object has a type, a value, an address, etc. The goal is to be able to know the meaning of each bit in a file. Why using slow Python code instead of fast hardcoded C code? Hachoir has many interesting features: * Autofix: Hachoir is able to open invalid / truncated files * Lazy: Open a file is very fast since no information is read from file, data are read and/or computed when the user ask for it * Types: Hachoir has many predefined field types (integer, bit, string, etc.) and supports string with charset (ISO-8859-1, UTF-8, UTF-16, ...) * Addresses and sizes are stored in bit, so flags are stored as classic fields * Endian: You have to set endian once, and then number are converted in the right endian * Editor: Using Hachoir representation of data, you can edit, insert, remove data and then save in a new file. Website: http://bitbucket.org/haypo/hachoir/wiki/hachoir-core Installation ============ For the installation, use setup.py or see: http://bitbucket.org/haypo/hachoir/wiki/Install hachoir-core 1.3.3 (2010-02-26) =============================== * Add writelines() method to UnicodeStdout hachoir-core 1.3.2 (2010-01-28) =============================== * MANIFEST.in includes also the documentation hachoir-core 1.3.1 (2010-01-21) =============================== * Create MANIFEST.in to include ChangeLog and other files for setup.py hachoir-core 1.3 (2010-01-20) ============================= * Add more charsets to GenericString: CP874, WINDOWS-1250, WINDOWS-1251, WINDOWS-1254, WINDOWS-1255, WINDOWS-1256,WINDOWS-1257, WINDOWS-1258, ISO-8859-16 * Fix initLocale(): return charset even if config.unicode_stdout is False * initLocale() leave sys.stdout and sys.stderr unchanged if the readline module is loaded: Hachoir can now be used correctly with ipython * HachoirError: replace "message" attribute by "text" to fix Python 2.6 compatibility (message attribute is deprecated) * StaticFieldSet: fix Python 2.6 warning, object.__new__() takes one only argument (the class). * Fix GenericFieldSet.readMoreFields() result: don't count the number of added fields in a loop, use the number of fields before/after the operation using len() * GenericFieldSet.__iter__() supports iterable result for _fixFeedError() and _stopFeeding() * New seekable field set implementation in hachoir_core.field.new_seekable_field_set hachoir-core 1.2.1 (2008-10) ============================ * Create configuration option "unicode_stdout" which avoid replacing stdout and stderr by objects supporting unicode string * Create TimedeltaWin64 file type * Support WINDOWS-1252 and WINDOWS-1253 charsets for GenericString * guessBytesCharset() now supports ISO-8859-7 (greek) * durationWin64() is now deprecated, use TimedeltaWin64 instead hachoir-core 1.2 (2008-09) ========================== * Create Field.getFieldType(): describe a field type and gives some useful informations (eg. the charset for a string) * Create TimestampUnix64 * GenericString: only guess the charset once; if the charset attribute if not set, guess it when it's asked by the user. hachoir-core 1.1 (2008-04-01) ============================= Main change: string values are always encoded as Unicode. Details: * Create guessBytesCharset() and guessStreamCharset() * GenericString.createValue() is now always Unicode: if charset is not specified, try to guess it. Otherwise, use default charset (ISO-8859-1) * RawBits: add createRawDisplay() to avoid slow down on huge fields * Fix SeekableFieldSet.current_size (use offset and not current_max_size) * GenericString: fix UTF-16-LE string with missing nul byte * Add __nonzero__() method to GenericTimestamp * All stream errors now inherit from StreamError (instead of HachoirError), and create and OutputStreamError * humanDatetime(): strip microseconds by default (add optional argument to keep them) hachoir-core 1.0 (2007-07-10) ============================= Version 1.0.1 changelog: * Rename parser.tags to parser.PARSER_TAGS to be compatible with future hachoir-parser 1.0 Visible changes: * New field type: TimestampUUID60 * SeekableFieldSet: fix __getitem__() method and implement __iter__() and __len__() methods, so it can now be used in hachoir-wx * String value is always Unicode, even on conversion error: use * OutputStream: add readBytes() method * Create Language class using ISO-639-2 * Add hachoir_core.profiler module to run a profiler on a function * Add hachoir_core.timeout module to call a function with a timeout Minor changes: * Fix many spelling mistakes * Dict: use iteritems() instead of items() for faster operations on huge dictionaries Platform: UNKNOWN Classifier: Intended Audience :: Developers Classifier: Development Status :: 5 - Production/Stable Classifier: Environment :: Plugins Classifier: Topic :: Software Development :: Libraries :: Python Modules Classifier: Topic :: Utilities Classifier: License :: OSI Approved :: GNU General Public License (GPL) Classifier: Operating System :: OS Independent Classifier: Natural Language :: English Classifier: Programming Language :: Python