byteplay-0.2/0000755000175000017500000000000011445502375013472 5ustar iloweilowe00000000000000byteplay-0.2/byteplay.egg-info/0000755000175000017500000000000011445502375017015 5ustar iloweilowe00000000000000byteplay-0.2/byteplay.egg-info/zip-safe0000644000175000017500000000000111445502374020444 0ustar iloweilowe00000000000000 byteplay-0.2/byteplay.egg-info/SOURCES.txt0000644000175000017500000000036711445502375020707 0ustar iloweilowe00000000000000byteplay.py setup.py unittests.py byteplay.egg-info/PKG-INFO byteplay.egg-info/SOURCES.txt byteplay.egg-info/dependency_links.txt byteplay.egg-info/top_level.txt byteplay.egg-info/zip-safe examples/make_constants.py examples/make_constants_orig.pybyteplay-0.2/byteplay.egg-info/dependency_links.txt0000644000175000017500000000000111445502375023063 0ustar iloweilowe00000000000000 byteplay-0.2/byteplay.egg-info/PKG-INFO0000644000175000017500000006736711445502375020135 0ustar iloweilowe00000000000000Metadata-Version: 1.0 Name: byteplay Version: 0.2 Summary: bytecode manipulation library Home-page: http://code.google.com/p/byteplay Author: Noam Raph Author-email: noamraph@gmail.com License: LGPL Download-URL: http://code.google.com/p/byteplay/downloads/list Description: byteplay lets you convert Python code objects into equivalent objects which are easy to play with, and lets you convert those objects back into living Python code objects. It's useful for applying crazy transformations on Python functions, and is also useful in learning Python byte code intricacies. It currently works with Python 2.4 and up. byteplay Module Documentation ============================= About byteplay -------------- byteplay is a module which lets you easily play with Python bytecode. I wrote it because I needed to manipulate Python bytecode, but didn't find any suitable tool. Michael Hudson's bytecodehacks (http://bytecodehacks.sourceforge.net/) could have worked fine for me, but it only works with Python 1.5.2. I also looked at Phillip J. Eby's peak.util.assembler (http://pypi.python.org/pypi/BytecodeAssembler), but it's intended at creating new code objects from scratch, not for manipulating existing code objects. So I wrote byteplay. The basic idea is simple: define a new type, named Code, which is equivalent to Python code objects, but, unlike Python code objects, is easy to play with. "Equivalent" means that every Python code object can be converted to a Code object and vice-versa, without losing any important information on the way. "Easy to play with" means... well, exactly that. The representation should be as simple as possible, letting the infrastructure sort out technical details which do not affect the final behaviour. If you are interested in changing the behaviour of functions, or in assembling functions on your own, you may find byteplay useful. You may also find it useful if you are interested in how Python's bytecode actually works - byteplay lets you easily play with existing bytecode and see what happens, which is a great way to learn. You are also welcome to check byteplay's (pure Python) code, to see how it manipulates real bytecode. byteplay can be downloaded from http://byteplay.googlecode.com/svn/trunk/byteplay.py . See http://code.google.com/p/byteplay/ for a bit more administrative info. Feel free to improve this document - that's why it's on the wiki! Also, if you find it useful, please drop me an email at noamraph at gmail dot com - it would be nice knowing that what I did was useful to someone... A Quick Example --------------- Let's start from a quick example, to give you a taste of what byteplay does. Let's define this stupid function:: >>> def f(a, b): ... print (a, b) ... >>> f(3, 5) (3, 5) Now, let's use byteplay to see what its bytecode is actually doing:: >>> from byteplay import * >>> from pprint import pprint >>> c = Code.from_code(f.func_code) >>> pprint(c.code) [(SetLineno, 2), (LOAD_FAST, 'a'), (LOAD_FAST, 'b'), (BUILD_TUPLE, 2), (PRINT_ITEM, None), (PRINT_NEWLINE, None), (LOAD_CONST, None), (RETURN_VALUE, None)] I hope that this is pretty clear if you are a bit familiar with bytecode. The Code object contains a list of all operations, which are pairs of (opcode, arg). Not all opcodes have an argument, so they have None as their argument. You can see that no external tables are used: in the raw bytecode, the argument of many opcodes is an index to a table - for example, the argument of the LOAD_CONST opcode is an index to the co_consts table, which contains the actual constants. Here, the argument is the constant itself. Also note the SetLineno "opcode". It is not a real opcode, but it is used to declare where a line in the original source code begins. Besides another special opcode defined by byteplay, which we will see later, all other opcodes are the real opcodes used by the Python interpreter. By the way, if you want to see the code list in a form which is easier to read, you can simply print it, like this:: >>> print c.code 2 1 LOAD_FAST a 2 LOAD_FAST b 3 BUILD_TUPLE 2 4 PRINT_ITEM 5 PRINT_NEWLINE 6 LOAD_CONST None 7 RETURN_VALUE This is especially useful if the code contains jumps. See the description of the printcodelist function for another example. Ok, now let's play! Say we want to change the function, to print its arguments in reverse order. To do this, we will add a ROT_TWO opcode after the two arguments were loaded to the stack. See how simple it is:: >>> c.code[3:3] = [(ROT_TWO, None)] >>> f.func_code = c.to_code() >>> f(3, 5) (5, 3) Opcodes ------- We have seen that the code list contains opcode constants such as LOAD_FAST. These are instances of the Opcode class. The Opcode class is a subclass of int, which overrides the ``__repr__`` method to return the string representation of an opcode. This means that instead of using a constant such as LOAD_FAST, a numerical constant such as 124 can be used. Opcode instances are, of course, much easier to understand. The byteplay module creates Opcode instances for all the interpreter opcodes. They can be found in the ``opcodes`` set, and also in the module's global namespace, so you can write ``from byteplay import *`` and use the opcode constants immediately. byteplay doesn't include a constant for the EXTENDED_ARG opcode, as it is not used by byteplay's representation. Module Contents --------------- These are byteplay's public attributes, which are imported when ``from byteplay import *`` is done. ``POP_TOP``, ``ROT_TWO``, etc. All bytecode constants are imported by their names. ``opcodes`` A set of all Opcode instances. ``opmap`` A mapping from an opcode name to an Opcode instance. ``opname`` A mapping from an opcode number (and an Opcode instance) to its name. ``cmp_op`` A list of strings which represent comparison operators. In raw bytecode, the argument of the COMPARE_OP opcode is an index to this list. In the code list, it is the string representing the comparison. The following are sets of opcodes, which list opcodes according to their behaviour. ``hasarg`` This set contains all opcodes which have an argument (these are the opcodes which are >= HAVE_ARGUMENT). ``hasname`` This set contains all opcodes whose argument is an index to the co_names list. ``hasjrel`` This set contains all opcodes whose argument is a relative jump, that is, an offset by which to advance the byte code instruction pointer. ``hasjabs`` This set contains all opcodes whose argument is an absolute jump, that is, an address to which the instruction pointer should jump. ``hasjump`` This set contains all opcodes whose argument is a jump. It is simply ``hasjrel + hasjabs``. In byteplay, relative and absolute jumps behave in the same way, so this set is convenient. ``haslocal`` This set contains all opcodes which operate on local variables. ``hascompare`` This set contains all opcodes whose argument is a comparison operator - that is, only the COMPARE_OP opcode. ``hasfree`` This set contains all opcodes which operate on the cell and free variable storage. These are variables which are also used by an enclosing or an enclosed function. ``hascode`` This set contains all opcodes which expect a code object to be at the top of the stack. In the bytecode the Python compiler generates, they are always preceded by a LOAD_CONST opcode, which loads the code object. ``hasflow`` This set contains all opcodes which have a special flow behaviour. All other opcodes always continue to the next opcode after finished, unless an exception was raised. The following are the types of the first elements of the opcode list tuples. ``Opcode`` The type of all opcode constants. ``SetLineno`` This singleton is used like the "real" opcode constants, but only declares where starts the bytecode for a specific line in the source code. ``Label`` This is the type of label objects. This class does nothing - it is used as a way to refer to a place in the code list. Here come some additional functions. ``isopcode(obj)`` Use this function to check whether the first element of an operation pair is a real opcode. This simply returns ``obj is not SetLineno and not isinstance(obj, Label)``. ``getse(op[, arg])`` This function gets the stack effect of an opcode, as a (pop, push) tuple. The stack effect is the number of items popped from the stack, and the number of items pushed instead of them. If an item is only inspected, it is considered as though it was popped and pushed again. This function is meaningful only for opcodes not in hasflow - for other opcodes, ValueError will be raised. For some opcodes the argument is needed in order to calculate the stack effect. In that case, if arg isn't given, ValueError will be raised. ``printcodelist(code, to=sys.stdout)`` This function gets a code list and prints it in a way easier to read. For example, let's define a simple function:: >>> def f(a): ... if a < 3: ... b = a ... >>> c = Code.from_code(f.func_code) This is the code list itself:: >>> pprint(c.code) [(SetLineno, 2), (LOAD_FAST, 'a'), (LOAD_CONST, 3), (COMPARE_OP, '<'), (JUMP_IF_FALSE, ), (POP_TOP, None), (SetLineno, 3), (LOAD_FAST, 'a'), (STORE_FAST, 'b'), (JUMP_FORWARD, ), (, None), (POP_TOP, None), (, None), (LOAD_CONST, None), (RETURN_VALUE, None)] And this is the nicer representation:: >>> printcodelist(c.code) 2 1 LOAD_FAST a 2 LOAD_CONST 3 3 COMPARE_OP < 4 JUMP_IF_FALSE to 11 5 POP_TOP 3 7 LOAD_FAST a 8 STORE_FAST b 9 JUMP_FORWARD to 13 >> 11 POP_TOP >> 13 LOAD_CONST None 14 RETURN_VALUE As you can see, all opcodes are marked by their index in the list, and jumps show the index of the target opcode. For your convenience, another class was defined: ``CodeList`` This class is a list subclass, which only overrides the __str__ method to use ``printcodelist``. If the code list is an instance of CodeList, you don't have to type ``printcodelist(c.code)`` in order to see the nice representation - just type ``print c.code``. Code instances created from raw Python code objects already have that feature! And, last but not least - the Code class itself! The Code Class -------------- Constructor ~~~~~~~~~~~ :: Code(code, freevars, args, varargs, varkwargs, newlocals, name, filename, firstlineno, docstring) -> new Code object This constructs a new Code object. The argument are simply values for the Code object data attributes - see below. Data Attributes ~~~~~~~~~~~~~~~ We'll start with the data attributes - those are read/write, and distinguish one code instance from another. First come the attributes which affect the operation of the interpreter when it executes the code, and then come attributes which give extra information, useful for debugging and introspection. ``code`` This is the main part which describes what a Code object does. It is a list of pairs ``(opcode, arg)``. ``arg`` is the opcode argument, if it has one, or None if it doesn't. ``opcode`` can be of 3 types: * Regular opcodes. These are the opcodes which really define an operation of the interpreter. They can be regular ints, or Opcode instances. The meaning of the argument changes according to the opcode: - Opcodes not in ``hasarg`` don't have an argument. None should be used as the second item of the tuple. - The argument of opcodes in ``hasconst`` is the actual constant. - The argument of opcodes in ``hasname`` is the name, as a string. - The argument of opcodes in ``hasjump`` is a Label instance, which should point to a specific location in the code list. - The argument of opcodes in ``haslocal`` is the local variable name, as a string. - The argument of opcodes in ``hascompare`` is the string representing the comparison operator. - The argument of opcodes in ``hasfree`` is the name of the cell or free variable, as a string. - The argument of the remaining opcodes is the numerical argument found in raw bytecode. Its meaning is opcode specific. * ``SetLineno``. This is a singleton, which means that a line in the source code begins. Its argument is the line number. * labels. These are instances of the ``Label`` class. The label class does nothing - it is just used as a way to specify a place in the code list. Labels can be put in the code list and cause no action by themselves. They are used as the argument of opcodes which may cause a jump to a specific location in the code. ``freevars`` This is a list of strings - the names of variables defined in outer functions and used in this function or in functions defined inside it. The order of this list is important, since those variables are passed to the function as a sequence whose order should match the order of the ``freevars`` attribute. A few words about closures in Python may be in place. In Python, functions defined inside other functions can use variables defined in an outer function. We know each running function has a place to store local variables. But how can functions refer to variables defined in an outer scope? The solution is this: for every variable which is used in more than one scope, a new ``cell`` object is created. This object does one simple thing: it refers to one another object - the value of its variable. When the variable gets a new value, the cell object is updated too. A reference to the cell object is passed to any function which uses that variable. When an inner function is interested in the value of a variable of an outer scope, it uses the value referred by the cell object passed to it. An example might help understand this. Let's take a look at the bytecode of a simple example:: >>> def f(): ... a = 3 ... b = 5 ... def g(): ... return a + b ... >>> from byteplay import * >>> c = Code.from_code(f.func_code) >>> print c.code 2 1 LOAD_CONST 3 2 STORE_DEREF a 3 4 LOAD_CONST 5 5 STORE_DEREF b 4 7 LOAD_CLOSURE a 8 LOAD_CLOSURE b 9 BUILD_TUPLE 2 10 LOAD_CONST 11 MAKE_CLOSURE 0 12 STORE_FAST g 13 LOAD_CONST None 14 RETURN_VALUE >>> c.code[10][1].freevars ('a', 'b') >>> print c.code[10][1].code 5 1 LOAD_DEREF a 2 LOAD_DEREF b 3 BINARY_ADD 4 RETURN_VALUE We can see that LOAD_DEREF and STORE_DEREF opcodes are used to get and set the value of cell objects. There is no inherent difference between cell objects created by an outer function and cell objects used in an inner function. What makes the difference is whether a variable name was listed in the ``freevars`` attribute of the Code object - if it was not listed there, a new cell is created, and if it was listed there, the cell created by an outer function is used. We can also see how a function gets the cell objects it needs from its outer functions. The inner function is created with the MAKE_CLOSURE opcode, which pops two objects from the stack: first, the code object used to create the function. Second, a tuple with the cell objects used by the code (the tuple is created by the LOAD_CLOSURE opcodes, which push a cell object into the stack, and of course the BUILD_TUPLE opcode.) We can see that the order of the cells in the tuple match the order of the names in the ``freevars`` list - that's how the inner function knows that ``(LOAD_DEREF, 'a')`` means "load the value of the first cell in the tuple". ``args`` The list of arguments names of a function. For example:: >>> def f(a, b, *args, **kwargs): ... pass ... >>> Code.from_code(f.func_code).args ('a', 'b', 'args', 'kwargs') ``varargs`` A boolean: Does the function get a variable number of positional arguments? In other words: does it have a ``*args`` argument? If ``varargs`` is True, the argument which gets that extra positional arguments will be the last argument or the one before the last, depending on whether ``varkwargs`` is True. ``varkwargs`` A boolean: Does the function get a variable number of keyword arguments? In other words: does it have a ``**kwargs`` argument? If ``varkwargs`` is True, the argument which gets the extra keyword arguments will be the last argument. ``newlocals`` A boolean: Should a new local namespace be created for this code? This is True for functions and False for modules and exec code. Now come attributes with additional information about the code: ``name`` A string: The name of the code, which is usually the name of the function created from it. ``filename`` A string: The name of the source file from which the bytecode was compiled. ``firstlineno`` An int: The number of the first line in the source file from which the bytecode was compiled. ``docstring`` A string: The docstring for functions created from this code. Methods ~~~~~~~ These are the Code class methods. ``Code.from_code(code) -> new Code object`` This is a static method, which creates a new Code object from a raw Python code object. It is equivalent to the raw code object, that is, the resulting Code object can be converted to a new raw Python code object, which will have exactly the same behaviour as the original object. ``code.to_code() -> new code object`` This method converts a Code object into an equivalent raw Python code object, which can be executed just like any other code object. ``code1.__eq__(code2) -> bool`` Different Code objects can be meaningfully tested for equality. This tests that all attributes have the same value. For the code attribute, labels are compared to see if they form the same flow graph. Stack-depth Calculation ----------------------- What was described above is enough for using byteplay. However, if you encounter an "Inconsistent code" exception when you try to assemble your code and wonder what it means, or if you just want to learn more about Python's stack behaviour, this section is for you. Note: This section isn't as clear as it could have been, to say the least. If you like to improve it, feel free to do so - that's what wikis are for, aren't they? When assembling code objects, the code's maximum stack usage is needed. This is simply the maximum number of items expected on the frame's stack. If the actual number of items in stack exceeds this, Python may well fail with a segmentation fault. The question is then, how to calculate the maximum stack usage of a given code? There's most likely no general solution for this problem. However, code generated by Python's compiler has a nice property which makes it relatively easy to calculate the maximum stack usage. The property is that if we take a bytecode "line", and check the stack state whenever we reach that line, we will find the stack state when we reach that line is always the same, no matter how we got to that line. We'll call such code "regular". Now, this requires clarification: what is the "stack state" which is always the same, exactly? Obviously, the stack doesn't always contain the same objects when we reach a line. For now, we can assume that it simply means the number of items on the stack. This helps us a lot. If we know that every line can have exactly one stack state, and we know how every opcode changes the stack state, we can trace stack states along all possible code paths, and find the stack state of every reachable line. Then we can simply check what state had the largest number of stack items, and that's the maximum stack usage of the code. What will happen with code not generated by Python's compiler, if it doesn't fulfill the requirement that every line should have one state? When tracing the stack state for every line, we will find a line, which can be reached from several places, whose stack state changes according to the address from which we jumped to that line. In that case, An "Inconsistent code" exception will be raised. Ok, what is really what we called "stack state"? If every opcode pushed and popped a constant number of elements, the stack state could have been the number of items on stack. However, life isn't that simple. In real life, there are *blocks*. Blocks allow us to break from a loop, regardless of exactly how many items we have in stack. How? Simple. Before the loop starts, the SETUP_LOOP opcode is executed. This opcode records in a block the number of operands(items) currently in stack, and also a position in the code. When the POP_BLOCK is executed, the stack is restored to the recorded state by poping extra items, and the corresponding block is discarded. But if the BREAK_LOOP opcode is executed instead of POP_BLOCK, one more thing happens. The execution jumps to the position specified by the SETUP_LOOP opcode. Fortunately, we can still live with that. Instead of defining the stack state as a single number - the total number of elements in the stack, we will define the stack state as a sequence of numbers - the number of elements in the stack per each block. So, for example, if the state was (3, 5), after a BINARY_ADD operation the state will be (3, 4), because the operation pops two elements and pushes one element. If the state was (3, 5), after a PUSH_BLOCK operation the state will be (3, 5, 0), because a new block, without elements yet, was pushed. Another complication: the SETUP_FINALLY opcode specifies an address to jump to if an exception is raised or a BREAK_LOOP operation was executed. This address can also be reached by normal flow. However, the stack state in that address will be different, depending on what actually happened - if an exception was raised, 3 elements will be pushed, if BREAK_LOOP was executed, 2 elements will be pushed, and if nothing happened, 1 element will be pushed by a LOAD_CONST operation. This seemingly non-consistent state always ends with an END_FINALLY opcode. The END_FINALLY opcodes pops 1, 2 or 3 elements according to what it finds on stack, so we return to "consistent" state. How can we deal with that complexity? The solution is pretty simple. We will treat the SETUP_FINALLY opcode as if it pushes 1 element to its target - this makes it consistent with the 1 element which is pushed if the target is reached by normal flow. However, we will calculate the stack state as if at the target line there was an opcode which pushed 2 elements to the stack. This is done so that the maximum stack size calculation will be correct. Those 2 extra elements will be popped by the END_FINALLY opcode, which will be treated as though it always pops 3 elements. That's all! Just be aware of that when you are playing with SETUP_FINALLY and END_FINALLY opcodes... Platform: UNKNOWN byteplay-0.2/byteplay.egg-info/top_level.txt0000644000175000017500000000001111445502375021537 0ustar iloweilowe00000000000000byteplay byteplay-0.2/byteplay.py0000644000175000017500000010214411445502373015675 0ustar iloweilowe00000000000000# byteplay - Python bytecode assembler/disassembler. # Copyright (C) 2010 Noam Raphael # Homepage: http://code.google.com/p/byteplay # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the Free Software # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA # Many thanks to Greg X for adding support for Python 2.6 and 2.7! __version__ = '0.2' __all__ = ['opmap', 'opname', 'opcodes', 'cmp_op', 'hasarg', 'hasname', 'hasjrel', 'hasjabs', 'hasjump', 'haslocal', 'hascompare', 'hasfree', 'hascode', 'hasflow', 'getse', 'Opcode', 'SetLineno', 'Label', 'isopcode', 'Code', 'CodeList', 'printcodelist'] import opcode from dis import findlabels import types from array import array import operator import itertools import sys import warnings from cStringIO import StringIO ###################################################################### # Define opcodes and information about them python_version = '.'.join(str(x) for x in sys.version_info[:2]) if python_version not in ('2.4', '2.5', '2.6', '2.7'): warnings.warn("byteplay doesn't support Python version "+python_version) class Opcode(int): """An int which represents an opcode - has a nicer repr.""" def __repr__(self): return opname[self] __str__ = __repr__ class CodeList(list): """A list for storing opcode tuples - has a nicer __str__.""" def __str__(self): f = StringIO() printcodelist(self, f) return f.getvalue() opmap = dict((name.replace('+', '_'), Opcode(code)) for name, code in opcode.opmap.iteritems() if name != 'EXTENDED_ARG') opname = dict((code, name) for name, code in opmap.iteritems()) opcodes = set(opname) def globalize_opcodes(): for name, code in opmap.iteritems(): globals()[name] = code __all__.append(name) globalize_opcodes() cmp_op = opcode.cmp_op hasarg = set(x for x in opcodes if x >= opcode.HAVE_ARGUMENT) hasconst = set(Opcode(x) for x in opcode.hasconst) hasname = set(Opcode(x) for x in opcode.hasname) hasjrel = set(Opcode(x) for x in opcode.hasjrel) hasjabs = set(Opcode(x) for x in opcode.hasjabs) hasjump = hasjrel.union(hasjabs) haslocal = set(Opcode(x) for x in opcode.haslocal) hascompare = set(Opcode(x) for x in opcode.hascompare) hasfree = set(Opcode(x) for x in opcode.hasfree) hascode = set([MAKE_FUNCTION, MAKE_CLOSURE]) class _se: """Quick way of defining static stack effects of opcodes""" # Taken from assembler.py by Phillip J. Eby NOP = 0,0 POP_TOP = 1,0 ROT_TWO = 2,2 ROT_THREE = 3,3 ROT_FOUR = 4,4 DUP_TOP = 1,2 UNARY_POSITIVE = UNARY_NEGATIVE = UNARY_NOT = UNARY_CONVERT = \ UNARY_INVERT = GET_ITER = LOAD_ATTR = 1,1 IMPORT_FROM = 1,2 BINARY_POWER = BINARY_MULTIPLY = BINARY_DIVIDE = BINARY_FLOOR_DIVIDE = \ BINARY_TRUE_DIVIDE = BINARY_MODULO = BINARY_ADD = BINARY_SUBTRACT = \ BINARY_SUBSCR = BINARY_LSHIFT = BINARY_RSHIFT = BINARY_AND = \ BINARY_XOR = BINARY_OR = COMPARE_OP = 2,1 INPLACE_POWER = INPLACE_MULTIPLY = INPLACE_DIVIDE = \ INPLACE_FLOOR_DIVIDE = INPLACE_TRUE_DIVIDE = INPLACE_MODULO = \ INPLACE_ADD = INPLACE_SUBTRACT = INPLACE_LSHIFT = INPLACE_RSHIFT = \ INPLACE_AND = INPLACE_XOR = INPLACE_OR = 2,1 SLICE_0, SLICE_1, SLICE_2, SLICE_3 = \ (1,1),(2,1),(2,1),(3,1) STORE_SLICE_0, STORE_SLICE_1, STORE_SLICE_2, STORE_SLICE_3 = \ (2,0),(3,0),(3,0),(4,0) DELETE_SLICE_0, DELETE_SLICE_1, DELETE_SLICE_2, DELETE_SLICE_3 = \ (1,0),(2,0),(2,0),(3,0) STORE_SUBSCR = 3,0 DELETE_SUBSCR = STORE_ATTR = 2,0 DELETE_ATTR = STORE_DEREF = 1,0 PRINT_NEWLINE = 0,0 PRINT_EXPR = PRINT_ITEM = PRINT_NEWLINE_TO = IMPORT_STAR = 1,0 STORE_NAME = STORE_GLOBAL = STORE_FAST = 1,0 PRINT_ITEM_TO = 2,0 LOAD_LOCALS = LOAD_CONST = LOAD_NAME = LOAD_GLOBAL = LOAD_FAST = \ LOAD_CLOSURE = LOAD_DEREF = BUILD_MAP = 0,1 DELETE_FAST = DELETE_GLOBAL = DELETE_NAME = 0,0 EXEC_STMT = 3,0 BUILD_CLASS = 3,1 STORE_MAP = MAP_ADD = 2,0 SET_ADD = 1,0 if python_version == '2.4': YIELD_VALUE = 1,0 IMPORT_NAME = 1,1 LIST_APPEND = 2,0 elif python_version == '2.5': YIELD_VALUE = 1,1 IMPORT_NAME = 2,1 LIST_APPEND = 2,0 elif python_version == '2.6': YIELD_VALUE = 1,1 IMPORT_NAME = 2,1 LIST_APPEND = 2,0 elif python_version == '2.7': YIELD_VALUE = 1,1 IMPORT_NAME = 2,1 LIST_APPEND = 1,0 _se = dict((op, getattr(_se, opname[op])) for op in opcodes if hasattr(_se, opname[op])) hasflow = opcodes - set(_se) - \ set([CALL_FUNCTION, CALL_FUNCTION_VAR, CALL_FUNCTION_KW, CALL_FUNCTION_VAR_KW, BUILD_TUPLE, BUILD_LIST, UNPACK_SEQUENCE, BUILD_SLICE, DUP_TOPX, RAISE_VARARGS, MAKE_FUNCTION, MAKE_CLOSURE]) if python_version == '2.7': hasflow = hasflow - set([BUILD_SET]) def getse(op, arg=None): """Get the stack effect of an opcode, as a (pop, push) tuple. If an arg is needed and is not given, a ValueError is raised. If op isn't a simple opcode, that is, the flow doesn't always continue to the next opcode, a ValueError is raised. """ try: return _se[op] except KeyError: # Continue to opcodes with an effect that depends on arg pass if arg is None: raise ValueError, "Opcode stack behaviour depends on arg" def get_func_tup(arg, nextra): if arg > 0xFFFF: raise ValueError, "Can only split a two-byte argument" return (nextra + 1 + (arg & 0xFF) + 2*((arg >> 8) & 0xFF), 1) if op == CALL_FUNCTION: return get_func_tup(arg, 0) elif op == CALL_FUNCTION_VAR: return get_func_tup(arg, 1) elif op == CALL_FUNCTION_KW: return get_func_tup(arg, 1) elif op == CALL_FUNCTION_VAR_KW: return get_func_tup(arg, 2) elif op == BUILD_TUPLE: return arg, 1 elif op == BUILD_LIST: return arg, 1 elif python_version == '2.7' and op == BUILD_SET: return arg, 1 elif op == UNPACK_SEQUENCE: return 1, arg elif op == BUILD_SLICE: return arg, 1 elif op == DUP_TOPX: return arg, arg*2 elif op == RAISE_VARARGS: return 1+arg, 1 elif op == MAKE_FUNCTION: return 1+arg, 1 elif op == MAKE_CLOSURE: if python_version == '2.4': raise ValueError, "The stack effect of MAKE_CLOSURE depends on TOS" else: return 2+arg, 1 else: raise ValueError, "The opcode %r isn't recognized or has a special "\ "flow control" % op class SetLinenoType(object): def __repr__(self): return 'SetLineno' SetLineno = SetLinenoType() class Label(object): pass def isopcode(obj): """Return whether obj is an opcode - not SetLineno or Label""" return obj is not SetLineno and not isinstance(obj, Label) # Flags from code.h CO_OPTIMIZED = 0x0001 # use LOAD/STORE_FAST instead of _NAME CO_NEWLOCALS = 0x0002 # only cleared for module/exec code CO_VARARGS = 0x0004 CO_VARKEYWORDS = 0x0008 CO_NESTED = 0x0010 # ??? CO_GENERATOR = 0x0020 CO_NOFREE = 0x0040 # set if no free or cell vars CO_GENERATOR_ALLOWED = 0x1000 # unused # The future flags are only used on code generation, so we can ignore them. # (It does cause some warnings, though.) CO_FUTURE_DIVISION = 0x2000 CO_FUTURE_ABSOLUTE_IMPORT = 0x4000 CO_FUTURE_WITH_STATEMENT = 0x8000 ###################################################################### # Define the Code class class Code(object): """An object which holds all the information which a Python code object holds, but in an easy-to-play-with representation. The attributes are: Affecting action ---------------- code - list of 2-tuples: the code freevars - list of strings: the free vars of the code (those are names of variables created in outer functions and used in the function) args - list of strings: the arguments of the code varargs - boolean: Does args end with a '*args' argument varkwargs - boolean: Does args end with a '**kwargs' argument newlocals - boolean: Should a new local namespace be created. (True in functions, False for module and exec code) Not affecting action -------------------- name - string: the name of the code (co_name) filename - string: the file name of the code (co_filename) firstlineno - int: the first line number (co_firstlineno) docstring - string or None: the docstring (the first item of co_consts, if it's str or unicode) code is a list of 2-tuples. The first item is an opcode, or SetLineno, or a Label instance. The second item is the argument, if applicable, or None. code can be a CodeList instance, which will produce nicer output when being printed. """ def __init__(self, code, freevars, args, varargs, varkwargs, newlocals, name, filename, firstlineno, docstring): self.code = code self.freevars = freevars self.args = args self.varargs = varargs self.varkwargs = varkwargs self.newlocals = newlocals self.name = name self.filename = filename self.firstlineno = firstlineno self.docstring = docstring @staticmethod def _findlinestarts(code): """Find the offsets in a byte code which are start of lines in the source. Generate pairs (offset, lineno) as described in Python/compile.c. This is a modified version of dis.findlinestarts, which allows multiple "line starts" with the same line number. """ byte_increments = [ord(c) for c in code.co_lnotab[0::2]] line_increments = [ord(c) for c in code.co_lnotab[1::2]] lineno = code.co_firstlineno addr = 0 for byte_incr, line_incr in zip(byte_increments, line_increments): if byte_incr: yield (addr, lineno) addr += byte_incr lineno += line_incr yield (addr, lineno) @classmethod def from_code(cls, co): """Disassemble a Python code object into a Code object.""" co_code = co.co_code labels = dict((addr, Label()) for addr in findlabels(co_code)) linestarts = dict(cls._findlinestarts(co)) cellfree = co.co_cellvars + co.co_freevars code = CodeList() n = len(co_code) i = 0 extended_arg = 0 while i < n: op = Opcode(ord(co_code[i])) if i in labels: code.append((labels[i], None)) if i in linestarts: code.append((SetLineno, linestarts[i])) i += 1 if op in hascode: lastop, lastarg = code[-1] if lastop != LOAD_CONST: raise ValueError, \ "%s should be preceded by LOAD_CONST code" % op code[-1] = (LOAD_CONST, Code.from_code(lastarg)) if op not in hasarg: code.append((op, None)) else: arg = ord(co_code[i]) + ord(co_code[i+1])*256 + extended_arg extended_arg = 0 i += 2 if op == opcode.EXTENDED_ARG: extended_arg = arg << 16 elif op in hasconst: code.append((op, co.co_consts[arg])) elif op in hasname: code.append((op, co.co_names[arg])) elif op in hasjabs: code.append((op, labels[arg])) elif op in hasjrel: code.append((op, labels[i + arg])) elif op in haslocal: code.append((op, co.co_varnames[arg])) elif op in hascompare: code.append((op, cmp_op[arg])) elif op in hasfree: code.append((op, cellfree[arg])) else: code.append((op, arg)) varargs = bool(co.co_flags & CO_VARARGS) varkwargs = bool(co.co_flags & CO_VARKEYWORDS) newlocals = bool(co.co_flags & CO_NEWLOCALS) args = co.co_varnames[:co.co_argcount + varargs + varkwargs] if co.co_consts and isinstance(co.co_consts[0], basestring): docstring = co.co_consts[0] else: docstring = None return cls(code = code, freevars = co.co_freevars, args = args, varargs = varargs, varkwargs = varkwargs, newlocals = newlocals, name = co.co_name, filename = co.co_filename, firstlineno = co.co_firstlineno, docstring = docstring, ) def __eq__(self, other): if (self.freevars != other.freevars or self.args != other.args or self.varargs != other.varargs or self.varkwargs != other.varkwargs or self.newlocals != other.newlocals or self.name != other.name or self.filename != other.filename or self.firstlineno != other.firstlineno or self.docstring != other.docstring or len(self.code) != len(other.code) ): return False # Compare code. This isn't trivial because labels should be matching, # not equal. labelmapping = {} for (op1, arg1), (op2, arg2) in itertools.izip(self.code, other.code): if isinstance(op1, Label): if labelmapping.setdefault(op1, op2) is not op2: return False else: if op1 != op2: return False if op1 in hasjump: if labelmapping.setdefault(arg1, arg2) is not arg2: return False elif op1 in hasarg: if arg1 != arg2: return False return True def _compute_flags(self): opcodes = set(op for op, arg in self.code if isopcode(op)) optimized = (STORE_NAME not in opcodes and LOAD_NAME not in opcodes and DELETE_NAME not in opcodes) generator = (YIELD_VALUE in opcodes) nofree = not (opcodes.intersection(hasfree)) flags = 0 if optimized: flags |= CO_OPTIMIZED if self.newlocals: flags |= CO_NEWLOCALS if self.varargs: flags |= CO_VARARGS if self.varkwargs: flags |= CO_VARKEYWORDS if generator: flags |= CO_GENERATOR if nofree: flags |= CO_NOFREE return flags def _compute_stacksize(self): """Get a code list, compute its maximal stack usage.""" # This is done by scanning the code, and computing for each opcode # the stack state at the opcode. code = self.code # A mapping from labels to their positions in the code list label_pos = dict((op, pos) for pos, (op, arg) in enumerate(code) if isinstance(op, Label)) # sf_targets are the targets of SETUP_FINALLY opcodes. They are recorded # because they have special stack behaviour. If an exception was raised # in the block pushed by a SETUP_FINALLY opcode, the block is popped # and 3 objects are pushed. On return or continue, the block is popped # and 2 objects are pushed. If nothing happened, the block is popped by # a POP_BLOCK opcode and 1 object is pushed by a (LOAD_CONST, None) # operation. # # Our solution is to record the stack state of SETUP_FINALLY targets # as having 3 objects pushed, which is the maximum. However, to make # stack recording consistent, the get_next_stacks function will always # yield the stack state of the target as if 1 object was pushed, but # this will be corrected in the actual stack recording. sf_targets = set(label_pos[arg] for op, arg in code if op == SETUP_FINALLY) # What we compute - for each opcode, its stack state, as an n-tuple. # n is the number of blocks pushed. For each block, we record the number # of objects pushed. stacks = [None] * len(code) def get_next_stacks(pos, curstack): """Get a code position and the stack state before the operation was done, and yield pairs (pos, curstack) for the next positions to be explored - those are the positions to which you can get from the given (pos, curstack). If the given position was already explored, nothing will be yielded. """ op, arg = code[pos] if isinstance(op, Label): # We should check if we already reached a node only if it is # a label. if pos in sf_targets: curstack = curstack[:-1] + (curstack[-1] + 2,) if stacks[pos] is None: stacks[pos] = curstack else: if stacks[pos] != curstack: raise ValueError, "Inconsistent code" return def newstack(n): # Return a new stack, modified by adding n elements to the last # block if curstack[-1] + n < 0: raise ValueError, "Popped a non-existing element" return curstack[:-1] + (curstack[-1]+n,) if not isopcode(op): # label or SetLineno - just continue to next line yield pos+1, curstack elif op in (STOP_CODE, RETURN_VALUE, RAISE_VARARGS): # No place in particular to continue to pass elif op == MAKE_CLOSURE and python_version == '2.4': # This is only relevant in Python 2.4 - in Python 2.5 the stack # effect of MAKE_CLOSURE can be calculated from the arg. # In Python 2.4, it depends on the number of freevars of TOS, # which should be a code object. if pos == 0: raise ValueError, \ "MAKE_CLOSURE can't be the first opcode" lastop, lastarg = code[pos-1] if lastop != LOAD_CONST: raise ValueError, \ "MAKE_CLOSURE should come after a LOAD_CONST op" try: nextrapops = len(lastarg.freevars) except AttributeError: try: nextrapops = len(lastarg.co_freevars) except AttributeError: raise ValueError, \ "MAKE_CLOSURE preceding const should "\ "be a code or a Code object" yield pos+1, newstack(-arg-nextrapops) elif op not in hasflow: # Simple change of stack pop, push = getse(op, arg) yield pos+1, newstack(push - pop) elif op in (JUMP_FORWARD, JUMP_ABSOLUTE): # One possibility for a jump yield label_pos[arg], curstack elif python_version < '2.7' and op in (JUMP_IF_FALSE, JUMP_IF_TRUE): # Two possibilities for a jump yield label_pos[arg], curstack yield pos+1, curstack elif python_version >= '2.7' and op in (POP_JUMP_IF_FALSE, POP_JUMP_IF_TRUE): # Two possibilities for a jump yield label_pos[arg], newstack(-1) yield pos+1, newstack(-1) elif python_version >= '2.7' and op in (JUMP_IF_TRUE_OR_POP, JUMP_IF_FALSE_OR_POP): # Two possibilities for a jump yield label_pos[arg], curstack yield pos+1, newstack(-1) elif op == FOR_ITER: # FOR_ITER pushes next(TOS) on success, and pops TOS and jumps # on failure yield label_pos[arg], newstack(-1) yield pos+1, newstack(1) elif op == BREAK_LOOP: # BREAK_LOOP jumps to a place specified on block creation, so # it is ignored here pass elif op == CONTINUE_LOOP: # CONTINUE_LOOP jumps to the beginning of a loop which should # already ave been discovered, but we verify anyway. # It pops a block. if python_version == '2.6': pos, stack = label_pos[arg], curstack[:-1] if stacks[pos] != stack: #this could be a loop with a 'with' inside yield pos, stack[:-1] + (stack[-1]-1,) else: yield pos, stack else: yield label_pos[arg], curstack[:-1] elif op == SETUP_LOOP: # We continue with a new block. # On break, we jump to the label and return to current stack # state. yield label_pos[arg], curstack yield pos+1, curstack + (0,) elif op == SETUP_EXCEPT: # We continue with a new block. # On exception, we jump to the label with 3 extra objects on # stack yield label_pos[arg], newstack(3) yield pos+1, curstack + (0,) elif op == SETUP_FINALLY: # We continue with a new block. # On exception, we jump to the label with 3 extra objects on # stack, but to keep stack recording consistent, we behave as # if we add only 1 object. Extra 2 will be added to the actual # recording. yield label_pos[arg], newstack(1) yield pos+1, curstack + (0,) elif python_version == '2.7' and op == SETUP_WITH: yield label_pos[arg], curstack yield pos+1, newstack(-1) + (1,) elif op == POP_BLOCK: # Just pop the block yield pos+1, curstack[:-1] elif op == END_FINALLY: # Since stack recording of SETUP_FINALLY targets is of 3 pushed # objects (as when an exception is raised), we pop 3 objects. yield pos+1, newstack(-3) elif op == WITH_CLEANUP: # Since WITH_CLEANUP is always found after SETUP_FINALLY # targets, and the stack recording is that of a raised # exception, we can simply pop 1 object and let END_FINALLY # pop the remaining 3. if python_version == '2.7': yield pos+1, newstack(2) else: yield pos+1, newstack(-1) else: assert False, "Unhandled opcode: %r" % op # Now comes the calculation: open_positions holds positions which are # yet to be explored. In each step we take one open position, and # explore it by adding the positions to which you can get from it, to # open_positions. On the way, we update maxsize. # open_positions is a list of tuples: (pos, stack state) maxsize = 0 open_positions = [(0, (0,))] while open_positions: pos, curstack = open_positions.pop() maxsize = max(maxsize, sum(curstack)) open_positions.extend(get_next_stacks(pos, curstack)) return maxsize def to_code(self): """Assemble a Python code object from a Code object.""" co_argcount = len(self.args) - self.varargs - self.varkwargs co_stacksize = self._compute_stacksize() co_flags = self._compute_flags() co_consts = [self.docstring] co_names = [] co_varnames = list(self.args) co_freevars = tuple(self.freevars) # We find all cellvars beforehand, for two reasons: # 1. We need the number of them to construct the numeric argument # for ops in "hasfree". # 2. We need to put arguments which are cell vars in the beginning # of co_cellvars cellvars = set(arg for op, arg in self.code if isopcode(op) and op in hasfree and arg not in co_freevars) co_cellvars = [x for x in self.args if x in cellvars] def index(seq, item, eq=operator.eq, can_append=True): """Find the index of item in a sequence and return it. If it is not found in the sequence, and can_append is True, it is appended to the sequence. eq is the equality operator to use. """ for i, x in enumerate(seq): if eq(x, item): return i else: if can_append: seq.append(item) return len(seq) - 1 else: raise IndexError, "Item not found" # List of tuples (pos, label) to be filled later jumps = [] # A mapping from a label to its position label_pos = {} # Last SetLineno lastlineno = self.firstlineno lastlinepos = 0 co_code = array('B') co_lnotab = array('B') for i, (op, arg) in enumerate(self.code): if isinstance(op, Label): label_pos[op] = len(co_code) elif op is SetLineno: incr_lineno = arg - lastlineno incr_pos = len(co_code) - lastlinepos lastlineno = arg lastlinepos = len(co_code) if incr_lineno == 0 and incr_pos == 0: co_lnotab.append(0) co_lnotab.append(0) else: while incr_pos > 255: co_lnotab.append(255) co_lnotab.append(0) incr_pos -= 255 while incr_lineno > 255: co_lnotab.append(incr_pos) co_lnotab.append(255) incr_pos = 0 incr_lineno -= 255 if incr_pos or incr_lineno: co_lnotab.append(incr_pos) co_lnotab.append(incr_lineno) elif op == opcode.EXTENDED_ARG: raise ValueError, "EXTENDED_ARG not supported in Code objects" elif not op in hasarg: co_code.append(op) else: if op in hasconst: if isinstance(arg, Code) and i < len(self.code)-1 and \ self.code[i+1][0] in hascode: arg = arg.to_code() arg = index(co_consts, arg, operator.is_) elif op in hasname: arg = index(co_names, arg) elif op in hasjump: # arg will be filled later jumps.append((len(co_code), arg)) arg = 0 elif op in haslocal: arg = index(co_varnames, arg) elif op in hascompare: arg = index(cmp_op, arg, can_append=False) elif op in hasfree: try: arg = index(co_freevars, arg, can_append=False) \ + len(cellvars) except IndexError: arg = index(co_cellvars, arg) else: # arg is ok pass if arg > 0xFFFF: co_code.append(opcode.EXTENDED_ARG) co_code.append((arg >> 16) & 0xFF) co_code.append((arg >> 24) & 0xFF) co_code.append(op) co_code.append(arg & 0xFF) co_code.append((arg >> 8) & 0xFF) for pos, label in jumps: jump = label_pos[label] if co_code[pos] in hasjrel: jump -= pos+3 if jump > 0xFFFF: raise NotImplementedError, "Extended jumps not implemented" co_code[pos+1] = jump & 0xFF co_code[pos+2] = (jump >> 8) & 0xFF co_code = co_code.tostring() co_lnotab = co_lnotab.tostring() co_consts = tuple(co_consts) co_names = tuple(co_names) co_varnames = tuple(co_varnames) co_nlocals = len(co_varnames) co_cellvars = tuple(co_cellvars) return types.CodeType(co_argcount, co_nlocals, co_stacksize, co_flags, co_code, co_consts, co_names, co_varnames, self.filename, self.name, self.firstlineno, co_lnotab, co_freevars, co_cellvars) def printcodelist(codelist, to=sys.stdout): """Get a code list. Print it nicely.""" labeldict = {} pendinglabels = [] for i, (op, arg) in enumerate(codelist): if isinstance(op, Label): pendinglabels.append(op) elif op is SetLineno: pass else: while pendinglabels: labeldict[pendinglabels.pop()] = i lineno = None islabel = False for i, (op, arg) in enumerate(codelist): if op is SetLineno: lineno = arg print >> to continue if isinstance(op, Label): islabel = True continue if lineno is None: linenostr = '' else: linenostr = str(lineno) lineno = None if islabel: islabelstr = '>>' islabel = False else: islabelstr = '' if op in hasconst: argstr = repr(arg) elif op in hasjump: try: argstr = 'to ' + str(labeldict[arg]) except KeyError: argstr = repr(arg) elif op in hasarg: argstr = str(arg) else: argstr = '' print >> to, '%3s %2s %4d %-20s %s' % ( linenostr, islabelstr, i, op, argstr) def recompile(filename): """Create a .pyc by disassembling the file and assembling it again, printing a message that the reassembled file was loaded.""" # Most of the code here based on the compile.py module. import os import imp import marshal import struct f = open(filename, 'U') try: timestamp = long(os.fstat(f.fileno()).st_mtime) except AttributeError: timestamp = long(os.stat(filename).st_mtime) codestring = f.read() f.close() if codestring and codestring[-1] != '\n': codestring = codestring + '\n' try: codeobject = compile(codestring, filename, 'exec') except SyntaxError: print >> sys.stderr, "Skipping %s - syntax error." % filename return cod = Code.from_code(codeobject) message = "reassembled %r imported.\n" % filename cod.code[:0] = [ # __import__('sys').stderr.write(message) (LOAD_GLOBAL, '__import__'), (LOAD_CONST, 'sys'), (CALL_FUNCTION, 1), (LOAD_ATTR, 'stderr'), (LOAD_ATTR, 'write'), (LOAD_CONST, message), (CALL_FUNCTION, 1), (POP_TOP, None), ] codeobject2 = cod.to_code() fc = open(filename+'c', 'wb') fc.write('\0\0\0\0') fc.write(struct.pack('> sys.stderr, filename recompile(filename) else: filename = os.path.abspath(path) recompile(filename) def main(): import os if len(sys.argv) != 2 or not os.path.exists(sys.argv[1]): print """\ Usage: %s dir Search recursively for *.py in the given directory, disassemble and assemble them, adding a note when each file is imported. Use it to test byteplay like this: > byteplay.py Lib > make test Some FutureWarnings may be raised, but that's expected. Tip: before doing this, check to see which tests fail even without reassembling them... """ % sys.argv[0] sys.exit(1) recompile_all(sys.argv[1]) if __name__ == '__main__': main() byteplay-0.2/unittests.py0000644000175000017500000000027711445502373016112 0ustar iloweilowe00000000000000def test_python26_build_map(): import sys if '.'.join(str(x) for x in sys.version_info[:2]) == '2.6': from byteplay import Code Code.from_code((lambda: {'a': 1}).func_code).to_code() byteplay-0.2/setup.py0000644000175000017500000005723511445502373015216 0ustar iloweilowe00000000000000#!/usr/bin/python from setuptools import setup, find_packages from byteplay import __version__ as lib_version setup( name = 'byteplay', author='Noam Raph', author_email='noamraph@gmail.com', url='http://code.google.com/p/byteplay', download_url='http://code.google.com/p/byteplay/downloads/list', version = lib_version, py_modules = ['byteplay'], zip_safe = True, license='LGPL', description='bytecode manipulation library', long_description = """byteplay lets you convert Python code objects into equivalent objects which are easy to play with, and lets you convert those objects back into living Python code objects. It's useful for applying crazy transformations on Python functions, and is also useful in learning Python byte code intricacies. It currently works with Python 2.4 and up. byteplay Module Documentation ============================= About byteplay -------------- byteplay is a module which lets you easily play with Python bytecode. I wrote it because I needed to manipulate Python bytecode, but didn't find any suitable tool. Michael Hudson's bytecodehacks (http://bytecodehacks.sourceforge.net/) could have worked fine for me, but it only works with Python 1.5.2. I also looked at Phillip J. Eby's peak.util.assembler (http://pypi.python.org/pypi/BytecodeAssembler), but it's intended at creating new code objects from scratch, not for manipulating existing code objects. So I wrote byteplay. The basic idea is simple: define a new type, named Code, which is equivalent to Python code objects, but, unlike Python code objects, is easy to play with. "Equivalent" means that every Python code object can be converted to a Code object and vice-versa, without losing any important information on the way. "Easy to play with" means... well, exactly that. The representation should be as simple as possible, letting the infrastructure sort out technical details which do not affect the final behaviour. If you are interested in changing the behaviour of functions, or in assembling functions on your own, you may find byteplay useful. You may also find it useful if you are interested in how Python's bytecode actually works - byteplay lets you easily play with existing bytecode and see what happens, which is a great way to learn. You are also welcome to check byteplay's (pure Python) code, to see how it manipulates real bytecode. byteplay can be downloaded from http://byteplay.googlecode.com/svn/trunk/byteplay.py . See http://code.google.com/p/byteplay/ for a bit more administrative info. Feel free to improve this document - that's why it's on the wiki! Also, if you find it useful, please drop me an email at noamraph at gmail dot com - it would be nice knowing that what I did was useful to someone... A Quick Example --------------- Let's start from a quick example, to give you a taste of what byteplay does. Let's define this stupid function:: >>> def f(a, b): ... print (a, b) ... >>> f(3, 5) (3, 5) Now, let's use byteplay to see what its bytecode is actually doing:: >>> from byteplay import * >>> from pprint import pprint >>> c = Code.from_code(f.func_code) >>> pprint(c.code) [(SetLineno, 2), (LOAD_FAST, 'a'), (LOAD_FAST, 'b'), (BUILD_TUPLE, 2), (PRINT_ITEM, None), (PRINT_NEWLINE, None), (LOAD_CONST, None), (RETURN_VALUE, None)] I hope that this is pretty clear if you are a bit familiar with bytecode. The Code object contains a list of all operations, which are pairs of (opcode, arg). Not all opcodes have an argument, so they have None as their argument. You can see that no external tables are used: in the raw bytecode, the argument of many opcodes is an index to a table - for example, the argument of the LOAD_CONST opcode is an index to the co_consts table, which contains the actual constants. Here, the argument is the constant itself. Also note the SetLineno "opcode". It is not a real opcode, but it is used to declare where a line in the original source code begins. Besides another special opcode defined by byteplay, which we will see later, all other opcodes are the real opcodes used by the Python interpreter. By the way, if you want to see the code list in a form which is easier to read, you can simply print it, like this:: >>> print c.code 2 1 LOAD_FAST a 2 LOAD_FAST b 3 BUILD_TUPLE 2 4 PRINT_ITEM 5 PRINT_NEWLINE 6 LOAD_CONST None 7 RETURN_VALUE This is especially useful if the code contains jumps. See the description of the printcodelist function for another example. Ok, now let's play! Say we want to change the function, to print its arguments in reverse order. To do this, we will add a ROT_TWO opcode after the two arguments were loaded to the stack. See how simple it is:: >>> c.code[3:3] = [(ROT_TWO, None)] >>> f.func_code = c.to_code() >>> f(3, 5) (5, 3) Opcodes ------- We have seen that the code list contains opcode constants such as LOAD_FAST. These are instances of the Opcode class. The Opcode class is a subclass of int, which overrides the ``__repr__`` method to return the string representation of an opcode. This means that instead of using a constant such as LOAD_FAST, a numerical constant such as 124 can be used. Opcode instances are, of course, much easier to understand. The byteplay module creates Opcode instances for all the interpreter opcodes. They can be found in the ``opcodes`` set, and also in the module's global namespace, so you can write ``from byteplay import *`` and use the opcode constants immediately. byteplay doesn't include a constant for the EXTENDED_ARG opcode, as it is not used by byteplay's representation. Module Contents --------------- These are byteplay's public attributes, which are imported when ``from byteplay import *`` is done. ``POP_TOP``, ``ROT_TWO``, etc. All bytecode constants are imported by their names. ``opcodes`` A set of all Opcode instances. ``opmap`` A mapping from an opcode name to an Opcode instance. ``opname`` A mapping from an opcode number (and an Opcode instance) to its name. ``cmp_op`` A list of strings which represent comparison operators. In raw bytecode, the argument of the COMPARE_OP opcode is an index to this list. In the code list, it is the string representing the comparison. The following are sets of opcodes, which list opcodes according to their behaviour. ``hasarg`` This set contains all opcodes which have an argument (these are the opcodes which are >= HAVE_ARGUMENT). ``hasname`` This set contains all opcodes whose argument is an index to the co_names list. ``hasjrel`` This set contains all opcodes whose argument is a relative jump, that is, an offset by which to advance the byte code instruction pointer. ``hasjabs`` This set contains all opcodes whose argument is an absolute jump, that is, an address to which the instruction pointer should jump. ``hasjump`` This set contains all opcodes whose argument is a jump. It is simply ``hasjrel + hasjabs``. In byteplay, relative and absolute jumps behave in the same way, so this set is convenient. ``haslocal`` This set contains all opcodes which operate on local variables. ``hascompare`` This set contains all opcodes whose argument is a comparison operator - that is, only the COMPARE_OP opcode. ``hasfree`` This set contains all opcodes which operate on the cell and free variable storage. These are variables which are also used by an enclosing or an enclosed function. ``hascode`` This set contains all opcodes which expect a code object to be at the top of the stack. In the bytecode the Python compiler generates, they are always preceded by a LOAD_CONST opcode, which loads the code object. ``hasflow`` This set contains all opcodes which have a special flow behaviour. All other opcodes always continue to the next opcode after finished, unless an exception was raised. The following are the types of the first elements of the opcode list tuples. ``Opcode`` The type of all opcode constants. ``SetLineno`` This singleton is used like the "real" opcode constants, but only declares where starts the bytecode for a specific line in the source code. ``Label`` This is the type of label objects. This class does nothing - it is used as a way to refer to a place in the code list. Here come some additional functions. ``isopcode(obj)`` Use this function to check whether the first element of an operation pair is a real opcode. This simply returns ``obj is not SetLineno and not isinstance(obj, Label)``. ``getse(op[, arg])`` This function gets the stack effect of an opcode, as a (pop, push) tuple. The stack effect is the number of items popped from the stack, and the number of items pushed instead of them. If an item is only inspected, it is considered as though it was popped and pushed again. This function is meaningful only for opcodes not in hasflow - for other opcodes, ValueError will be raised. For some opcodes the argument is needed in order to calculate the stack effect. In that case, if arg isn't given, ValueError will be raised. ``printcodelist(code, to=sys.stdout)`` This function gets a code list and prints it in a way easier to read. For example, let's define a simple function:: >>> def f(a): ... if a < 3: ... b = a ... >>> c = Code.from_code(f.func_code) This is the code list itself:: >>> pprint(c.code) [(SetLineno, 2), (LOAD_FAST, 'a'), (LOAD_CONST, 3), (COMPARE_OP, '<'), (JUMP_IF_FALSE, ), (POP_TOP, None), (SetLineno, 3), (LOAD_FAST, 'a'), (STORE_FAST, 'b'), (JUMP_FORWARD, ), (, None), (POP_TOP, None), (, None), (LOAD_CONST, None), (RETURN_VALUE, None)] And this is the nicer representation:: >>> printcodelist(c.code) 2 1 LOAD_FAST a 2 LOAD_CONST 3 3 COMPARE_OP < 4 JUMP_IF_FALSE to 11 5 POP_TOP 3 7 LOAD_FAST a 8 STORE_FAST b 9 JUMP_FORWARD to 13 >> 11 POP_TOP >> 13 LOAD_CONST None 14 RETURN_VALUE As you can see, all opcodes are marked by their index in the list, and jumps show the index of the target opcode. For your convenience, another class was defined: ``CodeList`` This class is a list subclass, which only overrides the __str__ method to use ``printcodelist``. If the code list is an instance of CodeList, you don't have to type ``printcodelist(c.code)`` in order to see the nice representation - just type ``print c.code``. Code instances created from raw Python code objects already have that feature! And, last but not least - the Code class itself! The Code Class -------------- Constructor ~~~~~~~~~~~ :: Code(code, freevars, args, varargs, varkwargs, newlocals, name, filename, firstlineno, docstring) -> new Code object This constructs a new Code object. The argument are simply values for the Code object data attributes - see below. Data Attributes ~~~~~~~~~~~~~~~ We'll start with the data attributes - those are read/write, and distinguish one code instance from another. First come the attributes which affect the operation of the interpreter when it executes the code, and then come attributes which give extra information, useful for debugging and introspection. ``code`` This is the main part which describes what a Code object does. It is a list of pairs ``(opcode, arg)``. ``arg`` is the opcode argument, if it has one, or None if it doesn't. ``opcode`` can be of 3 types: * Regular opcodes. These are the opcodes which really define an operation of the interpreter. They can be regular ints, or Opcode instances. The meaning of the argument changes according to the opcode: - Opcodes not in ``hasarg`` don't have an argument. None should be used as the second item of the tuple. - The argument of opcodes in ``hasconst`` is the actual constant. - The argument of opcodes in ``hasname`` is the name, as a string. - The argument of opcodes in ``hasjump`` is a Label instance, which should point to a specific location in the code list. - The argument of opcodes in ``haslocal`` is the local variable name, as a string. - The argument of opcodes in ``hascompare`` is the string representing the comparison operator. - The argument of opcodes in ``hasfree`` is the name of the cell or free variable, as a string. - The argument of the remaining opcodes is the numerical argument found in raw bytecode. Its meaning is opcode specific. * ``SetLineno``. This is a singleton, which means that a line in the source code begins. Its argument is the line number. * labels. These are instances of the ``Label`` class. The label class does nothing - it is just used as a way to specify a place in the code list. Labels can be put in the code list and cause no action by themselves. They are used as the argument of opcodes which may cause a jump to a specific location in the code. ``freevars`` This is a list of strings - the names of variables defined in outer functions and used in this function or in functions defined inside it. The order of this list is important, since those variables are passed to the function as a sequence whose order should match the order of the ``freevars`` attribute. A few words about closures in Python may be in place. In Python, functions defined inside other functions can use variables defined in an outer function. We know each running function has a place to store local variables. But how can functions refer to variables defined in an outer scope? The solution is this: for every variable which is used in more than one scope, a new ``cell`` object is created. This object does one simple thing: it refers to one another object - the value of its variable. When the variable gets a new value, the cell object is updated too. A reference to the cell object is passed to any function which uses that variable. When an inner function is interested in the value of a variable of an outer scope, it uses the value referred by the cell object passed to it. An example might help understand this. Let's take a look at the bytecode of a simple example:: >>> def f(): ... a = 3 ... b = 5 ... def g(): ... return a + b ... >>> from byteplay import * >>> c = Code.from_code(f.func_code) >>> print c.code 2 1 LOAD_CONST 3 2 STORE_DEREF a 3 4 LOAD_CONST 5 5 STORE_DEREF b 4 7 LOAD_CLOSURE a 8 LOAD_CLOSURE b 9 BUILD_TUPLE 2 10 LOAD_CONST 11 MAKE_CLOSURE 0 12 STORE_FAST g 13 LOAD_CONST None 14 RETURN_VALUE >>> c.code[10][1].freevars ('a', 'b') >>> print c.code[10][1].code 5 1 LOAD_DEREF a 2 LOAD_DEREF b 3 BINARY_ADD 4 RETURN_VALUE We can see that LOAD_DEREF and STORE_DEREF opcodes are used to get and set the value of cell objects. There is no inherent difference between cell objects created by an outer function and cell objects used in an inner function. What makes the difference is whether a variable name was listed in the ``freevars`` attribute of the Code object - if it was not listed there, a new cell is created, and if it was listed there, the cell created by an outer function is used. We can also see how a function gets the cell objects it needs from its outer functions. The inner function is created with the MAKE_CLOSURE opcode, which pops two objects from the stack: first, the code object used to create the function. Second, a tuple with the cell objects used by the code (the tuple is created by the LOAD_CLOSURE opcodes, which push a cell object into the stack, and of course the BUILD_TUPLE opcode.) We can see that the order of the cells in the tuple match the order of the names in the ``freevars`` list - that's how the inner function knows that ``(LOAD_DEREF, 'a')`` means "load the value of the first cell in the tuple". ``args`` The list of arguments names of a function. For example:: >>> def f(a, b, *args, **kwargs): ... pass ... >>> Code.from_code(f.func_code).args ('a', 'b', 'args', 'kwargs') ``varargs`` A boolean: Does the function get a variable number of positional arguments? In other words: does it have a ``*args`` argument? If ``varargs`` is True, the argument which gets that extra positional arguments will be the last argument or the one before the last, depending on whether ``varkwargs`` is True. ``varkwargs`` A boolean: Does the function get a variable number of keyword arguments? In other words: does it have a ``**kwargs`` argument? If ``varkwargs`` is True, the argument which gets the extra keyword arguments will be the last argument. ``newlocals`` A boolean: Should a new local namespace be created for this code? This is True for functions and False for modules and exec code. Now come attributes with additional information about the code: ``name`` A string: The name of the code, which is usually the name of the function created from it. ``filename`` A string: The name of the source file from which the bytecode was compiled. ``firstlineno`` An int: The number of the first line in the source file from which the bytecode was compiled. ``docstring`` A string: The docstring for functions created from this code. Methods ~~~~~~~ These are the Code class methods. ``Code.from_code(code) -> new Code object`` This is a static method, which creates a new Code object from a raw Python code object. It is equivalent to the raw code object, that is, the resulting Code object can be converted to a new raw Python code object, which will have exactly the same behaviour as the original object. ``code.to_code() -> new code object`` This method converts a Code object into an equivalent raw Python code object, which can be executed just like any other code object. ``code1.__eq__(code2) -> bool`` Different Code objects can be meaningfully tested for equality. This tests that all attributes have the same value. For the code attribute, labels are compared to see if they form the same flow graph. Stack-depth Calculation ----------------------- What was described above is enough for using byteplay. However, if you encounter an "Inconsistent code" exception when you try to assemble your code and wonder what it means, or if you just want to learn more about Python's stack behaviour, this section is for you. Note: This section isn't as clear as it could have been, to say the least. If you like to improve it, feel free to do so - that's what wikis are for, aren't they? When assembling code objects, the code's maximum stack usage is needed. This is simply the maximum number of items expected on the frame's stack. If the actual number of items in stack exceeds this, Python may well fail with a segmentation fault. The question is then, how to calculate the maximum stack usage of a given code? There's most likely no general solution for this problem. However, code generated by Python's compiler has a nice property which makes it relatively easy to calculate the maximum stack usage. The property is that if we take a bytecode "line", and check the stack state whenever we reach that line, we will find the stack state when we reach that line is always the same, no matter how we got to that line. We'll call such code "regular". Now, this requires clarification: what is the "stack state" which is always the same, exactly? Obviously, the stack doesn't always contain the same objects when we reach a line. For now, we can assume that it simply means the number of items on the stack. This helps us a lot. If we know that every line can have exactly one stack state, and we know how every opcode changes the stack state, we can trace stack states along all possible code paths, and find the stack state of every reachable line. Then we can simply check what state had the largest number of stack items, and that's the maximum stack usage of the code. What will happen with code not generated by Python's compiler, if it doesn't fulfill the requirement that every line should have one state? When tracing the stack state for every line, we will find a line, which can be reached from several places, whose stack state changes according to the address from which we jumped to that line. In that case, An "Inconsistent code" exception will be raised. Ok, what is really what we called "stack state"? If every opcode pushed and popped a constant number of elements, the stack state could have been the number of items on stack. However, life isn't that simple. In real life, there are *blocks*. Blocks allow us to break from a loop, regardless of exactly how many items we have in stack. How? Simple. Before the loop starts, the SETUP_LOOP opcode is executed. This opcode records in a block the number of operands(items) currently in stack, and also a position in the code. When the POP_BLOCK is executed, the stack is restored to the recorded state by poping extra items, and the corresponding block is discarded. But if the BREAK_LOOP opcode is executed instead of POP_BLOCK, one more thing happens. The execution jumps to the position specified by the SETUP_LOOP opcode. Fortunately, we can still live with that. Instead of defining the stack state as a single number - the total number of elements in the stack, we will define the stack state as a sequence of numbers - the number of elements in the stack per each block. So, for example, if the state was (3, 5), after a BINARY_ADD operation the state will be (3, 4), because the operation pops two elements and pushes one element. If the state was (3, 5), after a PUSH_BLOCK operation the state will be (3, 5, 0), because a new block, without elements yet, was pushed. Another complication: the SETUP_FINALLY opcode specifies an address to jump to if an exception is raised or a BREAK_LOOP operation was executed. This address can also be reached by normal flow. However, the stack state in that address will be different, depending on what actually happened - if an exception was raised, 3 elements will be pushed, if BREAK_LOOP was executed, 2 elements will be pushed, and if nothing happened, 1 element will be pushed by a LOAD_CONST operation. This seemingly non-consistent state always ends with an END_FINALLY opcode. The END_FINALLY opcodes pops 1, 2 or 3 elements according to what it finds on stack, so we return to "consistent" state. How can we deal with that complexity? The solution is pretty simple. We will treat the SETUP_FINALLY opcode as if it pushes 1 element to its target - this makes it consistent with the 1 element which is pushed if the target is reached by normal flow. However, we will calculate the stack state as if at the target line there was an opcode which pushed 2 elements to the stack. This is done so that the maximum stack size calculation will be correct. Those 2 extra elements will be popped by the END_FINALLY opcode, which will be treated as though it always pops 3 elements. That's all! Just be aware of that when you are playing with SETUP_FINALLY and END_FINALLY opcodes... """ ) byteplay-0.2/setup.cfg0000644000175000017500000000007311445502375015313 0ustar iloweilowe00000000000000[egg_info] tag_build = tag_date = 0 tag_svn_revision = 0 byteplay-0.2/examples/0000755000175000017500000000000011445502375015310 5ustar iloweilowe00000000000000byteplay-0.2/examples/make_constants.py0000644000175000017500000001171511445502373020676 0ustar iloweilowe00000000000000# Decorator for BindingConstants at compile time # Based on a recipe by Raymond Hettinger, from Python Cookbook: # http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/277940 # # Modified by Noam Raphael to demonstrate using the byteplay module # (http://code.google.com/p/byteplay) from byteplay import * def _make_constants(f, builtin_only=False, stoplist=[], verbose=False): try: co = f.func_code except AttributeError: return f # Jython doesn't have a func_code attribute. co = Code.from_code(co) import __builtin__ env = vars(__builtin__).copy() if builtin_only: stoplist = dict.fromkeys(stoplist) stoplist.update(f.func_globals) else: env.update(f.func_globals) # First pass converts global lookups into constants for i, (op, arg) in enumerate(co.code): if op == LOAD_GLOBAL: name = arg if name in env and name not in stoplist: value = env[name] co.code[i] = (LOAD_CONST, value) if verbose: print name, '-->', value # Second pass folds tuples of constants and constant attribute lookups newcode = [] constcount = 0 NONE = [] # An object that won't appear anywhere else for op, arg in co.code: newconst = NONE if op == LOAD_CONST: constcount += 1 elif op == LOAD_ATTR: if constcount >= 1: lastop, lastarg = newcode.pop() constcount -= 1 newconst = getattr(lastarg, arg) elif op == BUILD_TUPLE: if constcount >= arg: newconst = tuple(x[1] for x in newcode[-1:-1-arg:-1]) del newcode[-arg:] constcount -= arg else: constcount = 0 if newconst is not NONE: newcode.append((LOAD_CONST, newconst)) constcount += 1 if verbose: print "new folded constant:", newconst else: newcode.append((op, arg)) co.code = newcode return type(f)(co.to_code(), f.func_globals, f.func_name, f.func_defaults, f.func_closure) _make_constants = _make_constants(_make_constants) # optimize thyself! def bind_all(mc, builtin_only=False, stoplist=[], verbose=False): """Recursively apply constant binding to functions in a module or class. Use as the last line of the module (after everything is defined, but before test code). In modules that need modifiable globals, set builtin_only to True. """ try: d = vars(mc) except TypeError: return for k, v in d.items(): if type(v) is FunctionType: newv = _make_constants(v, builtin_only, stoplist, verbose) setattr(mc, k, newv) elif type(v) in (type, ClassType): bind_all(v, builtin_only, stoplist, verbose) @_make_constants def make_constants(builtin_only=False, stoplist=[], verbose=False): """ Return a decorator for optimizing global references. Replaces global references with their currently defined values. If not defined, the dynamic (runtime) global lookup is left undisturbed. If builtin_only is True, then only builtins are optimized. Variable names in the stoplist are also left undisturbed. Also, folds constant attr lookups and tuples of constants. If verbose is True, prints each substitution as is occurs """ if type(builtin_only) == type(make_constants): raise ValueError("The bind_constants decorator must have arguments.") return lambda f: _make_constants(f, builtin_only, stoplist, verbose) ## --------- Example call ----------------------------------------- import random @make_constants(verbose=True) def sample(population, k): "Choose k unique random elements from a population sequence." if not ininstance(population, (list, tuple, str)): raise TypeError('Cannot handle type', type(population)) n = len(population) if not 0 <= k <= n: raise ValueError, "sample larger than population" result = [None] * k pool = list(population) for i in xrange(k): # invariant: non-selected at [0,n-i) j = int(random.random() * (n-i)) result[i] = pool[j] pool[j] = pool[n-i-1] # move non-selected item into vacancy return result """ Output from the example call: list --> tuple --> str --> TypeError --> exceptions.TypeError type --> len --> ValueError --> exceptions.ValueError list --> xrange --> int --> random --> new folded constant: (, , ) new folded constant: """ byteplay-0.2/examples/make_constants_orig.py0000644000175000017500000001454611445502373021723 0ustar iloweilowe00000000000000# Decorator for BindingConstants at compile time # A recipe by Raymond Hettinger, from Python Cookbook: # http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/277940 from opcode import opmap, HAVE_ARGUMENT, EXTENDED_ARG globals().update(opmap) def _make_constants(f, builtin_only=False, stoplist=[], verbose=False): try: co = f.func_code except AttributeError: return f # Jython doesn't have a func_code attribute. newcode = map(ord, co.co_code) newconsts = list(co.co_consts) names = co.co_names codelen = len(newcode) import __builtin__ env = vars(__builtin__).copy() if builtin_only: stoplist = dict.fromkeys(stoplist) stoplist.update(f.func_globals) else: env.update(f.func_globals) # First pass converts global lookups into constants i = 0 while i < codelen: opcode = newcode[i] if opcode in (EXTENDED_ARG, STORE_GLOBAL): return f # for simplicity, only optimize common cases if opcode == LOAD_GLOBAL: oparg = newcode[i+1] + (newcode[i+2] << 8) name = co.co_names[oparg] if name in env and name not in stoplist: value = env[name] for pos, v in enumerate(newconsts): if v is value: break else: pos = len(newconsts) newconsts.append(value) newcode[i] = LOAD_CONST newcode[i+1] = pos & 0xFF newcode[i+2] = pos >> 8 if verbose: print name, '-->', value i += 1 if opcode >= HAVE_ARGUMENT: i += 2 # Second pass folds tuples of constants and constant attribute lookups i = 0 while i < codelen: newtuple = [] while newcode[i] == LOAD_CONST: oparg = newcode[i+1] + (newcode[i+2] << 8) newtuple.append(newconsts[oparg]) i += 3 opcode = newcode[i] if not newtuple: i += 1 if opcode >= HAVE_ARGUMENT: i += 2 continue if opcode == LOAD_ATTR: obj = newtuple[-1] oparg = newcode[i+1] + (newcode[i+2] << 8) name = names[oparg] try: value = getattr(obj, name) except AttributeError: continue deletions = 1 elif opcode == BUILD_TUPLE: oparg = newcode[i+1] + (newcode[i+2] << 8) if oparg != len(newtuple): continue deletions = len(newtuple) value = tuple(newtuple) else: continue reljump = deletions * 3 newcode[i-reljump] = JUMP_FORWARD newcode[i-reljump+1] = (reljump-3) & 0xFF newcode[i-reljump+2] = (reljump-3) >> 8 n = len(newconsts) newconsts.append(value) newcode[i] = LOAD_CONST newcode[i+1] = n & 0xFF newcode[i+2] = n >> 8 i += 3 if verbose: print "new folded constant:", value codestr = ''.join(map(chr, newcode)) codeobj = type(co)(co.co_argcount, co.co_nlocals, co.co_stacksize, co.co_flags, codestr, tuple(newconsts), co.co_names, co.co_varnames, co.co_filename, co.co_name, co.co_firstlineno, co.co_lnotab, co.co_freevars, co.co_cellvars) return type(f)(codeobj, f.func_globals, f.func_name, f.func_defaults, f.func_closure) _make_constants = _make_constants(_make_constants) # optimize thyself! def bind_all(mc, builtin_only=False, stoplist=[], verbose=False): """Recursively apply constant binding to functions in a module or class. Use as the last line of the module (after everything is defined, but before test code). In modules that need modifiable globals, set builtin_only to True. """ try: d = vars(mc) except TypeError: return for k, v in d.items(): if type(v) is FunctionType: newv = _make_constants(v, builtin_only, stoplist, verbose) setattr(mc, k, newv) elif type(v) in (type, ClassType): bind_all(v, builtin_only, stoplist, verbose) @_make_constants def make_constants(builtin_only=False, stoplist=[], verbose=False): """ Return a decorator for optimizing global references. Replaces global references with their currently defined values. If not defined, the dynamic (runtime) global lookup is left undisturbed. If builtin_only is True, then only builtins are optimized. Variable names in the stoplist are also left undisturbed. Also, folds constant attr lookups and tuples of constants. If verbose is True, prints each substitution as is occurs """ if type(builtin_only) == type(make_constants): raise ValueError("The bind_constants decorator must have arguments.") return lambda f: _make_constants(f, builtin_only, stoplist, verbose) ## --------- Example call ----------------------------------------- import random @make_constants(verbose=True) def sample(population, k): "Choose k unique random elements from a population sequence." if not ininstance(population, (list, tuple, str)): raise TypeError('Cannot handle type', type(population)) n = len(population) if not 0 <= k <= n: raise ValueError, "sample larger than population" result = [None] * k pool = list(population) for i in xrange(k): # invariant: non-selected at [0,n-i) j = int(random.random() * (n-i)) result[i] = pool[j] pool[j] = pool[n-i-1] # move non-selected item into vacancy return result """ Output from the example call: list --> tuple --> str --> TypeError --> exceptions.TypeError type --> len --> ValueError --> exceptions.ValueError list --> xrange --> int --> random --> new folded constant: (, , ) new folded constant: """ byteplay-0.2/PKG-INFO0000644000175000017500000006736711445502375014612 0ustar iloweilowe00000000000000Metadata-Version: 1.0 Name: byteplay Version: 0.2 Summary: bytecode manipulation library Home-page: http://code.google.com/p/byteplay Author: Noam Raph Author-email: noamraph@gmail.com License: LGPL Download-URL: http://code.google.com/p/byteplay/downloads/list Description: byteplay lets you convert Python code objects into equivalent objects which are easy to play with, and lets you convert those objects back into living Python code objects. It's useful for applying crazy transformations on Python functions, and is also useful in learning Python byte code intricacies. It currently works with Python 2.4 and up. byteplay Module Documentation ============================= About byteplay -------------- byteplay is a module which lets you easily play with Python bytecode. I wrote it because I needed to manipulate Python bytecode, but didn't find any suitable tool. Michael Hudson's bytecodehacks (http://bytecodehacks.sourceforge.net/) could have worked fine for me, but it only works with Python 1.5.2. I also looked at Phillip J. Eby's peak.util.assembler (http://pypi.python.org/pypi/BytecodeAssembler), but it's intended at creating new code objects from scratch, not for manipulating existing code objects. So I wrote byteplay. The basic idea is simple: define a new type, named Code, which is equivalent to Python code objects, but, unlike Python code objects, is easy to play with. "Equivalent" means that every Python code object can be converted to a Code object and vice-versa, without losing any important information on the way. "Easy to play with" means... well, exactly that. The representation should be as simple as possible, letting the infrastructure sort out technical details which do not affect the final behaviour. If you are interested in changing the behaviour of functions, or in assembling functions on your own, you may find byteplay useful. You may also find it useful if you are interested in how Python's bytecode actually works - byteplay lets you easily play with existing bytecode and see what happens, which is a great way to learn. You are also welcome to check byteplay's (pure Python) code, to see how it manipulates real bytecode. byteplay can be downloaded from http://byteplay.googlecode.com/svn/trunk/byteplay.py . See http://code.google.com/p/byteplay/ for a bit more administrative info. Feel free to improve this document - that's why it's on the wiki! Also, if you find it useful, please drop me an email at noamraph at gmail dot com - it would be nice knowing that what I did was useful to someone... A Quick Example --------------- Let's start from a quick example, to give you a taste of what byteplay does. Let's define this stupid function:: >>> def f(a, b): ... print (a, b) ... >>> f(3, 5) (3, 5) Now, let's use byteplay to see what its bytecode is actually doing:: >>> from byteplay import * >>> from pprint import pprint >>> c = Code.from_code(f.func_code) >>> pprint(c.code) [(SetLineno, 2), (LOAD_FAST, 'a'), (LOAD_FAST, 'b'), (BUILD_TUPLE, 2), (PRINT_ITEM, None), (PRINT_NEWLINE, None), (LOAD_CONST, None), (RETURN_VALUE, None)] I hope that this is pretty clear if you are a bit familiar with bytecode. The Code object contains a list of all operations, which are pairs of (opcode, arg). Not all opcodes have an argument, so they have None as their argument. You can see that no external tables are used: in the raw bytecode, the argument of many opcodes is an index to a table - for example, the argument of the LOAD_CONST opcode is an index to the co_consts table, which contains the actual constants. Here, the argument is the constant itself. Also note the SetLineno "opcode". It is not a real opcode, but it is used to declare where a line in the original source code begins. Besides another special opcode defined by byteplay, which we will see later, all other opcodes are the real opcodes used by the Python interpreter. By the way, if you want to see the code list in a form which is easier to read, you can simply print it, like this:: >>> print c.code 2 1 LOAD_FAST a 2 LOAD_FAST b 3 BUILD_TUPLE 2 4 PRINT_ITEM 5 PRINT_NEWLINE 6 LOAD_CONST None 7 RETURN_VALUE This is especially useful if the code contains jumps. See the description of the printcodelist function for another example. Ok, now let's play! Say we want to change the function, to print its arguments in reverse order. To do this, we will add a ROT_TWO opcode after the two arguments were loaded to the stack. See how simple it is:: >>> c.code[3:3] = [(ROT_TWO, None)] >>> f.func_code = c.to_code() >>> f(3, 5) (5, 3) Opcodes ------- We have seen that the code list contains opcode constants such as LOAD_FAST. These are instances of the Opcode class. The Opcode class is a subclass of int, which overrides the ``__repr__`` method to return the string representation of an opcode. This means that instead of using a constant such as LOAD_FAST, a numerical constant such as 124 can be used. Opcode instances are, of course, much easier to understand. The byteplay module creates Opcode instances for all the interpreter opcodes. They can be found in the ``opcodes`` set, and also in the module's global namespace, so you can write ``from byteplay import *`` and use the opcode constants immediately. byteplay doesn't include a constant for the EXTENDED_ARG opcode, as it is not used by byteplay's representation. Module Contents --------------- These are byteplay's public attributes, which are imported when ``from byteplay import *`` is done. ``POP_TOP``, ``ROT_TWO``, etc. All bytecode constants are imported by their names. ``opcodes`` A set of all Opcode instances. ``opmap`` A mapping from an opcode name to an Opcode instance. ``opname`` A mapping from an opcode number (and an Opcode instance) to its name. ``cmp_op`` A list of strings which represent comparison operators. In raw bytecode, the argument of the COMPARE_OP opcode is an index to this list. In the code list, it is the string representing the comparison. The following are sets of opcodes, which list opcodes according to their behaviour. ``hasarg`` This set contains all opcodes which have an argument (these are the opcodes which are >= HAVE_ARGUMENT). ``hasname`` This set contains all opcodes whose argument is an index to the co_names list. ``hasjrel`` This set contains all opcodes whose argument is a relative jump, that is, an offset by which to advance the byte code instruction pointer. ``hasjabs`` This set contains all opcodes whose argument is an absolute jump, that is, an address to which the instruction pointer should jump. ``hasjump`` This set contains all opcodes whose argument is a jump. It is simply ``hasjrel + hasjabs``. In byteplay, relative and absolute jumps behave in the same way, so this set is convenient. ``haslocal`` This set contains all opcodes which operate on local variables. ``hascompare`` This set contains all opcodes whose argument is a comparison operator - that is, only the COMPARE_OP opcode. ``hasfree`` This set contains all opcodes which operate on the cell and free variable storage. These are variables which are also used by an enclosing or an enclosed function. ``hascode`` This set contains all opcodes which expect a code object to be at the top of the stack. In the bytecode the Python compiler generates, they are always preceded by a LOAD_CONST opcode, which loads the code object. ``hasflow`` This set contains all opcodes which have a special flow behaviour. All other opcodes always continue to the next opcode after finished, unless an exception was raised. The following are the types of the first elements of the opcode list tuples. ``Opcode`` The type of all opcode constants. ``SetLineno`` This singleton is used like the "real" opcode constants, but only declares where starts the bytecode for a specific line in the source code. ``Label`` This is the type of label objects. This class does nothing - it is used as a way to refer to a place in the code list. Here come some additional functions. ``isopcode(obj)`` Use this function to check whether the first element of an operation pair is a real opcode. This simply returns ``obj is not SetLineno and not isinstance(obj, Label)``. ``getse(op[, arg])`` This function gets the stack effect of an opcode, as a (pop, push) tuple. The stack effect is the number of items popped from the stack, and the number of items pushed instead of them. If an item is only inspected, it is considered as though it was popped and pushed again. This function is meaningful only for opcodes not in hasflow - for other opcodes, ValueError will be raised. For some opcodes the argument is needed in order to calculate the stack effect. In that case, if arg isn't given, ValueError will be raised. ``printcodelist(code, to=sys.stdout)`` This function gets a code list and prints it in a way easier to read. For example, let's define a simple function:: >>> def f(a): ... if a < 3: ... b = a ... >>> c = Code.from_code(f.func_code) This is the code list itself:: >>> pprint(c.code) [(SetLineno, 2), (LOAD_FAST, 'a'), (LOAD_CONST, 3), (COMPARE_OP, '<'), (JUMP_IF_FALSE, ), (POP_TOP, None), (SetLineno, 3), (LOAD_FAST, 'a'), (STORE_FAST, 'b'), (JUMP_FORWARD, ), (, None), (POP_TOP, None), (, None), (LOAD_CONST, None), (RETURN_VALUE, None)] And this is the nicer representation:: >>> printcodelist(c.code) 2 1 LOAD_FAST a 2 LOAD_CONST 3 3 COMPARE_OP < 4 JUMP_IF_FALSE to 11 5 POP_TOP 3 7 LOAD_FAST a 8 STORE_FAST b 9 JUMP_FORWARD to 13 >> 11 POP_TOP >> 13 LOAD_CONST None 14 RETURN_VALUE As you can see, all opcodes are marked by their index in the list, and jumps show the index of the target opcode. For your convenience, another class was defined: ``CodeList`` This class is a list subclass, which only overrides the __str__ method to use ``printcodelist``. If the code list is an instance of CodeList, you don't have to type ``printcodelist(c.code)`` in order to see the nice representation - just type ``print c.code``. Code instances created from raw Python code objects already have that feature! And, last but not least - the Code class itself! The Code Class -------------- Constructor ~~~~~~~~~~~ :: Code(code, freevars, args, varargs, varkwargs, newlocals, name, filename, firstlineno, docstring) -> new Code object This constructs a new Code object. The argument are simply values for the Code object data attributes - see below. Data Attributes ~~~~~~~~~~~~~~~ We'll start with the data attributes - those are read/write, and distinguish one code instance from another. First come the attributes which affect the operation of the interpreter when it executes the code, and then come attributes which give extra information, useful for debugging and introspection. ``code`` This is the main part which describes what a Code object does. It is a list of pairs ``(opcode, arg)``. ``arg`` is the opcode argument, if it has one, or None if it doesn't. ``opcode`` can be of 3 types: * Regular opcodes. These are the opcodes which really define an operation of the interpreter. They can be regular ints, or Opcode instances. The meaning of the argument changes according to the opcode: - Opcodes not in ``hasarg`` don't have an argument. None should be used as the second item of the tuple. - The argument of opcodes in ``hasconst`` is the actual constant. - The argument of opcodes in ``hasname`` is the name, as a string. - The argument of opcodes in ``hasjump`` is a Label instance, which should point to a specific location in the code list. - The argument of opcodes in ``haslocal`` is the local variable name, as a string. - The argument of opcodes in ``hascompare`` is the string representing the comparison operator. - The argument of opcodes in ``hasfree`` is the name of the cell or free variable, as a string. - The argument of the remaining opcodes is the numerical argument found in raw bytecode. Its meaning is opcode specific. * ``SetLineno``. This is a singleton, which means that a line in the source code begins. Its argument is the line number. * labels. These are instances of the ``Label`` class. The label class does nothing - it is just used as a way to specify a place in the code list. Labels can be put in the code list and cause no action by themselves. They are used as the argument of opcodes which may cause a jump to a specific location in the code. ``freevars`` This is a list of strings - the names of variables defined in outer functions and used in this function or in functions defined inside it. The order of this list is important, since those variables are passed to the function as a sequence whose order should match the order of the ``freevars`` attribute. A few words about closures in Python may be in place. In Python, functions defined inside other functions can use variables defined in an outer function. We know each running function has a place to store local variables. But how can functions refer to variables defined in an outer scope? The solution is this: for every variable which is used in more than one scope, a new ``cell`` object is created. This object does one simple thing: it refers to one another object - the value of its variable. When the variable gets a new value, the cell object is updated too. A reference to the cell object is passed to any function which uses that variable. When an inner function is interested in the value of a variable of an outer scope, it uses the value referred by the cell object passed to it. An example might help understand this. Let's take a look at the bytecode of a simple example:: >>> def f(): ... a = 3 ... b = 5 ... def g(): ... return a + b ... >>> from byteplay import * >>> c = Code.from_code(f.func_code) >>> print c.code 2 1 LOAD_CONST 3 2 STORE_DEREF a 3 4 LOAD_CONST 5 5 STORE_DEREF b 4 7 LOAD_CLOSURE a 8 LOAD_CLOSURE b 9 BUILD_TUPLE 2 10 LOAD_CONST 11 MAKE_CLOSURE 0 12 STORE_FAST g 13 LOAD_CONST None 14 RETURN_VALUE >>> c.code[10][1].freevars ('a', 'b') >>> print c.code[10][1].code 5 1 LOAD_DEREF a 2 LOAD_DEREF b 3 BINARY_ADD 4 RETURN_VALUE We can see that LOAD_DEREF and STORE_DEREF opcodes are used to get and set the value of cell objects. There is no inherent difference between cell objects created by an outer function and cell objects used in an inner function. What makes the difference is whether a variable name was listed in the ``freevars`` attribute of the Code object - if it was not listed there, a new cell is created, and if it was listed there, the cell created by an outer function is used. We can also see how a function gets the cell objects it needs from its outer functions. The inner function is created with the MAKE_CLOSURE opcode, which pops two objects from the stack: first, the code object used to create the function. Second, a tuple with the cell objects used by the code (the tuple is created by the LOAD_CLOSURE opcodes, which push a cell object into the stack, and of course the BUILD_TUPLE opcode.) We can see that the order of the cells in the tuple match the order of the names in the ``freevars`` list - that's how the inner function knows that ``(LOAD_DEREF, 'a')`` means "load the value of the first cell in the tuple". ``args`` The list of arguments names of a function. For example:: >>> def f(a, b, *args, **kwargs): ... pass ... >>> Code.from_code(f.func_code).args ('a', 'b', 'args', 'kwargs') ``varargs`` A boolean: Does the function get a variable number of positional arguments? In other words: does it have a ``*args`` argument? If ``varargs`` is True, the argument which gets that extra positional arguments will be the last argument or the one before the last, depending on whether ``varkwargs`` is True. ``varkwargs`` A boolean: Does the function get a variable number of keyword arguments? In other words: does it have a ``**kwargs`` argument? If ``varkwargs`` is True, the argument which gets the extra keyword arguments will be the last argument. ``newlocals`` A boolean: Should a new local namespace be created for this code? This is True for functions and False for modules and exec code. Now come attributes with additional information about the code: ``name`` A string: The name of the code, which is usually the name of the function created from it. ``filename`` A string: The name of the source file from which the bytecode was compiled. ``firstlineno`` An int: The number of the first line in the source file from which the bytecode was compiled. ``docstring`` A string: The docstring for functions created from this code. Methods ~~~~~~~ These are the Code class methods. ``Code.from_code(code) -> new Code object`` This is a static method, which creates a new Code object from a raw Python code object. It is equivalent to the raw code object, that is, the resulting Code object can be converted to a new raw Python code object, which will have exactly the same behaviour as the original object. ``code.to_code() -> new code object`` This method converts a Code object into an equivalent raw Python code object, which can be executed just like any other code object. ``code1.__eq__(code2) -> bool`` Different Code objects can be meaningfully tested for equality. This tests that all attributes have the same value. For the code attribute, labels are compared to see if they form the same flow graph. Stack-depth Calculation ----------------------- What was described above is enough for using byteplay. However, if you encounter an "Inconsistent code" exception when you try to assemble your code and wonder what it means, or if you just want to learn more about Python's stack behaviour, this section is for you. Note: This section isn't as clear as it could have been, to say the least. If you like to improve it, feel free to do so - that's what wikis are for, aren't they? When assembling code objects, the code's maximum stack usage is needed. This is simply the maximum number of items expected on the frame's stack. If the actual number of items in stack exceeds this, Python may well fail with a segmentation fault. The question is then, how to calculate the maximum stack usage of a given code? There's most likely no general solution for this problem. However, code generated by Python's compiler has a nice property which makes it relatively easy to calculate the maximum stack usage. The property is that if we take a bytecode "line", and check the stack state whenever we reach that line, we will find the stack state when we reach that line is always the same, no matter how we got to that line. We'll call such code "regular". Now, this requires clarification: what is the "stack state" which is always the same, exactly? Obviously, the stack doesn't always contain the same objects when we reach a line. For now, we can assume that it simply means the number of items on the stack. This helps us a lot. If we know that every line can have exactly one stack state, and we know how every opcode changes the stack state, we can trace stack states along all possible code paths, and find the stack state of every reachable line. Then we can simply check what state had the largest number of stack items, and that's the maximum stack usage of the code. What will happen with code not generated by Python's compiler, if it doesn't fulfill the requirement that every line should have one state? When tracing the stack state for every line, we will find a line, which can be reached from several places, whose stack state changes according to the address from which we jumped to that line. In that case, An "Inconsistent code" exception will be raised. Ok, what is really what we called "stack state"? If every opcode pushed and popped a constant number of elements, the stack state could have been the number of items on stack. However, life isn't that simple. In real life, there are *blocks*. Blocks allow us to break from a loop, regardless of exactly how many items we have in stack. How? Simple. Before the loop starts, the SETUP_LOOP opcode is executed. This opcode records in a block the number of operands(items) currently in stack, and also a position in the code. When the POP_BLOCK is executed, the stack is restored to the recorded state by poping extra items, and the corresponding block is discarded. But if the BREAK_LOOP opcode is executed instead of POP_BLOCK, one more thing happens. The execution jumps to the position specified by the SETUP_LOOP opcode. Fortunately, we can still live with that. Instead of defining the stack state as a single number - the total number of elements in the stack, we will define the stack state as a sequence of numbers - the number of elements in the stack per each block. So, for example, if the state was (3, 5), after a BINARY_ADD operation the state will be (3, 4), because the operation pops two elements and pushes one element. If the state was (3, 5), after a PUSH_BLOCK operation the state will be (3, 5, 0), because a new block, without elements yet, was pushed. Another complication: the SETUP_FINALLY opcode specifies an address to jump to if an exception is raised or a BREAK_LOOP operation was executed. This address can also be reached by normal flow. However, the stack state in that address will be different, depending on what actually happened - if an exception was raised, 3 elements will be pushed, if BREAK_LOOP was executed, 2 elements will be pushed, and if nothing happened, 1 element will be pushed by a LOAD_CONST operation. This seemingly non-consistent state always ends with an END_FINALLY opcode. The END_FINALLY opcodes pops 1, 2 or 3 elements according to what it finds on stack, so we return to "consistent" state. How can we deal with that complexity? The solution is pretty simple. We will treat the SETUP_FINALLY opcode as if it pushes 1 element to its target - this makes it consistent with the 1 element which is pushed if the target is reached by normal flow. However, we will calculate the stack state as if at the target line there was an opcode which pushed 2 elements to the stack. This is done so that the maximum stack size calculation will be correct. Those 2 extra elements will be popped by the END_FINALLY opcode, which will be treated as though it always pops 3 elements. That's all! Just be aware of that when you are playing with SETUP_FINALLY and END_FINALLY opcodes... Platform: UNKNOWN