meliae-0.4.0/CHANGES.txt0000644000000000000000000001416211605576210014525 0ustar rootroot00000000000000############## Meliae Changes ############## .. contents:: List of Releases :depth: 1 Meliae 0.4 ########## :0.4: 2011-07-08 * We now only compile against Cython. I've finally hit some issues that I don't want to work around. Namely sizeof(Class) doesn't work under even pyrex 0.9.9. (John Arbash Meinel) * Duplicate parent entries are filtered out. (eg the intern dict refers to the same string 2x, you'll see 1 parent ref in the child, not 2.) (John Arbash Meinel) * We now default to limiting the maximum length of the parents list (default 100). I had some dumps where a single object was referenced 50k times. Anything over about 10 is at the point where you won't really walk them. This can be disabled with ``load(max_parents=-1)``. The main win is lowering memory consumption. A 50k parent list takes 200kB by itself (on 32-bit). (John Arbash Meinel) * Fix a PyInt memory leak. We were calling __sizeof__ which returns an PyInt and not DECREFing it. (John Arbash Meinel) * Initial support for overriding ``__sizeof__`` definitions. It turns out that object has a basic definition, so all new style classes inherit it. We now provide ``meliae.scanner.add_special_size``, which takes a type name and some callbacks to determine the size of an object. This lets you register them without importing the module. * Small updates to fix the test suite under python2.7 and 64-bit architectures. PyString changed size in 2.7, and we weren't accounting for alignment effects on 64-bit machines. (John Arbash Meinel, Jelmer Vernooij, #739310) * ``run_tests.py`` exits nonzero if a test fails. (Jelmer Vernooij, #740109) * Add a MANIFEST.in file, to allow ``python setup.py sdist`` to properly see the .c and .h files it needs to include. (Chris Adams, #735284) Meliae 0.3 ########## :0.3: 2010-08-02 The main update is the ability to do more queries on a subset of the object graph. ``om.summarize(starting_at, excluding=[address])`` lets you find out what is more directly "owned" by a given object. * Add ``__sizeof__`` members to a lot of the core classes (IntSet, etc.) (John Arbash Meinel) * ``ObjectManager.compute_total_size()`` now only computes the size of a single object, rather than all objects. All objects took too long to be useful anyway, better to make it easier to use the useful api. (John Arbash Meinel) * ``obj.iter_recursive_refs()`` can now be used to find all objects referenced from this object (including obj). It can also take an iterable of object addresses to exclude. Which makes it easy to ask, "What objects are accessible from X that aren't accessible from Y?" (John Arbash Meinel) * ``ObjectManager.summarize()`` can now take an object and an exclusion list, and summarize the referenced objects. This can be quite useful when you want to look at just a subset of the graph. The syntax is ``ObjectManager.summarize(obj, [not_address1, not_address2])``. (John Arbash Meinel) * ``obj.all()`` and ``obj.compute_total_size()`` helpers. These let you get the set of referenced objects matching the type (like ``om.get_all()``). But they *also* allow you to pass an exclusion list, so you can only get things reachable from here and not reachable from there. (John Arbash Meinel) Meliae 0.2.1 ############ :0.2.1: 2010-07-20 * When dumping a ``PyFrame`` look at the function object for ``co_name``, so you don't have to wander through the references to get their yourself. (Andrew Bennetts) * Fixes for the simple regex parser (w/o simplejson). However, simplejson is still recommended, because it is both faster and more accurate (decodes unicode escapes, etc). (John Arbash Meinel) * ``loader.load()`` now defaults to computing parents and collapsing instance dicts. It does mean that loading will be a bit slower, and consume more memory, but it is almost always what you need to do first anyway. (John Arbash Meinel) Meliae 0.2.1rc1 ############### :0.2.1rc1: 2010-06-30 * Avoid calling ``PyType_Type.tp_traverse`` when the argument is not a heap-class. There is an assert that gets tripped if you are running a debug build (or something to do with how Fedora builds its python). (John Arbash Meinel, #586122) * Flush the file handle before exiting. We were doing it at the Python layer, but that might not translate into the ``FILE*`` object. (John Arbash Meinel, #428165) * Handle some issues when using Pyrex 0.9.8.4. It was treating ```` as casting the object pointer, not as a Python level "extract an integer". However, assignment to an ``cdef unsigned long`` does the right thing. (John Arbash Meinel) * Tweak some memory performance issues (Gary Poster, #581918) Meliae 0.2.0 ############ :0.2.0: 2010-01-08 A fairly major reworking of the internals, provides significant memory savings and easier navigation of the object graph. New Features ************ * The loader internals have been rewritten significantly. Instead of storing a bunch of objects in a regular python dict, they are now stored in a custom Pyrex collection, and proxy objects are created on demand. This means significantly improved memory usage. (roughly 2:1). (John Arbash Meinel) * The internals change also changes the interface a bit. When viewing an object, a shorter display is given. One can use ``obj[offset]`` to get the object at that reference, rather than the reference itself. (so fewer indirections through the ObjManager). (John Arbash Meinel) * Now supports the ``__sizeof__`` interface introduced in python 2.6. This allows a class (especially extension classes) to inform Meliae/Python how many bytes it is consuming in memory. (John Arbash Meinel) * Memory objects now use a ``.children`` and ``.parents`` notation, rather than ``.ref_list`` and ``.referrers``. For one, this makes it clearer as 'referred to' and 'referred from' is a bit tricky. We also now have ``.c`` and ``.p`` which return a list of Memory objects, rather than just a list of addresses. This makes it very quick to move around. Especially with ``.refs_as_dict`` and other such niceties. (John Arbash Meinel) Bug Fixes ********* * Some .. vim: tw=71 ft=rst meliae-0.4.0/COPYING.txt0000644000000000000000000010451311605576210014565 0ustar rootroot00000000000000 GNU GENERAL PUBLIC LICENSE Version 3, 29 June 2007 Copyright (C) 2007 Free Software Foundation, Inc. Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The GNU General Public License is a free, copyleft license for software and other kinds of works. The licenses for most software and other practical works are designed to take away your freedom to share and change the works. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change all versions of a program--to make sure it remains free software for all its users. We, the Free Software Foundation, use the GNU General Public License for most of our software; it applies also to any other work released this way by its authors. You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for them if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs, and that you know you can do these things. To protect your rights, we need to prevent others from denying you these rights or asking you to surrender the rights. Therefore, you have certain responsibilities if you distribute copies of the software, or if you modify it: responsibilities to respect the freedom of others. For example, if you distribute copies of such a program, whether gratis or for a fee, you must pass on to the recipients the same freedoms that you received. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. Developers that use the GNU GPL protect your rights with two steps: (1) assert copyright on the software, and (2) offer you this License giving you legal permission to copy, distribute and/or modify it. For the developers' and authors' protection, the GPL clearly explains that there is no warranty for this free software. For both users' and authors' sake, the GPL requires that modified versions be marked as changed, so that their problems will not be attributed erroneously to authors of previous versions. Some devices are designed to deny users access to install or run modified versions of the software inside them, although the manufacturer can do so. This is fundamentally incompatible with the aim of protecting users' freedom to change the software. The systematic pattern of such abuse occurs in the area of products for individuals to use, which is precisely where it is most unacceptable. Therefore, we have designed this version of the GPL to prohibit the practice for those products. If such problems arise substantially in other domains, we stand ready to extend this provision to those domains in future versions of the GPL, as needed to protect the freedom of users. Finally, every program is threatened constantly by software patents. States should not allow patents to restrict development and use of software on general-purpose computers, but in those that do, we wish to avoid the special danger that patents applied to a free program could make it effectively proprietary. To prevent this, the GPL assures that patents cannot be used to render the program non-free. The precise terms and conditions for copying, distribution and modification follow. TERMS AND CONDITIONS 0. Definitions. "This License" refers to version 3 of the GNU General Public License. "Copyright" also means copyright-like laws that apply to other kinds of works, such as semiconductor masks. "The Program" refers to any copyrightable work licensed under this License. Each licensee is addressed as "you". "Licensees" and "recipients" may be individuals or organizations. To "modify" a work means to copy from or adapt all or part of the work in a fashion requiring copyright permission, other than the making of an exact copy. The resulting work is called a "modified version" of the earlier work or a work "based on" the earlier work. A "covered work" means either the unmodified Program or a work based on the Program. To "propagate" a work means to do anything with it that, without permission, would make you directly or secondarily liable for infringement under applicable copyright law, except executing it on a computer or modifying a private copy. Propagation includes copying, distribution (with or without modification), making available to the public, and in some countries other activities as well. To "convey" a work means any kind of propagation that enables other parties to make or receive copies. Mere interaction with a user through a computer network, with no transfer of a copy, is not conveying. An interactive user interface displays "Appropriate Legal Notices" to the extent that it includes a convenient and prominently visible feature that (1) displays an appropriate copyright notice, and (2) tells the user that there is no warranty for the work (except to the extent that warranties are provided), that licensees may convey the work under this License, and how to view a copy of this License. If the interface presents a list of user commands or options, such as a menu, a prominent item in the list meets this criterion. 1. Source Code. The "source code" for a work means the preferred form of the work for making modifications to it. "Object code" means any non-source form of a work. A "Standard Interface" means an interface that either is an official standard defined by a recognized standards body, or, in the case of interfaces specified for a particular programming language, one that is widely used among developers working in that language. The "System Libraries" of an executable work include anything, other than the work as a whole, that (a) is included in the normal form of packaging a Major Component, but which is not part of that Major Component, and (b) serves only to enable use of the work with that Major Component, or to implement a Standard Interface for which an implementation is available to the public in source code form. A "Major Component", in this context, means a major essential component (kernel, window system, and so on) of the specific operating system (if any) on which the executable work runs, or a compiler used to produce the work, or an object code interpreter used to run it. The "Corresponding Source" for a work in object code form means all the source code needed to generate, install, and (for an executable work) run the object code and to modify the work, including scripts to control those activities. However, it does not include the work's System Libraries, or general-purpose tools or generally available free programs which are used unmodified in performing those activities but which are not part of the work. For example, Corresponding Source includes interface definition files associated with source files for the work, and the source code for shared libraries and dynamically linked subprograms that the work is specifically designed to require, such as by intimate data communication or control flow between those subprograms and other parts of the work. The Corresponding Source need not include anything that users can regenerate automatically from other parts of the Corresponding Source. The Corresponding Source for a work in source code form is that same work. 2. Basic Permissions. All rights granted under this License are granted for the term of copyright on the Program, and are irrevocable provided the stated conditions are met. This License explicitly affirms your unlimited permission to run the unmodified Program. The output from running a covered work is covered by this License only if the output, given its content, constitutes a covered work. This License acknowledges your rights of fair use or other equivalent, as provided by copyright law. You may make, run and propagate covered works that you do not convey, without conditions so long as your license otherwise remains in force. You may convey covered works to others for the sole purpose of having them make modifications exclusively for you, or provide you with facilities for running those works, provided that you comply with the terms of this License in conveying all material for which you do not control copyright. Those thus making or running the covered works for you must do so exclusively on your behalf, under your direction and control, on terms that prohibit them from making any copies of your copyrighted material outside their relationship with you. Conveying under any other circumstances is permitted solely under the conditions stated below. Sublicensing is not allowed; section 10 makes it unnecessary. 3. Protecting Users' Legal Rights From Anti-Circumvention Law. No covered work shall be deemed part of an effective technological measure under any applicable law fulfilling obligations under article 11 of the WIPO copyright treaty adopted on 20 December 1996, or similar laws prohibiting or restricting circumvention of such measures. When you convey a covered work, you waive any legal power to forbid circumvention of technological measures to the extent such circumvention is effected by exercising rights under this License with respect to the covered work, and you disclaim any intention to limit operation or modification of the work as a means of enforcing, against the work's users, your or third parties' legal rights to forbid circumvention of technological measures. 4. Conveying Verbatim Copies. You may convey verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice; keep intact all notices stating that this License and any non-permissive terms added in accord with section 7 apply to the code; keep intact all notices of the absence of any warranty; and give all recipients a copy of this License along with the Program. You may charge any price or no price for each copy that you convey, and you may offer support or warranty protection for a fee. 5. Conveying Modified Source Versions. You may convey a work based on the Program, or the modifications to produce it from the Program, in the form of source code under the terms of section 4, provided that you also meet all of these conditions: a) The work must carry prominent notices stating that you modified it, and giving a relevant date. b) The work must carry prominent notices stating that it is released under this License and any conditions added under section 7. This requirement modifies the requirement in section 4 to "keep intact all notices". c) You must license the entire work, as a whole, under this License to anyone who comes into possession of a copy. This License will therefore apply, along with any applicable section 7 additional terms, to the whole of the work, and all its parts, regardless of how they are packaged. This License gives no permission to license the work in any other way, but it does not invalidate such permission if you have separately received it. d) If the work has interactive user interfaces, each must display Appropriate Legal Notices; however, if the Program has interactive interfaces that do not display Appropriate Legal Notices, your work need not make them do so. A compilation of a covered work with other separate and independent works, which are not by their nature extensions of the covered work, and which are not combined with it such as to form a larger program, in or on a volume of a storage or distribution medium, is called an "aggregate" if the compilation and its resulting copyright are not used to limit the access or legal rights of the compilation's users beyond what the individual works permit. Inclusion of a covered work in an aggregate does not cause this License to apply to the other parts of the aggregate. 6. Conveying Non-Source Forms. You may convey a covered work in object code form under the terms of sections 4 and 5, provided that you also convey the machine-readable Corresponding Source under the terms of this License, in one of these ways: a) Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by the Corresponding Source fixed on a durable physical medium customarily used for software interchange. b) Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by a written offer, valid for at least three years and valid for as long as you offer spare parts or customer support for that product model, to give anyone who possesses the object code either (1) a copy of the Corresponding Source for all the software in the product that is covered by this License, on a durable physical medium customarily used for software interchange, for a price no more than your reasonable cost of physically performing this conveying of source, or (2) access to copy the Corresponding Source from a network server at no charge. c) Convey individual copies of the object code with a copy of the written offer to provide the Corresponding Source. This alternative is allowed only occasionally and noncommercially, and only if you received the object code with such an offer, in accord with subsection 6b. d) Convey the object code by offering access from a designated place (gratis or for a charge), and offer equivalent access to the Corresponding Source in the same way through the same place at no further charge. You need not require recipients to copy the Corresponding Source along with the object code. If the place to copy the object code is a network server, the Corresponding Source may be on a different server (operated by you or a third party) that supports equivalent copying facilities, provided you maintain clear directions next to the object code saying where to find the Corresponding Source. Regardless of what server hosts the Corresponding Source, you remain obligated to ensure that it is available for as long as needed to satisfy these requirements. e) Convey the object code using peer-to-peer transmission, provided you inform other peers where the object code and Corresponding Source of the work are being offered to the general public at no charge under subsection 6d. A separable portion of the object code, whose source code is excluded from the Corresponding Source as a System Library, need not be included in conveying the object code work. A "User Product" is either (1) a "consumer product", which means any tangible personal property which is normally used for personal, family, or household purposes, or (2) anything designed or sold for incorporation into a dwelling. In determining whether a product is a consumer product, doubtful cases shall be resolved in favor of coverage. For a particular product received by a particular user, "normally used" refers to a typical or common use of that class of product, regardless of the status of the particular user or of the way in which the particular user actually uses, or expects or is expected to use, the product. A product is a consumer product regardless of whether the product has substantial commercial, industrial or non-consumer uses, unless such uses represent the only significant mode of use of the product. "Installation Information" for a User Product means any methods, procedures, authorization keys, or other information required to install and execute modified versions of a covered work in that User Product from a modified version of its Corresponding Source. The information must suffice to ensure that the continued functioning of the modified object code is in no case prevented or interfered with solely because modification has been made. If you convey an object code work under this section in, or with, or specifically for use in, a User Product, and the conveying occurs as part of a transaction in which the right of possession and use of the User Product is transferred to the recipient in perpetuity or for a fixed term (regardless of how the transaction is characterized), the Corresponding Source conveyed under this section must be accompanied by the Installation Information. But this requirement does not apply if neither you nor any third party retains the ability to install modified object code on the User Product (for example, the work has been installed in ROM). The requirement to provide Installation Information does not include a requirement to continue to provide support service, warranty, or updates for a work that has been modified or installed by the recipient, or for the User Product in which it has been modified or installed. Access to a network may be denied when the modification itself materially and adversely affects the operation of the network or violates the rules and protocols for communication across the network. Corresponding Source conveyed, and Installation Information provided, in accord with this section must be in a format that is publicly documented (and with an implementation available to the public in source code form), and must require no special password or key for unpacking, reading or copying. 7. Additional Terms. "Additional permissions" are terms that supplement the terms of this License by making exceptions from one or more of its conditions. Additional permissions that are applicable to the entire Program shall be treated as though they were included in this License, to the extent that they are valid under applicable law. If additional permissions apply only to part of the Program, that part may be used separately under those permissions, but the entire Program remains governed by this License without regard to the additional permissions. When you convey a copy of a covered work, you may at your option remove any additional permissions from that copy, or from any part of it. (Additional permissions may be written to require their own removal in certain cases when you modify the work.) You may place additional permissions on material, added by you to a covered work, for which you have or can give appropriate copyright permission. Notwithstanding any other provision of this License, for material you add to a covered work, you may (if authorized by the copyright holders of that material) supplement the terms of this License with terms: a) Disclaiming warranty or limiting liability differently from the terms of sections 15 and 16 of this License; or b) Requiring preservation of specified reasonable legal notices or author attributions in that material or in the Appropriate Legal Notices displayed by works containing it; or c) Prohibiting misrepresentation of the origin of that material, or requiring that modified versions of such material be marked in reasonable ways as different from the original version; or d) Limiting the use for publicity purposes of names of licensors or authors of the material; or e) Declining to grant rights under trademark law for use of some trade names, trademarks, or service marks; or f) Requiring indemnification of licensors and authors of that material by anyone who conveys the material (or modified versions of it) with contractual assumptions of liability to the recipient, for any liability that these contractual assumptions directly impose on those licensors and authors. All other non-permissive additional terms are considered "further restrictions" within the meaning of section 10. If the Program as you received it, or any part of it, contains a notice stating that it is governed by this License along with a term that is a further restriction, you may remove that term. If a license document contains a further restriction but permits relicensing or conveying under this License, you may add to a covered work material governed by the terms of that license document, provided that the further restriction does not survive such relicensing or conveying. If you add terms to a covered work in accord with this section, you must place, in the relevant source files, a statement of the additional terms that apply to those files, or a notice indicating where to find the applicable terms. Additional terms, permissive or non-permissive, may be stated in the form of a separately written license, or stated as exceptions; the above requirements apply either way. 8. Termination. You may not propagate or modify a covered work except as expressly provided under this License. Any attempt otherwise to propagate or modify it is void, and will automatically terminate your rights under this License (including any patent licenses granted under the third paragraph of section 11). However, if you cease all violation of this License, then your license from a particular copyright holder is reinstated (a) provisionally, unless and until the copyright holder explicitly and finally terminates your license, and (b) permanently, if the copyright holder fails to notify you of the violation by some reasonable means prior to 60 days after the cessation. Moreover, your license from a particular copyright holder is reinstated permanently if the copyright holder notifies you of the violation by some reasonable means, this is the first time you have received notice of violation of this License (for any work) from that copyright holder, and you cure the violation prior to 30 days after your receipt of the notice. Termination of your rights under this section does not terminate the licenses of parties who have received copies or rights from you under this License. If your rights have been terminated and not permanently reinstated, you do not qualify to receive new licenses for the same material under section 10. 9. Acceptance Not Required for Having Copies. You are not required to accept this License in order to receive or run a copy of the Program. Ancillary propagation of a covered work occurring solely as a consequence of using peer-to-peer transmission to receive a copy likewise does not require acceptance. However, nothing other than this License grants you permission to propagate or modify any covered work. These actions infringe copyright if you do not accept this License. Therefore, by modifying or propagating a covered work, you indicate your acceptance of this License to do so. 10. Automatic Licensing of Downstream Recipients. Each time you convey a covered work, the recipient automatically receives a license from the original licensors, to run, modify and propagate that work, subject to this License. You are not responsible for enforcing compliance by third parties with this License. An "entity transaction" is a transaction transferring control of an organization, or substantially all assets of one, or subdividing an organization, or merging organizations. If propagation of a covered work results from an entity transaction, each party to that transaction who receives a copy of the work also receives whatever licenses to the work the party's predecessor in interest had or could give under the previous paragraph, plus a right to possession of the Corresponding Source of the work from the predecessor in interest, if the predecessor has it or can get it with reasonable efforts. You may not impose any further restrictions on the exercise of the rights granted or affirmed under this License. For example, you may not impose a license fee, royalty, or other charge for exercise of rights granted under this License, and you may not initiate litigation (including a cross-claim or counterclaim in a lawsuit) alleging that any patent claim is infringed by making, using, selling, offering for sale, or importing the Program or any portion of it. 11. Patents. A "contributor" is a copyright holder who authorizes use under this License of the Program or a work on which the Program is based. The work thus licensed is called the contributor's "contributor version". A contributor's "essential patent claims" are all patent claims owned or controlled by the contributor, whether already acquired or hereafter acquired, that would be infringed by some manner, permitted by this License, of making, using, or selling its contributor version, but do not include claims that would be infringed only as a consequence of further modification of the contributor version. For purposes of this definition, "control" includes the right to grant patent sublicenses in a manner consistent with the requirements of this License. Each contributor grants you a non-exclusive, worldwide, royalty-free patent license under the contributor's essential patent claims, to make, use, sell, offer for sale, import and otherwise run, modify and propagate the contents of its contributor version. In the following three paragraphs, a "patent license" is any express agreement or commitment, however denominated, not to enforce a patent (such as an express permission to practice a patent or covenant not to sue for patent infringement). To "grant" such a patent license to a party means to make such an agreement or commitment not to enforce a patent against the party. If you convey a covered work, knowingly relying on a patent license, and the Corresponding Source of the work is not available for anyone to copy, free of charge and under the terms of this License, through a publicly available network server or other readily accessible means, then you must either (1) cause the Corresponding Source to be so available, or (2) arrange to deprive yourself of the benefit of the patent license for this particular work, or (3) arrange, in a manner consistent with the requirements of this License, to extend the patent license to downstream recipients. "Knowingly relying" means you have actual knowledge that, but for the patent license, your conveying the covered work in a country, or your recipient's use of the covered work in a country, would infringe one or more identifiable patents in that country that you have reason to believe are valid. If, pursuant to or in connection with a single transaction or arrangement, you convey, or propagate by procuring conveyance of, a covered work, and grant a patent license to some of the parties receiving the covered work authorizing them to use, propagate, modify or convey a specific copy of the covered work, then the patent license you grant is automatically extended to all recipients of the covered work and works based on it. A patent license is "discriminatory" if it does not include within the scope of its coverage, prohibits the exercise of, or is conditioned on the non-exercise of one or more of the rights that are specifically granted under this License. You may not convey a covered work if you are a party to an arrangement with a third party that is in the business of distributing software, under which you make payment to the third party based on the extent of your activity of conveying the work, and under which the third party grants, to any of the parties who would receive the covered work from you, a discriminatory patent license (a) in connection with copies of the covered work conveyed by you (or copies made from those copies), or (b) primarily for and in connection with specific products or compilations that contain the covered work, unless you entered into that arrangement, or that patent license was granted, prior to 28 March 2007. Nothing in this License shall be construed as excluding or limiting any implied license or other defenses to infringement that may otherwise be available to you under applicable patent law. 12. No Surrender of Others' Freedom. If conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot convey a covered work so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not convey it at all. For example, if you agree to terms that obligate you to collect a royalty for further conveying from those to whom you convey the Program, the only way you could satisfy both those terms and this License would be to refrain entirely from conveying the Program. 13. Use with the GNU Affero General Public License. Notwithstanding any other provision of this License, you have permission to link or combine any covered work with a work licensed under version 3 of the GNU Affero General Public License into a single combined work, and to convey the resulting work. The terms of this License will continue to apply to the part which is the covered work, but the special requirements of the GNU Affero General Public License, section 13, concerning interaction through a network will apply to the combination as such. 14. Revised Versions of this License. The Free Software Foundation may publish revised and/or new versions of the GNU General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies that a certain numbered version of the GNU General Public License "or any later version" applies to it, you have the option of following the terms and conditions either of that numbered version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of the GNU General Public License, you may choose any version ever published by the Free Software Foundation. If the Program specifies that a proxy can decide which future versions of the GNU General Public License can be used, that proxy's public statement of acceptance of a version permanently authorizes you to choose that version for the Program. Later license versions may give you additional or different permissions. However, no additional obligations are imposed on any author or copyright holder as a result of your choosing to follow a later version. 15. Disclaimer of Warranty. THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 16. Limitation of Liability. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. 17. Interpretation of Sections 15 and 16. If the disclaimer of warranty and limitation of liability provided above cannot be given local legal effect according to their terms, reviewing courts shall apply local law that most closely approximates an absolute waiver of all civil liability in connection with the Program, unless a warranty or assumption of liability accompanies a copy of the Program in return for a fee. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively state the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see . Also add information on how to contact you by electronic and paper mail. If the program does terminal interaction, make it output a short notice like this when it starts in an interactive mode: Copyright (C) This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, your program's commands might be different; for a GUI interface, you would use an "about box". You should also get your employer (if you work as a programmer) or school, if any, to sign a "copyright disclaimer" for the program, if necessary. For more information on this, and how to apply and follow the GNU GPL, see . The GNU General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Lesser General Public License instead of this License. But first, please read . meliae-0.4.0/MANIFEST.in0000644000000000000000000000007211605576210014445 0ustar rootroot00000000000000recursive-include meliae *.c recursive-include meliae *.h meliae-0.4.0/Makefile0000644000000000000000000000017511605576210014353 0ustar rootroot00000000000000all: build_inplace check: all python run_tests.py build_inplace: python setup.py build_ext -i .PHONY: all build_inplace meliae-0.4.0/README.txt0000644000000000000000000000027211605576210014407 0ustar rootroot00000000000000====== Meliae ====== Meant to be a simple program for dumping python memory usage information to a disk format, which can then be parsed into useful things like graph representations. meliae-0.4.0/TODO.txt0000644000000000000000000000226511605576210014223 0ustar rootroot00000000000000============ Things to do ============ A fairly random collection of things to work on next... 1) Generating a Calltree output. I haven't yet understood the calltree syntax, nor how I want to exactly match things. Certainly you don't have FILE/LINE to put into the output. Also, look at generating `runsnakerun`_ output. .. _runsnakerun: http://www.vrplumber.com/programming/runsnakerun/ 2) Other analysis tools, like walking the ref graph. I'm thinking something similar to PDB, which could let you walk up-and-down the reference graph, to let you figure out why that one string is being cached, by going through the 10 layers of references. At the moment, you can do this using '*' in Vim, which is at least a start, and one reason to use a text-compatible dump format. 3) Dump differencing utilities. This probably will make it a bit easier to see where memory is increasing, rather than just where it is at right now. 4) Full cross-platform and version compatibility testing. I'd like to support python2.4+, 32/64-bit, Win/Linux/Mac. I've tested a couple variants, but I don't have all of them to make sure it works everywhere. .. vim: ft=rst meliae-0.4.0/meliae/0000755000000000000000000000000011605576210014144 5ustar rootroot00000000000000meliae-0.4.0/remove_expensive_references.py0000755000000000000000000000320511605576210021051 0ustar rootroot00000000000000#!/usr/bin/env python # Copyright (C) 2009 Canonical Ltd # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License version 3 as # published by the Free Software Foundation. # # This program is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . """Remove expensive references. This script takes 1 or two filenames and filters the first into either std out, or the second filename. """ import os import re import sys import time from meliae import files, loader def main(args): import optparse p = optparse.OptionParser( '%prog INFILE [OUTFILE]') opts, args = p.parse_args(args) if len(args) > 2: sys.stderr.write('We only support 2 filenames, not %d\n' % (len(args),)) return -1 if len(args) < 1: sys.stderr.write("Must supply INFILE\n") return -1 def source(): infile, cleanup = files.open_file(args[0]) for obj in loader.iter_objs(infile): yield obj cleanup() if len(args) == 1: outfile = sys.stdout else: outfile = open(args[1], 'wb') for _, obj in loader.remove_expensive_references(source, show_progress=True): outfile.write(obj.to_json() + '\n') outfile.flush() if __name__ == '__main__': sys.exit(main(sys.argv[1:])) meliae-0.4.0/run_tests.py0000644000000000000000000000201011605576210015301 0ustar rootroot00000000000000#!/usr/bin/env python # Copyright (C) 2009 Canonical Ltd # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License version 3 as # published by the Free Software Foundation. # # This program is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . """Run the Meliae test suite.""" import sys def main(args): import optparse p = optparse.OptionParser() p.add_option('--verbose', '-v', action='store_true', help='run verbose tests') (opts, args) = p.parse_args(args) import meliae.tests return not meliae.tests.run_suite(verbose=opts.verbose) if __name__ == '__main__': sys.exit(main(sys.argv[1:])) meliae-0.4.0/setup.py0000644000000000000000000000610011605576210014417 0ustar rootroot00000000000000#!/usr/bin/env python # Copyright (C) 2009 Canonical Ltd # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License version 3 as # published by the Free Software Foundation. # # This program is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . def config(): import meliae ext = [] kwargs = { "name": "meliae", "version": meliae.__version__, "description": "Python Memory Usage Analyzer", "author": "John Arbash Meinel", "author_email": "john.meinel@canonical.com", "url": "https://launchpad.net/meliae", "license": "GNU GPL v3", "download_url": "https://launchpad.net/meliae/+download", "packages": ["meliae"], "scripts": ["strip_duplicates.py"], "ext_modules": ext, "classifiers": [ 'Development Status :: 4 - Beta', 'Environment :: Console', 'Intended Audience :: Developers', 'License :: OSI Approved :: GNU General Public License (GPL)', 'Operating System :: OS Independent', 'Operating System :: Microsoft :: Windows', 'Operating System :: POSIX', 'Programming Language :: Python', 'Programming Language :: Cython', 'Topic :: Software Development :: Debuggers', ], "long_description": """\ This project is similar to heapy (in the 'guppy' project), in its attempt to understand how memory has been allocated. Currently, its main difference is that it splits the task of computing summary statistics, etc of memory consumption from the actual scanning of memory consumption. It does this, because I often want to figure out what is going on in my process, while my process is consuming huge amounts of memory (1GB, etc). It also allows dramatically simplifying the scanner, as I don't allocate python objects while trying to analyze python object memory consumption. It will likely grow to include a GUI for browsing the reference graph. For now it is mostly used in the python interpreter. The name is simply a fun word (means Ash-wood Nymph). """ } from distutils.core import setup, Extension try: from Cython.Distutils import build_ext except ImportError: print "We require Cython to be installed." return kwargs["cmdclass"] = {"build_ext": build_ext} ext.append(Extension("meliae._scanner", ["meliae/_scanner.pyx", "meliae/_scanner_core.c"])) ext.append(Extension("meliae._loader", ["meliae/_loader.pyx"])) ext.append(Extension("meliae._intset", ["meliae/_intset.pyx"])) setup(**kwargs) if __name__ == "__main__": config() meliae-0.4.0/strip_duplicates.py0000755000000000000000000000613011605576210016643 0ustar rootroot00000000000000#!/usr/bin/env python # Copyright (C) 2009 Canonical Ltd # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License version 3 as # published by the Free Software Foundation. # # This program is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . """Remove duplicated object information. To be memory efficient, 'scanner.dump_gc_objects()' does not track what objects have already been dumped. This means that it occasionally dumps the same object multiple times. This script just takes 2 files, and filters the incoming one into purely unique lines in the outgoing one. """ import os import re import sys import time from meliae import files def strip_duplicate(infile, outfile, insize=None): from meliae import _intset seen = _intset.IntSet() address_re = re.compile( r'{"address": (?P
\d+)' ) if insize is not None: in_mb = ' / %5.1f MiB' % (insize / 1024. / 1024,) else: in_mb = '' bytes_read = 0 last_bytes = 0 lines_out = 0 tstart = time.time() tlast = time.time() for line_num, line in enumerate(infile): bytes_read += len(line) m = address_re.match(line) if m: address = int(m.group('address')) if address not in seen: outfile.write(line) lines_out += 1 seen.add(address) tnow = time.time() if tnow - tlast > 0.2: tlast = tnow mb_read = bytes_read / 1024. / 1024 tdelta = tnow - tstart sys.stderr.write( 'loading... line %d, %d out, %5.1f%s read in %.1fs\r' % (line_num, lines_out, mb_read, in_mb, tdelta)) def main(args): import optparse p = optparse.OptionParser( '%prog [INFILE [OUTFILE]]') opts, args = p.parse_args(args) if len(args) > 2: sys.stderr.write('We only support 2 filenames, not %d\n' % (len(args),)) return -1 cleanups = [] try: if len(args) == 0: infile = sys.stdin insize = None outfile = sys.stdout else: infile, cleanup = files.open_file(args[0]) if cleanup is not None: cleanups.append(cleanup) if isinstance(infile, file): # pipes are files, but 0 isn't useful. insize = os.fstat(infile.fileno()).st_size or None else: insize = None if len(args) == 1: outfile = sys.stdout else: outfile = open(args[1], 'wb') strip_duplicate(infile, outfile, insize) finally: for cleanup in cleanups: cleanup() if __name__ == '__main__': sys.exit(main(sys.argv[1:])) meliae-0.4.0/track_memory.py0000644000000000000000000000460611605576210015764 0ustar rootroot00000000000000# Copyright (C) 2009, 2010 Canonical Ltd # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License version 3 as # published by the Free Software Foundation. # # This program is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . """A program to spawn a subprocess and track how memory is consumed.""" import subprocess import time def spawn_and_track(opts, args): from meliae import perf_counter timer = perf_counter.perf_counter.get_timer() start = timer() print 'spawning: %s' % (args,) # We have to use shell=False, otherwise we end up tracking the 'cmd.exe' or # 'sh' process, rather than the actual process we care about. p = subprocess.Popen(args, shell=False) mb = 1.0/(1024.0*1024) last = start last_print = 0 mem_secs = 0 while p.poll() is None: now = timer() cur_mem, peak_mem = perf_counter.perf_counter.get_memory(p) if cur_mem is None or peak_mem is None: p.wait() break mem_secs += cur_mem * (now - last) last = now time.sleep(opts.sleep_time) if now - last_print > 3: print '%8.3fs %6.1fMB %6.1fMB %8.1fMB*s \r' % ( now - start, cur_mem*mb, peak_mem*mb, mem_secs*mb), last_print = now print '%8.3fs %6.1fMB %6.1fMB %8.1fMB*s ' % (now - start, cur_mem*mb, peak_mem*mb, mem_secs*mb) def main(args): import optparse p = optparse.OptionParser('%prog [local opts] command [opts]') p.add_option('--trace-file', type=str, default=None, help='Save the memory usage information to this file') p.add_option('--sleep-time', type=float, default=0.1, help='Check the status after this many seconds.') # All options after the first 'command' are passed to the command, so don't # try to process them ourselves. p.disable_interspersed_args() opts, args = p.parse_args(args) spawn_and_track(opts, args) if __name__ == '__main__': import sys sys.exit(main(sys.argv[1:])) meliae-0.4.0/meliae/__init__.py0000644000000000000000000000140711605576210016257 0ustar rootroot00000000000000# Copyright (C) 2009, 2010 Canonical Ltd # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License version 3 as # published by the Free Software Foundation. # # This program is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . """A simple way to dump memory consumption of a running python program.""" version_info = (0, 4, 0, 'final', 0) __version__ = '.'.join(map(str, version_info)) meliae-0.4.0/meliae/_intset.pyx0000644000000000000000000002352411605576210016361 0ustar rootroot00000000000000# Copyright (C) 2009, 2010 Canonical Ltd # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License version 3 as # published by the Free Software Foundation. # # This program is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . """A structure for handling a set of pure integers. (Such as a set of python object ids.) """ cdef extern from *: ctypedef unsigned long size_t void *malloc(size_t) void *realloc(void *, size_t) void free(void *) void memset(void *, int, size_t) ctypedef Py_ssize_t int_type cdef int_type _singleton1, _singleton2 # _singleton1 is the 'no value present' value # _singleton2 is the 'value deleted' value, which has us keep searching # collisions after the fact _singleton1 = 0; _singleton2 = -1; cdef class IntSet: """Keep a set of integer objects. This is slightly more efficient than a PySet because we don't allow arbitrary types. """ cdef Py_ssize_t _count cdef Py_ssize_t _mask cdef int_type *_array cdef readonly int _has_singleton def __init__(self, values=None): self._count = 0 self._mask = 0 self._array = NULL # This is a separate bit mask of singletons that we have seen # There are 2, the one which indicates no value, and the one that # indicates 'dummy', aka removed value self._has_singleton = 0 if values: for value in values: self._add(value) def __dealloc__(self): if self._array != NULL: free(self._array) def __len__(self): return self._count def __sizeof__(self): # PyType *, RefCount, + known attributes my_size = sizeof(IntSet) if self._array != NULL: my_size += (sizeof(int_type) * (self._mask + 1)) return my_size def _peek_array(self): cdef Py_ssize_t i, size if self._array == NULL: return None result = [] size = self._mask + 1 for i from 0 <= i < size: result.append(self._array[i]) return result cdef int_type *_lookup(self, int_type c_val) except NULL: """Taken from the set() algorithm.""" cdef size_t offset, perturb cdef int_type *entry, *freeslot if self._array == NULL: raise RuntimeError('cannot _lookup without _array allocated.') offset = c_val & self._mask entry = self._array + offset if entry[0] == c_val or entry[0] == _singleton1: return entry if entry[0] == _singleton2: freeslot = entry else: freeslot = NULL perturb = c_val while True: offset = (offset << 2) + offset + perturb + 1 entry = self._array + (offset & self._mask) if (entry[0] == _singleton1): # We converged on an empty slot, without finding entry == c_val # If we previously found a freeslot, return it, else return # this entry if freeslot == NULL: return entry else: return freeslot elif (entry[0] == c_val): # Exact match return entry elif (entry[0] == _singleton2 and freeslot == NULL): # We found the first match that was a 'dummy' entry freeslot = entry perturb = perturb >> 5 # PERTURB_SHIFT def __contains__(self, val): cdef int_type i_val i_val = val return self._contains(i_val) cdef object _contains(self, int_type c_val): cdef int_type *entry if c_val == _singleton1: if self._has_singleton & 0x01: return True else: return False elif c_val == _singleton2: if self._has_singleton & 0x02: return True else: return False if self._array == NULL: return False entry = self._lookup(c_val) if entry[0] == c_val: return True return False cdef int _grow(self) except -1: cdef int i cdef Py_ssize_t old_mask, old_size, new_size, old_count cdef int_type *old_array, val old_mask = self._mask old_size = old_mask + 1 old_array = self._array old_count = self._count # Current size * 2 if old_array == NULL: # Nothing currently allocated self._mask = 255 self._array = malloc(sizeof(int_type) * 256) memset(self._array, _singleton1, sizeof(int_type) * 256) return 0 new_size = old_size * 2 # Replace 'in place', grow to a new array, and add items back in # Note that if it weren't for collisions, we could actually 'realloc()' # and insert backwards. Since expanding mask means something will only # fit in its old place, or the 2<<1 greater. self._array = malloc(sizeof(int_type) * new_size) memset(self._array, _singleton1, sizeof(int_type) * new_size) self._mask = new_size - 1 self._count = 0 if self._has_singleton & 0x01: self._count = self._count + 1 if self._has_singleton & 0x02: self._count = self._count + 1 for i from 0 <= i < old_size: val = old_array[i] if val != _singleton1 and val != _singleton2: self._add(val) if self._count != old_count: raise RuntimeError('After resizing array from %d => %d' ' the count of items changed %d => %d' % (old_size, new_size, old_count, self._count)) free(old_array) cdef int _add(self, int_type c_val) except -1: cdef int_type *entry if c_val == _singleton1: if self._has_singleton & 0x01: return 0 self._has_singleton = self._has_singleton | 0x01 self._count = self._count + 1 return 1 elif c_val == _singleton2: if self._has_singleton & 0x02: # Already had it, no-op return 0 self._has_singleton = self._has_singleton | 0x02 self._count = self._count + 1 return 1 if self._array == NULL or self._count * 4 > self._mask: self._grow() entry = self._lookup(c_val) if entry[0] == c_val: # We already had it, no-op return 0 if entry[0] == _singleton1 or entry[0] == _singleton2: # No value stored at this location entry[0] = c_val self._count = self._count + 1 return 1 raise RuntimeError("Calling self._lookup(%x) returned %x. However" " that is not the value or one of the singletons." % (c_val, entry[0])) def add(self, val): """Add a new entry to the set.""" self._add(val) cdef class IDSet(IntSet): """Track a set of object ids (addresses). This only differs from IntSet in how the integers are hashed. Object addresses tend to be aligned on 16-byte boundaries (occasionally 8-byte, and even more rarely on 4-byte), as such the standard hash lookup has more collisions than desired. Also, addresses are considered to be unsigned longs by python, but Py_ssize_t is a signed long. Just treating it normally causes us to get a value overflow on 32-bits if the highest bit is set. """ def add(self, val): cdef unsigned long ul_val ul_val = val self._add((ul_val)) def __contains__(self, val): cdef unsigned long ul_val ul_val = val return self._contains((ul_val)) # TODO: Consider that the code would probably be simpler if we just # bit-shifted before passing the value to self._add and self._contains, # rather than re-implementing _lookup here. cdef int_type *_lookup(self, int_type c_val) except NULL: """Taken from the set() algorithm.""" cdef size_t offset, perturb cdef int_type *entry, *freeslot cdef int_type internal_val if self._array == NULL: raise RuntimeError('cannot _lookup without _array allocated.') # For addresses, we shift the last 4 bits into the beginning of the # value internal_val = ((c_val & 0xf) << (sizeof(int_type)*8 - 4)) internal_val = internal_val | (c_val >> 4) offset = internal_val & self._mask entry = self._array + offset if entry[0] == c_val or entry[0] == _singleton1: return entry if entry[0] == _singleton2: freeslot = entry else: freeslot = NULL perturb = c_val while True: offset = (offset << 2) + offset + perturb + 1 entry = self._array + (offset & self._mask) if (entry[0] == _singleton1): # We converged on an empty slot, without finding entry == c_val # If we previously found a freeslot, return it, else return # this entry if freeslot == NULL: return entry else: return freeslot elif (entry[0] == c_val): # Exact match return entry elif (entry[0] == _singleton2 and freeslot == NULL): # We found the first match that was a 'dummy' entry freeslot = entry perturb = perturb >> 5 # PERTURB_SHIFT meliae-0.4.0/meliae/_loader.pyx0000644000000000000000000012030211605576210016311 0ustar rootroot00000000000000# Copyright (C) 2009, 2010 Canonical Ltd # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License version 3 as # published by the Free Software Foundation. # # This program is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . """Routines and objects for loading dump files.""" cdef extern from "Python.h": ctypedef unsigned long size_t ctypedef struct PyObject: pass PyObject *Py_None void *PyMem_Malloc(size_t) void PyMem_Free(void *) ctypedef int (*visitproc)(PyObject *, void *) ctypedef int (*traverseproc)(PyObject *, visitproc, void *) ctypedef struct PyTypeObject: # hashfunc tp_hash # richcmpfunc tp_richcompare traverseproc tp_traverse long PyObject_Hash(PyObject *) except -1 object PyList_New(Py_ssize_t) void PyList_SET_ITEM(object, Py_ssize_t, object) PyObject *PyDict_GetItem(object d, object key) PyObject *PyDict_GetItem_ptr "PyDict_GetItem" (object d, PyObject *key) int PyDict_SetItem(object d, object key, object val) except -1 int PyDict_SetItem_ptr "PyDict_SetItem" (object d, PyObject *key, PyObject *val) except -1 void Py_INCREF(PyObject*) void Py_XDECREF(PyObject*) void Py_DECREF(PyObject*) object PyTuple_New(Py_ssize_t) object PyTuple_SET_ITEM(object, Py_ssize_t, object) int PyObject_RichCompareBool(PyObject *, PyObject *, int) except -1 int Py_EQ void memset(void *, int, size_t) # void fprintf(void *, char *, ...) # void *stderr import gc from meliae import warn ctypedef struct RefList: long size PyObject *refs[0] cdef Py_ssize_t sizeof_RefList(RefList *val): """Determine how many bytes for this ref list. val() can be NULL""" if val == NULL: return 0 return sizeof(long) + (sizeof(PyObject *) * val.size) cdef object _set_default(object d, object val): """Either return the value in the dict, or return 'val'. This is the same as Dict.setdefault, but without the attribute lookups. (It may be slightly worse because it does 2 dict lookups?) """ cdef PyObject *tmp # TODO: Note that using _lookup directly would remove the need to ever do a # double-lookup to set the value tmp = PyDict_GetItem(d, val) if tmp == NULL: PyDict_SetItem(d, val, val) else: val = tmp return val cdef int _set_default_ptr(object d, PyObject **val) except -1: """Similar to _set_default, only it sets the val in place""" cdef PyObject *tmp tmp = PyDict_GetItem_ptr(d, val[0]) if tmp == NULL: # val is unchanged, so we don't change the refcounts PyDict_SetItem_ptr(d, val[0], val[0]) return 0 else: # We will be pointing val to something new, so fix up the refcounts Py_INCREF(tmp) Py_DECREF(val[0]) val[0] = tmp return 1 cdef int _free_ref_list(RefList *ref_list) except -1: """Decref and free the list.""" cdef long i if ref_list == NULL: return 0 for i from 0 <= i < ref_list.size: if ref_list.refs[i] == NULL: raise RuntimeError('Somehow we got a NULL reference.') Py_DECREF(ref_list.refs[i]) PyMem_Free(ref_list) return 1 cdef object _ref_list_to_list(RefList *ref_list): """Convert the notation of [len, items, ...] into [items]. :param ref_list: A pointer to NULL, or to a list of longs. The list should start with the count of items """ cdef long i # TODO: Always return a tuple, we already know the width, and this prevents # double malloc(). However, this probably isn't a critical code path if ref_list == NULL: return () refs = [] refs_append = refs.append for i from 0 <= i < ref_list.size: refs_append((ref_list.refs[i])) return refs cdef RefList *_list_to_ref_list(object refs) except? NULL: cdef long i, num_refs cdef RefList *ref_list num_refs = len(refs) if num_refs == 0: return NULL ref_list = PyMem_Malloc(sizeof(RefList) + sizeof(PyObject*)*num_refs) ref_list.size = num_refs i = 0 for ref in refs: ref_list.refs[i] = ref Py_INCREF(ref_list.refs[i]) i = i + 1 return ref_list cdef object _format_list(RefList *ref_list): cdef long i, max_refs if ref_list == NULL: return '' max_refs = ref_list.size if max_refs > 10: max_refs = 10 ref_str = ['['] for i from 0 <= i < max_refs: if i == 0: ref_str.append('%d' % (ref_list.refs[i])) else: ref_str.append(', %d' % (ref_list.refs[i])) if ref_list.size > 10: ref_str.append(', ...]') else: ref_str.append(']') return ''.join(ref_str) cdef struct _MemObject: # """The raw C structure, used to minimize memory allocation size.""" PyObject *address PyObject *type_str # Consider making this unsigned long long size RefList *child_list # Removed for now, since it hasn't proven useful # int length PyObject *value # TODO: I don't think I've found an object that has both a value and a # name. As such, I should probably remove the redundancy, as it saves # a pointer # PyObject *name RefList *parent_list unsigned long total_size # This is an uncounted ref to a _MemObjectProxy. _MemObjectProxy also has a # reference to this object, so when it disappears it can set the reference # to NULL. PyObject *proxy cdef _MemObject *_new_mem_object(address, type_str, size, children, value, name, parent_list, total_size) except NULL: cdef _MemObject *new_entry cdef PyObject *addr new_entry = <_MemObject *>PyMem_Malloc(sizeof(_MemObject)) if new_entry == NULL: raise MemoryError('Failed to allocate %d bytes' % (sizeof(_MemObject),)) memset(new_entry, 0, sizeof(_MemObject)) addr = address Py_INCREF(addr) new_entry.address = addr new_entry.type_str = type_str Py_INCREF(new_entry.type_str) new_entry.size = size new_entry.child_list = _list_to_ref_list(children) # TODO: Was found wanting and removed # if length is None: # new_entry.length = -1 # else: # new_entry.length = length if value is not None and name is not None: raise RuntimeError("We currently only support one of value or name" " per object.") if value is not None: new_entry.value = value else: new_entry.value = name Py_INCREF(new_entry.value) new_entry.parent_list = _list_to_ref_list(parent_list) new_entry.total_size = total_size return new_entry cdef int _free_mem_object(_MemObject *cur) except -1: if cur == NULL: # Already cleared return 0 if cur == _dummy: return 0 if cur.address == NULL: raise RuntimeError('clering something that doesn\'t have address') Py_XDECREF(cur.address) cur.address = NULL Py_XDECREF(cur.type_str) cur.type_str = NULL _free_ref_list(cur.child_list) cur.child_list = NULL Py_XDECREF(cur.value) cur.value = NULL # Py_XDECREF(cur.name) # cur.name = NULL _free_ref_list(cur.parent_list) cur.parent_list = NULL cur.proxy = NULL PyMem_Free(cur) return 1 cdef _MemObject *_dummy _dummy = <_MemObject*>(-1) cdef class MemObjectCollection cdef class _MemObjectProxy def _MemObjectProxy_from_args(address, type_str, size, children=(), length=0, value=None, name=None, parent_list=(), total_size=0): """Create a standalone _MemObjectProxy instance. Note that things like '__getitem__' won't work, as they query the collection for the actual data. """ cdef _MemObject *new_entry cdef _MemObjectProxy proxy new_entry = _new_mem_object(address, type_str, size, children, value, name, parent_list, total_size) proxy = _MemObjectProxy(None) proxy._obj = new_entry proxy._managed_obj = new_entry new_entry.proxy = proxy return proxy cdef class _MemObjectProxy: """The standard interface for understanding memory consumption. MemObjectCollection stores the data as a fairly efficient table, without the overhead of having a regular python object for every data point. However, the rest of python code needs to interact with a real python object, so we generate these on-the-fly. Most attributes are properties, which thunk over to the actual data table entry. :ivar address: The address in memory of the original object. This is used as the 'handle' to this object. :ivar type_str: The type of this object :ivar size: The number of bytes consumed for just this object. So for a dict, this would be the basic_size + the size of the allocated array to store the reference pointers :ivar children: A list of items referenced from this object :ivar num_refs: Count of references, you can also use len() :ivar value: A PyObject representing the Value for this object. (For strings, it is the first 100 bytes, it may be None if we have no value, or it may be an integer, etc.) This is also where the 'name' is stored for objects like 'module'. :ivar """ cdef MemObjectCollection collection # This must be set immediately after construction, before accessing any # member vars cdef _MemObject *_obj # If not NULL, this will be freed when this object is deallocated cdef _MemObject *_managed_obj def __init__(self, collection): self.collection = collection self._obj = NULL self._managed_obj = NULL def __dealloc__(self): if self._obj != NULL: if self._obj.proxy == self: # This object is going away, remove the reference self._obj.proxy = NULL # else: # fprintf(stderr, "obj at address %x referenced" # " a proxy that was not self\n", # self._obj.address) if self._managed_obj != NULL: _free_mem_object(self._managed_obj) self._managed_obj = NULL def __sizeof__(self): my_size = sizeof(_MemObjectProxy) if self._managed_obj != NULL: my_size += sizeof(_MemObject) # XXX: Note that to get the memory dump correct for all this stuff, # We need to walk all the RefList objects and get their size, and # tp_traverse should be walking to all of those referenced # integers, etc. return my_size property address: """The identifier for the tracked object.""" def __get__(self): return (self._obj.address) property type_str: """The type of this object.""" def __get__(self): return (self._obj.type_str) def __set__(self, value): cdef PyObject *ptr ptr = value Py_INCREF(ptr) Py_DECREF(self._obj.type_str) self._obj.type_str = ptr property size: """The number of bytes allocated for this object.""" def __get__(self): return self._obj.size def __set__(self, value): self._obj.size = value # property name: # """Name associated with this object.""" # def __get__(self): # return self._obj.name property value: """Value for this object (for strings and ints)""" def __get__(self): return self._obj.value def __set__(self, value): cdef PyObject *new_val new_val = value # INCREF first, just in case value is self._obj.value Py_INCREF(new_val) Py_DECREF(self._obj.value) self._obj.value = new_val property total_size: """Mean to hold the size of this plus size of all referenced objects.""" def __get__(self): return self._obj.total_size def __set__(self, value): self._obj.total_size = value def __len__(self): if self._obj.child_list == NULL: return 0 return self._obj.child_list.size property num_refs: def __get__(self): warn.deprecated('Attribute .num_refs deprecated.' ' Use len() instead.') return self.__len__() def _intern_from_cache(self, cache): cdef long i _set_default_ptr(cache, &self._obj.address) _set_default_ptr(cache, &self._obj.type_str) if self._obj.child_list != NULL: for i from 0 <= i < self._obj.child_list.size: _set_default_ptr(cache, &self._obj.child_list.refs[i]) if self._obj.parent_list != NULL: for i from 0 <= i < self._obj.parent_list.size: _set_default_ptr(cache, &self._obj.parent_list.refs[i]) property children: """The list of objects referenced by this object.""" def __get__(self): return _ref_list_to_list(self._obj.child_list) def __set__(self, value): _free_ref_list(self._obj.child_list) self._obj.child_list = _list_to_ref_list(value) property ref_list: """The list of objects referenced by this object. Deprecated, use .children instead. """ def __get__(self): warn.deprecated('Attribute .ref_list deprecated.' ' Use .children instead.') return self.children def __set__(self, val): warn.deprecated('Attribute .ref_list deprecated.' ' Use .children instead.') self.children = val property referrers: """Objects which refer to this object. Deprecated, use .parents instead. """ def __get__(self): warn.deprecated('Attribute .referrers deprecated.' ' Use .parents instead.') return self.parents def __set__(self, value): warn.deprecated('Attribute .referrers deprecated.' ' Use .parents instead.') self.parents = value property parents: """The list of objects that reference this object. Original set to None, can be computed on demand. """ def __get__(self): return _ref_list_to_list(self._obj.parent_list) def __set__(self, value): _free_ref_list(self._obj.parent_list) self._obj.parent_list = _list_to_ref_list(value) property num_referrers: """The length of the parents list.""" def __get__(self): warn.deprecated('Attribute .num_referrers deprecated.' ' Use .num_parents instead.') if self._obj.parent_list == NULL: return 0 return self._obj.parent_list.size property num_parents: """The length of the parents list.""" def __get__(self): if self._obj.parent_list == NULL: return 0 return self._obj.parent_list.size def __getitem__(self, offset): cdef long off if self._obj.child_list == NULL: raise IndexError('%s has no references' % (self,)) off = offset if off >= self._obj.child_list.size: raise IndexError('%s has only %d (not %d) references' % (self, self._obj.child_list.size, offset)) address = self._obj.child_list.refs[off] try: return self.collection[address] except KeyError: # TODO: What to do if the object isn't present? I think returning a # 'no-such-object' proxy would be nicer than returning nothing raise property c: """The list of children objects as objects (not references).""" def __get__(self): cdef long pos result = [] if self._obj.child_list == NULL: return result for pos from 0 <= pos < self._obj.child_list.size: address = self._obj.child_list.refs[pos] obj = self.collection[address] result.append(obj) return result property p: """The list of parent objects as objects (not references).""" def __get__(self): cdef long pos result = [] if self._obj.parent_list == NULL: return result for pos from 0 <= pos < self._obj.parent_list.size: address = self._obj.parent_list.refs[pos] try: obj = self.collection[address] except KeyError: # We should probably create an "unknown object" type raise result.append(obj) return result def __repr__(self): if self._obj.child_list == NULL: refs = '' else: refs = ' %drefs' % (self._obj.child_list.size,) if self._obj.parent_list == NULL: parent_str = '' else: parent_str = ' %dpar' % (self._obj.parent_list.size,) if self._obj.value == NULL or self._obj.value == Py_None: val = '' else: val = ' %r' % (self._obj.value,) if self._obj.total_size == 0: total_size_str = '' else: total_size = float(self._obj.total_size) order = 'B' if total_size > 800.0: total_size = total_size / 1024.0 order = 'K' if total_size > 800.0: total_size = total_size / 1024.0 order = 'M' if total_size > 800.0: total_size = total_size / 1024.0 order = 'G' total_size_str = ' %.1f%stot' % (total_size, order) return '%s(%d %dB%s%s%s%s)' % ( self.type_str, self.address, self.size, refs, parent_str, val, total_size_str) def to_json(self): """Convert this back into json.""" refs = [] for ref in sorted(self.children): refs.append(str(ref)) # Note: We've lost the info about whether this was a value or a name # We've also lost the 'length' field. if self.value is not None: if self.type_str == 'int': value = '"value": %s, ' % self.value else: # TODO: This isn't perfect, as it doesn't do proper json # escaping value = '"value": "%s", ' % self.value else: value = '' return '{"address": %d, "type": "%s", "size": %d, %s"refs": [%s]}' % ( self.address, self.type_str, self.size, value, ', '.join(refs)) def refs_as_dict(self): """Expand the ref list considering it to be a 'dict' structure. Often we have dicts that point to simple strings and ints, etc. This tries to expand that as much as possible. """ as_dict = {} children = self.children if len(children) % 2 == 1 and self.type_str not in ('dict', 'module'): # Instance dicts end with a 'type' reference, but only do that if # we actually have an odd number children = children[:-1] for idx in xrange(0, len(children), 2): key = self.collection[children[idx]] val = self.collection[children[idx+1]] if key.value is not None: key = key.value # TODO: We should consider recursing if val is a 'known' type, such # a tuple/dict/etc if val.type_str == 'bool': val = (val.value == 'True') elif val.type_str in ('int', 'long', 'str', 'unicode', 'float', ) and val.value is not None: val = val.value elif val.type_str == 'NoneType': val = None as_dict[key] = val return as_dict def iter_recursive_refs(self, excluding=None): """Find all objects referenced from this one (including self). Self will always be the first object returned, in case you want to exclude it (though it can be excluded in the excluding list). This is done because it is cumbersome to add it back in, but easy to exclude. :param excluding: This can be any iterable of addresses. We will not walk to anything in this list (including self). :return: Iterator over all objects that can be reached. """ cdef _MOPReferencedIterator iterator iterator = _MOPReferencedIterator(self, excluding) return iterator def compute_total_size(self, excluding=None): """Compute the number of bytes of this and all referenced objects. This layers on top of iter_recursive_refs to give just the interesting bits. :return: total_size, this will also be set on self. """ cdef _MOPReferencedIterator iterator cdef _MemObjectProxy item cdef unsigned long total_size total_size = 0 iterator = self.iter_recursive_refs(excluding=excluding) for item in iterator: total_size += item._obj.size self._obj.total_size = total_size return self.total_size def all(self, type_str, excluding=None): """Retrieve a list of all the referenced items matching type_str. :param type_str: Children must match this type string. :param excluding: See iter_recursive_refs :return: A list of all entries, sorted with the largest entries first. """ cdef list all all = [] for item in self.iter_recursive_refs(excluding=excluding): if item.type_str == type_str: all.append(item) all.sort(key=_all_sort_key, reverse=True) return all def _all_sort_key(proxy_obj): return (proxy_obj.size, len(proxy_obj), proxy_obj.num_parents) cdef class MemObjectCollection: """Track a bunch of _MemObject instances.""" cdef readonly int _table_mask # N slots = table_mask + 1 cdef readonly int _active # How many slots have real data cdef readonly int _filled # How many slots have real or dummy cdef _MemObject** _table # _MemObjects are stored inline def __init__(self): self._table_mask = 1024 - 1 self._table = <_MemObject**>PyMem_Malloc(sizeof(_MemObject*)*1024) memset(self._table, 0, sizeof(_MemObject*)*1024) def __len__(self): return self._active def __sizeof__(self): cdef int i cdef _MemObject *cur cdef long my_size my_size = (sizeof(MemObjectCollection) + (sizeof(_MemObject**) * (self._table_mask + 1)) + (sizeof(_MemObject) * self._active)) for i from 0 <= i <= self._table_mask: cur = self._table[i] if cur != NULL and cur != _dummy: my_size += (sizeof_RefList(cur.child_list) + sizeof_RefList(cur.parent_list)) return my_size cdef _MemObject** _lookup(self, address) except NULL: cdef long the_hash cdef size_t i, n_lookup cdef long mask cdef _MemObject **table, **slot, **free_slot cdef PyObject *py_addr py_addr = address the_hash = PyObject_Hash(py_addr) i = the_hash mask = self._table_mask table = self._table free_slot = NULL for n_lookup from 0 <= n_lookup <= mask: # Don't loop forever slot = &table[i & mask] if slot[0] == NULL: # Found a blank spot if free_slot != NULL: # Did we find an earlier _dummy entry? return free_slot else: return slot elif slot[0] == _dummy: if free_slot == NULL: free_slot = slot elif slot[0].address == py_addr: # Found an exact pointer to the key return slot elif slot[0].address == NULL: raise RuntimeError('Found a non-empty slot with null address') elif PyObject_RichCompareBool(slot[0].address, py_addr, Py_EQ): # Both py_key and cur belong in this slot, return it return slot i = i + 1 + n_lookup raise RuntimeError('we failed to find an open slot after %d lookups' % (n_lookup)) cdef int _clear_slot(self, _MemObject **slot) except -1: _free_mem_object(slot[0]) slot[0] = NULL return 1 def _test_lookup(self, address): cdef _MemObject **slot slot = self._lookup(address) return (slot - self._table) def __contains__(self, address): cdef _MemObject **slot slot = self._lookup(address) if slot[0] == NULL or slot[0] == _dummy: return False return True cdef _MemObjectProxy _proxy_for(self, address, _MemObject *val): cdef _MemObjectProxy proxy if val.proxy == NULL: proxy = _MemObjectProxy(self) proxy._obj = val val.proxy = proxy else: proxy = val.proxy return proxy def __getitem__(self, at): cdef _MemObject **slot cdef _MemObjectProxy proxy if isinstance(at, _MemObjectProxy): address = at.address proxy = at else: address = at proxy = None slot = self._lookup(address) if slot[0] == NULL or slot[0] == _dummy: raise KeyError('address %s not present' % (at,)) if proxy is None: proxy = self._proxy_for(address, slot[0]) else: assert proxy._obj == slot[0] return proxy def get(self, at, default=None): try: return self[at] except KeyError: return default def __delitem__(self, at): cdef _MemObject **slot cdef _MemObjectProxy proxy if isinstance(at, _MemObjectProxy): address = at.address else: address = at slot = self._lookup(address) if slot[0] == NULL or slot[0] == _dummy: raise KeyError('address %s not present' % (at,)) if slot[0].proxy != NULL: # Have the proxy take over the memory lifetime. At the same time, # we break the reference cycle, so that the proxy will get cleaned # up properly proxy = slot[0].proxy proxy._managed_obj = proxy._obj else: # Without a proxy, we just nuke the object self._clear_slot(slot) slot[0] = _dummy self._active -= 1 # TODO: Shrink #def __setitem__(self, address, value): # """moc[address] = value""" # pass cdef int _insert_clean(self, _MemObject *entry) except -1: """Copy _MemObject into the table. We know that this _MemObject is unique, and we know that self._table contains no _dummy entries. So we can do the lookup cheaply, without any equality checks, etc. """ cdef long the_hash cdef size_t i, n_lookup, mask cdef _MemObject **slot assert entry != NULL and entry.address != NULL mask = self._table_mask the_hash = PyObject_Hash(entry.address) i = the_hash for n_lookup from 0 <= n_lookup < mask: slot = &self._table[i & mask] if slot[0] == NULL: slot[0] = entry self._filled += 1 self._active += 1 return 1 i = i + 1 + n_lookup raise RuntimeError('could not find a free slot after %d lookups' % (n_lookup,)) cdef int _resize(self, int min_active) except -1: """Resize the internal table. We will be big enough to hold at least 'min_active' entries. We will create a copy of all data, leaving out dummy entries. :return: The new table size. """ cdef int new_size, remaining cdef size_t n_bytes cdef _MemObject **old_table, **old_slot, **new_table new_size = 1024 while new_size <= min_active and new_size > 0: new_size <<= 1 if new_size <= 0: raise MemoryError('table size too large for %d entries' % (min_active,)) n_bytes = sizeof(_MemObject*)*new_size new_table = <_MemObject**>PyMem_Malloc(n_bytes) if new_table == NULL: raise MemoryError('Failed to allocate %d bytes' % (n_bytes,)) memset(new_table, 0, n_bytes) old_slot = old_table = self._table self._table = new_table self._table_mask = new_size - 1 remaining = self._active self._filled = 0 self._active = 0 while remaining > 0: if old_slot[0] == NULL: pass # empty elif old_slot[0] == _dummy: pass # dummy else: remaining -= 1 self._insert_clean(old_slot[0]) old_slot += 1 # Moving everything over is refcount neutral, so we just free the old # table PyMem_Free(old_table) return new_size def add(self, address, type_str, size, children=(), length=0, value=None, name=None, parent_list=(), total_size=0): """Add a new MemObject to this collection.""" cdef _MemObject **slot, *new_entry cdef _MemObjectProxy proxy slot = self._lookup(address) if slot[0] != NULL and slot[0] != _dummy: # We are overwriting an existing entry, for now, fail # Probably all we have to do is clear the slot first, then continue assert False, "We don't support overwrite yet." # TODO: These are fairy small and more subject to churn, maybe we # should be using PyObj_Malloc instead... new_entry = _new_mem_object(address, type_str, size, children, value, name, parent_list, total_size) if slot[0] == NULL: self._filled += 1 self._active += 1 slot[0] = new_entry if self._filled * 3 > (self._table_mask + 1) * 2: # We need to grow self._resize(self._active * 2) proxy = self._proxy_for(address, new_entry) return proxy def __dealloc__(self): cdef long i for i from 0 <= i < self._table_mask: self._clear_slot(self._table + i) PyMem_Free(self._table) self._table = NULL def __iter__(self): return self.iterkeys() def iterkeys(self): return iter(self.keys()) def keys(self): cdef long i cdef _MemObject *cur cdef _MemObjectProxy proxy # TODO: Pre-allocate the full size list values = [] for i from 0 <= i < self._table_mask: cur = self._table[i] if cur == NULL or cur == _dummy: continue else: address = cur.address values.append(address) return values def iteritems(self): return self.items() def items(self): """Iterate over (key, value) tuples.""" cdef long i, out_idx cdef _MemObject *cur cdef _MemObjectProxy proxy enabled = gc.isenabled() if enabled: # We are going to be creating a lot of objects here, but not with # cycles, so we disable gc temporarily # With an object list of ~3M items, this drops the .items() time # from 25s down to 1.3s gc.disable() try: values = PyList_New(self._active) out_idx = 0 for i from 0 <= i < self._table_mask: cur = self._table[i] if cur == NULL or cur == _dummy: continue else: address = cur.address proxy = self._proxy_for(address, cur) item = (address, proxy) # SET_ITEM steals a reference Py_INCREF(item) PyList_SET_ITEM(values, out_idx, item) out_idx += 1 finally: if enabled: gc.enable() return values def itervalues(self): """Return an iterable of values stored in this map.""" return _MOCValueIterator(self) def values(self): # This returns a list, but that is 'close enough' for what we need cdef long i cdef _MemObject *cur cdef _MemObjectProxy proxy values = [] for i from 0 <= i < self._table_mask: cur = self._table[i] if cur == NULL or cur == _dummy: continue else: proxy = self._proxy_for(cur.address, cur) values.append(proxy) return values cdef class _MOCValueIterator: """A simple iterator over the values in a MOC.""" cdef MemObjectCollection collection cdef int initial_active cdef int table_pos def __init__(self, collection): self.collection = collection self.initial_active = self.collection._active self.table_pos = 0 def __iter__(self): return self def __next__(self): cdef _MemObject *cur if self.collection._active != self.initial_active: raise RuntimeError('MemObjectCollection changed size during' ' iteration') while (self.table_pos <= self.collection._table_mask): cur = self.collection._table[self.table_pos] if cur != NULL and cur != _dummy: break self.table_pos += 1 if self.table_pos > self.collection._table_mask: raise StopIteration() # This entry is 'consumed', go on to the next self.table_pos += 1 if cur == NULL or cur == _dummy: raise RuntimeError('didn\'t run off the end, but got null/dummy' ' %d, %d %d' % (cur, self.table_pos, self.collection._table_mask)) return self.collection._proxy_for(cur.address, cur) cdef class _MOPReferencedIterator: """Iterate over all the children referenced from this object.""" cdef MemObjectCollection collection cdef object seen_addresses cdef list pending_addresses cdef int pending_offset def __init__(self, proxy, excluding=None): cdef _MemObjectProxy c_proxy from meliae import _intset c_proxy = proxy self.collection = c_proxy.collection if excluding is not None: self.seen_addresses = _intset.IDSet(excluding) else: self.seen_addresses = _intset.IDSet() self.pending_addresses = [c_proxy.address] self.pending_offset = 0 def __iter__(self): return self def __next__(self): while self.pending_offset >= 0: next_address = self.pending_addresses[self.pending_offset] self.pending_offset -= 1 # Avoid letting the pending addresses queue remain at its largest # size forever. If it has more that 50% waste, and it is 'big', # shrink it. Leave it a little room to grow, though. if (self.pending_offset > 50 and len(self.pending_addresses) > 2*self.pending_offset): self.pending_addresses = self.pending_addresses[ :self.pending_offset+10] if next_address in self.seen_addresses: continue self.seen_addresses.add(next_address) next_proxy = self.collection.get(next_address) if next_proxy is None: continue # Queue up the children of this object for c in next_proxy.children: if c in self.seen_addresses: continue self.pending_offset += 1 if self.pending_offset >= len(self.pending_addresses): self.pending_addresses.append(c) else: self.pending_addresses[self.pending_offset] = c return next_proxy # if we got this far, then we don't have anything left: raise StopIteration() cdef int RefList_traverse(RefList *self, visitproc visit, void *arg): """Equivalent of tp_traverse for a RefList. RefList isn't a fully fledged python object/type, but since it can hold counted references to python objects, we want the containing objects to be able to tp_traverse to them. """ cdef int ret cdef long i ret = 0 if self == NULL: return ret for i from 0 <= i < self.size: ret = visit(self.refs[i], arg) if ret: return ret return ret cdef int _MemObject_traverse(_MemObject *self, visitproc visit, void *arg): """Equivalent idea of tp_traverse for _MemObject. _MemObject isn't a fully fledged python object, but it can refer to python objects. So we create a traverse function for it, so that gc and Meliae itself can find the referenced objects. """ cdef int ret ret = 0 if self == NULL: return ret if ret == 0 and self.address != NULL: ret = visit(self.address, arg) if ret == 0 and self.type_str != NULL: ret = visit(self.type_str, arg) if ret == 0 and self.value != NULL: ret = visit(self.value, arg) if ret == 0: # RefList_traverse handles the NULL case ret = RefList_traverse(self.child_list, visit, arg) if ret == 0: ret = RefList_traverse(self.parent_list, visit, arg) # Note: we *don't* incref the proxy because we know it links back to us. So # we don't tp_traverse to it, because we don't want gc thinking it # has enough references to destroy the object. return ret cdef int _MemObjectProxy_traverse(_MemObjectProxy self, visitproc visit, void *arg) except -1: """Implement a correct tp_traverse because we use hidden members. Cython/Pyrex implement a tp_traverse, but it only handles 'object' members. We use some private pointers to manage things, so we need a custom tp_traverse to let everyone know about it. """ # The specific detail is that _MemObjectProxy can sometimes control the # reference to a _MemObject (if it was removed from a collection while the # proxy was still alive). And that _MemObject structure can hold onto # references to other real python objects. cdef int ret ret = 0 ret = visit(self.collection, arg) if ret == 0 and self._managed_obj != NULL: ret = _MemObject_traverse(self._managed_obj, visit, arg) return ret (_MemObjectProxy).tp_traverse = _MemObjectProxy_traverse cdef int MemObjectCollection_traverse(MemObjectCollection self, visitproc visit, void *arg) except -1: """Implement a correct tp_traverse because we use hidden members. Cython/Pyrex implement a tp_traverse, but it only handles 'object' members. We use some private pointers to manage things, so we need a custom tp_traverse to let everyone know about it. """ cdef int ret cdef int i cdef _MemObject *cur ret = 0 for i from 0 <= i <= self._table_mask: cur = self._table[i] if cur != NULL and cur != _dummy: ret = _MemObject_traverse(cur, visit, arg) if ret: break return ret (MemObjectCollection).tp_traverse = MemObjectCollection_traverse meliae-0.4.0/meliae/_scanner.pyx0000644000000000000000000002163311605576210016503 0ustar rootroot00000000000000# Copyright (C) 2009, 2010 Canonical Ltd # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License version 3 as # published by the Free Software Foundation. # # This program is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . """The core routines for scanning python references and dumping memory info.""" cdef extern from "stdio.h": ctypedef long size_t ctypedef struct FILE: pass FILE *stderr size_t fwrite(void *, size_t, size_t, FILE *) size_t fprintf(FILE *, char *, ...) void fflush(FILE *) cdef extern from "Python.h": FILE *PyFile_AsFile(object) int Py_UNICODE_SIZE ctypedef struct PyGC_Head: pass object PyString_FromStringAndSize(char *, Py_ssize_t) cdef extern from "_scanner_core.h": Py_ssize_t _size_of(object c_obj) ctypedef char* const_pchar "const char*" ctypedef void (*write_callback)(void *callee_data, const_pchar bytes, size_t len) void _clear_last_dumped() void _dump_object_info(write_callback write, void *callee_data, object c_obj, object nodump, int recurse) object _get_referents(object c_obj) object _get_special_case_dict() _word_size = sizeof(Py_ssize_t) _gc_head_size = sizeof(PyGC_Head) _unicode_size = Py_UNICODE_SIZE def size_of(obj): """Compute the size of the object. This is the actual malloc() size for this object, so for dicts it is the size of the dict object, plus the size of the array of references, but *not* the size of each individual referenced object. :param obj: The object to measure :return: An integer of the number of bytes used by this object. """ return _size_of(obj) cdef void _file_io_callback(void *callee_data, char *bytes, size_t len): cdef FILE *file_cb file_cb = callee_data fwrite(bytes, 1, len, file_cb) cdef void _callable_callback(void *callee_data, char *bytes, size_t len): callable = callee_data s = PyString_FromStringAndSize(bytes, len) callable(s) def dump_object_info(object out, object obj, object nodump=None, int recurse_depth=1): """Dump the object information to the given output. :param out: Either a File object, or a callable. If a File object, we will write bytes to the underlying FILE* Otherwise, we will call(str) with bytes as we build up the state of the object. Note that a single call will not be a complete description, but potentially a single character of the final formatted string. :param obj: The object to inspect :param nodump: If supplied, this is a set() of objects that we want to exclude from the dump file. :param recurse_depth: 0 to only dump the supplied object 1 to dump the object and immediate neighbors that would not otherwise be referenced (such as strings). 2 dump everything we find and continue recursing """ cdef FILE *fp_out fp_out = PyFile_AsFile(out) if fp_out != NULL: _dump_object_info(_file_io_callback, fp_out, obj, nodump, recurse_depth) fflush(fp_out) else: _dump_object_info(_callable_callback, out, obj, nodump, recurse_depth) _clear_last_dumped() def get_referents(object obj): """Similar to gc.get_referents() The main different is that gc.get_referents() only includes items that are in the garbage collector. However, we want anything referred to by tp_traverse. """ return _get_referents(obj) def add_special_size(object tp_name, object size_of_32, object size_of_64): """Special case a given object size. This is only meant to be used for objects we don't already handle or which don't implement __sizeof__ (those are checked before this check happens). This is meant for things like zlib.Compress which allocates a lot of internal buffers, which are not easily accessible (but can be approximated). The gc header should not be included in this size, it will be added at runtime. Setting the value to None will remove the value. (We only distinguish size_of_32 from size_of_64 for the implementer's benefit, since sizeof() is not generally accessible from Python.) :param tp_name: The type string we care about (such as 'zlib.Compress'). This will be matched against object->type->tp_name. :param size_of_32: Called when _word_size == 32-bits :param size_of_64: Called when _word_size == 64-bits :return: None """ special_dict = _get_special_case_dict() if _word_size == 4: sz = size_of_32 elif _word_size == 8: sz = size_of_64 else: raise RuntimeError('Unknown word size: %s' % (_word_size,)) if sz is None: if tp_name in special_dict: del special_dict[tp_name] else: special_dict[tp_name] = sz def _zlib_size_of_32(zlib_obj): """Return a __sizeof__ for a zlib object.""" cdef Py_ssize_t size t = type(zlib_obj) name = t.__name__ # Size of the zlib 'compobject', (PyObject_HEAD + z_stream, + misc) size = 56 if name.endswith('Decompress'): # _get_referents doesn't track into these attributes, so we just # attribute the size to the object itself. size += _size_of(zlib_obj.unused_data) size += _size_of(zlib_obj.unconsumed_tail) # sizeof(inflate_state) size += 7116 # sizeof(buffers allocated for inflate) # (1 << state->wbits) # However, we don't have access to wbits, so we assume the default (and # largest) of 15 wbits size += (1 << 15) # Empirically 42kB / object during decompression, and this gives 39998 elif name.endswith('Compress'): # compress objects have a reference to unused_data, etc, but it always # points to the empty string. # sizeof(deflate_state) size += 5828 # We don't have access to the stream C attributes, so we assume the # standard values and go with it # Pos == unsigned short # Byte == unsigned char # w_size = 1 << s->w_bits, default 15 => (1<<15) # memLevel default is 8 (maybe 9?) # s->w_size * 2*sizeof(Byte) = (1<<15) * 2 * 1 = 65536 size += 65536 # s_>w_size * sizeof(Pos) = (1<<15) * 2 = 65536 size += 65536 # s->hash_size * sizeof(Pos) = (1 << (8+7)) * 2 = 65536 size += 65536 # s->lit_bufsize = 1 << (8 + 6) = (1 << 14) = 16384 # s->pending_buf = lit_bufsize * (sizeof(ush)+2) = 4*16384 = 65536 size += 65536 # empirically, I got ~96378 bytes/object after allocating a lot of them # After sending a bunch of compression data to all of them, I got # ~270127 bytes/object. (according to WorkingMem) # This gives 268028, which is pretty close else: return -1 # We assume that everything is at least aligned to word boundary if size % _word_size != 0: size += _word_size - (size % _word_size) return size def _zlib_size_of_64(zlib_obj): """Return a __sizeof__ for a zlib object.""" t = type(zlib_obj) name = t.__name__ # Size of the zlib 'compobject', (PyObject_HEAD + z_stream, + misc) # All the 64-bit numbers here are 'made up' size = (56 * 2) if name.endswith('Decompress'): size += _size_of(zlib_obj.unused_data) size += _size_of(zlib_obj.unconsumed_tail) # sizeof(inflate_state) size += (7116 * 2) # sizeof(buffers allocated for inflate) # (1 << state->wbits) # However, we don't have access to wbits, so we assume the default (and # largest) of 15 wbits size += (1 << 15) elif name.endswith('Compress'): # sizeof(deflate_state) size += (5828 * 2) # We don't have access to the stream C attributes, so we assume the # standard values and go with it # s->w_size * 2*sizeof(Byte) = (1<<15) * 2 * 1 = 65536 size += 65536 # s_>w_size * sizeof(Pos) = (1<<15) * 2 = 65536 size += 65536 # s->hash_size * sizeof(Pos) = (1 << (8+7)) * 2 = 65536 size += 65536 # s->lit_bufsize = 1 << (8 + 6) = (1 << 14) = 16384 # s->pending_buf = lit_bufsize * (sizeof(ush)+2) = 4*16384 = 65536 size += 65536 else: return -1 if size % _word_size != 0: size += _word_size - (size % _word_size) return size add_special_size('zlib.Compress', _zlib_size_of_32, _zlib_size_of_64) add_special_size('zlib.Decompress', _zlib_size_of_32, _zlib_size_of_64) meliae-0.4.0/meliae/_scanner_core.c0000644000000000000000000004364711605576210017126 0ustar rootroot00000000000000/* Copyright (C) 2009, 2010 Canonical Ltd * * This program is free software: you can redistribute it and/or modify * it under the terms of the GNU General Public License version 3 as * published by the Free Software Foundation. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program. If not, see . */ /* The core of parsing is split into a pure C module, so that we can guarantee * that we won't be creating objects in the internal loops. */ #include "_scanner_core.h" #ifndef Py_TYPE # define Py_TYPE(o) ((o)->ob_type) #endif // %zd is the gcc convention for defining that we are formatting a size_t // object, windows seems to prefer %ld, though perhaps we need to first check // sizeof(size_t) ? #ifdef _WIN32 # if defined(_M_X64) || defined(__amd64__) # define SSIZET_FMT "%ld" # else # define SSIZET_FMT "%d" # endif # define snprintf _snprintf #else # define SSIZET_FMT "%zd" #endif #if defined(__GNUC__) # define inline __inline__ #elif defined(_MSC_VER) # define inline __inline #else # define inline #endif struct ref_info { write_callback write; void *data; int first; PyObject *nodump; }; void _dump_object_to_ref_info(struct ref_info *info, PyObject *c_obj, int recurse); #ifdef __GNUC__ static void _write_to_ref_info(struct ref_info *info, const char *fmt_string, ...) __attribute__((format(printf, 2, 3))); #else static void _write_to_ref_info(struct ref_info *info, const char *fmt_string, ...); #endif static PyObject * _get_specials(); /* The address of the last thing we dumped. Stuff like dumping the string * interned dictionary will dump the same string 2x in a row. This helps * prevent that. */ static PyObject *_last_dumped = NULL; static PyObject *_special_case_dict = NULL; void _clear_last_dumped() { _last_dumped = NULL; } static Py_ssize_t _basic_object_size(PyObject *c_obj) { Py_ssize_t size; size = c_obj->ob_type->tp_basicsize; if (PyType_HasFeature(c_obj->ob_type, Py_TPFLAGS_HAVE_GC)) { size += sizeof(PyGC_Head); } return size; } static Py_ssize_t _var_object_size(PyVarObject *c_obj) { Py_ssize_t num_entries; num_entries = PyObject_Size((PyObject *)c_obj); if (num_entries < 0) { /* This object doesn't support len() */ num_entries = 0; PyErr_Clear(); } return _basic_object_size((PyObject *)c_obj) + num_entries * c_obj->ob_type->tp_itemsize; } static Py_ssize_t _object_to_size_with_gc(PyObject *size_obj, PyObject *c_obj) { Py_ssize_t size = -1; size = PyInt_AsSsize_t(size_obj); if (size == -1) { // Probably an error occurred, we don't know for sure, but we might as // well just claim that we don't know the size. We *could* check // PyErr_Occurred(), but if we are just clearing it anyway... PyErr_Clear(); return -1; } // There is one trick left. Namely, __sizeof__ doesn't include the // GC overhead, so let's add that back in if (PyType_HasFeature(Py_TYPE(c_obj), Py_TPFLAGS_HAVE_GC)) { size += sizeof(PyGC_Head); } return size; } static Py_ssize_t _size_of_from__sizeof__(PyObject *c_obj) { PyObject *size_obj = NULL; Py_ssize_t size = -1; if (PyType_CheckExact(c_obj)) { // Types themselves may have a __sizeof__ attribute, but it is the // unbound method, which takes an instance return -1; } size_obj = PyObject_CallMethod(c_obj, "__sizeof__", NULL); if (size_obj == NULL) { // Not sure what happened, but this won't work, it could be a simple // attribute error, or it could be something else. PyErr_Clear(); return -1; } size = _object_to_size_with_gc(size_obj, c_obj); Py_DECREF(size_obj); return size; } static Py_ssize_t _size_of_list(PyListObject *c_obj) { Py_ssize_t size; size = _basic_object_size((PyObject *)c_obj); size += sizeof(PyObject*) * c_obj->allocated; return size; } static Py_ssize_t _size_of_set(PySetObject *c_obj) { Py_ssize_t size; size = _basic_object_size((PyObject *)c_obj); if (c_obj->table != c_obj->smalltable) { size += sizeof(setentry) * (c_obj->mask + 1); } return size; } static Py_ssize_t _size_of_dict(PyDictObject *c_obj) { Py_ssize_t size; size = _basic_object_size((PyObject *)c_obj); if (c_obj->ma_table != c_obj->ma_smalltable) { size += sizeof(PyDictEntry) * (c_obj->ma_mask + 1); } return size; } static Py_ssize_t _size_of_unicode(PyUnicodeObject *c_obj) { Py_ssize_t size; size = _basic_object_size((PyObject *)c_obj); size += Py_UNICODE_SIZE * c_obj->length; return size; } static Py_ssize_t _size_of_from_specials(PyObject *c_obj) { PyObject *special_dict; PyObject *special_size_of; PyObject *val; Py_ssize_t size; special_dict = _get_specials(); if (special_dict == NULL) { PyErr_Clear(); // Not sure what happened, but don't propogate it return -1; } special_size_of = PyDict_GetItemString(special_dict, Py_TYPE(c_obj)->tp_name); if (special_size_of == NULL) { // if special_size_of is NULL, an exception is *not* set return -1; } // special_size_of is a *borrowed referenced* val = PyObject_CallFunction(special_size_of, "O", c_obj); if (val == NULL) { return -1; } size = _object_to_size_with_gc(val, c_obj); Py_DECREF(val); return size; } static Py_ssize_t _size_of_from_var_or_basic_size(PyObject *c_obj) { /* There are a bunch of types that we know we can check directly, without * having to go through the __sizeof__ abstraction. This allows us to avoid * the extra intermediate allocations. It is also our final fallback * method. */ if (c_obj->ob_type->tp_itemsize != 0) { // Variable length object with inline storage // total size is tp_itemsize * ob_size return _var_object_size((PyVarObject *)c_obj); } return _basic_object_size(c_obj); } Py_ssize_t _size_of(PyObject *c_obj) { Py_ssize_t size; if PyList_Check(c_obj) { return _size_of_list((PyListObject *)c_obj); } else if PyAnySet_Check(c_obj) { return _size_of_set((PySetObject *)c_obj); } else if PyDict_Check(c_obj) { return _size_of_dict((PyDictObject *)c_obj); } else if PyUnicode_Check(c_obj) { return _size_of_unicode((PyUnicodeObject *)c_obj); } else if (PyTuple_CheckExact(c_obj) || PyString_CheckExact(c_obj) || PyInt_CheckExact(c_obj) || PyBool_Check(c_obj) || c_obj == Py_None || PyModule_CheckExact(c_obj)) { // All of these implement __sizeof__, but we don't need to use it return _size_of_from_var_or_basic_size(c_obj); } // object implements __sizeof__ so we have to specials first size = _size_of_from_specials(c_obj); if (size != -1) { return size; } size = _size_of_from__sizeof__(c_obj); if (size != -1) { return size; } return _size_of_from_var_or_basic_size(c_obj); } static void _write_to_ref_info(struct ref_info *info, const char *fmt_string, ...) { char temp_buf[1024] = {0}; va_list args; size_t n_bytes; va_start(args, fmt_string); n_bytes = vsnprintf(temp_buf, 1024, fmt_string, args); va_end(args); info->write(info->data, temp_buf, n_bytes); } static inline void _write_static_to_info(struct ref_info *info, const char data[]) { /* These are static strings, do we need to do strlen() each time? */ info->write(info->data, data, strlen(data)); } int _dump_reference(PyObject *c_obj, void* val) { struct ref_info *info; size_t n_bytes; char buf[24] = {0}; /* it seems that 64-bit long fits in 20 decimals */ info = (struct ref_info*)val; /* TODO: This is casting a pointer into an unsigned long, which we assume * is 'long enough'. We probably should really be using uintptr_t or * something like that. */ if (info->first) { info->first = 0; n_bytes = snprintf(buf, 24, "%lu", (unsigned long)c_obj); } else { n_bytes = snprintf(buf, 24, ", %lu", (unsigned long)c_obj); } info->write(info->data, buf, n_bytes); return 0; } int _dump_child(PyObject *c_obj, void *val) { struct ref_info *info; info = (struct ref_info *)val; // The caller has asked us to dump self, but no recursive children _dump_object_to_ref_info(info, c_obj, 0); return 0; } int _dump_if_no_traverse(PyObject *c_obj, void *val) { struct ref_info *info; info = (struct ref_info *)val; /* Objects without traverse are simple things without refs, and built-in * types have a traverse, but they won't be part of gc.get_objects(). */ if (Py_TYPE(c_obj)->tp_traverse == NULL || (PyType_Check(c_obj) && !PyType_HasFeature((PyTypeObject*)c_obj, Py_TPFLAGS_HEAPTYPE))) { _dump_object_to_ref_info(info, c_obj, 0); } else if (!PyType_HasFeature(Py_TYPE(c_obj), Py_TPFLAGS_HAVE_GC)) { /* This object is not considered part of the garbage collector, even * if it does [not] have a tp_traverse function. */ _dump_object_to_ref_info(info, c_obj, 1); } return 0; } static inline void _dump_json_c_string(struct ref_info *info, const char *buf, Py_ssize_t len) { Py_ssize_t i; char c, *ptr, *end; char out_buf[1024] = {0}; // Never try to dump more than 100 chars if (len == -1) { len = strlen(buf); } if (len > 100) { len = 100; } ptr = out_buf; end = out_buf + 1024; *ptr++ = '"'; for (i = 0; i < len; ++i) { c = buf[i]; if (c <= 0x1f || c > 0x7e) { // use the unicode escape sequence ptr += snprintf(ptr, end-ptr, "\\u00%02x", ((unsigned short)c & 0xFF)); } else if (c == '\\' || c == '/' || c == '"') { *ptr++ = '\\'; *ptr++ = c; } else { *ptr++ = c; } } *ptr++ = '"'; if (ptr >= end) { /* Abort somehow */ } info->write(info->data, out_buf, ptr-out_buf); } void _dump_string(struct ref_info *info, PyObject *c_obj) { Py_ssize_t str_size; char *str_buf; str_buf = PyString_AS_STRING(c_obj); str_size = PyString_GET_SIZE(c_obj); _dump_json_c_string(info, str_buf, str_size); } void _dump_unicode(struct ref_info *info, PyObject *c_obj) { // TODO: consider writing to a small memory buffer, before writing to disk Py_ssize_t uni_size; Py_UNICODE *uni_buf, c; Py_ssize_t i; char out_buf[1024] = {0}, *ptr, *end; uni_buf = PyUnicode_AS_UNICODE(c_obj); uni_size = PyUnicode_GET_SIZE(c_obj); // Never try to dump more than this many chars if (uni_size > 100) { uni_size = 100; } ptr = out_buf; end = out_buf + 1024; *ptr++ = '"'; for (i = 0; i < uni_size; ++i) { c = uni_buf[i]; if (c <= 0x1f || c > 0x7e) { ptr += snprintf(ptr, end-ptr, "\\u%04x", ((unsigned short)c & 0xFFFF)); } else if (c == '\\' || c == '/' || c == '"') { *ptr++ = '\\'; *ptr++ = (char)c; } else { *ptr++ = (char)c; } } *ptr++ = '"'; if (ptr >= end) { /* We should fail here */ } info->write(info->data, out_buf, ptr-out_buf); } void _dump_object_info(write_callback write, void *callee_data, PyObject *c_obj, PyObject *nodump, int recurse) { struct ref_info info; info.write = write; info.data = callee_data; info.first = 1; info.nodump = nodump; if (nodump != NULL) { Py_INCREF(nodump); } _dump_object_to_ref_info(&info, c_obj, recurse); if (info.nodump != NULL) { Py_DECREF(nodump); } } void _dump_object_to_ref_info(struct ref_info *info, PyObject *c_obj, int recurse) { Py_ssize_t size; int retval; int do_traverse; if (info->nodump != NULL && info->nodump != Py_None && PyAnySet_Check(info->nodump)) { if (c_obj == info->nodump) { /* Don't dump the 'nodump' set. */ return; } /* note this isn't exactly what we want. It checks for equality, not * the exact object. However, for what it is used for, it is often * 'close enough'. */ retval = PySet_Contains(info->nodump, c_obj); if (retval == 1) { /* This object is part of the no-dump set, don't dump the object */ return; } else if (retval == -1) { /* An error was raised, but we don't care, ignore it */ PyErr_Clear(); } } if (c_obj == _last_dumped) { /* We just dumped this object, no need to do it again. */ return; } _last_dumped = c_obj; size = _size_of(c_obj); _write_to_ref_info(info, "{\"address\": %lu, \"type\": ", (unsigned long)c_obj); _dump_json_c_string(info, c_obj->ob_type->tp_name, -1); _write_to_ref_info(info, ", \"size\": " SSIZET_FMT, _size_of(c_obj)); // HANDLE __name__ if (PyModule_Check(c_obj)) { _write_static_to_info(info, ", \"name\": "); _dump_json_c_string(info, PyModule_GetName(c_obj), -1); } else if (PyFunction_Check(c_obj)) { _write_static_to_info(info, ", \"name\": "); _dump_string(info, ((PyFunctionObject *)c_obj)->func_name); } else if (PyType_Check(c_obj)) { _write_static_to_info(info, ", \"name\": "); _dump_json_c_string(info, ((PyTypeObject *)c_obj)->tp_name, -1); } else if (PyClass_Check(c_obj)) { /* Old style class */ _write_static_to_info(info, ", \"name\": "); _dump_string(info, ((PyClassObject *)c_obj)->cl_name); } if (PyString_Check(c_obj)) { _write_to_ref_info(info, ", \"len\": " SSIZET_FMT, PyString_GET_SIZE(c_obj)); _write_static_to_info(info, ", \"value\": "); _dump_string(info, c_obj); } else if (PyUnicode_Check(c_obj)) { _write_to_ref_info(info, ", \"len\": " SSIZET_FMT, PyUnicode_GET_SIZE(c_obj)); _write_static_to_info(info, ", \"value\": "); _dump_unicode(info, c_obj); } else if (PyBool_Check(c_obj)) { if (c_obj == Py_True) { _write_static_to_info(info, ", \"value\": \"True\""); } else if (c_obj == Py_False) { _write_static_to_info(info, ", \"value\": \"False\""); } else { _write_to_ref_info(info, ", \"value\": %ld", PyInt_AS_LONG(c_obj)); } } else if (PyInt_CheckExact(c_obj)) { _write_to_ref_info(info, ", \"value\": %ld", PyInt_AS_LONG(c_obj)); } else if (PyTuple_Check(c_obj)) { _write_to_ref_info(info, ", \"len\": " SSIZET_FMT, PyTuple_GET_SIZE(c_obj)); } else if (PyList_Check(c_obj)) { _write_to_ref_info(info, ", \"len\": " SSIZET_FMT, PyList_GET_SIZE(c_obj)); } else if (PyAnySet_Check(c_obj)) { _write_to_ref_info(info, ", \"len\": " SSIZET_FMT, PySet_GET_SIZE(c_obj)); } else if (PyDict_Check(c_obj)) { _write_to_ref_info(info, ", \"len\": " SSIZET_FMT, PyDict_Size(c_obj)); } else if (PyFrame_Check(c_obj)) { PyCodeObject *co = ((PyFrameObject*)c_obj)->f_code; if (co) { _write_static_to_info(info, ", \"value\": "); _dump_string(info, co->co_name); } } _write_static_to_info(info, ", \"refs\": ["); do_traverse = 1; if (Py_TYPE(c_obj)->tp_traverse == NULL || (Py_TYPE(c_obj)->tp_traverse == PyType_Type.tp_traverse && !PyType_HasFeature((PyTypeObject*)c_obj, Py_TPFLAGS_HEAPTYPE))) { /* Obviously we don't traverse if there is no traverse function. But * also, if this is a 'Type' (class definition), then * PyTypeObject.tp_traverse has an assertion about whether this type is * a HEAPTYPE. In debug builds, this can trip and cause failures, even * though it doesn't seem to hurt anything. * See: https://bugs.launchpad.net/bugs/586122 */ do_traverse = 0; } if (do_traverse) { info->first = 1; Py_TYPE(c_obj)->tp_traverse(c_obj, _dump_reference, info); } _write_static_to_info(info, "]}\n"); if (do_traverse && recurse != 0) { if (recurse == 2) { /* Always dump one layer deeper */ Py_TYPE(c_obj)->tp_traverse(c_obj, _dump_child, info); } else if (recurse == 1) { /* strings and such aren't in gc.get_objects, so we need to dump * them when they are referenced. */ Py_TYPE(c_obj)->tp_traverse(c_obj, _dump_if_no_traverse, info); } } } static int _append_object(PyObject *visiting, void* data) { PyObject *lst; lst = (PyObject *)data; if (lst == NULL) { return -1; } if (PyList_Append(data, visiting) == -1) { return -1; } return 0; } /** * Return a PyList of all objects referenced via tp_traverse. */ PyObject * _get_referents(PyObject *c_obj) { PyObject *lst; lst = PyList_New(0); if (lst == NULL) { return NULL; } if (Py_TYPE(c_obj)->tp_traverse != NULL && (Py_TYPE(c_obj)->tp_traverse != PyType_Type.tp_traverse || PyType_HasFeature((PyTypeObject *)c_obj, Py_TPFLAGS_HEAPTYPE))) { Py_TYPE(c_obj)->tp_traverse(c_obj, _append_object, lst); } return lst; } static PyObject * _get_specials() { if (_special_case_dict == NULL) { _special_case_dict = PyDict_New(); } return _special_case_dict; } PyObject * _get_special_case_dict() { PyObject *ret; ret = _get_specials(); Py_XINCREF(ret); return ret; } meliae-0.4.0/meliae/_scanner_core.h0000644000000000000000000000434211605576210017120 0ustar rootroot00000000000000/* Copyright (C) 2009, 2010 Canonical Ltd * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA */ /* The core of parsing is split into a pure C module, so that we can guarantee * that we won't be creating objects in the internal loops. */ #ifndef _SCANNER_CORE_H_ #define _SCANNER_CORE_H_ #include #include #include #include /** * Compute the size of the data directly addressed by this object. * * For example, for a list, this is the number of bytes allocated plus the * number of bytes for the basic list object. Note that lists over-allocate, so * this is not strictly sizeof(pointer) * num_items. */ extern Py_ssize_t _size_of(PyObject *c_obj); /** * This callback will be used to dump more info to the user. */ typedef void (*write_callback)(void *data, const char *bytes, size_t len); /** * Write the information about this object to the file. */ extern void _dump_object_info(write_callback write, void *callee_data, PyObject *c_obj, PyObject *nodump, int recurse); /** * Clear out what the last object we dumped was. */ extern void _clear_last_dumped(); /** * Return a PyList of all objects referenced via tp_traverse. */ extern PyObject *_get_referents(PyObject *c_obj); /** * Return a (mutable) dict of known special cases. * * These are objects whose size is not reported properly, but which we have * figured out via trial-and-error. * The key is tp_name strings, the value is a PyInt of the appropriate size. */ extern PyObject *_get_special_case_dict(); #endif // _SCANNER_CORE_H_ meliae-0.4.0/meliae/files.py0000644000000000000000000001004611605576210015621 0ustar rootroot00000000000000# Copyright (C) 2009, 2010 Canonical Ltd # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License version 3 as # published by the Free Software Foundation. # # This program is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . """Work with files on disk.""" import errno import gzip try: import multiprocessing except ImportError: multiprocessing = None import subprocess import sys def open_file(filename): """Open a file which might be a regular file or a gzip. :return: An iterator of lines, and a cleanup function. """ source = open(filename, 'rb') gzip_source = gzip.GzipFile(mode='rb', fileobj=source) try: line = gzip_source.readline() except KeyboardInterrupt: raise except: # probably not a gzip file source.seek(0) return source, None else: # We don't need these anymore, so close them out in case the rest of # the code raises an exception. gzip_source.close() source.close() # a gzip file # preference - a gzip subprocess if sys.platform == 'win32': close_fds = False # not supported else: close_fds = True try: process = subprocess.Popen(['gzip', '-d', '-c', filename], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, close_fds=close_fds) except OSError, e: if e.errno == errno.ENOENT: # failing that, use another python process return _open_mprocess(filename) # make reading from stdin, or writting errors cause immediate aborts process.stdin.close() process.stderr.close() terminate = getattr(process, 'terminate', None) # terminate is a py2.6 thing if terminate is not None: def terminate_or_pass(): # It seems that on windows, sometimes terminate() can raise # WindowsError: [Error 5] Access is denied # My guess is that the process has actually completed, and is # no longer running. try: return terminate() except OSError, e: sys.stderr.write('Ignoring failure to terminate process:' ' %s\n' % (e,)) # We *could* check if process.poll() returns that the # process has already exited, etc. return process.stdout, terminate_or_pass else: # We would like to use process.wait() but that can cause a deadlock # if the child is still writing. # The other alternative is process.communicate, but we closed # stderr, and communicate wants to read from it. (We get: # ValueError: I/O operation on closed file # if we try it here. Also, for large files, this may be many GB # worth of data. # So for now, live with the deadlock... return process.stdout, process.wait def _stream_file(filename, child): gzip_source = gzip.GzipFile(filename, 'rb') for line in gzip_source: child.send(line) child.send(None) def _open_mprocess(filename): if multiprocessing is None: # can't multiprocess, use inprocess gzip. return gzip.GzipFile(filename, mode='rb'), None parent, child = multiprocessing.Pipe(False) process = multiprocessing.Process(target=_stream_file, args=(filename, child)) process.start() def iter_pipe(): while True: line = parent.recv() if line is None: break yield line return iter_pipe(), process.join meliae-0.4.0/meliae/loader.py0000644000000000000000000006772711605576210016007 0ustar rootroot00000000000000# Copyright (C) 2009, 2010 Canonical Ltd # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License version 3 as # published by the Free Software Foundation. # # This program is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . """Take a given dump file, and bring the data back. Currently requires simplejson to parse. """ import gc import math import os import re import sys import time try: import simplejson except ImportError: simplejson = None from meliae import ( files, _intset, _loader, warn, ) timer = time.time if sys.platform == 'win32': timer = time.clock # This is the minimal regex that is guaranteed to match. In testing, it is # faster than simplejson without extensions, though slower than simplejson w/ # extensions. _object_re = re.compile( r'\{"address": (?P
\d+)' r', "type": "(?P[^"]*)"' r', "size": (?P\d+)' r'(, "name": "(?P.*)")?' r'(, "len": (?P\d+))?' r'(, "value": "?(?P.*)"?)?' r', "refs": \[(?P[^]]*)\]' r'\}') _refs_re = re.compile( r'(?P\d+)' ) def _from_json(cls, line, temp_cache=None): val = simplejson.loads(line) # simplejson likes to turn everything into unicode strings, but we know # everything is just a plain 'str', and we can save some bytes if we # cast it back obj = cls(address=val['address'], type_str=str(val['type']), size=val['size'], children=val['refs'], length=val.get('len', None), value=val.get('value', None), name=val.get('name', None)) if (obj.type_str == 'str'): if type(obj.value) is unicode: obj.value = obj.value.encode('latin-1') if temp_cache is not None: obj._intern_from_cache(temp_cache) return obj def _from_line(cls, line, temp_cache=None): m = _object_re.match(line) if not m: raise RuntimeError('Failed to parse line: %r' % (line,)) (address, type_str, size, name, length, value, refs) = m.group('address', 'type', 'size', 'name', 'len', 'value', 'refs') assert '\\' not in type_str if name is not None: assert '\\' not in name if length is not None: length = int(length) refs = [int(val) for val in _refs_re.findall(refs)] if value is not None: try: value = int(value) except ValueError: pass obj = cls(address=int(address), type_str=type_str, size=int(size), children=refs, length=length, value=value, name=name) if (obj.type_str == 'str'): if type(obj.value) is unicode: obj.value = obj.value.encode('latin-1') if temp_cache is not None: obj._intern_from_cache(temp_cache) return obj class _TypeSummary(object): """Information about a given type.""" def __init__(self, type_str): self.type_str = type_str self.count = 0 self.total_size = 0 self.sq_sum = 0 # used for stddev computation self.max_size = 0 self.max_address = None def __repr__(self): if self.count == 0: avg = 0 stddev = 0 else: avg = self.total_size / float(self.count) exp_x2 = self.sq_sum / float(self.count) stddev = math.sqrt(exp_x2 - avg*avg) return '%s: %d, %d bytes, %.3f avg bytes, %.3f std dev, %d max @ %d' % ( self.type_str, self.count, self.total_size, avg, stddev, self.max_size, self.max_address) def _add(self, memobj): self.count += 1 self.total_size += memobj.size self.sq_sum += (memobj.size * memobj.size) if memobj.size > self.max_size: self.max_size = memobj.size self.max_address = memobj.address class _ObjSummary(object): """Tracks the summary stats about objects listed.""" def __init__(self): self.type_summaries = {} self.total_count = 0 self.total_size = 0 self.summaries = None def _add(self, memobj): try: type_summary = self.type_summaries[memobj.type_str] except KeyError: type_summary = _TypeSummary(memobj.type_str) self.type_summaries[memobj.type_str] = type_summary type_summary._add(memobj) self.total_count += 1 self.total_size += memobj.size def __repr__(self): if self.summaries is None: self.by_size() out = [ 'Total %d objects, %d types, Total size = %.1fMiB (%d bytes)' % (self.total_count, len(self.summaries), self.total_size / 1024. / 1024, self.total_size), ' Index Count % Size % Cum Max Kind' ] cumulative = 0 for i in xrange(min(20, len(self.summaries))): summary = self.summaries[i] cumulative += summary.total_size out.append( '%6d%8d%4d%10d%4d%4d%8d %s' % (i, summary.count, summary.count * 100.0 / self.total_count, summary.total_size, summary.total_size * 100.0 / self.total_size, cumulative * 100.0 / self.total_size, summary.max_size, summary.type_str)) return '\n'.join(out) def by_size(self): summaries = sorted(self.type_summaries.itervalues(), key=lambda x: (x.total_size, x.count), reverse=True) self.summaries = summaries def by_count(self): summaries = sorted(self.type_summaries.itervalues(), key=lambda x: (x.count, x.total_size), reverse=True) self.summaries = summaries class ObjManager(object): """Manage the collection of MemObjects. This is the interface for doing queries, etc. """ def __init__(self, objs, show_progress=True, max_parents=None): """Create a new ObjManager :param show_progress: If True, as content is loading, write progress information to stderr. :param max_parents: When running compute_parents(), cap the maximum parents tracked to a fixed number, since knowing there are 50k references is only informative, you won't actually track into them. If 0 we will not compute parents, if < 0 we will show all parents. """ self.objs = objs self.show_progress = show_progress self.max_parents = max_parents if self.max_parents is None: self.max_parents = 100 def __getitem__(self, address): return self.objs[address] def compute_referrers(self): """Deprecated, use compute_parents instead.""" warn.deprecated('.compute_referrers is deprecated.' ' Use .compute_parents instead.') return self.compute_parents() def compute_parents(self): """For each object, figure out who is referencing it.""" if self.max_parents == 0: return parents = {} get_refs = parents.get total = len(self.objs) tlast = timer()-20 enabled = gc.isenabled() if enabled: # We create a *lot* of temporary objects here, which are all # cleaned up perfectly by refcounting, so disable gc for this loop. gc.disable() try: for idx, obj in enumerate(self.objs.itervalues()): if self.show_progress and idx & 0x3f == 0: tnow = timer() if tnow - tlast > 0.1: tlast = tnow sys.stderr.write('compute parents %8d / %8d \r' % (idx, total)) address = obj.address for ref in obj.children: refs = get_refs(ref, None) # This is ugly, so it should be explained. # To save memory pressure, parents will point to one of 4 # types. # 1) A simple integer, representing a single referrer # this saves the allocation of a separate structure # entirely # 2) A tuple, slightly more efficient than a list, but # requires creating a new tuple to 'add' an entry. # 3) A list, as before, for things with lots of # parents, we use a regular list to let it grow. # 4) None, no references from this object t = type(refs) if refs is None: refs = address elif t in (int, long): refs = (refs, address) elif t is tuple: if len(refs) >= 5: refs = list(refs) refs.append(address) else: refs = refs + (address,) elif t is list: # if we are close to the maximum number of entries, put # it through a set() to make sure we get all the # duplicates if (self.max_parents > 0): if (len(refs) >= self.max_parents): # Our list has been filled, all done continue elif (len(refs) == self.max_parents - 1): # We are one step away from being full. We put # the content into a set() so that we are sure # any duplicates will get filtered out, leaving # space for the new ref. refs.append(address) refs[:] = set(refs) continue refs.append(address) # We don't need to set it, because we modify-in-place continue else: raise TypeError('unknown refs type: %s\n' % (t,)) parents[ref] = refs if self.show_progress: sys.stderr.write('compute parents %8d / %8d \r' % (idx, total)) for idx, obj in enumerate(self.objs.itervalues()): if self.show_progress and idx & 0x3f == 0: tnow = timer() if tnow - tlast > 0.1: tlast = tnow sys.stderr.write('set parents %8d / %8d \r' % (idx, total)) try: refs = parents.pop(obj.address) except KeyError: obj.parents = () else: if refs is None: obj.parents = () elif type(refs) in (int, long): obj.parents = (refs,) else: # We use a set() to remove duplicate parents obj.parents = set(refs) finally: if enabled: gc.enable() if self.show_progress: sys.stderr.write('set parents %8d / %8d \n' % (idx, total)) def remove_expensive_references(self): """Filter out references that are mere houskeeping links. module.__dict__ tends to reference lots of other modules, which in turn brings in the global reference cycle. Going further function.__globals__ references module.__dict__, so it *too* ends up in the global cycle. Generally these references aren't interesting, simply because they end up referring to *everything*. We filter out any reference to modules, frames, types, function globals pointers & LRU sideways references. """ source = lambda:self.objs.itervalues() total_objs = len(self.objs) # Add the 'null' object self.objs.add(0, '', 0, []) for changed, obj in remove_expensive_references(source, total_objs, self.show_progress): continue def compute_total_size(self, obj): """Sum the size of all referenced objects (recursively).""" obj.total_size = sum(c.size for c in obj.iter_recursive_refs()) return obj def summarize(self, obj=None, excluding=None): """Summarize the objects referenced from this one. :param obj: Given obj as the root object, aggregate the count and size of the types of each referenced object. If not supplied, we will walk all objects. :param excluding: A list of addresses to exclude from the aggregate :return: An _ObjSummary() of this subset of the graph """ summary = _ObjSummary() if obj is None: objs = self.objs.itervalues() else: objs = obj.iter_recursive_refs(excluding=excluding) for obj in objs: summary._add(obj) return summary def get_all(self, type_str): """Return all objects that match a given type.""" all = [o for o in self.objs.itervalues() if o.type_str == type_str] all.sort(key=lambda x:(x.size, len(x), x.num_parents), reverse=True) return all def collapse_instance_dicts(self): """Hide the __dict__ member of instances. When a class does not have __slots__ defined, all instances get a separate '__dict__' attribute that actually holds their contents. This adds a level of indirection that can make it harder than it needs to be, to actually find what instance holds what objects. So we collapse those references back into the object, and grow its 'size' at the same time. :return: True if some data was collapsed """ # The instances I'm focusing on have a custom type name, and every # instance has 2 pointers. The first is to __dict__, and the second is # to the 'type' object whose name matches the type of the instance. # Also __dict__ has only 1 referrer, and that is *this* object # TODO: Handle old style classes. They seem to have type 'instanceobj', # and reference a 'classobj' with the actual type name collapsed = 0 total = len(self.objs) tlast = timer()-20 to_be_removed = set() for item_idx, obj in enumerate(self.objs.itervalues()): if obj.type_str in ('str', 'dict', 'tuple', 'list', 'type', 'function', 'wrapper_descriptor', 'code', 'classobj', 'int', 'weakref'): continue if self.show_progress and item_idx & 0x3f: tnow = timer() if tnow - tlast > 0.1: tlast = tnow sys.stderr.write('checked %8d / %8d collapsed %8d \r' % (item_idx, total, collapsed)) if obj.type_str == 'module' and len(obj) == 1: (dict_obj,) = obj if dict_obj.type_str != 'dict': continue extra_refs = [] else: if len(obj) != 2: continue obj_1, obj_2 = obj if obj_1.type_str == 'dict' and obj_2.type_str == 'type': # This is a new-style class dict_obj = obj_1 type_obj = obj_2 elif (obj.type_str == 'instance' and obj_1.type_str == 'classobj' and obj_2.type_str == 'dict'): # This is an old-style class type_obj = obj_1 dict_obj = obj_2 else: continue extra_refs = [type_obj.address] collapsed += 1 # We found an instance \o/ new_refs = list(dict_obj.children) new_refs.extend(extra_refs) obj.children = new_refs obj.size = obj.size + dict_obj.size obj.total_size = 0 if obj.type_str == 'instance': obj.type_str = type_obj.value # Now that all the data has been moved into the instance, we # will want to remove the dict from the collection. We'll do the # actual deletion later, since we are using iteritems for this # loop. to_be_removed.add(dict_obj.address) # Now we can do the actual deletion. for address in to_be_removed: del self.objs[address] if self.show_progress: sys.stderr.write('checked %8d / %8d collapsed %8d \n' % (item_idx, total, collapsed)) if collapsed: self.compute_parents() return collapsed def refs_as_dict(self, obj): """Expand the ref list considering it to be a 'dict' structure. Often we have dicts that point to simple strings and ints, etc. This tries to expand that as much as possible. :param obj: Should be a MemObject representing an instance (that has been collapsed) or a dict. """ return obj.refs_as_dict() def refs_as_list(self, obj): """Expand the ref list, considering it to be a list structure.""" as_list = [] children = obj.children for addr in children: val = self.objs[addr] if val.type_str == 'bool': val = (val.value == 'True') elif val.value is not None: val = val.value elif val.type_str == 'NoneType': val = None as_list.append(val) return as_list def guess_intern_dict(self): """Try to find the string intern dict. This is a dict that only contains strings that point to themselves. """ for o in self.objs.itervalues(): o_len = len(o) if o.type_str != 'dict' or o_len == 0 or o.num_parents > 0: # Must be a non-empty dict continue # We avoid calling o.children so that we don't have to create # proxies for all objects for i in xrange(0, o_len, 2): # Technically, o[i].address == o[i+1].address, but the proxy # objects are smart enough to get reused... c_i = o[i] c_i1 = o[i+1] if c_i is not c_i1 or c_i.type_str != 'str': break else: return o def load(source, using_json=None, show_prog=True, collapse=True, max_parents=None): """Load objects from the given source. :param source: If this is a string, we will open it as a file and read all objects. For any other type, we will simply iterate and parse objects out, so the object should be an iterator of json lines. :param using_json: Use simplejson rather than the regex. This allows arbitrary ordered json dicts to be parsed but still requires per-line layout. Set to 'False' to indicate you want to use the regex, set to 'True' to force using simplejson. None will probe to see if simplejson is available, and use it if it is. (With _speedups built, simplejson parses faster and more accurately than the regex.) :param show_prog: If True, display the progress as we read in data :param collapse: If True, run collapse_instance_dicts() after loading. :param max_parents: See ObjManager.__init__(max_parents) """ cleanup = None if isinstance(source, str): source, cleanup = files.open_file(source) if isinstance(source, file): input_size = os.fstat(source.fileno()).st_size else: input_size = 0 elif isinstance(source, (list, tuple)): input_size = sum(map(len, source)) else: input_size = 0 if using_json is None: using_json = (simplejson is not None) try: manager = _load(source, using_json, show_prog, input_size, max_parents=max_parents) finally: if cleanup is not None: cleanup() if collapse: tstart = time.time() if not manager.collapse_instance_dicts(): manager.compute_parents() if show_prog: tend = time.time() sys.stderr.write('collapsed in %.1fs\n' % (tend - tstart,)) return manager def iter_objs(source, using_json=False, show_prog=False, input_size=0, objs=None, factory=None): """Iterate MemObjects from json. :param source: A line iterator. :param using_json: Use simplejson. See load(). :param show_prog: Show progress. :param input_size: The size of the input if known (in bytes) or 0. :param objs: Either None or a dict containing objects by address. If not None, then duplicate objects will not be parsed or output. :param factory: Use this to create new instances, if None, use _loader._MemObjectProxy.from_args :return: A generator of memory objects. """ # TODO: cStringIO? tstart = timer() input_mb = input_size / 1024. / 1024. temp_cache = {} address_re = re.compile( r'{"address": (?P
\d+)' ) bytes_read = count = 0 last = 0 mb_read = 0 if using_json: decoder = _from_json else: decoder = _from_line if factory is None: factory = _loader._MemObjectProxy_from_args for line_num, line in enumerate(source): bytes_read += len(line) if line in ("[\n", "]\n"): continue if line.endswith(',\n'): line = line[:-2] if objs: # Skip duplicate objects m = address_re.match(line) if not m: continue address = int(m.group('address')) if address in objs: continue yield decoder(factory, line, temp_cache=temp_cache) if show_prog and (line_num - last > 5000): last = line_num mb_read = bytes_read / 1024. / 1024 tdelta = timer() - tstart sys.stderr.write( 'loading... line %d, %d objs, %5.1f / %5.1f MiB read in %.1fs\r' % (line_num, len(objs), mb_read, input_mb, tdelta)) del temp_cache if show_prog: mb_read = bytes_read / 1024. / 1024 tdelta = timer() - tstart sys.stderr.write( 'loaded line %d, %d objs, %5.1f / %5.1f MiB read in %.1fs \n' % (line_num, len(objs), mb_read, input_mb, tdelta)) def _load(source, using_json, show_prog, input_size, max_parents=None): objs = _loader.MemObjectCollection() for memobj in iter_objs(source, using_json, show_prog, input_size, objs, factory=objs.add): # objs.add automatically adds the object as it is created pass return ObjManager(objs, show_progress=show_prog, max_parents=max_parents) def remove_expensive_references(source, total_objs=0, show_progress=False): """Filter out references that are mere houskeeping links. module.__dict__ tends to reference lots of other modules, which in turn brings in the global reference cycle. Going further function.__globals__ references module.__dict__, so it *too* ends up in the global cycle. Generally these references aren't interesting, simply because they end up referring to *everything*. We filter out any reference to modules, frames, types, function globals pointers & LRU sideways references. :param source: A callable that returns an iterator of MemObjects. This will be called twice. :param total_objs: The total objects to be filtered, if known. If show_progress is False or the count of objects is unknown, 0. :return: An iterator of (changed, MemObject) objects with expensive references removed. """ # First pass, find objects we don't want to reference any more noref_objs = _intset.IDSet() lru_objs = _intset.IDSet() total_steps = total_objs * 2 seen_zero = False for idx, obj in enumerate(source()): # 'module's have a single __dict__, which tends to refer to other # modules. As you start tracking into that, you end up getting into # reference cycles, etc, which generally ends up referencing every # object in memory. # 'frame' also tends to be self referential, and a single frame # ends up referencing the entire current state # 'type' generally is self referential through several attributes. # __bases__ means we recurse all the way up to object, and object # has __subclasses__, which means we recurse down into all types. # In general, not helpful for debugging memory consumption if show_progress and idx & 0x1ff == 0: sys.stderr.write('finding expensive refs... %8d / %8d \r' % (idx, total_steps)) if obj.type_str in ('module', 'frame', 'type'): noref_objs.add(obj.address) if obj.type_str == '_LRUNode': lru_objs.add(obj.address) if obj.address == 0: seen_zero = True # Second pass, any object which refers to something in noref_objs will # have that reference removed, and replaced with the null_memobj num_expensive = len(noref_objs) null_memobj = _loader._MemObjectProxy_from_args(0, '', 0, []) if not seen_zero: yield (True, null_memobj) if show_progress and total_objs == 0: total_objs = idx total_steps = total_objs * 2 for idx, obj in enumerate(source()): if show_progress and idx & 0x1ff == 0: sys.stderr.write('removing %d expensive refs... %8d / %8d \r' % (num_expensive, idx + total_objs, total_steps)) if obj.type_str == 'function': # Functions have a reference to 'globals' which is not very # helpful for having a clear understanding of what is going on # especially since the function itself is in its own globals # XXX: This is probably not a guaranteed order, but currently # func_traverse returns: # func_code, func_globals, func_module, func_defaults, # func_doc, func_name, func_dict, func_closure # We want to remove the reference to globals and module refs = list(obj.children) obj.children = refs[:1] + refs[3:] + [0] yield (True, obj) continue elif obj.type_str == '_LRUNode': # We remove the 'sideways' references obj.children = [ref for ref in obj.children if ref not in lru_objs] yield (True, obj) continue for ref in obj.children: if ref in noref_objs: break else: # No bad references, keep going yield (False, obj) continue new_ref_list = [ref for ref in obj.children if ref not in noref_objs] new_ref_list.append(0) obj.children = new_ref_list yield (True, obj) if show_progress: sys.stderr.write('removed %d expensive refs from %d objs%s\n' % (num_expensive, total_objs, ' '*20)) meliae-0.4.0/meliae/perf_counter.py0000644000000000000000000001456511605576210017224 0ustar rootroot00000000000000# Copyright (C) 2009, 2010 Canonical Ltd # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License version 3 as # published by the Free Software Foundation. # # This program is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . """Get information from the OS about current memory usage""" import ctypes import math import re import sys import time class _Counter(object): """Track various aspects of performance for a given action.""" def __init__(self, name, timer): self.name = name self.time_spent = 0.0 self._time_spent_squared = 0.0 self.count = 0 self._time_start = None self._timer = timer def tick(self): """Indicate that we are starting a section related to this counter.""" self._time_start = self._timer() def tock(self): """Indicate that we finished processing.""" if self._time_start is not None: now = self._timer() delta = now - self._time_start self.time_spent += delta self._time_spent_squared += (delta * delta) self._time_start = None self.count += 1 def time_stddev(self): # This uses a simple transformation on stddev to allow us to store 2 # numbers and compute the standard deviation, rather than needing a # list of times. # stddev = sqrt(mean(x^2) - mean(x)^2) if self.count == 0: return 0 diff = (self._time_spent_squared - (self.time_spent*self.time_spent)) if self.count == 1: return math.sqrt(diff) return math.sqrt(diff / (self.count-1)) class PerformanceCounter(object): """Abstract some system information about performance counters. This includes both memory and timing. """ def __init__(self): self._counters = {} def reset(self): self._counters.clear() def get_counter(self, name): """Create a Counter object that will track some aspect of processing. :param name: An identifier associated with this action. :return: A Counter instance. """ try: c = self._counters[name] except KeyError: c = _Counter(name, self.get_timer()) self._counters[name] = c return c def get_memory(self, process): """Ask the OS for the peak memory consumption at this point in time. :param process: is a subprocess.Popen object. :return: (current, peak) the memory used in bytes. """ raise NotImplementedError(self.get_memory) class _LinuxPerformanceCounter(PerformanceCounter): def get_timer(self): # This returns wall-clock time return time.time def get_memory(self, process): pid = process.pid try: f = open('/proc/%s/status' % (process.pid,), 'rb') except (IOError, OSError): return None, None try: content = f.read() finally: f.close() m = re.search(r'(?i)vmpeak:\s*(?P\d+) kB', content) peak = current = None if m is not None: peak = int(m.group('peak')) * 1024 m = re.search(r'(?i)vmsize:\s*(?P\d+) kB', content) if m is not None: current = int(m.group('current')) * 1024 return current, peak class _Win32PerformanceCounter(PerformanceCounter): def get_timer(self): # This returns wall-clock time, but using a much higher precision than # time.time() [which has a resolution of only 15ms] return time.clock def _get_memory_win32(self, process_handle): """Get the current memory consumption using win32 apis.""" mem_struct = PROCESS_MEMORY_COUNTERS_EX() ret = ctypes.windll.psapi.GetProcessMemoryInfo(process_handle, ctypes.byref(mem_struct), ctypes.sizeof(mem_struct)) if not ret: raise RuntimeError('Failed to call GetProcessMemoryInfo: %s' % ctypes.FormatError()) return { 'PageFaultCount': mem_struct.PageFaultCount, 'PeakWorkingSetSize': mem_struct.PeakWorkingSetSize, 'WorkingSetSize': mem_struct.WorkingSetSize, 'QuotaPeakPagedPoolUsage': mem_struct.QuotaPeakPagedPoolUsage, 'QuotaPagedPoolUsage': mem_struct.QuotaPagedPoolUsage, 'QuotaPeakNonPagedPoolUsage': mem_struct.QuotaPeakNonPagedPoolUsage, 'QuotaNonPagedPoolUsage': mem_struct.QuotaNonPagedPoolUsage, 'PagefileUsage': mem_struct.PagefileUsage, 'PeakPagefileUsage': mem_struct.PeakPagefileUsage, 'PrivateUsage': mem_struct.PrivateUsage, } def get_memory(self, process): """See PerformanceCounter.get_memory()""" process_handle = int(process._handle) mem = self._get_memory_win32(process_handle) return mem['WorkingSetSize'], mem['PeakWorkingSetSize'] # what to do about darwin, freebsd, etc? if sys.platform == 'win32': perf_counter = _Win32PerformanceCounter() # Define this here so we don't have to re-define it on every function call class PROCESS_MEMORY_COUNTERS_EX(ctypes.Structure): """Used by GetProcessMemoryInfo""" _fields_ = [('cb', ctypes.c_ulong), ('PageFaultCount', ctypes.c_ulong), ('PeakWorkingSetSize', ctypes.c_size_t), ('WorkingSetSize', ctypes.c_size_t), ('QuotaPeakPagedPoolUsage', ctypes.c_size_t), ('QuotaPagedPoolUsage', ctypes.c_size_t), ('QuotaPeakNonPagedPoolUsage', ctypes.c_size_t), ('QuotaNonPagedPoolUsage', ctypes.c_size_t), ('PagefileUsage', ctypes.c_size_t), ('PeakPagefileUsage', ctypes.c_size_t), ('PrivateUsage', ctypes.c_size_t), ] else: perf_counter = _LinuxPerformanceCounter() def _get_process_win32(): """Similar to getpid() but returns an OS handle.""" return ctypes.windll.kernel32.GetCurrentProcess() meliae-0.4.0/meliae/scanner.py0000644000000000000000000001656511605576210016164 0ustar rootroot00000000000000# Copyright (C) 2009, 2010, 2011 Canonical Ltd # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License version 3 as # published by the Free Software Foundation. # # This program is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . """Some bits for helping to scan objects looking for referenced memory.""" import gc import types from meliae import ( _intset, _scanner, ) size_of = _scanner.size_of get_referents = _scanner.get_referents add_special_size = _scanner.add_special_size def _size_of_ndarray(ndarray_obj): """ Return the size of a Numpy ndarray's internal storage. Doesn't yet handle views into other arrays. """ return ndarray_obj.nbytes add_special_size("numpy.ndarray", _size_of_ndarray, _size_of_ndarray) def dump_all_referenced(outf, obj, is_pending=False): """Recursively dump everything that is referenced from obj.""" if isinstance(outf, str): outf = open(outf, 'wb') if is_pending: pending = obj else: pending = [obj] last_offset = len(pending) - 1 # TODO: Instead of using an IDSet, we could use a BloomFilter. It would # mean some objects may not get dumped (blooms say "yes you # definitely are not present", but only "you might already be # present", collisions cause false positives.) # However, you can get by with 8-10bits for a 1% FPR, rather than # using 32/64-bit pointers + overhead for avoiding hash collisions. # So on 64-bit we drop from 16bytes/object to 1... seen = _intset.IDSet() if is_pending: seen.add(id(pending)) while last_offset >= 0: next = pending[last_offset] last_offset -= 1 id_next = id(next) if id_next in seen: continue seen.add(id_next) # We will recurse here, so tell dump_object_info to not recurse _scanner.dump_object_info(outf, next, recurse_depth=0) for ref in get_referents(next): if id(ref) not in seen: last_offset += 1 if len(pending) > last_offset: pending[last_offset] = ref else: pending.append(ref) def dump_gc_objects(outf, recurse_depth=1): """Dump everything that is available via gc.get_objects(). """ if isinstance(outf, basestring): opened = True outf = open(outf, 'wb') else: opened = False # Get the list of everything before we start building new objects all_objs = gc.get_objects() # Dump out a few specific objects, so they don't get repeated forever nodump = [None, True, False] # In current versions of python, these are all pre-cached nodump.extend(xrange(-5, 256)) nodump.extend([chr(c) for c in xrange(256)]) nodump.extend([t for t in types.__dict__.itervalues() if type(t) is types.TypeType]) nodump.extend([set, dict]) # Some very common interned strings nodump.extend(('__doc__', 'self', 'operator', '__init__', 'codecs', '__new__', '__builtin__', '__builtins__', 'error', 'len', 'errors', 'keys', 'None', '__module__', 'file', 'name', '', 'sys', 'True', 'False')) nodump.extend((BaseException, Exception, StandardError, ValueError)) for obj in nodump: _scanner.dump_object_info(outf, obj, nodump=None, recurse_depth=0) # Avoid dumping the all_objs list and this function as well. This helps # avoid getting a 'reference everything in existence' problem. nodump.append(dump_gc_objects) # This currently costs us ~16kB during dumping, but means we won't write # out those objects multiple times in the log file. # TODO: we might want to make nodump a variable-size dict, and add anything # with ob_refcnt > 1000 or so. nodump = frozenset(nodump) for obj in all_objs: _scanner.dump_object_info(outf, obj, nodump=nodump, recurse_depth=recurse_depth) del all_objs[:] if opened: outf.close() else: outf.flush() def dump_all_objects(outf): """Dump everything that is referenced from gc.get_objects() This recurses, and tracks dumped objects in an IDSet. Which means it costs memory, which is often about 10% of currently active memory. Otherwise, this usually results in smaller dump files than dump_gc_objects(). This also can be faster, because it doesn't dump the same item multiple times. """ if isinstance(outf, basestring): opened = True outf = open(outf, 'wb') else: opened = False all_objs = gc.get_objects() dump_all_referenced(outf, all_objs, is_pending=True) del all_objs[:] if opened: outf.close() else: outf.flush() def get_recursive_size(obj): """Get the memory referenced from this object. This returns the memory of the direct object, and all of the memory referenced by child objects. It also returns the total number of objects. """ total_size = 0 pending = [obj] last_item = 0 seen = _intset.IDSet() size_of = _scanner.size_of while last_item >= 0: item = pending[last_item] last_item -= 1 id_item = id(item) if id_item in seen: continue seen.add(id_item) total_size += size_of(item) for child in get_referents(item): if id(child) not in seen: last_item += 1 if len(pending) > last_item: pending[last_item] = child else: pending.append(child) return len(seen), total_size def get_recursive_items(obj): """Walk all referred items and return the unique list of them.""" all = [] pending = [obj] last_item = 0 seen = _intset.IDSet() while last_item >= 0: item = pending[last_item] last_item -= 1 id_item = id(item) if id_item in seen: continue seen.add(id_item) all.append(item) for child in get_referents(item): if id(child) not in seen: last_item += 1 if len(pending) > last_item: pending[last_item] = child else: pending.append(child) return all def find_interned_dict(): """Go through all gc objects and find the interned python dict.""" for obj in gc.get_objects(): if (type(obj) is not dict or 'find_interned_dict' not in obj or obj['find_interned_dict'] is not 'find_interned_dict' or 'get_recursive_items' not in obj or obj['get_recursive_items'] is not 'get_recursive_items'): # The above check assumes that local strings will be interned, # which is the standard cpython behavior, but perhaps not the best # to require? However, if we used something like a custom string # that we intern() we still could have problems with locals(), etc. continue return obj meliae-0.4.0/meliae/tests/0000755000000000000000000000000011605576210015306 5ustar rootroot00000000000000meliae-0.4.0/meliae/warn.py0000644000000000000000000000236511605576210015473 0ustar rootroot00000000000000# Copyright (C) 2010 Canonical Ltd # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License version 3 as # published by the Free Software Foundation. # # This program is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . """Routines for giving warnings.""" import warnings _warn_func = warnings.warn def deprecated(msg): """Issue a warning about something that is deprecated.""" warn(msg, DeprecationWarning, stacklevel=3) def warn(msg, klass=None, stacklevel=1): """Emit a warning about something.""" _warn_func(msg, klass, stacklevel=stacklevel) def trap_warnings(new_warning_func): """Redirect warnings to a different function. :param new_warning_func: A function with the same signature as warning.warn :return: The old warning function """ global _warn_func old = _warn_func _warn_func = new_warning_func return old meliae-0.4.0/meliae/tests/__init__.py0000644000000000000000000000266411605576210017427 0ustar rootroot00000000000000# Copyright (C) 2009, 2011 Canonical Ltd # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License version 3 as # published by the Free Software Foundation. # # This program is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . """Helpers for loading and running the test suite.""" import unittest TestCase = unittest.TestCase def run_suite(verbose=False): if verbose: verbosity = 2 else: verbosity = 1 runner = unittest.TextTestRunner(verbosity=verbosity) suite = test_suite() result = runner.run(suite) return result.wasSuccessful() def test_suite(): module_names = [ 'test__intset', 'test__loader', 'test__scanner', 'test_loader', 'test_perf_counter', 'test_scanner', ] full_names = [__name__ + '.' + n for n in module_names] loader = unittest.TestLoader() suite = loader.suiteClass() for full_name in full_names: module = __import__(full_name, {}, {}, [None]) suite.addTests(loader.loadTestsFromModule(module)) return suite meliae-0.4.0/meliae/tests/test__intset.py0000644000000000000000000001202511605576210020364 0ustar rootroot00000000000000# Copyright (C) 2009, 2010, 2011 Canonical Ltd # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License version 3 as # published by the Free Software Foundation. # # This program is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . """Test the Set of Integers object.""" import sys from meliae import ( _intset, _scanner, tests, ) class TestIntSet(tests.TestCase): _set_type = _intset.IntSet def test__init__(self): s = self._set_type() def test__len__(self): self.assertEqual(0, len(self._set_type())) def test__contains__(self): iset = self._set_type() self.assertFalse(0 in iset) def test__contains__not_int(self): iset = self._set_type() def test_contain(): return 'a' in iset self.assertRaises(TypeError, test_contain) def test_add_singletons(self): iset = self._set_type() iset.add(-1) self.assertTrue(-1 in iset) self.assertEqual(1, len(iset)) iset.add(-2) self.assertTrue(-2 in iset) self.assertEqual(2, len(iset)) def test_add_not_int(self): iset = self._set_type() self.assertRaises(TypeError, iset.add, 'foo') def test_add_general(self): iset = self._set_type() self.assertEqual(0, len(iset)) self.assertFalse(1 in iset) self.assertFalse(2 in iset) self.assertFalse(128 in iset) self.assertFalse(129 in iset) self.assertFalse(130 in iset) iset.add(1) self.assertEqual(1, len(iset)) self.assertTrue(1 in iset) self.assertFalse(2 in iset) self.assertFalse(128 in iset) self.assertFalse(129 in iset) self.assertFalse(130 in iset) iset.add(2) self.assertEqual(2, len(iset)) self.assertTrue(1 in iset) self.assertTrue(2 in iset) self.assertFalse(128 in iset) self.assertFalse(129 in iset) self.assertFalse(130 in iset) iset.add(129) self.assertTrue(1 in iset) self.assertTrue(2 in iset) self.assertFalse(128 in iset) self.assertTrue(129 in iset) self.assertFalse(130 in iset) def test_add_and_grow(self): iset = self._set_type() for i in xrange(0, 10000): iset.add(i) self.assertEqual(10000, len(iset)) def test_from_list(self): iset = self._set_type([0, 1, 2, 3, 4, 5]) self.assertTrue(0 in iset) self.assertTrue(1 in iset) self.assertTrue(2 in iset) self.assertTrue(3 in iset) self.assertTrue(4 in iset) self.assertTrue(5 in iset) self.assertFalse(6 in iset) def test_discard(self): # Not supported yet... KnownFailure pass def test_remove(self): # Not supported yet... KnownFailure pass def assertSizeOf(self, num_words, obj, extra_size=0, has_gc=True): expected_size = extra_size + num_words * _scanner._word_size if has_gc: expected_size += _scanner._gc_head_size self.assertEqual(expected_size, _scanner.size_of(obj)) def test__sizeof__(self): # The intset class should report a size iset = self._set_type([]) # I'm a bit leery of has_gc=False, as I think some versions of pyrex # will put the object into GC even though it doesn't have any 'object' # references... # We could do something with a double-entry check # Size is: # 1: PyType* # 2: refcnt # 3: vtable* # 4: _count # 5: _mask # 6: _array # 4-byte int _has_singleton # However, most compliers will align the struct based on the width of # the largest entry. So while we could put 2 4-byte ints into the # struct, it will waste 4-bytes anyway. self.assertSizeOf(7, iset, has_gc=False) iset.add(12345) # Min allocation is 256 entries self.assertSizeOf(7+256, iset, has_gc=False) class TestIDSet(TestIntSet): _set_type = _intset.IDSet def test_high_bit(self): # Python ids() are considered to be unsigned values, but python # integers are considered to be signed longs. As such, we need to play # some tricks to get them to fit properly. Otherwise we get # 'Overflow' exceptions bigint = sys.maxint + 1 self.assertTrue(isinstance(bigint, long)) iset = self._set_type() self.assertFalse(bigint in iset) iset.add(bigint) self.assertTrue(bigint in iset) def test_add_singletons(self): pass # Negative values cannot be checked in IDSet, because we cast them to # unsigned long first. meliae-0.4.0/meliae/tests/test__loader.py0000644000000000000000000006366711605576210020346 0ustar rootroot00000000000000# Copyright (C) 2009, 2010, 2011 Canonical Ltd # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License version 3 as # published by the Free Software Foundation. # # This program is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . """Pyrex extension for tracking loaded objects""" from meliae import ( _loader, _scanner, warn, tests, ) # Empty table size is 1024 pointers # 1: PyType* # 2: refcnt # 3: vtable* # 4: _table* # 3 4-byte int attributes # Note that on 64-bit platforms, alignment issues mean we will still # round to a multiple-of-8 bytes. _memobj_extra_size = 3*4 if (_memobj_extra_size % _scanner._word_size) != 0: _memobj_extra_size += (_scanner._word_size - (_memobj_extra_size % _scanner._word_size)) class TestMemObjectCollection(tests.TestCase): def test__init__(self): moc = _loader.MemObjectCollection() self.assertEqual(0, moc._active) self.assertEqual(0, moc._filled) self.assertEqual(1023, moc._table_mask) def test__lookup_direct(self): moc = _loader.MemObjectCollection() self.assertEqual(1023, moc._table_mask) self.assertEqual(0, moc._test_lookup(0)) self.assertEqual(0, moc._test_lookup(1024)) self.assertEqual(255, moc._test_lookup(255)) self.assertEqual(933, moc._test_lookup(933)) self.assertEqual(933, moc._test_lookup(933+1024)) self.assertEqual(933, moc._test_lookup(933L+1024L)) def test__len__(self): moc = _loader.MemObjectCollection() self.assertEqual(0, len(moc)) moc.add(0, 'foo', 100) self.assertEqual(1, len(moc)) moc.add(1024, 'foo', 100) self.assertEqual(2, len(moc)) del moc[0] self.assertEqual(1, len(moc)) del moc[1024] self.assertEqual(0, len(moc)) def test__lookup_collide(self): moc = _loader.MemObjectCollection() self.assertEqual(1023, moc._table_mask) self.assertEqual(0, moc._test_lookup(0)) self.assertEqual(0, moc._test_lookup(1024)) moc.add(0, 'foo', 100) self.assertEqual(0, moc._test_lookup(0)) self.assertEqual(1, moc._test_lookup(1024)) moc.add(1024, 'bar', 200) self.assertEqual(0, moc._test_lookup(0)) self.assertEqual(1, moc._test_lookup(1024)) def test__contains__(self): moc = _loader.MemObjectCollection() self.assertEqual(0, moc._test_lookup(0)) self.assertEqual(0, moc._test_lookup(1024)) self.assertFalse(0 in moc) self.assertFalse(1024 in moc) moc.add(0, 'foo', 100) self.assertTrue(0 in moc) self.assertFalse(1024 in moc) moc.add(1024, 'bar', 200) self.assertTrue(0 in moc) self.assertTrue(1024 in moc) def test__getitem__(self): moc = _loader.MemObjectCollection() def get(offset): return moc[offset] self.assertRaises(KeyError, get, 0) self.assertRaises(KeyError, get, 1024) moc.add(0, 'foo', 100) mop = moc[0] self.assertTrue(isinstance(mop, _loader._MemObjectProxy)) self.assertEqual('foo', mop.type_str) self.assertEqual(100, mop.size) self.assertRaises(KeyError, get, 1024) self.assertTrue(mop is moc[mop]) def test_get(self): moc = _loader.MemObjectCollection() self.assertEqual(None, moc.get(0, None)) moc.add(0, 'foo', 100) self.assertEqual(100, moc.get(0, None).size) def test__delitem__(self): moc = _loader.MemObjectCollection() def get(offset): return moc[offset] def delete(offset): del moc[offset] self.assertRaises(KeyError, delete, 0) self.assertRaises(KeyError, delete, 1024) moc.add(0, 'foo', 100) self.assertTrue(0 in moc) self.assertFalse(1024 in moc) self.assertRaises(KeyError, delete, 1024) moc.add(1024, 'bar', 200) del moc[0] self.assertFalse(0 in moc) self.assertRaises(KeyError, get, 0) mop = moc[1024] del moc[mop] self.assertRaises(KeyError, get, 1024) def test_add_until_resize(self): moc = _loader.MemObjectCollection() for i in xrange(1025): moc.add(i, 'foo', 100+i) self.assertEqual(1025, moc._filled) self.assertEqual(1025, moc._active) self.assertEqual(2047, moc._table_mask) mop = moc[1024] self.assertEqual(1024, mop.address) self.assertEqual(1124, mop.size) def test_itervalues_to_tip(self): moc = _loader.MemObjectCollection() moc.add(0, 'bar', 100) moc.add(1024, 'baz', 102) moc.add(512, 'bing', 104) self.assertEqual([0, 1024, 512], [x.address for x in moc.itervalues()]) del moc[0] self.assertEqual([1024, 512], [x.address for x in moc.itervalues()]) moc.add(1023, 'booze', 103) self.assertEqual([1024, 512, 1023], [x.address for x in moc.itervalues()]) del moc[1023] self.assertEqual([1024, 512], [x.address for x in moc.itervalues()]) def test_items(self): moc = _loader.MemObjectCollection() moc.add(0, 'bar', 100) moc.add(1024, 'baz', 102) moc.add(512, 'bing', 103) items = moc.items() self.assertTrue(isinstance(items, list)) self.assertEqual([(0, 0), (1024, 1024), (512, 512)], [(k, v.address) for k,v in items]) del moc[0] self.assertEqual([(1024, 1024), (512, 512)], [(k, v.address) for k,v in moc.items()]) def test_iteritems(self): moc = _loader.MemObjectCollection() moc.add(0, 'bar', 100) moc.add(1024, 'baz', 102) moc.add(512, 'bing', 103) self.assertEqual([(0, 0), (1024, 1024), (512, 512)], [(k, v.address) for k,v in moc.iteritems()]) del moc[0] self.assertEqual([(1024, 1024), (512, 512)], [(k, v.address) for k,v in moc.iteritems()]) def test_keys(self): moc = _loader.MemObjectCollection() moc.add(0, 'bar', 100) moc.add(1024, 'baz', 102) moc.add(512, 'bing', 103) keys = moc.keys() self.assertTrue(isinstance(keys, list)) self.assertEqual([0, 1024, 512], keys) del moc[0] self.assertEqual([1024, 512], moc.keys()) def test__iter__(self): moc = _loader.MemObjectCollection() moc.add(0, 'bar', 100) moc.add(1024, 'baz', 102) moc.add(512, 'bing', 103) self.assertEqual([0, 1024, 512], list(moc)) self.assertEqual([0, 1024, 512], list(moc.iterkeys())) del moc[0] self.assertEqual([1024, 512], list(moc)) self.assertEqual([1024, 512], list(moc.iterkeys())) def assertSizeOf(self, num_words, obj, extra_size=0, has_gc=True): expected_size = extra_size + num_words * _scanner._word_size if has_gc: expected_size += _scanner._gc_head_size self.assertEqual(expected_size, _scanner.size_of(obj)) def test__sizeof__base(self): moc = _loader.MemObjectCollection() # Empty table size is 1024 pointers # 1: PyType* # 2: refcnt # 3: vtable* # 4: _table* # 3 4-byte int attributes # Note that on 64-bit platforms, alignment issues mean we will still # round to a multiple-of-8 bytes. self.assertSizeOf(4+1024, moc, extra_size=_memobj_extra_size, has_gc=False) def test__sizeof__one_item(self): moc = _loader.MemObjectCollection() # We also track the size of the referenced _MemObject entries # Which is: # 1: PyObject *address # 2: PyObject *type_str # 3: long size # 4: *child_list # 5: *value # 6: *parent_list # 7: ulong total_size # 8: *proxy moc.add(0, 'foo', 100) self.assertSizeOf(4+1024+8, moc, extra_size=_memobj_extra_size, has_gc=False) def test__sizeof__with_reflists(self): moc = _loader.MemObjectCollection() # We should assign the memory for ref-lists to the container. A # ref-list allocates the number of entries + 1 # Each _memobject also takes up moc.add(0, 'foo', 100, children=[1234], parent_list=[3456, 7890]) self.assertSizeOf(4+1024+8+2+3, moc, extra_size=_memobj_extra_size, has_gc=False) def test__sizeof__with_dummy(self): moc = _loader.MemObjectCollection() moc.add(0, 'foo', 100, children=[1234], parent_list=[3456, 7890]) moc.add(1, 'foo', 100, children=[1234], parent_list=[3456, 7890]) del moc[1] self.assertSizeOf(4+1024+8+2+3, moc, extra_size=_memobj_extra_size, has_gc=False) def test_traverse_empty(self): # With nothing present, we return no referents moc = _loader.MemObjectCollection() self.assertEqual([], _scanner.get_referents(moc)) def test_traverse_simple_item(self): moc = _loader.MemObjectCollection() moc.add(1234, 'foo', 100) self.assertEqual([1234, 'foo', None], _scanner.get_referents(moc)) def test_traverse_multiple_and_parents_and_children(self): moc = _loader.MemObjectCollection() moc.add(1, 'foo', 100) moc.add(2, 'bar', 200, children=[8, 9], parent_list=[10, 11], value='test val') self.assertEqual([1, 'foo', None, 2, 'bar', 'test val', 8, 9, 10, 11], _scanner.get_referents(moc)) class Test_MemObjectProxy(tests.TestCase): def setUp(self): super(Test_MemObjectProxy, self).setUp() self.moc = _loader.MemObjectCollection() self.moc.add(1024, 'bar', 200) self.moc.add(0, 'foo', 100) self.moc.add(255, 'baz', 300) del self.moc[1024] def test_basic_proxy(self): mop = self.moc[0] self.assertEqual(0, mop.address) self.assertEqual('foo', mop.type_str) self.assertEqual(100, mop.size) mop.size = 1024 self.assertEqual(1024, mop.size) self.assertEqual(1024, self.moc[0].size) self.assertEqual(0, len(mop)) def test_deleted_proxy(self): mop = self.moc[0] del self.moc[0] self.assertEqual('foo', mop.type_str) self.assertEqual(100, mop.size) self.assertEqual(0, len(mop)) def test_value(self): mop = self.moc.add(1234, 'type', 256, value='testval') self.assertEqual('testval', mop.value) mop.value = None self.assertEqual(None, mop.value) mop.value = 'a str' self.assertEqual('a str', mop.value) mop.value = 'a str' self.assertEqual('a str', mop.value) def test_type_str(self): mop = self.moc.add(1234, 'type', 256, value='testval') self.assertEqual('type', mop.type_str) mop.type_str = 'difftype' self.assertEqual('difftype', mop.type_str) def test_name(self): mop = self.moc.add(1234, 'type', 256, name='the name') # 'name' entries get mapped as value self.assertEqual('the name', mop.value) def test__intern_from_cache(self): cache = {} addr = 1234567 mop = self.moc.add(addr, 'my ' + ' type', 256) mop._intern_from_cache(cache) self.assertTrue(addr in cache) self.assertTrue(mop.address is addr) self.assertTrue(cache[addr] is addr) t = cache['my type'] self.assertTrue(mop.type_str is t) del self.moc[addr] mop = self.moc.add(1234566+1, 'my ' + ' ty' + 'pe', 256) addr876543 = 876543 cache[addr876543] = addr876543 addr654321 = 654321 cache[addr654321] = addr654321 mop.children = [876542+1, 654320+1] mop.parents = [876542+1, 654320+1] self.assertFalse(mop.address is addr) self.assertFalse(mop.type_str is t) rl = mop.children self.assertFalse(rl[0] is addr876543) self.assertFalse(rl[1] is addr654321) rfrs = mop.parents self.assertFalse(rl[0] is addr876543) self.assertFalse(rl[1] is addr654321) mop._intern_from_cache(cache) self.assertTrue(mop.address is addr) self.assertTrue(mop.type_str is t) rl = mop.children self.assertTrue(rl[0] is addr876543) self.assertTrue(rl[1] is addr654321) rfrs = mop.parents self.assertTrue(rfrs[0] is addr876543) self.assertTrue(rfrs[1] is addr654321) def test_children(self): mop = self.moc.add(1234567, 'type', 256, children=[1, 2, 3]) self.assertEqual(3, len(mop)) self.assertEqual([1, 2, 3], mop.children) mop.children = [87654321, 23456] self.assertEqual([87654321, 23456], mop.children) self.assertEqual(2, len(mop)) def test_c(self): mop = self.moc.add(1234567, 'type', 256, children=[0, 255]) mop0 = self.moc[0] self.assertEqual([], mop0.c) c = mop.c self.assertEqual(2, len(c)) self.assertEqual(0, c[0].address) self.assertEqual(255, c[1].address) def test_ref_list(self): # Deprecated logged = [] def log_warn(msg, klass, stacklevel=None): logged.append((msg, klass, stacklevel)) old_func = warn.trap_warnings(log_warn) try: mop = self.moc.add(1234567, 'type', 256, children=[1, 2, 3]) self.assertEqual(3, len(mop)) self.assertEqual(3, mop.num_refs) self.assertEqual([('Attribute .num_refs deprecated.' ' Use len() instead.', DeprecationWarning, 3), ], logged) del logged[:] self.assertEqual([1, 2, 3], mop.ref_list) self.assertEqual([('Attribute .ref_list deprecated.' ' Use .children instead.', DeprecationWarning, 3), ], logged) mop.ref_list = [87654321, 23456] self.assertEqual([('Attribute .ref_list deprecated.' ' Use .children instead.', DeprecationWarning, 3), ]*2, logged) self.assertEqual([87654321, 23456], mop.children) self.assertEqual(2, len(mop)) finally: warn.trap_warnings(old_func) def test__getitem__(self): mop = self.moc.add(1234567, 'type', 256, children=[0, 255]) self.assertEqual(2, len(mop)) self.assertEqual(2, len(list(mop))) mop0 = mop[0] mop255 = mop[1] self.assertEqual([mop0, mop255], list(mop)) self.assertEqual(0, mop0.address) self.assertEqual('foo', mop0.type_str) self.assertEqual(255, mop255.address) self.assertEqual('baz', mop255.type_str) def test_total_size(self): mop = self.moc[0] self.assertEqual(0, mop.total_size) mop.total_size = 10245678 self.assertEqual(10245678, mop.total_size) mop.total_size = (2**31+1) self.assertEqual(2**31+1, mop.total_size) def test_parents(self): mop = self.moc.add(1234567, 'type', 256, children=[0, 255]) mop0 = self.moc[0] self.assertEqual((), mop0.parents) self.assertEqual(0, mop0.num_parents) mop255 = self.moc[255] self.assertEqual((), mop255.parents) self.assertEqual(0, mop255.num_parents) mop0.parents = [1234567] self.assertEqual(1, mop0.num_parents) self.assertEqual([1234567], mop0.parents) mop255.parents = [1234567] self.assertEqual(1, mop255.num_parents) self.assertEqual([1234567], mop255.parents) def test_p(self): mop = self.moc.add(1234567, 'type', 256, children=[0, 255]) mop0 = self.moc[0] self.assertEqual([], mop0.p) mop0.parents = [1234567] p = mop0.p self.assertEqual(1, len(p)) self.assertEqual(mop, p[0]) def test_referrers(self): mop = self.moc.add(1234567, 'type', 256, children=[0, 255]) # Referrers is deprecated logged = [] def log_warn(msg, klass, stacklevel=None): logged.append((msg, klass, stacklevel)) old_func = warn.trap_warnings(log_warn) try: mop0 = self.moc[0] self.assertEqual((), mop0.referrers) self.assertEqual([('Attribute .referrers deprecated.' ' Use .parents instead.', DeprecationWarning, 3) ], logged) mop0.referrers = [1234567] self.assertEqual([('Attribute .referrers deprecated.' ' Use .parents instead.', DeprecationWarning, 3) ]*2, logged) self.assertEqual([1234567], mop0.parents) del logged[:] self.assertEqual(1, mop0.num_referrers) self.assertEqual([('Attribute .num_referrers deprecated.' ' Use .num_parents instead.', DeprecationWarning, 3) ], logged) finally: warn.trap_warnings(old_func) def test_parents(self): mop = self.moc.add(1234567, 'type', 256, children=[0, 255]) mop0 = self.moc[0] self.assertEqual((), mop0.parents) self.assertEqual(0, mop0.num_parents) mop255 = self.moc[255] self.assertEqual((), mop255.parents) self.assertEqual(0, mop255.num_parents) mop0.parents = [1234567] self.assertEqual(1, mop0.num_parents) self.assertEqual([1234567], mop0.parents) mop255.parents = [1234567] self.assertEqual(1, mop255.num_parents) self.assertEqual([1234567], mop255.parents) def test__repr__(self): mop = self.moc.add(1234, 'str', 24) self.assertEqual('str(1234 24B)', repr(mop)) mop = self.moc.add(1235, 'tuple', 12, [4567, 8900]) self.assertEqual('tuple(1235 12B 2refs)', repr(mop)) mop = self.moc.add(1236, 'module', 12, [4568, 8900], name='named') # TODO: Will we show the refs? If so, we will want to truncate self.assertEqual("module(1236 12B 2refs 'named')", repr(mop)) mop = self.moc.add(1237, 'module', 12, range(20), name='named') self.assertEqual("module(1237 12B 20refs 'named')", repr(mop)) mop = self.moc.add(1238, 'foo', 12, [10], parent_list=[20, 30]) self.assertEqual("foo(1238 12B 1refs 2par)", repr(mop)) mop = self.moc.add(1239, 'str', 24, value='teststr') self.assertEqual("str(1239 24B 'teststr')", repr(mop)) # TODO: Will we want to truncate value? mop.value = 'averylongstringwithmorestuff' self.assertEqual("str(1239 24B 'averylongstringwithmorestuff')", repr(mop)) mop = self.moc.add(1240, 'int', 12, value=12345) self.assertEqual('int(1240 12B 12345)', repr(mop)) mop.total_size = 12 self.assertEqual('int(1240 12B 12345 12.0Btot)', repr(mop)) mop.total_size = 1024 self.assertEqual('int(1240 12B 12345 1.0Ktot)', repr(mop)) mop.total_size = int(1024*1024*10.5) self.assertEqual('int(1240 12B 12345 10.5Mtot)', repr(mop)) def test_expand_refs_as_dict(self): self.moc.add(1, 'str', 25, value='a') self.moc.add(2, 'int', 12, value=1) mop = self.moc.add(3, 'dict', 140, children=[1, 2]) as_dict = mop.refs_as_dict() self.assertEqual({'a': 1}, mop.refs_as_dict()) # It should even work if there is a 'trailing' entry, as after # collapse, instances have the dict inline, and end with the reference # to the type mop = self.moc.add(4, 'MyClass', 156, children=[2, 1, 8]) self.assertEqual({1: 'a'}, mop.refs_as_dict()) def test_expand_refs_as_dict_unknown(self): # We only expand objects that fall into known types, not just any value mop_f = self.moc.add(1, 'frame', 300, value='func_foo') self.moc.add(2, 'str', 25, value='a') mop = self.moc.add(3, 'dict', 140, children=[2, 1]) as_dict = mop.refs_as_dict() self.assertEqual({'a': mop_f}, as_dict) def assertSizeOf(self, num_words, obj, extra_size=0, has_gc=True): expected_size = extra_size + num_words * _scanner._word_size if has_gc: expected_size += _scanner._gc_head_size self.assertEqual(expected_size, _scanner.size_of(obj)) def test__sizeof__(self): mop = self.moc[0] # 1: PyType* # 2: refcnt # 3: collection* # 4: _MemObject* # 5: _managed_obj * # No vtable because we have no cdef functions self.assertSizeOf(5, mop, has_gc=True) def test__sizeof__managed(self): mop = self.moc[0] del self.moc[0] # If the underlying object has been removed from the collection, then # the memory is now being managed by the mop itself. So it grows by: # 1: PyObject* address # 2: PyObject* type_str # 3: long size # 4: RefList *child_list # 5: PyObject *value # 6: RefList *parent_list # 7: unsigned long total_size # 8: PyObject *proxy self.assertSizeOf(5+8, mop, has_gc=True) def test_traverse(self): # When a Proxied object is removed from its Collection, it becomes # owned by the Proxy itself, and should be returned from tp_traverse mop = self.moc[0] # At this point, moc still controls the _MemObject self.assertEqual([self.moc], _scanner.get_referents(mop)) # But now, it references everything else, too del self.moc[0] self.assertEqual([self.moc, mop.address, mop.type_str, mop.value], _scanner.get_referents(mop)) def test_traverse_with_parent_and_children(self): mop = self.moc.add(1, 'my_class', 1234, children=[5,6,7], parent_list=[8,9,10], value='test str') # While in the moc, the Collection manages the references self.assertEqual([self.moc], _scanner.get_referents(mop)) # But once gone, we refer to everything directly del self.moc[1] self.assertEqual([self.moc, mop.address, mop.type_str, mop.value, 5, 6, 7, 8, 9, 10], _scanner.get_referents(mop)) def test_compute_total_size(self): obj = self.moc.add(1, 'test', 1234, children=[0]) obj = self.moc.add(2, 'other', 4567, children=[1]) self.assertEqual(1234+4567+100, obj.compute_total_size()) self.assertEqual(1234+4567, obj.compute_total_size(excluding=[0])) def test_all(self): self.moc.add(1, 'foo', 1234, children=[0]) obj = self.moc.add(2, 'other', 4567, children=[1]) self.assertEqual([1, 0], [o.address for o in obj.all('foo')]) self.assertEqual([1], [o.address for o in obj.all('foo', excluding=[0])]) # Excluding 1 actually prevents us from walking further self.assertEqual([], [o.address for o in obj.all('foo', excluding=[1])]) class Test_MemObjectProxyIterRecursiveRefs(tests.TestCase): def setUp(self): super(Test_MemObjectProxyIterRecursiveRefs, self).setUp() self.moc = _loader.MemObjectCollection() self.moc.add(1024, 'bar', 200) self.moc.add(0, 'foo', 100) self.moc.add(255, 'baz', 300) def assertIterRecursiveRefs(self, addresses, obj, excluding=None): self.assertEqual(addresses, [o.address for o in obj.iter_recursive_refs(excluding=excluding)]) def test_no_refs(self): self.assertIterRecursiveRefs([1024], self.moc[1024]) def test_single_depth_refs(self): obj = self.moc.add(1, 'test', 1234, children=[1024]) self.assertIterRecursiveRefs([1, 1024], obj) def test_deep_refs(self): obj = self.moc.add(1, '1', 1234, children=[2]) self.moc.add(2, '2', 1234, children=[3]) self.moc.add(3, '3', 1234, children=[4]) self.moc.add(4, '4', 1234, children=[5]) self.moc.add(5, '5', 1234, children=[6]) self.moc.add(6, '6', 1234, children=[]) self.assertIterRecursiveRefs([1, 2, 3, 4, 5, 6], obj) def test_self_referenced(self): self.moc.add(1, 'test', 1234, children=[1024, 2]) obj = self.moc.add(2, 'test2', 1234, children=[1]) self.assertIterRecursiveRefs([2, 1, 1024], obj) def test_excluding(self): obj = self.moc.add(1, 'test', 1234, children=[1024]) self.assertIterRecursiveRefs([1], obj, excluding=[1024]) def test_excluding_nested(self): self.moc.add(1, 'test', 1234, children=[1024]) obj = self.moc.add(2, 'test', 1234, children=[1]) self.assertIterRecursiveRefs([2, 1, 1024], obj) self.assertIterRecursiveRefs([2], obj, excluding=[1]) def test_excluding_wraparound(self): self.moc.add(1, 'test', 1234, children=[1024]) self.moc.add(2, 'test', 1234, children=[1024]) obj = self.moc.add(3, 'test', 1234, children=[1, 2]) self.assertIterRecursiveRefs([3, 2, 1024, 1], obj) self.assertIterRecursiveRefs([3, 2, 1024], obj, excluding=[1]) refed = (o.address for o in self.moc[1].iter_recursive_refs()) self.assertIterRecursiveRefs([3, 2], obj, excluding=refed) def test_excluding_self(self): self.assertIterRecursiveRefs([], self.moc[1024], excluding=[1024]) obj = self.moc.add(1, '1', 1234, children=[1024]) self.assertIterRecursiveRefs([], obj, excluding=[1]) meliae-0.4.0/meliae/tests/test__scanner.py0000644000000000000000000004227611605576210020522 0ustar rootroot00000000000000# Copyright (C) 2009, 2010, 2011 Canonical Ltd # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License version 3 as # published by the Free Software Foundation. # # This program is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . """Tests for the object scanner.""" import gc import sys import tempfile import types import zlib from meliae import ( _scanner, tests, ) STRING_BASE = 8 STRING_SCALING = 4 if sys.version_info[:2] >= (2, 7): # In python2.7 they 'shrunk' strings by a couple bytes, by changing where # the pointer was. So their base size is a few bytes smaller STRING_BASE = 5 class TestSizeOf(tests.TestCase): def assertSizeOf(self, num_words, obj, extra_size=0, has_gc=True): expected_size = extra_size + num_words * _scanner._word_size if has_gc: expected_size += _scanner._gc_head_size self.assertEqual(expected_size, _scanner.size_of(obj)) def test_empty_string(self): fixed_size = STRING_BASE self.assertSizeOf(STRING_SCALING, '', extra_size=0+STRING_BASE, has_gc=False) def test_short_string(self): self.assertSizeOf(STRING_SCALING, 'a', extra_size=1+STRING_BASE, has_gc=False) def test_long_string(self): self.assertSizeOf(STRING_SCALING, ('abcd'*25)*1024, extra_size=100*1024+STRING_BASE, has_gc=False) def test_tuple(self): self.assertSizeOf(3, ()) def test_tuple_one(self): self.assertSizeOf(3+1, ('a',)) def test_tuple_n(self): self.assertSizeOf(3+3, (1, 2, 3)) def test_empty_list(self): self.assertSizeOf(5, []) def test_list_with_one(self): self.assertSizeOf(5+1, [1]) def test_list_with_three(self): self.assertSizeOf(5+3, [1, 2, 3]) def test_int(self): self.assertSizeOf(3, 1, has_gc=False) def test_list_appended(self): # Lists over-allocate when you append to them, we want the *allocated* # size lst = [] lst.append(1) self.assertSizeOf(5+4, lst) def test_empty_set(self): self.assertSizeOf(25, set()) self.assertSizeOf(25, frozenset()) def test_small_sets(self): self.assertSizeOf(25, set(range(1))) self.assertSizeOf(25, set(range(2))) self.assertSizeOf(25, set(range(3))) self.assertSizeOf(25, set(range(4))) self.assertSizeOf(25, set(range(5))) self.assertSizeOf(25, frozenset(range(3))) def test_medium_sets(self): self.assertSizeOf(25 + 512*2, set(range(100))) self.assertSizeOf(25 + 512*2, frozenset(range(100))) def test_empty_dict(self): self.assertSizeOf(31, dict()) def test_small_dict(self): self.assertSizeOf(31, dict.fromkeys(range(1))) self.assertSizeOf(31, dict.fromkeys(range(2))) self.assertSizeOf(31, dict.fromkeys(range(3))) self.assertSizeOf(31, dict.fromkeys(range(4))) self.assertSizeOf(31, dict.fromkeys(range(5))) def test_medium_dict(self): self.assertSizeOf(31+512*3, dict.fromkeys(range(100))) def test_basic_types(self): type_size = 106 if sys.version_info[:2] >= (2, 6): type_size = 109 self.assertSizeOf(type_size, dict) self.assertSizeOf(type_size, set) self.assertSizeOf(type_size, tuple) def test_user_type(self): class Foo(object): pass if sys.version_info[:2] >= (2, 6): self.assertSizeOf(109, Foo) else: self.assertSizeOf(106, Foo) def test_simple_object(self): obj = object() self.assertSizeOf(2, obj, has_gc=False) def test_user_instance(self): class Foo(object): pass # This has a pointer to a dict and a weakref list f = Foo() self.assertSizeOf(4, f) def test_slotted_instance(self): class One(object): __slots__ = ['one'] # The basic object plus memory for one member self.assertSizeOf(3, One()) class Two(One): __slots__ = ['two'] self.assertSizeOf(4, Two()) def test_empty_unicode(self): self.assertSizeOf(6, u'', extra_size=0, has_gc=False) def test_small_unicode(self): self.assertSizeOf(6, u'a', extra_size=_scanner._unicode_size*1, has_gc=False) self.assertSizeOf(6, u'abcd', extra_size=_scanner._unicode_size*4, has_gc=False) self.assertSizeOf(6, u'\xbe\xe5', extra_size=_scanner._unicode_size*2, has_gc=False) def test_None(self): self.assertSizeOf(2, None, has_gc=False) def test__sizeof__instance(self): # __sizeof__ appears to have been introduced in python 2.6, and # is meant to return the number of bytes allocated to this # object. It does not include GC overhead, that seems to be added back # in as part of sys.getsizeof(). So meliae does the same in size_of() class CustomSize(object): def __init__(self, size): self.size = size def __sizeof__(self): return self.size self.assertSizeOf(0, CustomSize(10), 10, has_gc=True) self.assertSizeOf(0, CustomSize(20), 20, has_gc=True) # If we get '-1' as the size we assume something is wrong, and fall # back to the original size self.assertSizeOf(4, CustomSize(-1), has_gc=True) def test_size_of_special(self): class CustomWithoutSizeof(object): pass log = [] def _size_32(obj): log.append(obj) return 800 def _size_64(obj): log.append(obj) return 1600 obj = CustomWithoutSizeof() self.assertSizeOf(4, obj) _scanner.add_special_size('CustomWithoutSizeof', _size_32, _size_64) try: self.assertSizeOf(200, obj) finally: _scanner.add_special_size('CustomWithoutSizeof', None, None) self.assertEqual([obj], log) del log[:] self.assertSizeOf(4, obj) self.assertEqual([], log) def test_size_of_special_neg1(self): # Returning -1 falls back to the regular __sizeof__, etc interface class CustomWithoutSizeof(object): pass log = [] def _size_neg1(obj): log.append(obj) return -1 obj = CustomWithoutSizeof() self.assertSizeOf(4, obj) _scanner.add_special_size('CustomWithoutSizeof', _size_neg1, _size_neg1) try: self.assertSizeOf(4, obj) finally: _scanner.add_special_size('CustomWithoutSizeof', None, None) self.assertEqual([obj], log) def test_size_of_zlib_compress_obj(self): # zlib compress objects allocate a lot of extra buffers, we want to # track that. Note that we are approximating it, because we don't # actually inspect the C attributes. But it is a closer approximation # than not doing this. c = zlib.compressobj() self.assertTrue(_scanner.size_of(c) > 256000) self.assertEqual(0, _scanner.size_of(c) % _scanner._word_size) def test_size_of_zlib_decompress_obj(self): d = zlib.decompressobj() self.assertTrue(_scanner.size_of(d) > 30000) self.assertEqual(0, _scanner.size_of(d) % _scanner._word_size) def _string_to_json(s): out = ['"'] for c in s: if c <= '\x1f' or c > '\x7e': out.append(r'\u%04x' % ord(c)) elif c in r'\/"': # Simple escape out.append('\\' + c) else: out.append(c) out.append('"') return ''.join(out) def _unicode_to_json(u): out = ['"'] for c in u: if c <= u'\u001f' or c > u'\u007e': out.append(r'\u%04x' % ord(c)) elif c in ur'\/"': # Simple escape out.append('\\' + str(c)) else: out.append(str(c)) out.append('"') return ''.join(out) class TestJSONString(tests.TestCase): def assertJSONString(self, exp, input): self.assertEqual(exp, _string_to_json(input)) def test_empty_string(self): self.assertJSONString('""', '') def test_simple_strings(self): self.assertJSONString('"foo"', 'foo') self.assertJSONString('"aoeu aoeu"', 'aoeu aoeu') def test_simple_escapes(self): self.assertJSONString(r'"\\x\/y\""', r'\x/y"') def test_control_escapes(self): self.assertJSONString(r'"\u0000\u0001\u0002\u001f"', '\x00\x01\x02\x1f') class TestJSONUnicode(tests.TestCase): def assertJSONUnicode(self, exp, input): val = _unicode_to_json(input) self.assertEqual(exp, val) self.assertTrue(isinstance(val, str)) def test_empty(self): self.assertJSONUnicode('""', u'') def test_ascii_chars(self): self.assertJSONUnicode('"abcdefg"', u'abcdefg') def test_unicode_chars(self): self.assertJSONUnicode(r'"\u0012\u00b5\u2030\u001f"', u'\x12\xb5\u2030\x1f') def test_simple_escapes(self): self.assertJSONUnicode(r'"\\x\/y\""', ur'\x/y"') # A pure python implementation of dump_object_info def _py_dump_json_obj(obj): klass = getattr(obj, '__class__', None) if klass is None: # This is an old style class klass = type(obj) content = [( '{"address": %d' ', "type": %s' ', "size": %d' ) % (id(obj), _string_to_json(klass.__name__), _scanner.size_of(obj)) ] name = getattr(obj, '__name__', None) if name is not None: content.append(', "name": %s' % (_string_to_json(name),)) if getattr(obj, '__len__', None) is not None: content.append(', "len": %s' % (len(obj),)) if isinstance(obj, str): content.append(', "value": %s' % (_string_to_json(obj[:100]),)) elif isinstance(obj, unicode): content.append(', "value": %s' % (_unicode_to_json(obj[:100]),)) elif obj is True: content.append(', "value": "True"') elif obj is False: content.append(', "value": "False"') elif isinstance(obj, int): content.append(', "value": %d' % (obj,)) elif isinstance(obj, types.FrameType): content.append(', "value": "%s"' % (obj.f_code.co_name,)) first = True content.append(', "refs": [') ref_strs = [] for ref in gc.get_referents(obj): ref_strs.append('%d' % (id(ref),)) content.append(', '.join(ref_strs)) content.append(']') content.append('}\n') return ''.join(content) def py_dump_object_info(obj, nodump=None): if nodump is not None: if obj is nodump: return '' try: if obj in nodump: return '' except TypeError: # This is probably an 'unhashable' object, which means it can't be # put into a set, and thus we are sure it isn't in the 'nodump' # set. pass obj_info = _py_dump_json_obj(obj) # Now we walk again, for certain types we dump them directly child_vals = [] for ref in gc.get_referents(obj): if (isinstance(ref, (str, unicode, int, types.CodeType)) or ref is None or type(ref) is object): # These types have no traverse func, so we dump them right away if nodump is None or ref not in nodump: child_vals.append(_py_dump_json_obj(ref)) return obj_info + ''.join(child_vals) class TestPyDumpJSONObj(tests.TestCase): def assertDumpText(self, expected, obj): self.assertEqual(expected, _py_dump_json_obj(obj)) def test_str(self): mystr = 'a string' self.assertDumpText( '{"address": %d, "type": "str", "size": %d, "len": 8' ', "value": "a string", "refs": []}\n' % (id(mystr), _scanner.size_of(mystr)), mystr) mystr = 'a \\str/with"control' self.assertDumpText( '{"address": %d, "type": "str", "size": %d, "len": 19' ', "value": "a \\\\str\\/with\\"control", "refs": []}\n' % (id(mystr), _scanner.size_of(mystr)), mystr) def test_unicode(self): myu = u'a \xb5nicode' self.assertDumpText( '{"address": %d, "type": "unicode", "size": %d' ', "len": 9, "value": "a \\u00b5nicode", "refs": []}\n' % ( id(myu), _scanner.size_of(myu)), myu) def test_obj(self): obj = object() self.assertDumpText( '{"address": %d, "type": "object", "size": %d, "refs": []}\n' % (id(obj), _scanner.size_of(obj)), obj) def test_tuple(self): a = object() b = object() t = (a, b) self.assertDumpText( '{"address": %d, "type": "tuple", "size": %d' ', "len": 2, "refs": [%d, %d]}\n' % (id(t), _scanner.size_of(t), id(b), id(a)), t) def test_module(self): m = _scanner self.assertDumpText( '{"address": %d, "type": "module", "size": %d' ', "name": "meliae._scanner", "refs": [%d]}\n' % (id(m), _scanner.size_of(m), id(m.__dict__)), m) def test_bool(self): a = True b = False self.assertDumpText( '{"address": %d, "type": "bool", "size": %d' ', "value": "True", "refs": []}\n' % (id(a), _scanner.size_of(a)), a) self.assertDumpText( '{"address": %d, "type": "bool", "size": %d' ', "value": "False", "refs": []}\n' % (id(b), _scanner.size_of(b)), b) class TestDumpInfo(tests.TestCase): """dump_object_info should give the same result at py_dump_object_info""" def assertDumpInfo(self, obj, nodump=None): t = tempfile.TemporaryFile(prefix='meliae-') # On some platforms TemporaryFile returns a wrapper object with 'file' # being the real object, on others, the returned object *is* the real # file object t_file = getattr(t, 'file', t) _scanner.dump_object_info(t_file, obj, nodump=nodump) t.seek(0) as_bytes = t.read() self.assertEqual(py_dump_object_info(obj, nodump=nodump), as_bytes) as_list = [] _scanner.dump_object_info(as_list.append, obj, nodump=nodump) self.assertEqual(as_bytes, ''.join(as_list)) def test_dump_int(self): self.assertDumpInfo(1) def test_dump_tuple(self): obj1 = object() obj2 = object() t = (obj1, obj2) self.assertDumpInfo(t) def test_dump_dict(self): key = object() val = object() d = {key: val} self.assertDumpInfo(d) def test_class(self): class Foo(object): pass class Child(Foo): pass self.assertDumpInfo(Child) def test_module(self): self.assertDumpInfo(_scanner) def test_instance(self): class Foo(object): pass f = Foo() self.assertDumpInfo(f) def test_slot_instance(self): class One(object): __slots__ = ['one'] one = One() one.one = object() self.assertDumpInfo(one) class Two(One): __slots__ = ['two'] two = Two() two.one = object() two.two = object() self.assertDumpInfo(two) def test_None(self): self.assertDumpInfo(None) def test_str(self): self.assertDumpInfo('this is a short \x00 \x1f \xffstring\n') self.assertDumpInfo('a \\string / with " control chars') def test_long_str(self): self.assertDumpInfo('abcd'*1000) def test_unicode(self): self.assertDumpInfo(u'this is a short \u1234 \x00 \x1f \xffstring\n') def test_long_unicode(self): self.assertDumpInfo(u'abcd'*1000) def test_nodump(self): self.assertDumpInfo(None, nodump=set([None])) def test_ref_nodump(self): self.assertDumpInfo((None, None), nodump=set([None])) def test_nodump_the_nodump(self): nodump = set([None, 1]) t = (20, nodump) self.assertDumpInfo(t, nodump=nodump) def test_function(self): def myfunction(): pass self.assertDumpInfo(myfunction) def test_class(self): class MyClass(object): pass self.assertDumpInfo(MyClass, nodump=set([object])) inst = MyClass() self.assertDumpInfo(inst) def test_old_style_class(self): class MyOldClass: pass self.assertDumpInfo(MyOldClass) def test_bool(self): self.assertDumpInfo(True) self.assertDumpInfo(False) def test_frame(self): def local_frame(): f = sys._getframe() return f f = local_frame() self.assertDumpInfo(f) class TestGetReferents(tests.TestCase): def test_list_referents(self): l = ['one', 2, object(), 4.0] self.assertEqual(gc.get_referents(l), _scanner.get_referents(l)) meliae-0.4.0/meliae/tests/test_loader.py0000644000000000000000000004652411605576210020200 0ustar rootroot00000000000000# Copyright (C) 2009, 2010 Canonical Ltd # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License version 3 as # published by the Free Software Foundation. # # This program is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . """Read back in a dump file and process it""" import gzip import os import sys import tempfile from meliae import ( _loader, loader, scanner, tests, warn, ) # A simple dump, with a couple of cross references, etc. # a@5 = 1 # b@4 = 2 # c@6 = 'a str' # t@7 = (a, b) # d@2 = {a:b, c:t} # l@3 = [a, b] # l.append(l) # outer@1 = (d, l) _example_dump = [ '{"address": 1, "type": "tuple", "size": 20, "len": 2, "refs": [2, 3]}', '{"address": 3, "type": "list", "size": 44, "len": 3, "refs": [3, 4, 5]}', '{"address": 5, "type": "int", "size": 12, "value": 1, "refs": []}', '{"address": 4, "type": "int", "size": 12, "value": 2, "refs": []}', '{"address": 2, "type": "dict", "size": 124, "len": 2, "refs": [4, 5, 6, 7]}', '{"address": 7, "type": "tuple", "size": 20, "len": 2, "refs": [4, 5]}', '{"address": 6, "type": "str", "size": 29, "len": 5, "value": "a str"' ', "refs": []}', '{"address": 8, "type": "module", "size": 60, "name": "mymod", "refs": [2]}', ] # Note that this doesn't have a complete copy of the references. Namely when # you subclass object you get a lot of references, and type instances also # reference other stuff that tends to chain to stuff like 'sys', which ends up # referencing everything. _instance_dump = [ '{"address": 1, "type": "MyClass", "size": 32, "refs": [2, 3]}', '{"address": 3, "type": "type", "size": 452, "name": "MyClass", "refs": []}', '{"address": 2, "type": "dict", "size": 140, "len": 4' ', "refs": [4, 5, 6, 7, 9, 10, 11, 12]}', '{"address": 4, "type": "str", "size": 25, "len": 1, "value": "a", "refs": []}', '{"address": 5, "type": "int", "size": 12, "value": 1, "refs": []}', '{"address": 6, "type": "str", "size": 25, "len": 1, "value": "c", "refs": []}', '{"address": 7, "type": "dict", "size": 140, "len": 1, "refs": [8, 6]}', '{"address": 8, "type": "str", "size": 25, "len": 1, "value": "s", "refs": []}', '{"address": 9, "type": "str", "size": 25, "len": 1, "value": "b", "refs": []}', '{"address": 10, "type": "str", "size": 30, "len": 6' ', "value": "string", "refs": []}', '{"address": 11, "type": "str", "size": 25, "len": 1, "value": "d", "refs": []}', '{"address": 12, "type": "tuple", "size": 32, "len": 1, "refs": [13]}', '{"address": 13, "type": "int", "size": 12, "value": 2, "refs": []}', '{"address": 14, "type": "module", "size": 28, "name": "sys", "refs": [15]}', '{"address": 15, "type": "dict", "size": 140, "len": 2, "refs": [5, 6, 9, 6]}', ] _old_instance_dump = [ '{"address": 1, "type": "instance", "size": 36, "refs": [2, 3]}', '{"address": 3, "type": "dict", "size": 140, "len": 2, "refs": [4, 5, 6, 7]}', '{"address": 7, "type": "int", "size": 12, "value": 2, "refs": []}', '{"address": 6, "type": "str", "size": 25, "len": 1, "value": "b", "refs": []}', '{"address": 5, "type": "int", "size": 12, "value": 1, "refs": []}', '{"address": 4, "type": "str", "size": 25, "len": 1, "value": "a", "refs": []}', '{"address": 2, "type": "classobj", "size": 48, "name": "OldStyle"' ', "refs": [8, 43839680, 9]}', '{"address": 9, "type": "str", "size": 32, "len": 8, "value": "OldStyle"' ', "refs": []}', '{"address": 8, "type": "tuple", "size": 28, "len": 0, "refs": []}', ] _intern_dict_dump = [ '{"address": 2, "type": "str", "size": 25, "len": 1, "value": "a", "refs": []}', '{"address": 3, "type": "str", "size": 25, "len": 1, "value": "b", "refs": []}', '{"address": 4, "type": "str", "size": 25, "len": 1, "value": "c", "refs": []}', '{"address": 5, "type": "str", "size": 25, "len": 1, "value": "d", "refs": []}', '{"address": 6, "type": "dict", "size": 512, "refs": [2, 5, 5, 5, 4, 4, 3, 3]}', '{"address": 7, "type": "dict", "size": 512, "refs": [6, 6, 5, 5, 4, 4, 3, 3]}', '{"address": 8, "type": "dict", "size": 512, "refs": [2, 2, 5, 5, 4, 4, 3, 3]}', ] class TestLoad(tests.TestCase): def test_load_smoketest(self): test_dict = {1:2, None:'a string'} t = tempfile.TemporaryFile(prefix='meliae-') # On some platforms TemporaryFile returns a wrapper object with 'file' # being the real object, on others, the returned object *is* the real # file object t_file = getattr(t, 'file', t) scanner.dump_all_referenced(t_file, test_dict) t_file.seek(0) manager = loader.load(t_file, show_prog=False) test_dict_id = id(test_dict) self.assertTrue(test_dict_id in manager.objs, '%s not found in %s' % (test_dict_id, manager.objs.keys())) def test_load_one(self): objs = loader.load([ '{"address": 1234, "type": "int", "size": 12, "value": 10' ', "refs": []}'], show_prog=False).objs keys = objs.keys() self.assertEqual([1234], keys) obj = objs[1234] self.assertTrue(isinstance(obj, _loader._MemObjectProxy)) # The address should be exactly the same python object as the key in # the objs dictionary. self.assertTrue(keys[0] is obj.address) def test_load_without_simplejson(self): objs = loader.load([ '{"address": 1234, "type": "int", "size": 12, "value": 10' ', "refs": []}', '{"address": 2345, "type": "module", "size": 60, "name": "mymod"' ', "refs": [1234]}', '{"address": 4567, "type": "str", "size": 150, "len": 126' ', "value": "Test \\\'whoami\\\'\\u000a\\"Your name\\"' ', "refs": []}' ], using_json=False, show_prog=False).objs keys = sorted(objs.keys()) self.assertEqual([1234, 2345, 4567], keys) obj = objs[1234] self.assertTrue(isinstance(obj, _loader._MemObjectProxy)) # The address should be exactly the same python object as the key in # the objs dictionary. self.assertTrue(keys[0] is obj.address) self.assertEqual(10, obj.value) obj = objs[2345] self.assertEqual("module", obj.type_str) self.assertEqual("mymod", obj.value) obj = objs[4567] # Known failure? We don't unescape properly, also, I'm surprised this # works. " should exit the " string, but \" seems to leave it. But the # '\' is also left verbatim because it is a raw string... self.assertEqual(r"Test \'whoami\'\u000a\"Your name\"", obj.value) def test_load_example(self): objs = loader.load(_example_dump, show_prog=False) def test_load_defaults_to_computing_and_collapsing(self): manager = loader.load(_instance_dump, show_prog=False, collapse=False) instance_obj = manager[1] self.assertEqual([2, 3], instance_obj.children) manager = loader.load(_instance_dump, show_prog=False) instance_obj = manager[1] self.assertEqual([4, 5, 6, 7, 9, 10, 11, 12, 3], instance_obj.children) def test_load_compressed(self): # unfortunately NamedTemporaryFile's cannot be re-opened on Windows fd, name = tempfile.mkstemp(prefix='meliae-') f = os.fdopen(fd, 'wb') try: content = gzip.GzipFile(mode='wb', compresslevel=6, fileobj=f) for line in _example_dump: content.write(line + '\n') content.flush() content.close() del content f.close() objs = loader.load(name, show_prog=False).objs objs[1] finally: f.close() os.remove(name) def test_get_all(self): om = loader.load(_example_dump, show_prog=False) the_ints = om.get_all('int') self.assertEqual(2, len(the_ints)) self.assertEqual([4, 5], sorted([i.address for i in the_ints])) def test_one(self): om = loader.load(_example_dump, show_prog=False) an_int = om[5] self.assertEqual(5, an_int.address) self.assertEqual('int', an_int.type_str) class TestRemoveExpensiveReferences(tests.TestCase): def test_remove_expensive_references(self): lines = list(_example_dump) lines.pop(-1) # Remove the old module lines.append('{"address": 8, "type": "module", "size": 12' ', "name": "mymod", "refs": [9]}') lines.append('{"address": 9, "type": "dict", "size": 124' ', "refs": [10, 11]}') lines.append('{"address": 10, "type": "module", "size": 12' ', "name": "mod2", "refs": [12]}') lines.append('{"address": 11, "type": "str", "size": 27' ', "value": "boo", "refs": []}') lines.append('{"address": 12, "type": "dict", "size": 124' ', "refs": []}') source = lambda:loader.iter_objs(lines) mymod_dict = list(source())[8] self.assertEqual([10, 11], mymod_dict.children) result = list(loader.remove_expensive_references(source)) null_obj = result[0][1] self.assertEqual(0, null_obj.address) self.assertEqual('', null_obj.type_str) self.assertEqual([11, 0], result[9][1].children) class TestMemObj(tests.TestCase): def test_to_json(self): manager = loader.load(_example_dump, show_prog=False, collapse=False) objs = manager.objs.values() objs.sort(key=lambda x:x.address) expected = [ '{"address": 1, "type": "tuple", "size": 20, "refs": [2, 3]}', '{"address": 2, "type": "dict", "size": 124, "refs": [4, 5, 6, 7]}', '{"address": 3, "type": "list", "size": 44, "refs": [3, 4, 5]}', '{"address": 4, "type": "int", "size": 12, "value": 2, "refs": []}', '{"address": 5, "type": "int", "size": 12, "value": 1, "refs": []}', '{"address": 6, "type": "str", "size": 29, "value": "a str", "refs": []}', '{"address": 7, "type": "tuple", "size": 20, "refs": [4, 5]}', '{"address": 8, "type": "module", "size": 60, "value": "mymod", "refs": [2]}', ] self.assertEqual(expected, [obj.to_json() for obj in objs]) class TestObjManager(tests.TestCase): def test_compute_parents(self): manager = loader.load(_example_dump, show_prog=False) objs = manager.objs self.assertEqual((), objs[1].parents) self.assertEqual([1, 3], objs[3].parents) self.assertEqual([3, 7, 8], sorted(objs[4].parents)) self.assertEqual([3, 7, 8], sorted(objs[5].parents)) self.assertEqual([8], objs[6].parents) self.assertEqual([8], objs[7].parents) self.assertEqual((), objs[8].parents) def test_compute_referrers(self): # Deprecated logged = [] def log_warn(msg, klass, stacklevel=None): logged.append((msg, klass, stacklevel)) old_func = warn.trap_warnings(log_warn) try: manager = loader.load(_example_dump, show_prog=False) manager.compute_referrers() self.assertEqual([('.compute_referrers is deprecated.' ' Use .compute_parents instead.', DeprecationWarning, 3), ], logged) objs = manager.objs finally: warn.trap_warnings(old_func) self.assertEqual((), objs[1].parents) self.assertEqual([1, 3], objs[3].parents) self.assertEqual([3, 7, 8], sorted(objs[4].parents)) self.assertEqual([3, 7, 8], sorted(objs[5].parents)) self.assertEqual([8], objs[6].parents) self.assertEqual([8], objs[7].parents) self.assertEqual((), objs[8].parents) def test_compute_parents_ignore_repeated(self): manager = loader.load(_intern_dict_dump, show_prog=False) str_5 = manager[5] # Each of these refers to str_5 multiple times, but they should only # show up 1 time in the parent list. self.assertEqual([6, 7, 8], sorted(str_5.parents)) def test_compute_parents_no_parents(self): manager = loader.load(_intern_dict_dump, show_prog=False, max_parents=0) str_5 = manager[5] # Each of these refers to str_5 multiple times, but they should only # show up 1 time in the parent list. self.assertEqual([], sorted(str_5.parents)) def test_compute_parents_many_parents(self): content = [ '{"address": 2, "type": "str", "size": 25, "len": 1, "value": "a", "refs": []}', ] for x in xrange(200): content.append('{"address": %d, "type": "tuple", "size": 20,' ' "len": 2, "refs": [2, 2]}' % (x+100)) # By default, we only track 100 parents manager = loader.load(content, show_prog=False) self.assertEqual(100, manager[2].num_parents) manager = loader.load(content, show_prog=False, max_parents=0) self.assertEqual(0, manager[2].num_parents) manager = loader.load(content, show_prog=False, max_parents=-1) self.assertEqual(200, manager[2].num_parents) manager = loader.load(content, show_prog=False, max_parents=10) self.assertEqual(10, manager[2].num_parents) def test_compute_total_size(self): manager = loader.load(_example_dump, show_prog=False) objs = manager.objs manager.compute_total_size(objs[8]) self.assertEqual(257, objs[8].total_size) def test_compute_total_size_missing_ref(self): lines = list(_example_dump) # 999 isn't in the dump, not sure how we get these in real life, but # they exist. we should live with references that can't be resolved. lines[-1] = ('{"address": 8, "type": "tuple", "size": 16, "len": 1' ', "refs": [999]}') manager = loader.load(lines, show_prog=False) obj = manager[8] manager.compute_total_size(obj) self.assertEqual(16, obj.total_size) def test_remove_expensive_references(self): lines = list(_example_dump) lines.pop(-1) # Remove the old module lines.append('{"address": 8, "type": "module", "size": 12' ', "name": "mymod", "refs": [9]}') lines.append('{"address": 9, "type": "dict", "size": 124' ', "refs": [10, 11]}') lines.append('{"address": 10, "type": "module", "size": 12' ', "name": "mod2", "refs": [12]}') lines.append('{"address": 11, "type": "str", "size": 27' ', "value": "boo", "refs": []}') lines.append('{"address": 12, "type": "dict", "size": 124' ', "refs": []}') manager = loader.load(lines, show_prog=False, collapse=False) mymod_dict = manager.objs[9] self.assertEqual([10, 11], mymod_dict.children) manager.remove_expensive_references() self.assertTrue(0 in manager.objs) null_obj = manager.objs[0] self.assertEqual(0, null_obj.address) self.assertEqual('', null_obj.type_str) self.assertEqual([11, 0], mymod_dict.children) def test_collapse_instance_dicts(self): manager = loader.load(_instance_dump, show_prog=False, collapse=False) # This should collapse all of the references from the instance's dict # @2 into the instance @1 instance = manager.objs[1] self.assertEqual(32, instance.size) self.assertEqual([2, 3], instance.children) inst_dict = manager.objs[2] self.assertEqual(140, inst_dict.size) self.assertEqual([4, 5, 6, 7, 9, 10, 11, 12], inst_dict.children) mod = manager.objs[14] self.assertEqual([15], mod.children) mod_dict = manager.objs[15] self.assertEqual([5, 6, 9, 6], mod_dict.children) manager.compute_parents() tpl = manager.objs[12] self.assertEqual([2], tpl.parents) self.assertEqual([1], inst_dict.parents) self.assertEqual([14], mod_dict.parents) manager.collapse_instance_dicts() # The instance dict has been removed self.assertEqual([4, 5, 6, 7, 9, 10, 11, 12, 3], instance.children) self.assertEqual(172, instance.size) self.assertFalse(2 in manager.objs) self.assertEqual([1], tpl.parents) self.assertEqual([5, 6, 9, 6], mod.children) self.assertFalse(15 in manager.objs) def test_collapse_old_instance_dicts(self): manager = loader.load(_old_instance_dump, show_prog=False, collapse=False) instance = manager.objs[1] self.assertEqual('instance', instance.type_str) self.assertEqual(36, instance.size) self.assertEqual([2, 3], instance.children) inst_dict = manager[3] self.assertEqual(140, inst_dict.size) self.assertEqual([4, 5, 6, 7], inst_dict.children) manager.compute_parents() manager.collapse_instance_dicts() # The instance dict has been removed, and its references moved into the # instance, further, the type has been updated from generic 'instance' # to being 'OldStyle'. self.assertFalse(3 in manager.objs) self.assertEqual(176, instance.size) self.assertEqual([4, 5, 6, 7, 2], instance.children) self.assertEqual('OldStyle', instance.type_str) def test_expand_refs_as_dict(self): # TODO: This test fails if simplejson is not installed, because the # regex extractor does not cast to integers (they stay as # strings). We could fix the test, or fix the extractor. manager = loader.load(_instance_dump, show_prog=False, collapse=False) as_dict = manager.refs_as_dict(manager[15]) self.assertEqual({1: 'c', 'b': 'c'}, as_dict) manager.compute_parents() manager.collapse_instance_dicts() self.assertEqual({1: 'c', 'b': 'c'}, manager.refs_as_dict(manager[14])) self.assertEqual({'a': 1, 'c': manager[7], 'b': 'string', 'd': manager[12]}, manager.refs_as_dict(manager[1])) def test_expand_refs_as_list(self): # TODO: This test fails if simplejson is not installed, because the # regex extractor does not cast to integers (they stay as # strings). We could fix the test, or fix the extractor. manager = loader.load(_instance_dump, show_prog=False) self.assertEqual([2], manager.refs_as_list(manager[12])) def test_guess_intern_dict(self): manager = loader.load(_intern_dict_dump, show_prog=False) obj = manager.guess_intern_dict() self.assertEqual(8, obj.address) def test_summarize_refs(self): manager = loader.load(_example_dump, show_prog=False) summary = manager.summarize(manager[8]) # Note that the module is included in the summary self.assertEqual(['int', 'module', 'str', 'tuple'], sorted(summary.type_summaries.keys())) self.assertEqual(257, summary.total_size) def test_summarize_excluding(self): manager = loader.load(_example_dump, show_prog=False) summary = manager.summarize(manager[8], excluding=[4, 5]) # No ints when they are explicitly filtered self.assertEqual(['module', 'str', 'tuple'], sorted(summary.type_summaries.keys())) self.assertEqual(233, summary.total_size) meliae-0.4.0/meliae/tests/test_perf_counter.py0000644000000000000000000000667311605576210021426 0ustar rootroot00000000000000# Copyright (C) 2009, 2010 Canonical Ltd # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License version 3 as # published by the Free Software Foundation. # # This program is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . import subprocess import sys from meliae import ( perf_counter, tests, ) class _FakeTimer(object): def __init__(self): self._current = 0.0 def __call__(self): self._current += 0.5 return self._current class Test_Counter(tests.TestCase): def test_tick_tock(self): counter = perf_counter._Counter('test', _FakeTimer()) counter.tick() counter.tock() self.assertEqual(1, counter.count) self.assertEqual(0.5, counter.time_spent) self.assertEqual(0.0, counter.time_stddev()) counter.tock() self.assertEqual(2, counter.count) self.assertEqual(0.5, counter.time_spent) class TestPerformanceCounter(tests.TestCase): def setUp(self): super(TestPerformanceCounter, self).setUp() perf_counter.perf_counter.reset() def tearDown(self): perf_counter.perf_counter.reset() super(TestPerformanceCounter, self).tearDown() def test_perf_counter_is_not_none(self): self.assertNotEqual(None, perf_counter.perf_counter) def test_create_counter(self): counter = perf_counter.perf_counter.get_counter('test-counter') self.assertEqual('test-counter', counter.name) self.assertEqual(counter._timer, perf_counter.perf_counter.get_timer()) self.assertTrue('test-counter' in perf_counter.perf_counter._counters) def test_get_counter(self): counter = perf_counter.perf_counter.get_counter('test-counter') counter2 = perf_counter.perf_counter.get_counter('test-counter') self.assertTrue(counter is counter2) def test_get_memory(self): # we don't have a great way to actually measure that the peak-memory # value is accurate, but we can at least try # Create a very large string, and then delete it. p = subprocess.Popen([sys.executable, '-c', 'x = "abcd"*(10*1000*1000); del x;' 'import sys;' 'sys.stdout.write(sys.stdin.read(3));' 'sys.stdout.flush();' 'sys.stdout.write(sys.stdin.read(4));' 'sys.stdout.flush(); sys.stdout.close()'], stdin=subprocess.PIPE, stdout=subprocess.PIPE) p.stdin.write('pre') p.stdin.flush() p.stdout.read(3) cur_mem, peak_mem = perf_counter.perf_counter.get_memory(p) if cur_mem is None or peak_mem is None: # fail gracefully, though we may want a stronger assertion here return self.assertTrue(isinstance(cur_mem, (int, long))) self.assertTrue(isinstance(peak_mem, (int, long))) p.stdin.write('post') p.stdin.flush() p.stdout.read() self.assertEqual(0, p.wait()) # We allocated a 40MB string, we should have peaked at at least 20MB more # than we are using now. self.assertTrue(peak_mem > cur_mem + 2*1000*1000) meliae-0.4.0/meliae/tests/test_scanner.py0000644000000000000000000001134111605576210020350 0ustar rootroot00000000000000# Copyright (C) 2009 Canonical Ltd # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License version 3 as # published by the Free Software Foundation. # # This program is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . """The core routines for scanning python references and dumping memory info.""" import tempfile from meliae import ( scanner, tests, ) from meliae.tests import test__scanner class TestDumpAllReferenced(tests.TestCase): def assertDumpAllReferenced(self, ref_objs, obj, is_pending=False): t = tempfile.TemporaryFile(prefix='meliae-') # On some platforms TemporaryFile returns a wrapper object with 'file' # being the real object, on others, the returned object *is* the real # file object t_file = getattr(t, 'file', t) scanner.dump_all_referenced(t_file, obj, is_pending=is_pending) t.flush() t.seek(0) # We don't care if the same entries are printed multiple times, just # that they are all correct lines = t.readlines() # py_dump_object_info will create a string that covers multpile lines, # so we need to split it back into 1-line-per-record ref_lines = [test__scanner.py_dump_object_info(ref_obj) for ref_obj in ref_objs] ref_lines = set(''.join(ref_lines).splitlines(True)) self.assertEqual(sorted(ref_lines), sorted(lines)) def test_dump_str(self): s = 'a test string' self.assertDumpAllReferenced([s], s) def test_dump_obj(self): o = object() self.assertDumpAllReferenced([o], o) def test_dump_simple_tuple(self): k = 10245 v = 'a value string' t = (k, v) self.assertDumpAllReferenced([k, v, t], t) def test_dump_list_of_tuple(self): k = 10245 v = 'a value string' t = (k, v) l = [k, v, t] self.assertDumpAllReferenced([k, v, l, t], l) def test_dump_list_of_tuple_is_pending(self): k = 10245 v = 'a value string' t = (v,) l = [t, k] self.assertDumpAllReferenced([k, t, v], l, is_pending=True) def test_dump_recursive(self): a = 1 b = 'str' c = {} l = [a, b, c] c[a] = l # We have a reference cycle here, but we should not loop endlessly :) self.assertDumpAllReferenced([a, b, c, l], l) self.assertDumpAllReferenced([a, b, c, l], c) class TestGetRecursiveSize(tests.TestCase): def assertRecursiveSize(self, n_objects, total_size, obj): self.assertEqual((n_objects, total_size), scanner.get_recursive_size(obj)) def test_single_object(self): i = 1 self.assertRecursiveSize(1, scanner.size_of(i), i) d = {} self.assertRecursiveSize(1, scanner.size_of(d), d) l = [] self.assertRecursiveSize(1, scanner.size_of(l), l) def test_referenced(self): s1 = 'this is a simple string' s2 = 'and this is another one' s3 = s1 + s2 s4 = 'this is a' + ' simple string'# same as s1, but diff object self.assertTrue(s1 is not s4) self.assertTrue(s1 == s4) d = {s1:s2, s3:s4} total_size = (scanner.size_of(s1) + scanner.size_of(s2) + scanner.size_of(s3) + scanner.size_of(s4) + scanner.size_of(d)) self.assertRecursiveSize(5, total_size, d) def test_recursive(self): s1 = 'this is a simple string' s2 = 'and this is another one' l = [s1, s2] l.append(l) total_size = (scanner.size_of(s1) + scanner.size_of(s2) + scanner.size_of(l)) self.assertRecursiveSize(3, total_size, l) class TestGetRecursiveItems(tests.TestCase): def assertRecursiveItems(self, expected, root): actual = scanner.get_recursive_items(root) self.assertEqual(len(expected), len(actual)) by_id = dict((id(x), x) for x in actual) for obj in expected: id_obj = id(obj) self.assertTrue(id_obj in by_id) self.assertTrue(by_id[id_obj] is obj) def test_single(self): a = 1 self.assertRecursiveItems([a], a) def test_dict(self): a = 1 b = 2 c = 'three' d = 'four' dd = {a:b, c:d} self.assertRecursiveItems([a, b, c, d, dd], dd)