pax_global_header00006660000000000000000000000064121745074630014523gustar00rootroot0000000000000052 comment=40fc9389c4e18510ba2a19c8bd8502fdbdd7e165 joblib-0.7.1/000077500000000000000000000000001217450746300127715ustar00rootroot00000000000000joblib-0.7.1/.bzrignore000066400000000000000000000000621217450746300147710ustar00rootroot00000000000000build .coverage **/.coverage dist joblib.egg-info joblib-0.7.1/.gitignore000066400000000000000000000006311217450746300147610ustar00rootroot00000000000000*.py[oc] *.so # setup.py working directory build # setup.py dist directory dist # Editor temporary/working/backup files *$ .*.sw[nop] .sw[nop] *~ [#]*# .#* *.bak *.tmp *.tgz *.rej *.org .project *.diff .settings/ *.svn/ # Egg metadata *.egg-info # The shelf plugin uses this dir ./.shelf # Mac droppings .DS_Store doc/documentation.zip doc/generated doc/CHANGES.rst doc/README.rst # Coverage report .coverage joblib-0.7.1/.mailmap000066400000000000000000000007231217450746300144140ustar00rootroot00000000000000Gael Varoquaux Gael Varoquaux Gael Varoquaux Gael varoquaux Gael Varoquaux GaelVaroquaux Gael Varoquaux Gael VAROQUAUX Gael Varoquaux gvaroquaux joblib-0.7.1/.travis.yml000066400000000000000000000003051217450746300151000ustar00rootroot00000000000000language: python python: - "2.6" - "2.7" - "3.2" - "3.3" before_install: - pip --quiet install --use-mirrors numpy install: - python setup.py install script: - make joblib-0.7.1/CHANGES.rst000066400000000000000000000155171217450746300146040ustar00rootroot00000000000000Latest changes =============== Release 0.7.1 --------------- 2013-07-25 Gael Varoquaux MISC: capture meaningless argument (n_jobs=0) in Parallel 2013-07-09 Lars Buitinck ENH Handles tuples, sets and Python 3's dict_keys type the same as lists. in pre_dispatch 2013-05-23 Martin Luessi ENH: fix function caching for IPython Release 0.7.0 --------------- **This release drops support for Python 2.5 in favor of support for Python 3.0** 2013-02-13 Gael Varoquaux BUG: fix nasty hash collisions 2012-11-19 Gael Varoquaux ENH: Parallel: Turn of pre-dispatch for already expanded lists Gael Varoquaux 2012-11-19 ENH: detect recursive sub-process spawning, as when people do not protect the __main__ in scripts under Windows, and raise a useful error. Gael Varoquaux 2012-11-16 ENH: Full python 3 support Release 0.6.5 --------------- 2012-09-15 Yannick Schwartz BUG: make sure that sets and dictionnaries give reproducible hashes 2012-07-18 Marek Rudnicki BUG: make sure that object-dtype numpy array hash correctly 2012-07-12 GaelVaroquaux BUG: Bad default n_jobs for Parallel Release 0.6.4 --------------- 2012-05-07 Vlad Niculae ENH: controlled randomness in tests and doctest fix 2012-02-21 GaelVaroquaux ENH: add verbosity in memory 2012-02-21 GaelVaroquaux BUG: non-reproducible hashing: order of kwargs The ordering of a dictionnary is random. As a result the function hashing was not reproducible. Pretty hard to test Release 0.6.3 --------------- 2012-02-14 GaelVaroquaux BUG: fix joblib Memory pickling 2012-02-11 GaelVaroquaux BUG: fix hasher with Python 3 2012-02-09 GaelVaroquaux API: filter_args: `*args, **kwargs -> args, kwargs` Release 0.6.2 --------------- 2012-02-06 Gael Varoquaux BUG: make sure Memory pickles even if cachedir=None Release 0.6.1 --------------- Bugfix release because of a merge error in release 0.6.0 Release 0.6.0 --------------- **Beta 3** 2012-01-11 Gael Varoquaux BUG: ensure compatibility with old numpy DOC: update installation instructions BUG: file semantic to work under Windows 2012-01-10 Yaroslav Halchenko BUG: a fix toward 2.5 compatibility **Beta 2** 2012-01-07 Gael Varoquaux ENH: hash: bugware to be able to hash objects defined interactively in IPython 2012-01-07 Gael Varoquaux ENH: Parallel: warn and not fail for nested loops ENH: Parallel: n_jobs=-2 now uses all CPUs but one 2012-01-01 Juan Manuel Caicedo Carvajal and Gael Varoquaux ENH: add verbosity levels in Parallel Release 0.5.7 --------------- 2011-12-28 Gael varoquaux API: zipped -> compress 2011-12-26 Gael varoquaux ENH: Add a zipped option to Memory API: Memory no longer accepts save_npy 2011-12-22 Kenneth C. Arnold and Gael varoquaux BUG: fix numpy_pickle for array subclasses 2011-12-21 Gael varoquaux ENH: add zip-based pickling 2011-12-19 Fabian Pedregosa Py3k: compatibility fixes. This makes run fine the tests test_disk and test_parallel Release 0.5.6 --------------- 2011-12-11 Lars Buitinck ENH: Replace os.path.exists before makedirs with exception check New disk.mkdirp will fail with other errnos than EEXIST. 2011-12-10 Bala Subrahmanyam Varanasi MISC: pep8 compliant Release 0.5.5 --------------- 2011-19-10 Fabian Pedregosa ENH: Make joblib installable under Python 3.X Release 0.5.4 --------------- 2011-09-29 Jon Olav Vik BUG: Make mangling path to filename work on Windows 2011-09-25 Olivier Grisel FIX: doctest heisenfailure on execution time 2011-08-24 Ralf Gommers STY: PEP8 cleanup. Release 0.5.3 --------------- 2011-06-25 Gael varoquaux API: All the usefull symbols in the __init__ Release 0.5.2 --------------- 2011-06-25 Gael varoquaux ENH: Add cpu_count 2011-06-06 Gael varoquaux ENH: Make sure memory hash in a reproducible way Release 0.5.1 --------------- 2011-04-12 Gael varoquaux TEST: Better testing of parallel and pre_dispatch Yaroslav Halchenko 2011-04-12 DOC: quick pass over docs -- trailing spaces/spelling Yaroslav Halchenko 2011-04-11 ENH: JOBLIB_MULTIPROCESSING env var to disable multiprocessing from the environment Alexandre Gramfort 2011-04-08 ENH : adding log message to know how long it takes to load from disk the cache Release 0.5.0 --------------- 2011-04-01 Gael varoquaux BUG: pickling MemoizeFunc does not store timestamp 2011-03-31 Nicolas Pinto TEST: expose hashing bug with cached method 2011-03-26...2011-03-27 Pietro Berkes BUG: fix error management in rm_subdirs BUG: fix for race condition during tests in mem.clear() Gael varoquaux 2011-03-22...2011-03-26 TEST: Improve test coverage and robustness Gael varoquaux 2011-03-19 BUG: hashing functions with only \*var \**kwargs Gael varoquaux 2011-02-01... 2011-03-22 BUG: Many fixes to capture interprocess race condition when mem.cache is used by several processes on the same cache. Fabian Pedregosa 2011-02-28 First work on Py3K compatibility Gael varoquaux 2011-02-27 ENH: pre_dispatch in parallel: lazy generation of jobs in parallel for to avoid drowning memory. GaelVaroquaux 2011-02-24 ENH: Add the option of overloading the arguments of the mother 'Memory' object in the cache method that is doing the decoration. Gael varoquaux 2010-11-21 ENH: Add a verbosity level for more verbosity Release 0.4.6 ---------------- Gael varoquaux 2010-11-15 ENH: Deal with interruption in parallel Gael varoquaux 2010-11-13 BUG: Exceptions raised by Parallel when n_job=1 are no longer captured. Gael varoquaux 2010-11-13 BUG: Capture wrong arguments properly (better error message) Release 0.4.5 ---------------- Pietro Berkes 2010-09-04 BUG: Fix Windows peculiarities with path separators and file names BUG: Fix more windows locking bugs Gael varoquaux 2010-09-03 ENH: Make sure that exceptions raised in Parallel also inherit from the original exception class ENH: Add a shadow set of exceptions Fabian Pedregosa 2010-09-01 ENH: Clean up the code for parallel. Thanks to Fabian Pedregosa for the patch. Release 0.4.4 ---------------- Gael varoquaux 2010-08-23 BUG: Fix Parallel on computers with only one CPU, for n_jobs=-1. Gael varoquaux 2010-08-02 BUG: Fix setup.py for extra setuptools args. Gael varoquaux 2010-07-29 MISC: Silence tests (and hopefuly Yaroslav :P) Release 0.4.3 ---------------- Gael Varoquaux 2010-07-22 BUG: Fix hashing for function with a side effect modifying their input argument. Thanks to Pietro Berkes for reporting the bug and proving the patch. Release 0.4.2 ---------------- Gael Varoquaux 2010-07-16 BUG: Make sure that joblib still works with Python2.5. => release 0.4.2 Release 0.4.1 ---------------- joblib-0.7.1/MANIFEST.in000066400000000000000000000002001217450746300145170ustar00rootroot00000000000000include *.txt *.py recursive-include joblib *.rst *.py graft doc graft doc/_static graft doc/_templates global-exclude *~ *.swp joblib-0.7.1/Makefile000066400000000000000000000001501217450746300144250ustar00rootroot00000000000000 all: test test: nosetests test-no-multiprocessing: export JOBLIB_MULTIPROCESSING=0 && nosetests joblib-0.7.1/README.rst000066400000000000000000000105501217450746300144610ustar00rootroot00000000000000The homepage of joblib with user documentation is located on: http://packages.python.org/joblib/ Getting the latest code ========================= To get the latest code using git, simply type:: git clone git://github.com/joblib/joblib.git If you don't have git installed, you can download a zip or tarball of the latest code: http://github.com/joblib/joblib/archives/master Installing ========================= As any Python packages, to install joblib, simply do:: python setup.py install in the source code directory. Joblib has no other mandatory dependency than Python (at least version 2.6). Numpy (at least version 1.3) is an optional dependency for array manipulation. Workflow to contribute ========================= To contribute to joblib, first create an account on `github `_. Once this is done, fork the `joblib repository `_ to have you own repository, clone it using 'git clone' on the computers where you want to work. Make your changes in your clone, push them to your github account, test them on several computer, and when you are happy with them, send a pull request to the main repository. Running the test suite ========================= To run the test suite, you need nosetests and the coverage modules. Run the test suite using:: nosetests from the root of the project. .. image:: https://secure.travis-ci.org/joblib/joblib.png :target: https://secure.travis-ci.org/joblib/joblib :alt: Build status :align: right Building the docs ========================= To build the docs you need to have setuptools and sphinx (>=0.5) installed. Run the command:: python setup.py build_sphinx The docs are built in the build/sphinx/html directory. Making a source tarball ========================= To create a source tarball, eg for packaging or distributing, run the following command:: python setup.py sdist The tarball will be created in the `dist` directory. This command will compile the docs, and the resulting tarball can be installed with no extra dependencies than the Python standard library. You will need setuptool and sphinx. Making a release and uploading it to PyPI ================================================== This command is only run by project manager, to make a release, and upload in to PyPI:: python setup.py sdist bdist_egg register upload Updating the changelog ======================== Changes are listed in the CHANGES.rst file. They must be manually updated but, the following git command may be used to generate the lines:: git log --abbrev-commit --date=short --no-merges --sparse Licensing ---------- joblib is **BSD-licenced** (3 clause): This software is OSI Certified Open Source Software. OSI Certified is a certification mark of the Open Source Initiative. Copyright (c) 2009-2011, joblib developpers All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Gael Varoquaux. nor the names of other joblib contributors may be used to endorse or promote products derived from this software without specific prior written permission. **This software is provided by the copyright holders and contributors "as is" and any express or implied warranties, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose are disclaimed. In no event shall the copyright owner or contributors be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage.** joblib-0.7.1/TODO.rst000066400000000000000000000032601217450746300142710ustar00rootroot00000000000000Tasks at hand on joblib, in increasing order of difficulty. * Add a changelog! * In parallel: need to deal with return arguments that don't pickle. * Improve test coverage and documentation * Store a repr of the arguments for each call in the corresponding cachedir * Try to use Mike McKerns's Dill pickling module in Parallel: Implementation idea: * Create a new function that is wrapped and takes Dillo pickles as inputs as output, feed this one to multiprocessing * pickle everything using Dill in the Parallel object. http://dev.danse.us/trac/pathos/browser/dill * Make a sensible error message when wrong keyword arguments are given, currently we have:: from joblib import Memory mem = Memory(cachedir='cache') def f(a=0, b=2): return a, b g = mem.cache(f) g(c=2) /home/varoquau/dev/joblib/joblib/func_inspect.pyc in filter_args(func, ignore_lst, *args, **kwargs), line 168 TypeError: Ignore list for diffusion_reorder() contains and unexpected keyword argument 'cachedir' * add a 'depends' keyword argument to memory.cache, to be able to specify that a function depends on other functions, and thus that the cache should be cleared. * add a 'argument_hash' keyword argument to Memory.cache, to be able to replace the hashing logic of memory for the input arguments. It should accept as an input the dictionnary of arguments, as returned in func_inspect, and return a string. * add a sqlite db for provenance tracking. Store computation time and usage timestamps, to be able to do 'garbage-collection-like' cleaning of unused results, based on a cost function balancing computation cost and frequency of use. joblib-0.7.1/doc/000077500000000000000000000000001217450746300135365ustar00rootroot00000000000000joblib-0.7.1/doc/__init__.py000066400000000000000000000001351217450746300156460ustar00rootroot00000000000000""" This is a phony __init__.py file, so that nose finds the doctests in this directory. """ joblib-0.7.1/doc/_templates/000077500000000000000000000000001217450746300156735ustar00rootroot00000000000000joblib-0.7.1/doc/_templates/layout.html000066400000000000000000000012471217450746300201020ustar00rootroot00000000000000{% extends '!layout.html' %} {%- if pagename == 'index' %} {% set title = 'Joblib: running Python function as pipeline jobs' %} {%- endif %} {%- block sidebarsourcelink %} {% endblock %} {%- block sidebarsearch %}
{{ super() }}

Mailing list

joblib@librelist.com

Send an email to subscribe

{%- if show_source and has_source and sourcename %}
{{ _('Show this page source') }} {%- endif %} {% endblock %} joblib-0.7.1/doc/conf.py000066400000000000000000000161161217450746300150420ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # joblib documentation build configuration file, created by # sphinx-quickstart on Thu Oct 23 16:36:51 2008. # # This file is execfile()d with the current directory set to its # containing dir. # # The contents of this file are pickled, so don't put values in the # namespace that aren't pickleable (module imports are okay, # they're removed automatically). # # All configuration values have a default; values that are commented out # serve to show the default. import sys import os import joblib # If your extensions are in another directory, add it here. If the directory # is relative to the documentation root, use os.path.abspath to make it # absolute, like shown here. #sys.path.append(os.path.abspath('.')) sys.path.append(os.path.abspath('./sphinxext')) # General configuration # --------------------- # Add any Sphinx extension module names here, as strings. They can be # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom ones. extensions = ['sphinx.ext.autodoc', 'sphinx.ext.pngmath', 'numpydoc', 'phantom_import', 'autosummary', 'sphinx.ext.coverage'] #extensions = ['sphinx.ext.autodoc', 'sphinx.ext.doctest'] # Add any paths that contain templates here, relative to this directory. templates_path = ['_templates'] # The suffix of source filenames. source_suffix = '.rst' # The encoding of source files. #source_encoding = 'utf-8' # The master toctree document. master_doc = 'index' # General information about the project. project = 'joblib' copyright = '2008-2009, Gael Varoquaux' # The version info for the project you're documenting, acts as replacement for # |version| and |release|, also used in various other places throughout the # built documents. # # The short X.Y version. version = joblib.__version__ # The full version, including alpha/beta/rc tags. release = version # The language for content autogenerated by Sphinx. Refer to documentation # for a list of supported languages. #language = None # There are two options for replacing |today|: either, you set today to some # non-false value, then it is used: #today = '' # Else, today_fmt is used as the format for a strftime call. #today_fmt = '%B %d, %Y' # List of documents that shouldn't be included in the build. #unused_docs = [] # List of directories, relative to source directory, that shouldn't be searched # for source files. exclude_trees = [] # The reST default role (used for this markup: `text`) to use for all # documents. #default_role = None # If true, '()' will be appended to :func: etc. cross-reference text. #add_function_parentheses = True # If true, the current module name will be prepended to all description # unit titles (such as .. function::). #add_module_names = True # If true, sectionauthor and moduleauthor directives will be shown in the # output. They are ignored by default. #show_authors = False # The name of the Pygments (syntax highlighting) style to use. pygments_style = 'sphinx' # Avoid '+DOCTEST...' comments in the docs trim_doctest_flags = True # Options for HTML output # ----------------------- # The style sheet to use for HTML and HTML Help pages. A file of that name # must exist either in Sphinx' static/ path, or in one of the custom paths # given in html_static_path. html_style = 'default.css' # The name for this set of Sphinx documents. If None, it defaults to # " v documentation". #html_title = None # A shorter title for the navigation bar. Default is the same as html_title. #html_short_title = None # The name of an image file (relative to this directory) to place at the top # of the sidebar. #html_logo = None # The name of an image file (within the static path) to use as favicon of the # docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32 # pixels large. #html_favicon = None # Add any paths that contain custom static files (such as style sheets) here, # relative to this directory. They are copied after the builtin static files, # so a file named "default.css" will overwrite the builtin "default.css". html_static_path = ['_static'] # If not '', a 'Last updated on:' timestamp is inserted at every page bottom, # using the given strftime format. #html_last_updated_fmt = '%b %d, %Y' # If true, SmartyPants will be used to convert quotes and dashes to # typographically correct entities. #html_use_smartypants = True # Custom sidebar templates, maps document names to template names. #html_sidebars = {} # Additional templates that should be rendered to pages, maps page names to # template names. #html_additional_pages = {} # If false, no module index is generated. #html_use_modindex = True # If false, no index is generated. #html_use_index = True # If true, the index is split into individual pages for each letter. #html_split_index = False # If true, the reST sources are included in the HTML build as _sources/. #html_copy_source = True # If true, an OpenSearch description file will be output, and all pages will # contain a tag referring to it. The value of this option must be the # base URL from which the finished HTML is served. #html_use_opensearch = '' # If nonempty, this is the file name suffix for HTML files (e.g. ".xhtml"). #html_file_suffix = '' # Output file base name for HTML help builder. htmlhelp_basename = 'joblibdoc' # Options for LaTeX output # ------------------------ # The paper size ('letter' or 'a4'). #latex_paper_size = 'letter' # The font size ('10pt', '11pt' or '12pt'). #latex_font_size = '10pt' # Grouping the document tree into LaTeX files. List of tuples # (source start file, target name, title, author, # document class [howto/manual]). latex_documents = [ ('index', 'joblib.tex', 'joblib Documentation', 'Gael Varoquaux', 'manual'), ] # The name of an image file (relative to this directory) to place at the top of # the title page. #latex_logo = None # For "manual" documents, if this is true, then toplevel headings are parts, # not chapters. #latex_use_parts = False # Additional stuff for the LaTeX preamble. #latex_preamble = '' # Documents to append as an appendix to all manuals. #latex_appendices = [] # If false, no module index is generated. #latex_use_modindex = True html_theme_options = { # "bgcolor": "#fff", # "footertextcolor": "#666", "relbarbgcolor": "#333", # "relbarlinkcolor": "#445481", # "relbartextcolor": "#445481", "sidebarlinkcolor": "#e15617", "sidebarbgcolor": "#000", # "sidebartextcolor": "#333", "footerbgcolor": "#111", "linkcolor": "#aa560c", # "bodyfont": '"Lucida Grande",Verdana,Lucida,Helvetica,Arial,sans-serif', # "headfont": "georgia, 'bitstream vera sans serif', 'lucida grande', # helvetica, verdana, sans-serif", # "headbgcolor": "#F5F5F5", "headtextcolor": "#643200", "codebgcolor": "#f5efe7", } ############################################################################## # Hack to copy the CHANGES.rst file import shutil try: shutil.copyfile('../CHANGES.rst', 'CHANGES.rst') shutil.copyfile('../README.rst', 'README.rst') except IOError: pass # This fails during the tesing, as the code is ran in a different # directory joblib-0.7.1/doc/developing.rst000066400000000000000000000001411217450746300164200ustar00rootroot00000000000000 =============== Development =============== .. include:: README.rst .. include:: CHANGES.rst joblib-0.7.1/doc/index.rst000066400000000000000000000015401217450746300153770ustar00rootroot00000000000000.. raw:: html .. raw:: html

Joblib: running Python function as pipeline jobs

Introduction ------------ .. automodule:: joblib User manual -------------- .. toctree:: :maxdepth: 2 why.rst installing.rst memory.rst parallel.rst developing.rst Module reference ----------------- .. autosummary:: :toctree: generated Memory Parallel dump load hash joblib-0.7.1/doc/installing.rst000066400000000000000000000061771217450746300164470ustar00rootroot00000000000000Installing joblib =================== The `easy_install` way ----------------------- For the easiest way to install joblib you need to have `setuptools` installed. * For installing for all users, you need to run:: easy_install joblib You may need to run the above command as administrator On a unix environment, it is better to install outside of the hierarchy managed by the system:: easy_install --prefix /usr/local joblib * Installing only for a specific user is easy if you use Python 2.6 or above:: easy_install --user joblib .. warning:: Packages installed via `easy_install` override the Python module look up mechanism and thus can confused people not familiar with setuptools. Although it may seem harder, we suggest that you use the manual way, as described in the following paragraph. Using distributions -------------------- Joblib is packaged for several linux distribution: archlinux, debian, ubuntu, and altlinux. For minimum administration overhead, using the package manager is the recommended installation strategy on these systems. The manual way --------------- To install joblib first download the latest tarball (follow the link on the bottom of http://pypi.python.org/pypi/joblib) and expand it. Installing in a local environment .................................. If you don't need to install for all users, we strongly suggest that you create a local environment and install `joblib` in it. One of the pros of this method is that you never have to become administrator, and thus all the changes are local to your account and easy to clean up. * **If you are under Pyton 2.6 or above** Simple go in the directory created by expanding the `joblib` tarball and run the following command:: python setup.py install --user * **If you are under Python 2.5** #. First, create the following directory (where `~` is your home directory, or any directory that you want to use as a base for your local Python environment, and `X` is your Python version number, e.g. `2.6`):: ~/usr/lib/pythonX/site-packages #. Second, make sure that you add this directory in your environment variable `PYTHONPATH`. Under window you can do this by editing your environment variables in the system parameters dialog. Under Unix you can add the following line to your `.bashrc` or any file source at login:: export PYTHONPATH=$HOME/usr/lib/python2.6/site-packages:$PYTHONPATH #. In the directory created by expanding the `joblib` tarball, run the following command:: python setup.py install --prefix ~/usr You should not be required to become administrator, if you have write access to the directory you are installing to. Installing for all users ........................ If you have administrator rights and want to install for all users, all you need to do is to go in directory created by expanding the `joblib` tarball and run the following line:: python setup.py install If you are under Unix, we suggest that you install in '/usr/local' in order not to interfere with your system:: python setup.py install --prefix /usr/local joblib-0.7.1/doc/memory.rst000066400000000000000000000223421217450746300156030ustar00rootroot00000000000000.. For doctests: >>> from joblib.testing import warnings_to_stdout >>> warnings_to_stdout() .. _memory: =========================================== On demand recomputing: the `Memory` class =========================================== .. currentmodule:: joblib.memory Usecase -------- The `Memory` class defines a context for lazy evaluation of function, by storing the results to the disk, and not rerunning the function twice for the same arguments. .. Commented out in favor of briefness You can use it as a context, with its `eval` method: .. automethod:: Memory.eval or decorate functions with the `cache` method: .. automethod:: Memory.cache It works by explicitly saving the output to a file and it is designed to work with non-hashable and potentially large input and output data types such as numpy arrays. A simple example: ~~~~~~~~~~~~~~~~~ First we create a temporary directory, for the cache:: >>> from tempfile import mkdtemp >>> cachedir = mkdtemp() We can instantiate a memory context, using this cache directory:: >>> from joblib import Memory >>> memory = Memory(cachedir=cachedir, verbose=0) Then we can decorate a function to be cached in this context:: >>> @memory.cache ... def f(x): ... print('Running f(%s)' % x) ... return x When we call this function twice with the same argument, it does not get executed the second time, and the output gets loaded from the pickle file:: >>> print(f(1)) Running f(1) 1 >>> print(f(1)) 1 However, when we call it a third time, with a different argument, the output gets recomputed:: >>> print(f(2)) Running f(2) 2 Comparison with `memoize` ~~~~~~~~~~~~~~~~~~~~~~~~~ The `memoize` decorator (http://code.activestate.com/recipes/52201/) caches in memory all the inputs and outputs of a function call. It can thus avoid running twice the same function, but with a very small overhead. However, it compares input objects with those in cache on each call. As a result, for big objects there is a huge overhead. More over this approach does not work with numpy arrays, or other objects subject to non-significant fluctuations. Finally, using `memoize` with large object will consume all the memory, where with `Memory`, objects are persisted to the disk, using a persister optimized for speed and memory usage (:func:`joblib.dump`). In short, `memoize` is best suited for functions with "small" input and output objects, whereas `Memory` is best suited for functions with complex input and output objects, and aggressive persistence to the disk. Using with `numpy` ------------------- The original motivation behind the `Memory` context was to be able to a memoize-like pattern on numpy arrays. `Memory` uses fast cryptographic hashing of the input arguments to check if they have been computed; An example ~~~~~~~~~~~ We define two functions, the first with a number as an argument, outputting an array, used by the second one. We decorate both functions with `Memory.cache`:: >>> import numpy as np >>> @memory.cache ... def g(x): ... print('A long-running calculation, with parameter %s' % x) ... return np.hamming(x) >>> @memory.cache ... def h(x): ... print('A second long-running calculation, using g(x)') ... return np.vander(x) If we call the function h with the array created by the same call to g, h is not re-run:: >>> a = g(3) A long-running calculation, with parameter 3 >>> a array([ 0.08, 1. , 0.08]) >>> g(3) array([ 0.08, 1. , 0.08]) >>> b = h(a) A second long-running calculation, using g(x) >>> b2 = h(a) >>> b2 array([[ 0.0064, 0.08 , 1. ], [ 1. , 1. , 1. ], [ 0.0064, 0.08 , 1. ]]) >>> np.allclose(b, b2) True Using memmapping ~~~~~~~~~~~~~~~~ To speed up cache looking of large numpy arrays, you can load them using memmapping (memory mapping):: >>> cachedir2 = mkdtemp() >>> memory2 = Memory(cachedir=cachedir2, mmap_mode='r') >>> square = memory2.cache(np.square) >>> a = np.vander(np.arange(3)).astype(np.float) >>> square(a) ________________________________________________________________________________ [Memory] Calling square... square(array([[ 0., 0., 1.], [ 1., 1., 1.], [ 4., 2., 1.]])) ___________________________________________________________square - 0.0s, 0.0min array([[ 0., 0., 1.], [ 1., 1., 1.], [ 16., 4., 1.]]) .. note:: Notice the debug mode used in the above example. It is useful for tracing of what is being reexecuted, and where the time is spent. If the `square` function is called with the same input argument, its return value is loaded from the disk using memmapping:: >>> res = square(a) >>> print(repr(res)) memmap([[ 0., 0., 1.], [ 1., 1., 1.], [ 16., 4., 1.]]) .. We need to close the memmap file to avoid file locking on Windows; closing numpy.memmap objects is done with del, which flushes changes to the disk >>> del res .. note:: If the memory mapping mode used was 'r', as in the above example, the array will be read only, and will be impossible to modified in place. On the other hand, using 'r+' or 'w+' will enable modification of the array, but will propagate these modification to the disk, which will corrupt the cache. If you want modification of the array in memory, we suggest you use the 'c' mode: copy on write. .. warning:: Because in the first run the array is a plain ndarray, and in the second run the array is a memmap, you can have side effects of using the `Memory`, especially when using `mmap_mode='r'` as the array is writable in the first run, and not the second. Gotchas -------- * **Function cache is identified by the function's name**. Thus if you have the same name to different functions, their cache will override each-others (you have 'name collisions'), and you will get unwanted re-run:: >>> @memory.cache ... def func(x): ... print('Running func(%s)' % x) >>> func2 = func >>> @memory.cache ... def func(x): ... print('Running a different func(%s)' % x) >>> func(1) Running a different func(1) >>> func2(1) memory.rst:0: JobLibCollisionWarning: Possible name collisions between functions 'func' (:30) and 'func' (:28) Running func(1) >>> func(1) memory.rst:0: JobLibCollisionWarning: Possible name collisions between functions 'func' (:28) and 'func' (:30) Running a different func(1) >>> func2(1) Running func(1) Beware that with Python 2.6 lambda functions cannot be separated out:: >>> def my_print(x): ... print(x) >>> f = memory.cache(lambda : my_print(1)) >>> g = memory.cache(lambda : my_print(2)) >>> f() 1 >>> f() >>> g() # doctest: +SKIP memory.rst:0: JobLibCollisionWarning: Cannot detect name collisions for function '' 2 >>> g() # doctest: +SKIP >>> f() # doctest: +SKIP 1 * **memory cannot be used on some complex objects**, e.g. a callable object with a `__call__` method. However, it works on numpy ufuncs:: >>> sin = memory.cache(np.sin) >>> print(sin(0)) 0.0 * **caching methods**: you cannot decorate a method at class definition, because when the class is instantiated, the first argument (self) is *bound*, and no longer accessible to the `Memory` object. The following code won't work:: class Foo(object): @mem.cache # WRONG def method(self, args): pass The right way to do this is to decorate at instantiation time:: class Foo(object): def __init__(self, args): self.method = mem.cache(self.method) def method(self, ...): pass Ignoring some arguments ------------------------ It may be useful not to recalculate a function when certain arguments change, for instance a debug flag. `Memory` provides the `ignore` list:: >>> @memory.cache(ignore=['debug']) ... def my_func(x, debug=True): ... print('Called with x = %s' % x) >>> my_func(0) Called with x = 0 >>> my_func(0, debug=False) >>> my_func(0, debug=True) >>> # my_func was not reevaluated .. _memory_reference: Reference documentation of the `Memory` class ---------------------------------------------- .. autoclass:: Memory :members: __init__, cache, eval, clear Useful methods of decorated functions -------------------------------------- Function decorated by :meth:`Memory.cache` are :class:`MemorizedFunc` objects that, in addition of behaving like normal functions, expose methods useful for cache exploration and management. .. autoclass:: MemorizedFunc :members: __init__, call, clear, format_signature, format_call, get_output_dir, load_output .. Let us not forget to clean our cache dir once we are finished:: >>> import shutil >>> shutil.rmtree(cachedir) >>> import shutil >>> shutil.rmtree(cachedir2) And we check that it has indeed been remove:: >>> import os ; os.path.exists(cachedir) False >>> os.path.exists(cachedir2) False joblib-0.7.1/doc/parallel.rst000066400000000000000000000032731217450746300160710ustar00rootroot00000000000000 ===================================================== Embarrassingly parallel for loops ===================================================== Joblib provides a simple helper class to write parallel for loops using multiprocessing. The core idea is to write the code to be executed as a generator expression, and convert it to parallel computing:: >>> from math import sqrt >>> [sqrt(i**2) for i in range(10)] [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0] can be spread over 2 CPUs using the following:: >>> from math import sqrt >>> from joblib import Parallel, delayed >>> Parallel(n_jobs=2)(delayed(sqrt)(i**2) for i in range(10)) [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0] Under the hood, the :class:`Parallel` object create a multiprocessing `pool` that forks the Python interpreter in multiple processes to execute each of the items of the list. The `delayed` function is a simple trick to be able to create a tuple `(function, args, kwargs)` with a function-call syntax. .. warning:: Under Windows, it is important to protect the main loop of code to avoid recursive spawning of subprocesses when using joblib.Parallel. In other words, you should be writing code like this: .. code-block:: python import .... def function1(...): ... def function2(...): ... ... if __name__ == '__main__': # do stuff with imports and functions defined about ... **No** code should *run* outside of the "if __name__ == '__main__'" blocks, only imports and definitions. `Parallel` reference documentation =================================== .. autoclass:: joblib.Parallel :members: auto joblib-0.7.1/doc/sphinxext/000077500000000000000000000000001217450746300155705ustar00rootroot00000000000000joblib-0.7.1/doc/sphinxext/LICENSE.txt000066400000000000000000000030271217450746300174150ustar00rootroot00000000000000------------------------------------------------------------------------------- The files - numpydoc.py - autosummary.py - autosummary_generate.py - docscrape.py - docscrape_sphinx.py - phantom_import.py have the following license: Copyright (C) 2008 Stefan van der Walt , Pauli Virtanen Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. joblib-0.7.1/doc/sphinxext/__init__.py000066400000000000000000000000001217450746300176670ustar00rootroot00000000000000joblib-0.7.1/doc/sphinxext/autosummary.py000066400000000000000000000256541217450746300205440ustar00rootroot00000000000000""" =========== autosummary =========== Sphinx extension that adds an autosummary:: directive, which can be used to generate function/method/attribute/etc. summary lists, similar to those output eg. by Epydoc and other API doc generation tools. An :autolink: role is also provided. autosummary directive --------------------- The autosummary directive has the form:: .. autosummary:: :nosignatures: :toctree: generated/ module.function_1 module.function_2 ... and it generates an output table (containing signatures, optionally) ======================== ============================================= module.function_1(args) Summary line from the docstring of function_1 module.function_2(args) Summary line from the docstring ... ======================== ============================================= If the :toctree: option is specified, files matching the function names are inserted to the toctree with the given prefix: generated/module.function_1 generated/module.function_2 ... Note: The file names contain the module:: or currentmodule:: prefixes. .. seealso:: autosummary_generate.py autolink role ------------- The autolink role functions as ``:obj:`` when the name referred can be resolved to a Python object, and otherwise it becomes simple emphasis. This can be used as the default role to make links 'smart'. """ import sys import posixpath import re try: from docutils import nodes except ImportError: # This won't work, but we have to have the module importable, # so that nose can do its discovery scan, so we stub docutils class nodes(object): comment = object if sys.version_info[0] == 2: from docscrape_sphinx import get_doc_object else: from .docscrape_sphinx import get_doc_object def setup(app): from docutils.parsers.rst import directives app.add_directive('autosummary', autosummary_directive, True, (0, 0, False), toctree=directives.unchanged, nosignatures=directives.flag) app.add_role('autolink', autolink_role) app.add_node(autosummary_toc, html=(autosummary_toc_visit_html, autosummary_toc_depart_noop), latex=(autosummary_toc_visit_latex, autosummary_toc_depart_noop)) app.connect('doctree-read', process_autosummary_toc) #------------------------------------------------------------------------------ # autosummary_toc node #------------------------------------------------------------------------------ class autosummary_toc(nodes.comment): pass def process_autosummary_toc(app, doctree): """ Insert items described in autosummary:: to the TOC tree, but do not generate the toctree:: list. """ import sphinx.addnodes env = app.builder.env crawled = {} def crawl_toc(node, depth=1): crawled[node] = True for j, subnode in enumerate(node): try: if (isinstance(subnode, autosummary_toc) and isinstance(subnode[0], sphinx.addnodes.toctree)): env.note_toctree(env.docname, subnode[0]) continue except IndexError: continue if not isinstance(subnode, nodes.section): continue if subnode not in crawled: crawl_toc(subnode, depth + 1) crawl_toc(doctree) def autosummary_toc_visit_html(self, node): """Hide autosummary toctree list in HTML output""" raise nodes.SkipNode def autosummary_toc_visit_latex(self, node): """Show autosummary toctree (= put the referenced pages here) in Latex""" pass def autosummary_toc_depart_noop(self, node): pass #------------------------------------------------------------------------------ # .. autosummary:: #------------------------------------------------------------------------------ def autosummary_directive(dirname, arguments, options, content, lineno, content_offset, block_text, state, state_machine): """ Pretty table containing short signatures and summaries of functions etc. autosummary also generates a (hidden) toctree:: node. """ import sphinx.addnodes names = [] names += [x.strip() for x in content if x.strip()] table, warnings, real_names = get_autosummary(names, state, 'nosignatures' in options) node = table env = state.document.settings.env suffix = env.config.source_suffix all_docnames = env.found_docs.copy() dirname = posixpath.dirname(env.docname) if 'toctree' in options: tree_prefix = options['toctree'].strip() docnames = [] for name in names: name = real_names.get(name, name) docname = tree_prefix + name if docname.endswith(suffix): docname = docname[:-len(suffix)] docname = posixpath.normpath(posixpath.join(dirname, docname)) if docname not in env.found_docs: warnings.append(state.document.reporter.warning( 'toctree references unknown document %r' % docname, line=lineno)) docnames.append(docname) tocnode = sphinx.addnodes.toctree() tocnode['includefiles'] = docnames tocnode['maxdepth'] = -1 tocnode['glob'] = None tocnode['entries'] = [] tocnode = autosummary_toc('', '', tocnode) return warnings + [node] + [tocnode] else: return warnings + [node] def get_autosummary(names, state, no_signatures=False): """ Generate a proper table node for autosummary:: directive. Parameters ---------- names : list of str Names of Python objects to be imported and added to the table. document : document Docutils document object """ from docutils.statemachine import ViewList document = state.document real_names = {} warnings = [] prefixes = [''] prefixes.insert(0, document.settings.env.currmodule) table = nodes.table('') group = nodes.tgroup('', cols=2) table.append(group) group.append(nodes.colspec('', colwidth=30)) group.append(nodes.colspec('', colwidth=70)) body = nodes.tbody('') group.append(body) def append_row(*column_texts): row = nodes.row('') for text in column_texts: node = nodes.paragraph('') vl = ViewList() vl.append(text, '') state.nested_parse(vl, 0, node) row.append(nodes.entry('', node)) body.append(row) for name in names: try: obj, real_name = import_by_name(name, prefixes=prefixes) except ImportError: warnings.append(document.reporter.warning( 'failed to import %s' % name)) append_row(":obj:`%s`" % name, "") continue real_names[name] = real_name doc = get_doc_object(obj) if doc['Summary']: title = " ".join(doc['Summary']) else: title = "" col1 = ":obj:`%s <%s>`" % (name, real_name) if doc['Signature']: sig = re.sub('^[a-zA-Z_0-9.-]*', '', doc['Signature']) if '=' in sig: # abbreviate optional arguments sig = re.sub(r', ([a-zA-Z0-9_]+)=', r'[, \1=', sig, count=1) sig = re.sub(r'\(([a-zA-Z0-9_]+)=', r'([\1=', sig, count=1) sig = re.sub(r'=[^,)]+,', ',', sig) sig = re.sub(r'=[^,)]+\)$', '])', sig) # shorten long strings sig = re.sub(r'(\[.{16,16}[^,)]*?),.*?\]\)', r'\1, ...])', sig) else: sig = re.sub(r'(\(.{16,16}[^,)]*?),.*?\)', r'\1, ...)', sig) col1 += " " + sig col2 = title append_row(col1, col2) return table, warnings, real_names def import_by_name(name, prefixes=[None]): """ Import a Python object that has the given name, under one of the prefixes. Parameters ---------- name : str Name of a Python object, eg. 'numpy.ndarray.view' prefixes : list of (str or None), optional Prefixes to prepend to the name (None implies no prefix). The first prefixed name that results to successful import is used. Returns ------- obj The imported object name Name of the imported object (useful if `prefixes` was used) """ for prefix in prefixes: try: if prefix: prefixed_name = '.'.join([prefix, name]) else: prefixed_name = name return _import_by_name(prefixed_name), prefixed_name except ImportError: pass raise ImportError def _import_by_name(name): """Import a Python object given its full name""" try: # try first interpret `name` as MODNAME.OBJ name_parts = name.split('.') try: modname = '.'.join(name_parts[:-1]) __import__(modname) return getattr(sys.modules[modname], name_parts[-1]) except (ImportError, IndexError, AttributeError): pass # ... then as MODNAME, MODNAME.OBJ1, MODNAME.OBJ1.OBJ2, ... last_j = 0 modname = None for j in reversed(range(1, len(name_parts) + 1)): last_j = j modname = '.'.join(name_parts[:j]) try: __import__(modname) except ImportError: continue if modname in sys.modules: break if last_j < len(name_parts): obj = sys.modules[modname] for obj_name in name_parts[last_j:]: obj = getattr(obj, obj_name) return obj else: return sys.modules[modname] except (ValueError, ImportError, AttributeError, KeyError) as e: raise ImportError(e) #------------------------------------------------------------------------------ # :autolink: (smart default role) #------------------------------------------------------------------------------ def autolink_role(typ, rawtext, etext, lineno, inliner, options={}, content=[]): """ Smart linking role. Expands to ":obj:`text`" if `text` is an object that can be imported; otherwise expands to "*text*". """ import sphinx.roles r = sphinx.roles.xfileref_role('obj', rawtext, etext, lineno, inliner, options, content) pnode = r[0][0] prefixes = [None] #prefixes.insert(0, inliner.document.settings.env.currmodule) try: obj, name = import_by_name(pnode['reftarget'], prefixes) except ImportError: content = pnode[0] r[0][0] = nodes.emphasis(rawtext, content[0].astext(), classes=content['classes']) return r joblib-0.7.1/doc/sphinxext/autosummary_generate.py000077500000000000000000000171121217450746300224070ustar00rootroot00000000000000#!/usr/bin/env python r""" autosummary_generate.py OPTIONS FILES Generate automatic RST source files for items referred to in autosummary:: directives. Each generated RST file contains a single auto*:: directive which extracts the docstring of the referred item. Example Makefile rule:: generate: ./ext/autosummary_generate.py -o source/generated source/*.rst """ import re import inspect import os import optparse import pydoc import sys if sys.version_info[0] == 2: from autosummary import import_by_name else: from .autosummary import import_by_name try: from phantom_import import import_phantom_module except ImportError: import_phantom_module = lambda x: x def main(): p = optparse.OptionParser(__doc__.strip()) p.add_option("-p", "--phantom", action="store", type="string", dest="phantom", default=None, help="Phantom import modules from a file") p.add_option("-o", "--output-dir", action="store", type="string", dest="output_dir", default=None, help=("Write all output files to the given " "directory (instead of writing them as specified " "in the autosummary::directives)")) options, args = p.parse_args() if len(args) == 0: p.error("wrong number of arguments") if options.phantom and os.path.isfile(options.phantom): import_phantom_module(options.phantom) # read names = {} for name, loc in get_documented(args).items(): for (filename, sec_title, keyword, toctree) in loc: if toctree is not None: path = os.path.join(os.path.dirname(filename), toctree) names[name] = os.path.abspath(path) # write for name, path in sorted(names.items()): if options.output_dir is not None: path = options.output_dir if not os.path.isdir(path): os.makedirs(path) try: obj, name = import_by_name(name) except ImportError, e: print "Failed to import '%s': %s" % (name, e) continue fn = os.path.join(path, '%s.rst' % name) if os.path.exists(fn): # skip continue f = open(fn, 'w') try: f.write('%s\n%s\n\n' % (name, '=' * len(name))) if inspect.isclass(obj): if issubclass(obj, Exception): f.write(format_modulemember(name, 'autoexception')) else: f.write(format_modulemember(name, 'autoclass')) elif inspect.ismodule(obj): f.write(format_modulemember(name, 'automodule')) elif inspect.ismethod(obj) or inspect.ismethoddescriptor(obj): f.write(format_classmember(name, 'automethod')) elif callable(obj): f.write(format_modulemember(name, 'autofunction')) elif hasattr(obj, '__get__'): f.write(format_classmember(name, 'autoattribute')) else: f.write(format_modulemember(name, 'autofunction')) finally: f.close() def format_modulemember(name, directive): parts = name.split('.') mod, name = '.'.join(parts[:-1]), parts[-1] return ".. currentmodule:: %s\n\n.. %s:: %s\n" % (mod, directive, name) def format_classmember(name, directive): parts = name.split('.') mod, name = '.'.join(parts[:-2]), '.'.join(parts[-2:]) return ".. currentmodule:: %s\n\n.. %s:: %s\n" % (mod, directive, name) def get_documented(filenames): """ Find out what items are documented in source/*.rst See `get_documented_in_lines`. """ documented = {} for filename in filenames: f = open(filename, 'r') lines = f.read().splitlines() documented.update(get_documented_in_lines(lines, filename=filename)) f.close() return documented def get_documented_in_docstring(name, module=None, filename=None): """ Find out what items are documented in the given object's docstring. See `get_documented_in_lines`. """ try: obj, real_name = import_by_name(name) lines = pydoc.getdoc(obj).splitlines() return get_documented_in_lines(lines, module=name, filename=filename) except AttributeError: pass except ImportError, e: print "Failed to import '%s': %s" % (name, e) return {} def get_documented_in_lines(lines, module=None, filename=None): """ Find out what items are documented in the given lines Returns ------- documented : dict of list of (filename, title, keyword, toctree) Dictionary whose keys are documented names of objects. The value is a list of locations where the object was documented. Each location is a tuple of filename, the current section title, the name of the directive, and the value of the :toctree: argument (if present) of the directive. """ title_underline_re = re.compile("^[-=*_^#]{3,}\s*$") autodoc_re = re.compile(".. auto(function|method|attribute|class \ |exception|module)::\s*([A-Za-z0-9_.]+)\s*$") autosummary_re = re.compile(r'^\.\.\s+autosummary::\s*') module_re = re.compile( r'^\.\.\s+(current)?module::\s*([a-zA-Z0-9_.]+)\s*$') autosummary_item_re = re.compile(r'^\s+([_a-zA-Z][a-zA-Z0-9_.]*)\s*') toctree_arg_re = re.compile(r'^\s+:toctree:\s*(.*?)\s*$') documented = {} current_title = [] last_line = None toctree = None current_module = module in_autosummary = False for line in lines: try: if in_autosummary: m = toctree_arg_re.match(line) if m: toctree = m.group(1) continue if line.strip().startswith(':'): continue # skip options m = autosummary_item_re.match(line) if m: name = m.group(1).strip() if current_module and not name.startswith( current_module + '.'): name = "%s.%s" % (current_module, name) documented.setdefault(name, []).append( (filename, current_title, 'autosummary', toctree)) continue if line.strip() == '': continue in_autosummary = False m = autosummary_re.match(line) if m: in_autosummary = True continue m = autodoc_re.search(line) if m: name = m.group(2).strip() if m.group(1) == "module": current_module = name documented.update(get_documented_in_docstring( name, filename=filename)) elif current_module and not name.startswith( current_module + '.'): name = "%s.%s" % (current_module, name) documented.setdefault(name, []).append( (filename, current_title, "auto" + m.group(1), None)) continue m = title_underline_re.match(line) if m and last_line: current_title = last_line.strip() continue m = module_re.match(line) if m: current_module = m.group(2) continue finally: last_line = line return documented if __name__ == "__main__": main() joblib-0.7.1/doc/sphinxext/docscrape.py000066400000000000000000000346211217450746300201130ustar00rootroot00000000000000"""Extract reference documentation from the NumPy source tree. """ import inspect import textwrap import re import pydoc from warnings import warn class Reader(object): """A line-based string reader. """ def __init__(self, data): """ Parameters ---------- data : str String with lines separated by '\n'. """ if isinstance(data, list): self._str = data else: self._str = data.split('\n') # store string as list of lines self.reset() def __getitem__(self, n): return self._str[n] def reset(self): self._l = 0 # current line nr def read(self): if not self.eof(): out = self[self._l] self._l += 1 return out else: return '' def seek_next_non_empty_line(self): for l in self[self._l:]: if l.strip(): break else: self._l += 1 def eof(self): return self._l >= len(self._str) def read_to_condition(self, condition_func): start = self._l for line in self[start:]: if condition_func(line): return self[start:self._l] self._l += 1 if self.eof(): return self[start:self._l + 1] return [] def read_to_next_empty_line(self): self.seek_next_non_empty_line() def is_empty(line): return not line.strip() return self.read_to_condition(is_empty) def read_to_next_unindented_line(self): def is_unindented(line): return (line.strip() and (len(line.lstrip()) == len(line))) return self.read_to_condition(is_unindented) def peek(self, n=0): if self._l + n < len(self._str): return self[self._l + n] else: return '' def is_empty(self): return not ''.join(self._str).strip() class NumpyDocString(object): def __init__(self, docstring): docstring = textwrap.dedent(docstring).split('\n') self._doc = Reader(docstring) self._parsed_data = { 'Signature': '', 'Summary': [''], 'Extended Summary': [], 'Parameters': [], 'Returns': [], 'Raises': [], 'Warns': [], 'Other Parameters': [], 'Attributes': [], 'Methods': [], 'See Also': [], 'Notes': [], 'Warnings': [], 'References': '', 'Examples': '', 'index': {} } self._parse() def __getitem__(self, key): return self._parsed_data[key] def __setitem__(self, key, val): if not self._parsed_data.has_key(key): warn("Unknown section %s" % key) else: self._parsed_data[key] = val def _is_at_section(self): self._doc.seek_next_non_empty_line() if self._doc.eof(): return False l1 = self._doc.peek().strip() # e.g. Parameters if l1.startswith('.. index::'): return True l2 = self._doc.peek(1).strip() # ---------- or ========== return l2.startswith('-' * len(l1)) or l2.startswith('=' * len(l1)) def _strip(self, doc): i = 0 j = 0 for i, line in enumerate(doc): if line.strip(): break for j, line in enumerate(doc[::-1]): if line.strip(): break return doc[i:len(doc) - j] def _read_to_next_section(self): section = self._doc.read_to_next_empty_line() while not self._is_at_section() and not self._doc.eof(): if not self._doc.peek(-1).strip(): # previous line was empty section += [''] section += self._doc.read_to_next_empty_line() return section def _read_sections(self): while not self._doc.eof(): data = self._read_to_next_section() name = data[0].strip() if name.startswith('..'): # index section yield name, data[1:] elif len(data) < 2: yield StopIteration else: yield name, self._strip(data[2:]) def _parse_param_list(self, content): r = Reader(content) params = [] while not r.eof(): header = r.read().strip() if ' : ' in header: arg_name, arg_type = header.split(' : ')[:2] else: arg_name, arg_type = header, '' desc = r.read_to_next_unindented_line() desc = dedent_lines(desc) params.append((arg_name, arg_type, desc)) return params _name_rgx = re.compile(r"^\s*(:(?P\w+):`(?P[a-zA-Z0-9_.-]+)`|" r" (?P[a-zA-Z0-9_.-]+))\s*", re.X) def _parse_see_also(self, content): """ func_name : Descriptive text continued text another_func_name : Descriptive text func_name1, func_name2, :meth:`func_name`, func_name3 """ items = [] def parse_item_name(text): """Match ':role:`name`' or 'name'""" m = self._name_rgx.match(text) if m: g = m.groups() if g[1] is None: return g[3], None else: return g[2], g[1] raise ValueError("%s is not a item name" % text) def push_item(name, rest): if not name: return name, role = parse_item_name(name) items.append((name, list(rest), role)) del rest[:] current_func = None rest = [] for line in content: if not line.strip(): continue m = self._name_rgx.match(line) if m and line[m.end():].strip().startswith(':'): push_item(current_func, rest) current_func, line = line[:m.end()], line[m.end():] rest = [line.split(':', 1)[1].strip()] if not rest[0]: rest = [] elif not line.startswith(' '): push_item(current_func, rest) current_func = None if ',' in line: for func in line.split(','): push_item(func, []) elif line.strip(): current_func = line elif current_func is not None: rest.append(line.strip()) push_item(current_func, rest) return items def _parse_index(self, section, content): """ .. index: default :refguide: something, else, and more """ def strip_each_in(lst): return [s.strip() for s in lst] out = {} section = section.split('::') if len(section) > 1: out['default'] = strip_each_in(section[1].split(','))[0] for line in content: line = line.split(':') if len(line) > 2: out[line[1]] = strip_each_in(line[2].split(',')) return out def _parse_summary(self): """Grab signature (if given) and summary""" if self._is_at_section(): return summary = self._doc.read_to_next_empty_line() summary_str = " ".join([s.strip() for s in summary]).strip() if re.compile('^([\w., ]+=)?\s*[\w\.]+\(.*\)$').match(summary_str): self['Signature'] = summary_str if not self._is_at_section(): self['Summary'] = self._doc.read_to_next_empty_line() else: self['Summary'] = summary if not self._is_at_section(): self['Extended Summary'] = self._read_to_next_section() def _parse(self): self._doc.reset() self._parse_summary() for (section, content) in self._read_sections(): if not section.startswith('..'): section = ' '.join([s.capitalize() for s in section.split(' ') ]) if section in ('Parameters', 'Attributes', 'Methods', 'Returns', 'Raises', 'Warns'): self[section] = self._parse_param_list(content) elif section.startswith('.. index::'): self['index'] = self._parse_index(section, content) elif section == 'See Also': self['See Also'] = self._parse_see_also(content) else: self[section] = content # string conversion routines def _str_header(self, name, symbol='-'): return [name, len(name) * symbol] def _str_indent(self, doc, indent=4): out = [] for line in doc: out += [' ' * indent + line] return out def _str_signature(self): if self['Signature']: return [self['Signature'].replace('*', '\*')] + [''] else: return [''] def _str_summary(self): if self['Summary']: return self['Summary'] + [''] else: return [] def _str_extended_summary(self): if self['Extended Summary']: return self['Extended Summary'] + [''] else: return [] def _str_param_list(self, name): out = [] if self[name]: out += self._str_header(name) for param, param_type, desc in self[name]: out += ['%s : %s' % (param, param_type)] out += self._str_indent(desc) out += [''] return out def _str_section(self, name): out = [] if self[name]: out += self._str_header(name) out += self[name] out += [''] return out def _str_see_also(self, func_role): if not self['See Also']: return [] out = [] out += self._str_header("See Also") last_had_desc = True for func, desc, role in self['See Also']: if role: link = ':%s:`%s`' % (role, func) elif func_role: link = ':%s:`%s`' % (func_role, func) else: link = "`%s`_" % func if desc or last_had_desc: out += [''] out += [link] else: out[-1] += ", %s" % link if desc: out += self._str_indent([' '.join(desc)]) last_had_desc = True else: last_had_desc = False out += [''] return out def _str_index(self): idx = self['index'] out = [] out += ['.. index:: %s' % idx.get('default', '')] for section, references in idx.iteritems(): if section == 'default': continue out += [' :%s: %s' % (section, ', '.join(references))] return out def __str__(self, func_role=''): out = [] out += self._str_signature() out += self._str_summary() out += self._str_extended_summary() for param_list in ('Parameters', 'Returns', 'Raises'): out += self._str_param_list(param_list) out += self._str_section('Warnings') out += self._str_see_also(func_role) for s in ('Notes', 'References', 'Examples'): out += self._str_section(s) out += self._str_index() return '\n'.join(out) def indent(str, indent=4): indent_str = ' ' * indent if str is None: return indent_str lines = str.split('\n') return '\n'.join(indent_str + l for l in lines) def dedent_lines(lines): """Deindent a list of lines maximally""" return textwrap.dedent("\n".join(lines)).split("\n") def header(text, style='-'): return text + '\n' + style * len(text) + '\n' class FunctionDoc(NumpyDocString): def __init__(self, func, role='func'): self._f = func self._role = role # e.g. "func" or "meth" try: NumpyDocString.__init__(self, inspect.getdoc(func) or '') except ValueError as e: print('*' * 78) print("ERROR: '%s' while parsing `%s`" % (e, self._f)) print('*' * 78) if not self['Signature']: func, func_name = self.get_func() try: # try to read signature argspec = inspect.getargspec(func) argspec = inspect.formatargspec(*argspec) argspec = argspec.replace('*', '\*') signature = '%s%s' % (func_name, argspec) except TypeError as e: signature = '%s()' % func_name self['Signature'] = signature def get_func(self): func_name = getattr(self._f, '__name__', self.__class__.__name__) if inspect.isclass(self._f): func = getattr(self._f, '__call__', self._f.__init__) else: func = self._f return func, func_name def __str__(self): out = '' func, func_name = self.get_func() signature = self['Signature'].replace('*', '\*') roles = {'func': 'function', 'meth': 'method'} if self._role: if not roles.has_key(self._role): print("Warning: invalid role %s" % self._role) out += '.. %s:: %s\n \n\n' % (roles.get(self._role, ''), func_name) out += super(FunctionDoc, self).__str__(func_role=self._role) return out class ClassDoc(NumpyDocString): def __init__(self, cls, modulename='', func_doc=FunctionDoc): if not inspect.isclass(cls): raise ValueError("Initialise using a class. Got %r" % cls) self._cls = cls if modulename and not modulename.endswith('.'): modulename += '.' self._mod = modulename self._name = cls.__name__ self._func_doc = func_doc NumpyDocString.__init__(self, pydoc.getdoc(cls)) @property def methods(self): return [name for name, func in inspect.getmembers(self._cls) if not name.startswith('_') and callable(func)] def __str__(self): out = '' out += super(ClassDoc, self).__str__() out += "\n\n" #for m in self.methods: # print "Parsing `%s`" % m # out += str(self._func_doc(getattr(self._cls,m), 'meth')) + '\n\n' # out += '.. index::\n single: %s; %s\n\n' % (self._name, m) return out joblib-0.7.1/doc/sphinxext/docscrape_sphinx.py000066400000000000000000000104521217450746300215000ustar00rootroot00000000000000import inspect import textwrap import pydoc import sys if sys.version_info[0] == 2: from docscrape import NumpyDocString from docscrape import FunctionDoc from docscrape import ClassDoc else: from .docscrape import NumpyDocString from .docscrape import FunctionDoc from .docscrape import ClassDoc class SphinxDocString(NumpyDocString): # string conversion routines def _str_header(self, name, symbol='`'): return ['.. rubric:: ' + name, ''] def _str_field_list(self, name): return [':' + name + ':'] def _str_indent(self, doc, indent=4): out = [] for line in doc: out += [' ' * indent + line] return out def _str_signature(self): return [''] if self['Signature']: return ['``%s``' % self['Signature']] + [''] else: return [''] def _str_summary(self): return self['Summary'] + [''] def _str_extended_summary(self): return self['Extended Summary'] + [''] def _str_param_list(self, name): out = [] if self[name]: out += self._str_field_list(name) out += [''] for param, param_type, desc in self[name]: out += self._str_indent(['**%s** : %s' % (param.strip(), param_type)]) out += [''] out += self._str_indent(desc, 8) out += [''] return out def _str_section(self, name): out = [] if self[name]: out += self._str_header(name) out += [''] content = textwrap.dedent("\n".join(self[name])).split("\n") out += content out += [''] return out def _str_see_also(self, func_role): out = [] if self['See Also']: see_also = super(SphinxDocString, self)._str_see_also(func_role) out = ['.. seealso::', ''] out += self._str_indent(see_also[2:]) return out def _str_warnings(self): out = [] if self['Warnings']: out = ['.. warning::', ''] out += self._str_indent(self['Warnings']) return out def _str_index(self): idx = self['index'] out = [] if len(idx) == 0: return out out += ['.. index:: %s' % idx.get('default', '')] for section, references in idx.iteritems(): if section == 'default': continue elif section == 'refguide': out += [' single: %s' % (', '.join(references))] else: out += [' %s: %s' % (section, ','.join(references))] return out def _str_references(self): out = [] if self['References']: out += self._str_header('References') if isinstance(self['References'], str): self['References'] = [self['References']] out.extend(self['References']) out += [''] return out def __str__(self, indent=0, func_role="obj"): out = [] out += self._str_signature() out += self._str_index() + [''] out += self._str_summary() out += self._str_extended_summary() for param_list in ('Parameters', 'Attributes', 'Methods', 'Returns', 'Raises'): out += self._str_param_list(param_list) out += self._str_warnings() out += self._str_see_also(func_role) out += self._str_section('Notes') out += self._str_references() out += self._str_section('Examples') out = self._str_indent(out, indent) return '\n'.join(out) class SphinxFunctionDoc(SphinxDocString, FunctionDoc): pass class SphinxClassDoc(SphinxDocString, ClassDoc): pass def get_doc_object(obj, what=None): if what is None: if inspect.isclass(obj): what = 'class' elif inspect.ismodule(obj): what = 'module' elif callable(obj): what = 'function' else: what = 'object' if what == 'class': return SphinxClassDoc(obj, '', func_doc=SphinxFunctionDoc) elif what in ('function', 'method'): return SphinxFunctionDoc(obj, '') else: return SphinxDocString(pydoc.getdoc(obj)) joblib-0.7.1/doc/sphinxext/numpydoc.py000066400000000000000000000077111217450746300200060ustar00rootroot00000000000000""" ======== numpydoc ======== Sphinx extension that handles docstrings in the Numpy standard format. [1] It will: - Convert Parameters etc. sections to field lists. - Convert See Also section to a See also entry. - Renumber references. - Extract the signature from the docstring, if it can't be determined otherwise. .. [1] http://projects.scipy.org/scipy/numpy/wiki/CodingStyleGuidelines#docstring-standard """ import re import pydoc import inspect import sys if sys.version_info[0] == 2: from docscrape_sphinx import get_doc_object from docscrape_sphinx import SphinxDocString else: from .docscrape_sphinx import get_doc_object from .docscrape_sphinx import SphinxDocString def mangle_docstrings(app, what, name, obj, options, lines, reference_offset=[0]): if what == 'module': # Strip top title title_re = re.compile(r'^\s*[#*=]{4,}\n[a-z0-9 -]+\n[#*=]{4,}\s*', re.I | re.S) lines[:] = title_re.sub('', "\n".join(lines)).split("\n") else: doc = get_doc_object(obj, what) lines[:] = str(doc).split("\n") if app.config.numpydoc_edit_link and hasattr(obj, '__name__') and \ obj.__name__: v = dict(full_name=obj.__name__) lines += [''] + (app.config.numpydoc_edit_link % v).split("\n") # replace reference numbers so that there are no duplicates references = [] for l in lines: l = l.strip() if l.startswith('.. ['): try: references.append(int(l[len('.. ['):l.index(']')])) except ValueError: print("WARNING: invalid reference in %s docstring" % name) # Start renaming from the biggest number, otherwise we may # overwrite references. references.sort() if references: for i, line in enumerate(lines): for r in references: new_r = reference_offset[0] + r lines[i] = lines[i].replace('[%d]_' % r, '[%d]_' % new_r) lines[i] = lines[i].replace('.. [%d]' % r, '.. [%d]' % new_r) reference_offset[0] += len(references) def mangle_signature(app, what, name, obj, options, sig, retann): # Do not try to inspect classes that don't define `__init__` if (inspect.isclass(obj) and 'initializes x; see ' in pydoc.getdoc(obj.__init__)): return '', '' if not (callable(obj) or hasattr(obj, '__argspec_is_invalid_')): return if not hasattr(obj, '__doc__'): return doc = SphinxDocString(pydoc.getdoc(obj)) if doc['Signature']: sig = re.sub("^[^(]*", "", doc['Signature']) return sig, '' def initialize(app): try: app.connect('autodoc-process-signature', mangle_signature) except: monkeypatch_sphinx_ext_autodoc() def setup(app, get_doc_object_=get_doc_object): global get_doc_object get_doc_object = get_doc_object_ app.connect('autodoc-process-docstring', mangle_docstrings) app.connect('builder-inited', initialize) app.add_config_value('numpydoc_edit_link', None, True) #------------------------------------------------------------------------------ # Monkeypatch sphinx.ext.autodoc to accept argspecless autodocs (Sphinx < 0.5) #------------------------------------------------------------------------------ def monkeypatch_sphinx_ext_autodoc(): global _original_format_signature import sphinx.ext.autodoc if sphinx.ext.autodoc.format_signature is our_format_signature: return print("[numpydoc] Monkeypatching sphinx.ext.autodoc ...") _original_format_signature = sphinx.ext.autodoc.format_signature sphinx.ext.autodoc.format_signature = our_format_signature def our_format_signature(what, obj): r = mangle_signature(None, what, None, obj, None, None, None) if r is not None: return r[0] else: return _original_format_signature(what, obj) joblib-0.7.1/doc/sphinxext/phantom_import.py000066400000000000000000000133061217450746300212050ustar00rootroot00000000000000""" ============== phantom_import ============== Sphinx extension to make directives from ``sphinx.ext.autodoc`` and similar extensions to use docstrings loaded from an XML file. This extension loads an XML file in the Pydocweb format [1] and creates a dummy module that contains the specified docstrings. This can be used to get the current docstrings from a Pydocweb instance without needing to rebuild the documented module. .. [1] http://code.google.com/p/pydocweb """ import imp import sys import os import inspect import re def setup(app): app.connect('builder-inited', initialize) app.add_config_value('phantom_import_file', None, True) def initialize(app): fn = app.config.phantom_import_file if (fn and os.path.isfile(fn)): print("[numpydoc] Phantom importing modules from %s ..." % fn) import_phantom_module(fn) #------------------------------------------------------------------------------ # Creating 'phantom' modules from an XML description #------------------------------------------------------------------------------ def import_phantom_module(xml_file): """ Insert a fake Python module to sys.modules, based on a XML file. The XML file is expected to conform to Pydocweb DTD. The fake module will contain dummy objects, which guarantee the following: - Docstrings are correct. - Class inheritance relationships are correct (if present in XML). - Function argspec is *NOT* correct (even if present in XML). Instead, the function signature is prepended to the function docstring. - Class attributes are *NOT* correct; instead, they are dummy objects. Parameters ---------- xml_file : str Name of an XML file to read """ import lxml.etree as etree object_cache = {} tree = etree.parse(xml_file) root = tree.getroot() # Sort items so that # - Base classes come before classes inherited from them # - Modules come before their contents all_nodes = dict([(n.attrib['id'], n) for n in root]) def _get_bases(node, recurse=False): bases = [x.attrib['ref'] for x in node.findall('base')] if recurse: j = 0 while True: try: b = bases[j] except IndexError: break if b in all_nodes: bases.extend(_get_bases(all_nodes[b])) j += 1 return bases type_index = ['module', 'class', 'callable', 'object'] def base_cmp(a, b): x = cmp(type_index.index(a.tag), type_index.index(b.tag)) if x != 0: return x if a.tag == 'class' and b.tag == 'class': a_bases = _get_bases(a, recurse=True) b_bases = _get_bases(b, recurse=True) x = cmp(len(a_bases), len(b_bases)) if x != 0: return x if a.attrib['id'] in b_bases: return -1 if b.attrib['id'] in a_bases: return 1 return cmp(a.attrib['id'].count('.'), b.attrib['id'].count('.')) nodes = root.getchildren() nodes.sort(base_cmp) # Create phantom items for node in nodes: name = node.attrib['id'] doc = (node.text or '').decode('string-escape') + "\n" if doc == "\n": doc = "" # create parent, if missing parent = name while True: parent = '.'.join(parent.split('.')[:-1]) if not parent: break if parent in object_cache: break obj = imp.new_module(parent) object_cache[parent] = obj sys.modules[parent] = obj # create object if node.tag == 'module': obj = imp.new_module(name) obj.__doc__ = doc sys.modules[name] = obj elif node.tag == 'class': bases = [object_cache[b] for b in _get_bases(node) if b in object_cache] bases.append(object) init = lambda self: None init.__doc__ = doc obj = type(name, tuple(bases), {'__doc__': doc, '__init__': init}) obj.__name__ = name.split('.')[-1] elif node.tag == 'callable': funcname = node.attrib['id'].split('.')[-1] argspec = node.attrib.get('argspec') if argspec: argspec = re.sub('^[^(]*', '', argspec) doc = "%s%s\n\n%s" % (funcname, argspec, doc) obj = lambda: 0 obj.__argspec_is_invalid_ = True obj.func_name = funcname obj.__name__ = name obj.__doc__ = doc if inspect.isclass(object_cache[parent]): obj.__objclass__ = object_cache[parent] else: class Dummy(object): pass obj = Dummy() obj.__name__ = name obj.__doc__ = doc if inspect.isclass(object_cache[parent]): obj.__get__ = lambda: None object_cache[name] = obj if parent: if inspect.ismodule(object_cache[parent]): obj.__module__ = parent setattr(object_cache[parent], name.split('.')[-1], obj) # Populate items for node in root: obj = object_cache.get(node.attrib['id']) if obj is None: continue for ref in node.findall('ref'): if node.tag == 'class': if ref.attrib['ref'].startswith(node.attrib['id'] + '.'): setattr(obj, ref.attrib['name'], object_cache.get(ref.attrib['ref'])) else: setattr(obj, ref.attrib['name'], object_cache.get(ref.attrib['ref'])) joblib-0.7.1/doc/why.rst000066400000000000000000000032671217450746300151070ustar00rootroot00000000000000 Why joblib: project goals =========================== What pipelines bring us -------------------------- Pipeline processing systems can provide a set of useful features: Data-flow programming for performance ...................................... * **On-demand computing:** in pipeline systems such as labView, or VTK calculations are performed as needed by the outputs and only when inputs change. * **Transparent parallelization:** a pipeline topology can be inspected to deduce which operations can be run in parallel (it is equivalent to purely functional programming). Provenance tracking for understanding the code ............................................... * **Tracking of data and computations:** to be able to fully reproduce a computational experiment: requires tracking of the data and operation implemented. * **Inspecting data flow:** Inspecting intermediate results helps debugging and understanding. .. topic:: But pipeline frameworks can get in the way :class: warning We want our code to look like the underlying algorithm, not like a software framework. Joblib's approach -------------------- Functions are the simplest abstraction used by everyone. Our pipeline jobs (or tasks) are made of decorated functions. Tracking of parameters in a meaningful way requires specification of data model. We give up on that and use hashing for performance and robustness. Design choices --------------- * No dependencies other than Python * Robust, well-tested code, at the cost of functionality * Fast and suitable for scientific computing on big dataset without changing the original code * Only local imports: **embed joblib in your code by copying it** joblib-0.7.1/joblib/000077500000000000000000000000001217450746300142325ustar00rootroot00000000000000joblib-0.7.1/joblib/__init__.py000066400000000000000000000105161217450746300163460ustar00rootroot00000000000000""" Joblib is a set of tools to provide **lightweight pipelining in Python**. In particular, joblib offers: 1. transparent disk-caching of the output values and lazy re-evaluation (memoize pattern) 2. easy simple parallel computing 3. logging and tracing of the execution Joblib is optimized to be **fast** and **robust** in particular on large data and has specific optimizations for `numpy` arrays. It is **BSD-licensed**. ============================== ============================================ **User documentation**: http://packages.python.org/joblib **Download packages**: http://pypi.python.org/pypi/joblib#downloads **Source code**: http://github.com/joblib/joblib **Report issues**: http://github.com/joblib/joblib/issues ============================== ============================================ Vision -------- The vision is to provide tools to easily achieve better performance and reproducibility when working with long running jobs. In addition, Joblib can also be used to provide a light-weight make replacement or caching solution. * **Avoid computing twice the same thing**: code is rerun over an over, for instance when prototyping computational-heavy jobs (as in scientific development), but hand-crafted solution to alleviate this issue is error-prone and often leads to unreproducible results * **Persist to disk transparently**: persisting in an efficient way arbitrary objects containing large data is hard. Using joblib's caching mechanism avoids hand-written persistence and implicitly links the file on disk to the execution context of the original Python object. As a result, joblib's persistence is good for resuming an application status or computational job, eg after a crash. Joblib strives to address these problems while **leaving your code and your flow control as unmodified as possible** (no framework, no new paradigms). Main features ------------------ 1) **Transparent and fast disk-caching of output value:** a memoize or make-like functionality for Python functions that works well for arbitrary Python objects, including very large numpy arrays. Separate persistence and flow-execution logic from domain logic or algorithmic code by writing the operations as a set of steps with well-defined inputs and outputs: Python functions. Joblib can save their computation to disk and rerun it only if necessary:: >>> import numpy as np >>> from joblib import Memory >>> mem = Memory(cachedir='/tmp/joblib') >>> import numpy as np >>> a = np.vander(np.arange(3)).astype(np.float) >>> square = mem.cache(np.square) >>> b = square(a) # doctest: +ELLIPSIS ________________________________________________________________________________ [Memory] Calling square... square(array([[ 0., 0., 1.], [ 1., 1., 1.], [ 4., 2., 1.]])) ___________________________________________________________square - 0...s, 0.0min >>> c = square(a) >>> # The above call did not trigger an evaluation 2) **Embarrassingly parallel helper:** to make is easy to write readable parallel code and debug it quickly:: >>> from joblib import Parallel, delayed >>> from math import sqrt >>> Parallel(n_jobs=1)(delayed(sqrt)(i**2) for i in range(10)) [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0] 3) **Logging/tracing:** The different functionalities will progressively acquire better logging mechanism to help track what has been ran, and capture I/O easily. In addition, Joblib will provide a few I/O primitives, to easily define define logging and display streams, and provide a way of compiling a report. We want to be able to quickly inspect what has been run. 4) **Fast compressed Persistence**: a replacement for pickle to work efficiently on Python objects containing large data ( *joblib.dump* & *joblib.load* ). .. >>> import shutil ; shutil.rmtree('/tmp/joblib/') """ __version__ = '0.7.1' from .memory import Memory from .logger import PrintTime from .logger import Logger from .hashing import hash from .numpy_pickle import dump from .numpy_pickle import load from .parallel import Parallel from .parallel import delayed from .parallel import cpu_count joblib-0.7.1/joblib/_compat.py000066400000000000000000000002161217450746300162250ustar00rootroot00000000000000""" Compatibility layer for Python 3/Python 2 single codebase """ try: _basestring = basestring except NameError: _basestring = str joblib-0.7.1/joblib/disk.py000066400000000000000000000063201217450746300155370ustar00rootroot00000000000000""" Disk management utilities. """ # Authors: Gael Varoquaux # Lars Buitinck # Copyright (c) 2010 Gael Varoquaux # License: BSD Style, 3 clauses. import errno import os import shutil import sys import time def disk_used(path): """ Return the disk usage in a directory.""" size = 0 for file in os.listdir(path) + ['.']: stat = os.stat(os.path.join(path, file)) if hasattr(stat, 'st_blocks'): size += stat.st_blocks * 512 else: # on some platform st_blocks is not available (e.g., Windows) # approximate by rounding to next multiple of 512 size += (stat.st_size // 512 + 1) * 512 # We need to convert to int to avoid having longs on some systems (we # don't want longs to avoid problems we SQLite) return int(size / 1024.) def memstr_to_kbytes(text): """ Convert a memory text to it's value in kilobytes. """ kilo = 1024 units = dict(K=1, M=kilo, G=kilo ** 2) try: size = int(units[text[-1]] * float(text[:-1])) except (KeyError, ValueError): raise ValueError( "Invalid literal for size give: %s (type %s) should be " "alike '10G', '500M', '50K'." % (text, type(text)) ) return size def mkdirp(d): """Ensure directory d exists (like mkdir -p on Unix) No guarantee that the directory is writable. """ try: os.makedirs(d) except OSError as e: if e.errno != errno.EEXIST: raise # if a rmtree operation fails in rm_subdirs, wait for this much time (in secs), # then retry once. if it still fails, raise the exception RM_SUBDIRS_RETRY_TIME = 0.1 def rm_subdirs(path, onerror=None): """Remove all subdirectories in this path. The directory indicated by `path` is left in place, and its subdirectories are erased. If onerror is set, it is called to handle the error with arguments (func, path, exc_info) where func is os.listdir, os.remove, or os.rmdir; path is the argument to that function that caused it to fail; and exc_info is a tuple returned by sys.exc_info(). If onerror is None, an exception is raised. """ # NOTE this code is adapted from the one in shutil.rmtree, and is # just as fast names = [] try: names = os.listdir(path) except os.error as err: if onerror is not None: onerror(os.listdir, path, sys.exc_info()) else: raise for name in names: fullname = os.path.join(path, name) if os.path.isdir(fullname): if onerror is not None: shutil.rmtree(fullname, False, onerror) else: # allow the rmtree to fail once, wait and re-try. # if the error is raised again, fail err_count = 0 while True: try: shutil.rmtree(fullname, False, None) break except os.error: if err_count > 0: raise err_count += 1 time.sleep(RM_SUBDIRS_RETRY_TIME) joblib-0.7.1/joblib/format_stack.py000066400000000000000000000362261217450746300172720ustar00rootroot00000000000000""" Represent an exception with a lot of information. Provides 2 useful functions: format_exc: format an exception into a complete traceback, with full debugging instruction. format_outer_frames: format the current position in the stack call. Adapted from IPython's VerboseTB. """ # Authors: Gael Varoquaux < gael dot varoquaux at normalesup dot org > # Nathaniel Gray # Fernando Perez # Copyright: 2010, Gael Varoquaux # 2001-2004, Fernando Perez # 2001 Nathaniel Gray # License: BSD 3 clause import inspect import keyword import linecache import os import pydoc import sys import time import tokenize import traceback import types try: # Python 2 generate_tokens = tokenize.generate_tokens except AttributeError: # Python 3 generate_tokens = tokenize.tokenize PY3 = (sys.version[0] == '3') INDENT = ' ' * 8 from ._compat import _basestring ############################################################################### # some internal-use functions def safe_repr(value): """Hopefully pretty robust repr equivalent.""" # this is pretty horrible but should always return *something* try: return pydoc.text.repr(value) except KeyboardInterrupt: raise except: try: return repr(value) except KeyboardInterrupt: raise except: try: # all still in an except block so we catch # getattr raising name = getattr(value, '__name__', None) if name: # ick, recursion return safe_repr(name) klass = getattr(value, '__class__', None) if klass: return '%s instance' % safe_repr(klass) except KeyboardInterrupt: raise except: return 'UNRECOVERABLE REPR FAILURE' def eq_repr(value, repr=safe_repr): return '=%s' % repr(value) ############################################################################### def uniq_stable(elems): """uniq_stable(elems) -> list Return from an iterable, a list of all the unique elements in the input, but maintaining the order in which they first appear. A naive solution to this problem which just makes a dictionary with the elements as keys fails to respect the stability condition, since dictionaries are unsorted by nature. Note: All elements in the input must be hashable. """ unique = [] unique_set = set() for nn in elems: if nn not in unique_set: unique.append(nn) unique_set.add(nn) return unique ############################################################################### def fix_frame_records_filenames(records): """Try to fix the filenames in each record from inspect.getinnerframes(). Particularly, modules loaded from within zip files have useless filenames attached to their code object, and inspect.getinnerframes() just uses it. """ fixed_records = [] for frame, filename, line_no, func_name, lines, index in records: # Look inside the frame's globals dictionary for __file__, which should # be better. better_fn = frame.f_globals.get('__file__', None) if isinstance(better_fn, str): # Check the type just in case someone did something weird with # __file__. It might also be None if the error occurred during # import. filename = better_fn fixed_records.append((frame, filename, line_no, func_name, lines, index)) return fixed_records def _fixed_getframes(etb, context=1, tb_offset=0): LNUM_POS, LINES_POS, INDEX_POS = 2, 4, 5 records = fix_frame_records_filenames(inspect.getinnerframes(etb, context)) # If the error is at the console, don't build any context, since it would # otherwise produce 5 blank lines printed out (there is no file at the # console) rec_check = records[tb_offset:] try: rname = rec_check[0][1] if rname == '' or rname.endswith(''): return rec_check except IndexError: pass aux = traceback.extract_tb(etb) assert len(records) == len(aux) for i, (file, lnum, _, _) in enumerate(aux): maybeStart = lnum - 1 - context // 2 start = max(maybeStart, 0) end = start + context lines = linecache.getlines(file)[start:end] # pad with empty lines if necessary if maybeStart < 0: lines = (['\n'] * -maybeStart) + lines if len(lines) < context: lines += ['\n'] * (context - len(lines)) buf = list(records[i]) buf[LNUM_POS] = lnum buf[INDEX_POS] = lnum - 1 - start buf[LINES_POS] = lines records[i] = tuple(buf) return records[tb_offset:] def _format_traceback_lines(lnum, index, lines, lvals=None): numbers_width = 7 res = [] i = lnum - index for line in lines: if i == lnum: # This is the line with the error pad = numbers_width - len(str(i)) if pad >= 3: marker = '-' * (pad - 3) + '-> ' elif pad == 2: marker = '> ' elif pad == 1: marker = '>' else: marker = '' num = marker + str(i) else: num = '%*s' % (numbers_width, i) line = '%s %s' % (num, line) res.append(line) if lvals and i == lnum: res.append(lvals + '\n') i = i + 1 return res def format_records(records): # , print_globals=False): # Loop over all records printing context and info frames = [] abspath = os.path.abspath for frame, file, lnum, func, lines, index in records: try: file = file and abspath(file) or '?' except OSError: # if file is '' or something not in the filesystem, # the abspath call will throw an OSError. Just ignore it and # keep the original file string. pass link = file try: args, varargs, varkw, locals = inspect.getargvalues(frame) except: # This can happen due to a bug in python2.3. We should be # able to remove this try/except when 2.4 becomes a # requirement. Bug details at http://python.org/sf/1005466 print("\nJoblib's exception reporting continues...\n") if func == '?': call = '' else: # Decide whether to include variable details or not try: call = 'in %s%s' % (func, inspect.formatargvalues(args, varargs, varkw, locals, formatvalue=eq_repr)) except KeyError: # Very odd crash from inspect.formatargvalues(). The # scenario under which it appeared was a call to # view(array,scale) in NumTut.view.view(), where scale had # been defined as a scalar (it should be a tuple). Somehow # inspect messes up resolving the argument list of view() # and barfs out. At some point I should dig into this one # and file a bug report about it. print("\nJoblib's exception reporting continues...\n") call = 'in %s(***failed resolving arguments***)' % func # Initialize a list of names on the current line, which the # tokenizer below will populate. names = [] def tokeneater(token_type, token, start, end, line): """Stateful tokeneater which builds dotted names. The list of names it appends to (from the enclosing scope) can contain repeated composite names. This is unavoidable, since there is no way to disambiguate partial dotted structures until the full list is known. The caller is responsible for pruning the final list of duplicates before using it.""" # build composite names if token == '.': try: names[-1] += '.' # store state so the next token is added for x.y.z names tokeneater.name_cont = True return except IndexError: pass if token_type == tokenize.NAME and token not in keyword.kwlist: if tokeneater.name_cont: # Dotted names names[-1] += token tokeneater.name_cont = False else: # Regular new names. We append everything, the caller # will be responsible for pruning the list later. It's # very tricky to try to prune as we go, b/c composite # names can fool us. The pruning at the end is easy # to do (or the caller can print a list with repeated # names if so desired. names.append(token) elif token_type == tokenize.NEWLINE: raise IndexError # we need to store a bit of state in the tokenizer to build # dotted names tokeneater.name_cont = False def linereader(file=file, lnum=[lnum], getline=linecache.getline): line = getline(file, lnum[0]) lnum[0] += 1 return line # Build the list of names on this line of code where the exception # occurred. try: # This builds the names list in-place by capturing it from the # enclosing scope. for token in generate_tokens(linereader): tokeneater(*token) except (IndexError, UnicodeDecodeError): # signals exit of tokenizer pass except tokenize.TokenError as msg: _m = ("An unexpected error occurred while tokenizing input\n" "The following traceback may be corrupted or invalid\n" "The error message is: %s\n" % msg) print(_m) # prune names list of duplicates, but keep the right order unique_names = uniq_stable(names) # Start loop over vars lvals = [] for name_full in unique_names: name_base = name_full.split('.', 1)[0] if name_base in frame.f_code.co_varnames: if name_base in locals.keys(): try: value = repr(eval(name_full, locals)) except: value = "undefined" else: value = "undefined" name = name_full lvals.append('%s = %s' % (name, value)) #elif print_globals: # if frame.f_globals.has_key(name_base): # try: # value = repr(eval(name_full,frame.f_globals)) # except: # value = "undefined" # else: # value = "undefined" # name = 'global %s' % name_full # lvals.append('%s = %s' % (name,value)) if lvals: lvals = '%s%s' % (INDENT, ('\n%s' % INDENT).join(lvals)) else: lvals = '' level = '%s\n%s %s\n' % (75 * '.', link, call) if index is None: frames.append(level) else: frames.append('%s%s' % (level, ''.join( _format_traceback_lines(lnum, index, lines, lvals)))) return frames ############################################################################### def format_exc(etype, evalue, etb, context=5, tb_offset=0): """ Return a nice text document describing the traceback. Parameters ----------- etype, evalue, etb: as returned by sys.exc_info context: number of lines of the source file to plot tb_offset: the number of stack frame not to use (0 = use all) """ # some locals try: etype = etype.__name__ except AttributeError: pass # Header with the exception type, python version, and date pyver = 'Python ' + sys.version.split()[0] + ': ' + sys.executable date = time.ctime(time.time()) pid = 'PID: %i' % os.getpid() head = '%s%s%s\n%s%s%s' % (etype, ' ' * (75 - len(str(etype)) - len(date)), date, pid, ' ' * (75 - len(str(pid)) - len(pyver)), pyver) # Flush cache before calling inspect. This helps alleviate some of the # problems with python 2.3's inspect.py. linecache.checkcache() # Drop topmost frames if requested try: records = _fixed_getframes(etb, context, tb_offset) except: raise print('\nUnfortunately, your original traceback can not be ' 'constructed.\n') return '' # Get (safely) a string form of the exception info try: etype_str, evalue_str = map(str, (etype, evalue)) except: # User exception is improperly defined. etype, evalue = str, sys.exc_info()[:2] etype_str, evalue_str = map(str, (etype, evalue)) # ... and format it exception = ['%s: %s' % (etype_str, evalue_str)] frames = format_records(records) return '%s\n%s\n%s' % (head, '\n'.join(frames), ''.join(exception[0])) ############################################################################### def format_outer_frames(context=5, stack_start=None, stack_end=None, ignore_ipython=True): LNUM_POS, LINES_POS, INDEX_POS = 2, 4, 5 records = inspect.getouterframes(inspect.currentframe()) output = list() for i, (frame, filename, line_no, func_name, lines, index) \ in enumerate(records): # Look inside the frame's globals dictionary for __file__, which should # be better. better_fn = frame.f_globals.get('__file__', None) if isinstance(better_fn, str): # Check the type just in case someone did something weird with # __file__. It might also be None if the error occurred during # import. filename = better_fn if filename.endswith('.pyc'): filename = filename[:-4] + '.py' if ignore_ipython: # Hack to avoid printing the internals of IPython if (os.path.basename(filename) == 'iplib.py' and func_name in ('safe_execfile', 'runcode')): break maybeStart = line_no - 1 - context // 2 start = max(maybeStart, 0) end = start + context lines = linecache.getlines(filename)[start:end] # pad with empty lines if necessary if maybeStart < 0: lines = (['\n'] * -maybeStart) + lines if len(lines) < context: lines += ['\n'] * (context - len(lines)) buf = list(records[i]) buf[LNUM_POS] = line_no buf[INDEX_POS] = line_no - 1 - start buf[LINES_POS] = lines output.append(tuple(buf)) return '\n'.join(format_records(output[stack_end:stack_start:-1])) joblib-0.7.1/joblib/func_inspect.py000066400000000000000000000235171217450746300172740ustar00rootroot00000000000000""" My own variation on function-specific inspect-like features. """ # Author: Gael Varoquaux # Copyright (c) 2009 Gael Varoquaux # License: BSD Style, 3 clauses. from itertools import islice import inspect import warnings import re import os from ._compat import _basestring def get_func_code(func): """ Attempts to retrieve a reliable function code hash. The reason we don't use inspect.getsource is that it caches the source, whereas we want this to be modified on the fly when the function is modified. Returns ------- func_code: string The function code source_file: string The path to the file in which the function is defined. first_line: int The first line of the code in the source file. Notes ------ This function does a bit more magic than inspect, and is thus more robust. """ source_file = None try: code = func.__code__ source_file = code.co_filename if not os.path.exists(source_file): # Use inspect for lambda functions and functions defined in an # interactive shell, or in doctests source_code = ''.join(inspect.getsourcelines(func)[0]) line_no = 1 if source_file.startswith('', source_file).groups() line_no = int(line_no) source_file = '' % source_file return source_code, source_file, line_no # Try to retrieve the source code. with open(source_file) as source_file_obj: first_line = code.co_firstlineno # All the lines after the function definition: source_lines = list(islice(source_file_obj, first_line - 1, None)) return ''.join(inspect.getblock(source_lines)), source_file, first_line except: # If the source code fails, we use the hash. This is fragile and # might change from one session to another. if hasattr(func, '__code__'): # Python 3.X return str(func.__code__.__hash__()), source_file, -1 else: # Weird objects like numpy ufunc don't have __code__ # This is fragile, as quite often the id of the object is # in the repr, so it might not persist across sessions, # however it will work for ufuncs. return repr(func), source_file, -1 def _clean_win_chars(string): "Windows cannot encode some characters in filenames" import urllib if hasattr(urllib, 'quote'): quote = urllib.quote else: # In Python 3, quote is elsewhere quote = urllib.parse.quote for char in ('<', '>', '!', ':', '\\'): string = string.replace(char, quote(char)) return string def get_func_name(func, resolv_alias=True, win_characters=True): """ Return the function import path (as a list of module names), and a name for the function. Parameters ---------- func: callable The func to inspect resolv_alias: boolean, optional If true, possible local aliases are indicated. win_characters: boolean, optional If true, substitute special characters using urllib.quote This is useful in Windows, as it cannot encode some filenames """ if hasattr(func, '__module__'): module = func.__module__ else: try: module = inspect.getmodule(func) except TypeError: if hasattr(func, '__class__'): module = func.__class__.__module__ else: module = 'unknown' if module is None: # Happens in doctests, eg module = '' if module == '__main__': try: filename = os.path.abspath(inspect.getsourcefile(func)) except: filename = None if filename is not None: # mangling of full path to filename parts = filename.split(os.sep) if parts[-1].startswith(' # Copyright (c) 2009 Gael Varoquaux # License: BSD Style, 3 clauses. import warnings import pickle import hashlib import sys import types import struct import io if sys.version_info[0] < 3: Pickler = pickle.Pickler else: Pickler = pickle._Pickler class _ConsistentSet(object): """ Class used to ensure the hash of Sets is preserved whatever the order of its items. """ def __init__(self, set_sequence): self._sequence = sorted(set_sequence) class _MyHash(object): """ Class used to hash objects that won't normally pickle """ def __init__(self, *args): self.args = args class Hasher(Pickler): """ A subclass of pickler, to do cryptographic hashing, rather than pickling. """ def __init__(self, hash_name='md5'): self.stream = io.BytesIO() Pickler.__init__(self, self.stream, protocol=2) # Initialise the hash obj self._hash = hashlib.new(hash_name) def hash(self, obj, return_digest=True): try: self.dump(obj) except pickle.PicklingError as e: warnings.warn('PicklingError while hashing %r: %r' % (obj, e)) dumps = self.stream.getvalue() self._hash.update(dumps) if return_digest: return self._hash.hexdigest() def save(self, obj): if isinstance(obj, (types.MethodType, type({}.pop))): # the Pickler cannot pickle instance methods; here we decompose # them into components that make them uniquely identifiable if hasattr(obj, '__func__'): func_name = obj.__func__.__name__ else: func_name = obj.__name__ inst = obj.__self__ if type(inst) == type(pickle): obj = _MyHash(func_name, inst.__name__) elif inst is None: # type(None) or type(module) do not pickle obj = _MyHash(func_name, inst) else: cls = obj.__self__.__class__ obj = _MyHash(func_name, inst, cls) Pickler.save(self, obj) # The dispatch table of the pickler is not accessible in Python # 3, as these lines are only bugware for IPython, we skip them. def save_global(self, obj, name=None, pack=struct.pack): # We have to override this method in order to deal with objects # defined interactively in IPython that are not injected in # __main__ try: Pickler.save_global(self, obj, name=name, pack=pack) except pickle.PicklingError: Pickler.save_global(self, obj, name=name, pack=pack) module = getattr(obj, "__module__", None) if module == '__main__': my_name = name if my_name is None: my_name = obj.__name__ mod = sys.modules[module] if not hasattr(mod, my_name): # IPython doesn't inject the variables define # interactively in __main__ setattr(mod, my_name, obj) dispatch = Pickler.dispatch.copy() # builtin dispatch[type(len)] = save_global # type dispatch[type(object)] = save_global # classobj dispatch[type(Pickler)] = save_global # function dispatch[type(pickle.dump)] = save_global def _batch_setitems(self, items): # forces order of keys in dict to ensure consistent hash Pickler._batch_setitems(self, iter(sorted(items))) def save_set(self, set_items): # forces order of items in Set to ensure consistent hash Pickler.save(self, _ConsistentSet(set_items)) dispatch[type(set())] = save_set class NumpyHasher(Hasher): """ Special case the hasher for when numpy is loaded. """ def __init__(self, hash_name='md5', coerce_mmap=False): """ Parameters ---------- hash_name: string The hash algorithm to be used coerce_mmap: boolean Make no difference between np.memmap and np.ndarray objects. """ self.coerce_mmap = coerce_mmap Hasher.__init__(self, hash_name=hash_name) # delayed import of numpy, to avoid tight coupling import numpy as np self.np = np if hasattr(np, 'getbuffer'): self._getbuffer = np.getbuffer else: self._getbuffer = memoryview def save(self, obj): """ Subclass the save method, to hash ndarray subclass, rather than pickling them. Off course, this is a total abuse of the Pickler class. """ if isinstance(obj, self.np.ndarray) and not obj.dtype.hasobject: # Compute a hash of the object: try: self._hash.update(self._getbuffer(obj)) except (TypeError, BufferError): # Cater for non-single-segment arrays: this creates a # copy, and thus aleviates this issue. # XXX: There might be a more efficient way of doing this self._hash.update(self._getbuffer(obj.flatten())) # We store the class, to be able to distinguish between # Objects with the same binary content, but different # classes. if self.coerce_mmap and isinstance(obj, self.np.memmap): # We don't make the difference between memmap and # normal ndarrays, to be able to reload previously # computed results with memmap. klass = self.np.ndarray else: klass = obj.__class__ # We also return the dtype and the shape, to distinguish # different views on the same data with different dtypes. # The object will be pickled by the pickler hashed at the end. obj = (klass, ('HASHED', obj.dtype, obj.shape, obj.strides)) Hasher.save(self, obj) def hash(obj, hash_name='md5', coerce_mmap=False): """ Quick calculation of a hash to identify uniquely Python objects containing numpy arrays. Parameters ----------- hash_name: 'md5' or 'sha1' Hashing algorithm used. sha1 is supposedly safer, but md5 is faster. coerce_mmap: boolean Make no difference between np.memmap and np.ndarray """ if 'numpy' in sys.modules: hasher = NumpyHasher(hash_name=hash_name, coerce_mmap=coerce_mmap) else: hasher = Hasher(hash_name=hash_name) return hasher.hash(obj) joblib-0.7.1/joblib/logger.py000066400000000000000000000116641217450746300160730ustar00rootroot00000000000000""" Helpers for logging. This module needs much love to become useful. """ # Author: Gael Varoquaux # Copyright (c) 2008 Gael Varoquaux # License: BSD Style, 3 clauses. from __future__ import print_function import time import sys import os import shutil import logging import pprint from .disk import mkdirp def _squeeze_time(t): """Remove .1s to the time under Windows: this is the time it take to stat files. This is needed to make results similar to timings under Unix, for tests """ if sys.platform.startswith('win'): return max(0, t - .1) else: return t def format_time(t): t = _squeeze_time(t) return "%.1fs, %.1fmin" % (t, t / 60.) def short_format_time(t): t = _squeeze_time(t) if t > 60: return "%4.1fmin" % (t / 60.) else: return " %5.1fs" % (t) ############################################################################### # class `Logger` ############################################################################### class Logger(object): """ Base class for logging messages. """ def __init__(self, depth=3): """ Parameters ---------- depth: int, optional The depth of objects printed. """ self.depth = depth def warn(self, msg): logging.warn("[%s]: %s" % (self, msg)) def debug(self, msg): # XXX: This conflicts with the debug flag used in children class logging.debug("[%s]: %s" % (self, msg)) def format(self, obj, indent=0): """ Return the formated representation of the object. """ if 'numpy' in sys.modules: import numpy as np print_options = np.get_printoptions() np.set_printoptions(precision=6, threshold=64, edgeitems=1) else: print_options = None out = pprint.pformat(obj, depth=self.depth, indent=indent) if print_options: np.set_printoptions(**print_options) return out ############################################################################### # class `PrintTime` ############################################################################### class PrintTime(object): """ Print and log messages while keeping track of time. """ def __init__(self, logfile=None, logdir=None): if logfile is not None and logdir is not None: raise ValueError('Cannot specify both logfile and logdir') # XXX: Need argument docstring self.last_time = time.time() self.start_time = self.last_time if logdir is not None: logfile = os.path.join(logdir, 'joblib.log') self.logfile = logfile if logfile is not None: mkdirp(os.path.dirname(logfile)) if os.path.exists(logfile): # Rotate the logs for i in range(1, 9): try: shutil.move(logfile + '.%i' % i, logfile + '.%i' % (i + 1)) except: "No reason failing here" # Use a copy rather than a move, so that a process # monitoring this file does not get lost. try: shutil.copy(logfile, logfile + '.1') except: "No reason failing here" try: with open(logfile, 'w') as logfile: logfile.write('\nLogging joblib python script\n') logfile.write('\n---%s---\n' % time.ctime(self.last_time)) except: """ Multiprocessing writing to files can create race conditions. Rather fail silently than crash the computation. """ # XXX: We actually need a debug flag to disable this # silent failure. def __call__(self, msg='', total=False): """ Print the time elapsed between the last call and the current call, with an optional message. """ if not total: time_lapse = time.time() - self.last_time full_msg = "%s: %s" % (msg, format_time(time_lapse)) else: # FIXME: Too much logic duplicated time_lapse = time.time() - self.start_time full_msg = "%s: %.2fs, %.1f min" % (msg, time_lapse, time_lapse / 60) print(full_msg, file=sys.stderr) if self.logfile is not None: try: print >> file(self.logfile, 'a'), full_msg except: """ Multiprocessing writing to files can create race conditions. Rather fail silently than crash the calculation. """ # XXX: We actually need a debug flag to disable this # silent failure. self.last_time = time.time() joblib-0.7.1/joblib/memory.py000066400000000000000000000547271217450746300161330ustar00rootroot00000000000000""" A context object for caching a function's return value each time it is called with the same input arguments. """ # Author: Gael Varoquaux # Copyright (c) 2009 Gael Varoquaux # License: BSD Style, 3 clauses. from __future__ import with_statement import os import shutil import time import pydoc try: import cPickle as pickle except ImportError: import pickle import functools import traceback import warnings import inspect import json # Local imports from .hashing import hash from .func_inspect import get_func_code, get_func_name, filter_args from .logger import Logger, format_time from . import numpy_pickle from .disk import mkdirp, rm_subdirs FIRST_LINE_TEXT = "# first line:" # TODO: The following object should have a data store object as a sub # object, and the interface to persist and query should be separated in # the data store. # # This would enable creating 'Memory' objects with a different logic for # pickling that would simply span a MemorizedFunc with the same # store (or do we want to copy it to avoid cross-talks?), for instance to # implement HDF5 pickling. # TODO: Same remark for the logger, and probably use the Python logging # mechanism. def extract_first_line(func_code): """ Extract the first line information from the function code text if available. """ if func_code.startswith(FIRST_LINE_TEXT): func_code = func_code.split('\n') first_line = int(func_code[0][len(FIRST_LINE_TEXT):]) func_code = '\n'.join(func_code[1:]) else: first_line = -1 return func_code, first_line class JobLibCollisionWarning(UserWarning): """ Warn that there might be a collision between names of functions. """ ############################################################################### # class `MemorizedFunc` ############################################################################### class MemorizedFunc(Logger): """ Callable object decorating a function for caching its return value each time it is called. All values are cached on the filesystem, in a deep directory structure. Methods are provided to inspect the cache or clean it. Attributes ---------- func: callable The original, undecorated, function. cachedir: string Path to the base cache directory of the memory context. ignore: list or None List of variable names to ignore when choosing whether to recompute. mmap_mode: {None, 'r+', 'r', 'w+', 'c'} The memmapping mode used when loading from cache numpy arrays. See numpy.load for the meaning of the arguments. compress: boolean Whether to zip the stored data on disk. Note that compressed arrays cannot be read by memmapping. verbose: int, optional The verbosity flag, controls messages that are issued as the function is evaluated. """ #------------------------------------------------------------------------- # Public interface #------------------------------------------------------------------------- def __init__(self, func, cachedir, ignore=None, mmap_mode=None, compress=False, verbose=1, timestamp=None): """ Parameters ---------- func: callable The function to decorate cachedir: string The path of the base directory to use as a data store ignore: list or None List of variable names to ignore. mmap_mode: {None, 'r+', 'r', 'w+', 'c'}, optional The memmapping mode used when loading from cache numpy arrays. See numpy.load for the meaning of the arguments. verbose: int, optional Verbosity flag, controls the debug messages that are issued as functions are evaluated. The higher, the more verbose timestamp: float, optional The reference time from which times in tracing messages are reported. """ Logger.__init__(self) self._verbose = verbose self.cachedir = cachedir self.func = func self.mmap_mode = mmap_mode self.compress = compress if compress and mmap_mode is not None: warnings.warn('Compressed results cannot be memmapped', stacklevel=2) if timestamp is None: timestamp = time.time() self.timestamp = timestamp if ignore is None: ignore = [] self.ignore = ignore mkdirp(self.cachedir) try: functools.update_wrapper(self, func) except: " Objects like ufunc don't like that " if inspect.isfunction(func): doc = pydoc.TextDoc().document(func ).replace('\n', '\n\n', 1) else: # Pydoc does a poor job on other objects doc = func.__doc__ self.__doc__ = 'Memoized version of %s' % doc def __call__(self, *args, **kwargs): # Compare the function code with the previous to see if the # function code has changed output_dir, argument_hash = self.get_output_dir(*args, **kwargs) # FIXME: The statements below should be try/excepted if not (self._check_previous_func_code(stacklevel=3) and os.path.exists(output_dir)): if self._verbose > 10: _, name = get_func_name(self.func) self.warn('Computing func %s, argument hash %s in ' 'directory %s' % (name, argument_hash, output_dir)) return self.call(*args, **kwargs) else: try: t0 = time.time() out = self.load_output(output_dir) if self._verbose > 4: t = time.time() - t0 _, name = get_func_name(self.func) msg = '%s cache loaded - %s' % (name, format_time(t)) print(max(0, (80 - len(msg))) * '_' + msg) return out except Exception: # XXX: Should use an exception logger self.warn('Exception while loading results for ' '(args=%s, kwargs=%s)\n %s' % (args, kwargs, traceback.format_exc())) shutil.rmtree(output_dir, ignore_errors=True) return self.call(*args, **kwargs) def __reduce__(self): """ We don't store the timestamp when pickling, to avoid the hash depending from it. In addition, when unpickling, we run the __init__ """ return (self.__class__, (self.func, self.cachedir, self.ignore, self.mmap_mode, self.compress, self._verbose)) #------------------------------------------------------------------------- # Private interface #------------------------------------------------------------------------- def _get_func_dir(self, mkdir=True): """ Get the directory corresponding to the cache for the function. """ module, name = get_func_name(self.func) module.append(name) func_dir = os.path.join(self.cachedir, *module) if mkdir: mkdirp(func_dir) return func_dir def get_output_dir(self, *args, **kwargs): """ Returns the directory in which are persisted the results of the function corresponding to the given arguments. The results can be loaded using the .load_output method. """ coerce_mmap = (self.mmap_mode is not None) argument_hash = hash(filter_args(self.func, self.ignore, args, kwargs), coerce_mmap=coerce_mmap) output_dir = os.path.join(self._get_func_dir(self.func), argument_hash) return output_dir, argument_hash def _write_func_code(self, filename, func_code, first_line): """ Write the function code and the filename to a file. """ func_code = '%s %i\n%s' % (FIRST_LINE_TEXT, first_line, func_code) with open(filename, 'w') as out: out.write(func_code) def _check_previous_func_code(self, stacklevel=2): """ stacklevel is the depth a which this function is called, to issue useful warnings to the user. """ # Here, we go through some effort to be robust to dynamically # changing code and collision. We cannot inspect.getsource # because it is not reliable when using IPython's magic "%run". func_code, source_file, first_line = get_func_code(self.func) func_dir = self._get_func_dir() func_code_file = os.path.join(func_dir, 'func_code.py') try: with open(func_code_file) as infile: old_func_code, old_first_line = \ extract_first_line(infile.read()) except IOError: self._write_func_code(func_code_file, func_code, first_line) return False if old_func_code == func_code: return True # We have differing code, is this because we are referring to # differing functions, or because the function we are referring as # changed? _, func_name = get_func_name(self.func, resolv_alias=False, win_characters=False) if old_first_line == first_line == -1 or func_name == '': if not first_line == -1: func_description = '%s (%s:%i)' % (func_name, source_file, first_line) else: func_description = func_name warnings.warn(JobLibCollisionWarning( "Cannot detect name collisions for function '%s'" % func_description), stacklevel=stacklevel) # Fetch the code at the old location and compare it. If it is the # same than the code store, we have a collision: the code in the # file has not changed, but the name we have is pointing to a new # code block. if not old_first_line == first_line and source_file is not None: possible_collision = False if os.path.exists(source_file): _, func_name = get_func_name(self.func, resolv_alias=False) num_lines = len(func_code.split('\n')) with open(source_file) as f: on_disk_func_code = f.readlines()[ old_first_line - 1 :old_first_line - 1 + num_lines - 1] on_disk_func_code = ''.join(on_disk_func_code) possible_collision = (on_disk_func_code.rstrip() == old_func_code.rstrip()) else: possible_collision = source_file.startswith(' 10: _, func_name = get_func_name(self.func, resolv_alias=False) self.warn("Function %s (stored in %s) has changed." % (func_name, func_dir)) self.clear(warn=True) return False def clear(self, warn=True): """ Empty the function's cache. """ func_dir = self._get_func_dir(mkdir=False) if self._verbose and warn: self.warn("Clearing cache %s" % func_dir) if os.path.exists(func_dir): shutil.rmtree(func_dir, ignore_errors=True) mkdirp(func_dir) func_code, _, first_line = get_func_code(self.func) func_code_file = os.path.join(func_dir, 'func_code.py') self._write_func_code(func_code_file, func_code, first_line) def call(self, *args, **kwargs): """ Force the execution of the function with the given arguments and persist the output values. """ start_time = time.time() output_dir, argument_hash = self.get_output_dir(*args, **kwargs) if self._verbose: print(self.format_call(*args, **kwargs)) output = self.func(*args, **kwargs) self._persist_output(output, output_dir) self._persist_input(output_dir, *args, **kwargs) duration = time.time() - start_time if self._verbose: _, name = get_func_name(self.func) msg = '%s - %s' % (name, format_time(duration)) print(max(0, (80 - len(msg))) * '_' + msg) return output def format_call(self, *args, **kwds): """ Returns a nicely formatted statement displaying the function call with the given arguments. """ path, signature = self.format_signature(self.func, *args, **kwds) msg = '%s\n[Memory] Calling %s...\n%s' % (80 * '_', path, signature) return msg # XXX: Not using logging framework #self.debug(msg) def format_signature(self, func, *args, **kwds): # XXX: This should be moved out to a function # XXX: Should this use inspect.formatargvalues/formatargspec? module, name = get_func_name(func) module = [m for m in module if m] if module: module.append(name) module_path = '.'.join(module) else: module_path = name arg_str = list() previous_length = 0 for arg in args: arg = self.format(arg, indent=2) if len(arg) > 1500: arg = '%s...' % arg[:700] if previous_length > 80: arg = '\n%s' % arg previous_length = len(arg) arg_str.append(arg) arg_str.extend(['%s=%s' % (v, self.format(i)) for v, i in kwds.items()]) arg_str = ', '.join(arg_str) signature = '%s(%s)' % (name, arg_str) return module_path, signature # Make make public def _persist_output(self, output, dir): """ Persist the given output tuple in the directory. """ try: mkdirp(dir) filename = os.path.join(dir, 'output.pkl') numpy_pickle.dump(output, filename, compress=self.compress) if self._verbose > 10: print('Persisting in %s' % dir) except OSError: " Race condition in the creation of the directory " def _persist_input(self, output_dir, *args, **kwargs): """ Save a small summary of the call using json format in the output directory. """ argument_dict = filter_args(self.func, self.ignore, args, kwargs) input_repr = dict((k, repr(v)) for k, v in argument_dict.items()) # This can fail do to race-conditions with multiple # concurrent joblibs removing the file or the directory try: mkdirp(output_dir) json.dump( input_repr, file(os.path.join(output_dir, 'input_args.json'), 'w'), ) except: pass return input_repr def load_output(self, output_dir): """ Read the results of a previous calculation from the directory it was cached in. """ if self._verbose > 1: t = time.time() - self.timestamp if self._verbose < 10: print('[Memory]% 16s: Loading %s...' % ( format_time(t), self.format_signature(self.func)[0] )) else: print('[Memory]% 16s: Loading %s from %s' % ( format_time(t), self.format_signature(self.func)[0], output_dir )) filename = os.path.join(output_dir, 'output.pkl') return numpy_pickle.load(filename, mmap_mode=self.mmap_mode) # XXX: Need a method to check if results are available. #------------------------------------------------------------------------- # Private `object` interface #------------------------------------------------------------------------- def __repr__(self): return '%s(func=%s, cachedir=%s)' % ( self.__class__.__name__, self.func, repr(self.cachedir), ) ############################################################################### # class `Memory` ############################################################################### class Memory(Logger): """ A context object for caching a function's return value each time it is called with the same input arguments. All values are cached on the filesystem, in a deep directory structure. see :ref:`memory_reference` """ #------------------------------------------------------------------------- # Public interface #------------------------------------------------------------------------- def __init__(self, cachedir, mmap_mode=None, compress=False, verbose=1): """ Parameters ---------- cachedir: string or None The path of the base directory to use as a data store or None. If None is given, no caching is done and the Memory object is completely transparent. mmap_mode: {None, 'r+', 'r', 'w+', 'c'}, optional The memmapping mode used when loading from cache numpy arrays. See numpy.load for the meaning of the arguments. compress: boolean Whether to zip the stored data on disk. Note that compressed arrays cannot be read by memmapping. verbose: int, optional Verbosity flag, controls the debug messages that are issued as functions are evaluated. """ # XXX: Bad explanation of the None value of cachedir Logger.__init__(self) self._verbose = verbose self.mmap_mode = mmap_mode self.timestamp = time.time() self.compress = compress if compress and mmap_mode is not None: warnings.warn('Compressed results cannot be memmapped', stacklevel=2) if cachedir is None: self.cachedir = None else: self.cachedir = os.path.join(cachedir, 'joblib') mkdirp(self.cachedir) def cache(self, func=None, ignore=None, verbose=None, mmap_mode=False): """ Decorates the given function func to only compute its return value for input arguments not cached on disk. Parameters ---------- func: callable, optional The function to be decorated ignore: list of strings A list of arguments name to ignore in the hashing verbose: integer, optional The verbosity mode of the function. By default that of the memory object is used. mmap_mode: {None, 'r+', 'r', 'w+', 'c'}, optional The memmapping mode used when loading from cache numpy arrays. See numpy.load for the meaning of the arguments. By default that of the memory object is used. Returns ------- decorated_func: MemorizedFunc object The returned object is a MemorizedFunc object, that is callable (behaves like a function), but offers extra methods for cache lookup and management. See the documentation for :class:`joblib.memory.MemorizedFunc`. """ if func is None: # Partial application, to be able to specify extra keyword # arguments in decorators return functools.partial(self.cache, ignore=ignore) if self.cachedir is None: return func if verbose is None: verbose = self._verbose if mmap_mode is False: mmap_mode = self.mmap_mode if isinstance(func, MemorizedFunc): func = func.func return MemorizedFunc(func, cachedir=self.cachedir, mmap_mode=mmap_mode, ignore=ignore, compress=self.compress, verbose=verbose, timestamp=self.timestamp) def clear(self, warn=True): """ Erase the complete cache directory. """ if warn: self.warn('Flushing completely the cache') rm_subdirs(self.cachedir) def eval(self, func, *args, **kwargs): """ Eval function func with arguments `*args` and `**kwargs`, in the context of the memory. This method works similarly to the builtin `apply`, except that the function is called only if the cache is not up to date. """ if self.cachedir is None: return func(*args, **kwargs) return self.cache(func)(*args, **kwargs) #------------------------------------------------------------------------- # Private `object` interface #------------------------------------------------------------------------- def __repr__(self): return '%s(cachedir=%s)' % ( self.__class__.__name__, repr(self.cachedir), ) def __reduce__(self): """ We don't store the timestamp when pickling, to avoid the hash depending from it. In addition, when unpickling, we run the __init__ """ # We need to remove 'joblib' from the end of cachedir cachedir = self.cachedir[:-7] if self.cachedir is not None else None return (self.__class__, (cachedir, self.mmap_mode, self.compress, self._verbose)) joblib-0.7.1/joblib/my_exceptions.py000066400000000000000000000053411217450746300174750ustar00rootroot00000000000000""" Exceptions """ # Author: Gael Varoquaux < gael dot varoquaux at normalesup dot org > # Copyright: 2010, Gael Varoquaux # License: BSD 3 clause import sys class JoblibException(Exception): """ A simple exception with an error message that you can get to. """ def __init__(self, message): self.message = message def __reduce__(self): # For pickling return self.__class__, (self.message,), {} def __repr__(self): return '%s\n%s\n%s\n%s' % ( self.__class__.__name__, 75 * '_', self.message, 75 * '_') __str__ = __repr__ class TransportableException(JoblibException): """ An exception containing all the info to wrap an original exception and recreate it. """ def __init__(self, message, etype): self.message = message self.etype = etype def __reduce__(self): # For pickling return self.__class__, (self.message, self.etype), {} _exception_mapping = dict() def _mk_exception(exception, name=None): # Create an exception inheriting from both JoblibException # and that exception if name is None: name = exception.__name__ this_name = 'Joblib%s' % name if this_name in _exception_mapping: # Avoid creating twice the same exception this_exception = _exception_mapping[this_name] else: this_exception = type(this_name, (exception, JoblibException), dict(__repr__=JoblibException.__repr__, __str__=JoblibException.__str__), ) _exception_mapping[this_name] = this_exception return this_exception, this_name def _mk_common_exceptions(): namespace = dict() if sys.version_info[0] == 3: import builtins as _builtin_exceptions common_exceptions = filter( lambda x: x.endswith('Error'), dir(_builtin_exceptions)) else: import exceptions as _builtin_exceptions common_exceptions = dir(_builtin_exceptions) for name in common_exceptions: obj = getattr(_builtin_exceptions, name) if isinstance(obj, type) and issubclass(obj, BaseException): try: this_obj, this_name = _mk_exception(obj, name=name) namespace[this_name] = this_obj except TypeError: # Cannot create a consistent method resolution order: # a class that we can't subclass properly, probably # BaseException pass return namespace # Updating module locals so that the exceptions pickle right. AFAIK this # works only at module-creation time locals().update(_mk_common_exceptions()) joblib-0.7.1/joblib/numpy_pickle.py000066400000000000000000000360331217450746300173100ustar00rootroot00000000000000""" Utilities for fast persistence of big data, with optional compression. """ # Author: Gael Varoquaux # Copyright (c) 2009 Gael Varoquaux # License: BSD Style, 3 clauses. import pickle import traceback import sys import os import zlib import warnings from ._compat import _basestring from io import BytesIO if sys.version_info[0] >= 3: Unpickler = pickle._Unpickler Pickler = pickle._Pickler def asbytes(s): if isinstance(s, bytes): return s return s.encode('latin1') else: Unpickler = pickle.Unpickler Pickler = pickle.Pickler asbytes = str _MEGA = 2 ** 20 _MAX_LEN = len(hex(2 ** 64)) # To detect file types _ZFILE_PREFIX = asbytes('ZF') ############################################################################### # Compressed file with Zlib def _read_magic(file_handle): """ Utility to check the magic signature of a file identifying it as a Zfile """ magic = file_handle.read(len(_ZFILE_PREFIX)) # Pickling needs file-handles at the beginning of the file file_handle.seek(0) return magic def read_zfile(file_handle): """Read the z-file and return the content as a string Z-files are raw data compressed with zlib used internally by joblib for persistence. Backward compatibility is not guaranteed. Do not use for external purposes. """ file_handle.seek(0) assert _read_magic(file_handle) == _ZFILE_PREFIX, \ "File does not have the right magic" length = file_handle.read(len(_ZFILE_PREFIX) + _MAX_LEN) length = length[len(_ZFILE_PREFIX):] length = int(length, 16) # We use the known length of the data to tell Zlib the size of the # buffer to allocate. data = zlib.decompress(file_handle.read(), 15, length) assert len(data) == length, ( "Incorrect data length while decompressing %s." "The file could be corrupted." % file_handle) return data def write_zfile(file_handle, data, compress=1): """Write the data in the given file as a Z-file. Z-files are raw data compressed with zlib used internally by joblib for persistence. Backward compatibility is not guarantied. Do not use for external purposes. """ file_handle.write(_ZFILE_PREFIX) length = hex(len(data)) if sys.version_info[0] < 3 and type(length) is long: # We need to remove the trailing 'L' in the hex representation length = length[:-1] # Store the length of the data file_handle.write(asbytes(length.ljust(_MAX_LEN))) file_handle.write(zlib.compress(asbytes(data), compress)) ############################################################################### # Utility objects for persistence. class NDArrayWrapper(object): """ An object to be persisted instead of numpy arrays. The only thing this object does, is to carry the filename in which the array has been persisted, and the array subclass. """ def __init__(self, filename, subclass): "Store the useful information for later" self.filename = filename self.subclass = subclass def read(self, unpickler): "Reconstruct the array" filename = os.path.join(unpickler._dirname, self.filename) # Load the array from the disk if unpickler.np.__version__ >= '1.3': array = unpickler.np.load(filename, mmap_mode=unpickler.mmap_mode) else: # Numpy does not have mmap_mode before 1.3 array = unpickler.np.load(filename) # Reconstruct subclasses. This does not work with old # versions of numpy if (hasattr(array, '__array_prepare__') and not self.subclass in (unpickler.np.ndarray, unpickler.np.memmap)): # We need to reconstruct another subclass new_array = unpickler.np.core.multiarray._reconstruct( self.subclass, (0,), 'b') new_array.__array_prepare__(array) array = new_array return array #def __reduce__(self): # return None class ZNDArrayWrapper(NDArrayWrapper): """An object to be persisted instead of numpy arrays. This object store the Zfile filename in which the data array has been persisted, and the meta information to retrieve it. The reason that we store the raw buffer data of the array and the meta information, rather than array representation routine (tostring) is that it enables us to use completely the strided model to avoid memory copies (a and a.T store as fast). In addition saving the heavy information separately can avoid creating large temporary buffers when unpickling data with large arrays. """ def __init__(self, filename, init_args, state): "Store the useful information for later" self.filename = filename self.state = state self.init_args = init_args def read(self, unpickler): "Reconstruct the array from the meta-information and the z-file" # Here we a simply reproducing the unpickling mechanism for numpy # arrays filename = os.path.join(unpickler._dirname, self.filename) array = unpickler.np.core.multiarray._reconstruct(*self.init_args) data = read_zfile(open(filename, 'rb')) state = self.state + (data,) array.__setstate__(state) return array ############################################################################### # Pickler classes class NumpyPickler(Pickler): """A pickler to persist of big data efficiently. The main features of this object are: * persistence of numpy arrays in separate .npy files, for which I/O is fast. * optional compression using Zlib, with a special care on avoid temporaries. """ def __init__(self, filename, compress=0, cache_size=100): self._filename = filename self._filenames = [filename, ] self.cache_size = cache_size self.compress = compress if not self.compress: self.file = open(filename, 'wb') else: self.file = BytesIO() # Count the number of npy files that we have created: self._npy_counter = 0 Pickler.__init__(self, self.file, protocol=pickle.HIGHEST_PROTOCOL) # delayed import of numpy, to avoid tight coupling try: import numpy as np except ImportError: np = None self.np = np def _write_array(self, array, filename): if not self.compress: self.np.save(filename, array) container = NDArrayWrapper(os.path.basename(filename), type(array)) else: filename += '.z' # Efficient compressed storage: # The meta data is stored in the container, and the core # numerics in a z-file _, init_args, state = array.__reduce__() # the last entry of 'state' is the data itself zfile = open(filename, 'wb') write_zfile(zfile, state[-1], compress=self.compress) zfile.close() state = state[:-1] container = ZNDArrayWrapper(os.path.basename(filename), init_args, state) return container, filename def save(self, obj): """ Subclass the save method, to save ndarray subclasses in npy files, rather than pickling them. Of course, this is a total abuse of the Pickler class. """ if self.np is not None and type(obj) in (self.np.ndarray, self.np.matrix, self.np.memmap): size = obj.size * obj.itemsize if self.compress and size < self.cache_size * _MEGA: # When compressing, as we are not writing directly to the # disk, it is more efficient to use standard pickling if type(obj) is self.np.memmap: # Pickling doesn't work with memmaped arrays obj = self.np.asarray(obj) return Pickler.save(self, obj) self._npy_counter += 1 try: filename = '%s_%02i.npy' % (self._filename, self._npy_counter) # This converts the array in a container obj, filename = self._write_array(obj, filename) self._filenames.append(filename) except: self._npy_counter -= 1 # XXX: We should have a logging mechanism print('Failed to save %s to .npy file:\n%s' % ( type(obj), traceback.format_exc())) return Pickler.save(self, obj) def close(self): if self.compress: zfile = open(self._filename, 'wb') write_zfile(zfile, self.file.getvalue(), self.compress) zfile.close() class NumpyUnpickler(Unpickler): """A subclass of the Unpickler to unpickle our numpy pickles. """ dispatch = Unpickler.dispatch.copy() def __init__(self, filename, file_handle, mmap_mode=None): self._filename = os.path.basename(filename) self._dirname = os.path.dirname(filename) self.mmap_mode = mmap_mode self.file_handle = self._open_pickle(file_handle) Unpickler.__init__(self, self.file_handle) try: import numpy as np except ImportError: np = None self.np = np def _open_pickle(self, file_handle): return file_handle def load_build(self): """ This method is called to set the state of a newly created object. We capture it to replace our place-holder objects, NDArrayWrapper, by the array we are interested in. We replace them directly in the stack of pickler. """ Unpickler.load_build(self) if isinstance(self.stack[-1], NDArrayWrapper): if self.np is None: raise ImportError('Trying to unpickle an ndarray, ' "but numpy didn't import correctly") nd_array_wrapper = self.stack.pop() array = nd_array_wrapper.read(self) self.stack.append(array) # Be careful to register our new method. if sys.version_info[0] >= 3: dispatch[pickle.BUILD[0]] = load_build else: dispatch[pickle.BUILD] = load_build class ZipNumpyUnpickler(NumpyUnpickler): """A subclass of our Unpickler to unpickle on the fly from compressed storage.""" def __init__(self, filename, file_handle): NumpyUnpickler.__init__(self, filename, file_handle, mmap_mode=None) def _open_pickle(self, file_handle): return BytesIO(read_zfile(file_handle)) ############################################################################### # Utility functions def dump(value, filename, compress=0, cache_size=100): """Fast persistence of an arbitrary Python object into a files, with dedicated storage for numpy arrays. Parameters ----------- value: any Python object The object to store to disk filename: string The name of the file in which it is to be stored compress: integer for 0 to 9, optional Optional compression level for the data. 0 is no compression. Higher means more compression, but also slower read and write times. Using a value of 3 is often a good compromise. See the notes for more details. cache_size: positive number, optional Fixes the order of magnitude (in megabytes) of the cache used for in-memory compression. Note that this is just an order of magnitude estimate and that for big arrays, the code will go over this value at dump and at load time. Returns ------- filenames: list of strings The list of file names in which the data is stored. If compress is false, each array is stored in a different file. See Also -------- joblib.load : corresponding loader Notes ----- Memmapping on load cannot be used for compressed files. Thus using compression can significantly slow down loading. In addition, compressed files take extra extra memory during dump and load. """ if not isinstance(filename, _basestring): # People keep inverting arguments, and the resulting error is # incomprehensible raise ValueError( 'Second argument should be a filename, %s (type %s) was given' % (filename, type(filename)) ) try: pickler = NumpyPickler(filename, compress=compress, cache_size=cache_size) pickler.dump(value) pickler.close() finally: if 'pickler' in locals() and hasattr(pickler, 'file'): pickler.file.flush() pickler.file.close() return pickler._filenames def load(filename, mmap_mode=None): """Reconstruct a Python object from a file persisted with joblib.load. Parameters ----------- filename: string The name of the file from which to load the object mmap_mode: {None, 'r+', 'r', 'w+', 'c'}, optional If not None, the arrays are memory-mapped from the disk. This mode has not effect for compressed files. Note that in this case the reconstructed object might not longer match exactly the originally pickled object. Returns ------- result: any Python object The object stored in the file. See Also -------- joblib.dump : function to save an object Notes ----- This function can load numpy array files saved separately during the dump. If the mmap_mode argument is given, it is passed to np.load and arrays are loaded as memmaps. As a consequence, the reconstructed object might not match the original pickled object. Note that if the file was saved with compression, the arrays cannot be memmaped. """ file_handle = open(filename, 'rb') # We are careful to open the file handle early and keep it open to # avoid race-conditions on renames. That said, if data are stored in # companion files, moving the directory will create a race when # joblib tries to access the companion files. if _read_magic(file_handle) == _ZFILE_PREFIX: if mmap_mode is not None: warnings.warn('file "%(filename)s" appears to be a zip, ' 'ignoring mmap_mode "%(mmap_mode)s" flag passed' % locals(), Warning, stacklevel=2) unpickler = ZipNumpyUnpickler(filename, file_handle=file_handle) else: unpickler = NumpyUnpickler(filename, file_handle=file_handle, mmap_mode=mmap_mode) try: obj = unpickler.load() finally: if hasattr(unpickler, 'file_handle'): unpickler.file_handle.close() return obj joblib-0.7.1/joblib/parallel.py000066400000000000000000000522511217450746300164050ustar00rootroot00000000000000""" Helpers for embarrassingly parallel code. """ # Author: Gael Varoquaux < gael dot varoquaux at normalesup dot org > # Copyright: 2010, Gael Varoquaux # License: BSD 3 clause import os import sys import warnings from collections import Sized from math import sqrt import functools import time import threading import itertools try: import cPickle as pickle except: import pickle # Obtain possible configuration from the environment, assuming 1 (on) # by default, upon 0 set to None. Should instructively fail if some non # 0/1 value is set. multiprocessing = int(os.environ.get('JOBLIB_MULTIPROCESSING', 1)) or None if multiprocessing: try: import multiprocessing except ImportError: multiprocessing = None # 2nd stage: validate that locking is available on the system and # issue a warning if not if multiprocessing: try: _sem = multiprocessing.Semaphore() del _sem # cleanup except (ImportError, OSError) as e: multiprocessing = None warnings.warn('%s. joblib will operate in serial mode' % (e,)) from .format_stack import format_exc, format_outer_frames from .logger import Logger, short_format_time from .my_exceptions import TransportableException, _mk_exception ############################################################################### # CPU that works also when multiprocessing is not installed (python2.5) def cpu_count(): """ Return the number of CPUs. """ if multiprocessing is None: return 1 return multiprocessing.cpu_count() ############################################################################### # For verbosity def _verbosity_filter(index, verbose): """ Returns False for indices increasingly apart, the distance depending on the value of verbose. We use a lag increasing as the square of index """ if not verbose: return True elif verbose > 10: return False if index == 0: return False verbose = .5 * (11 - verbose) ** 2 scale = sqrt(index / verbose) next_scale = sqrt((index + 1) / verbose) return (int(next_scale) == int(scale)) ############################################################################### class WorkerInterrupt(Exception): """ An exception that is not KeyboardInterrupt to allow subprocesses to be interrupted. """ pass ############################################################################### class SafeFunction(object): """ Wraps a function to make it exception with full traceback in their representation. Useful for parallel computing with multiprocessing, for which exceptions cannot be captured. """ def __init__(self, func): self.func = func def __call__(self, *args, **kwargs): try: return self.func(*args, **kwargs) except KeyboardInterrupt: # We capture the KeyboardInterrupt and reraise it as # something different, as multiprocessing does not # interrupt processing for a KeyboardInterrupt raise WorkerInterrupt() except: e_type, e_value, e_tb = sys.exc_info() text = format_exc(e_type, e_value, e_tb, context=10, tb_offset=1) raise TransportableException(text, e_type) ############################################################################### def delayed(function): """ Decorator used to capture the arguments of a function. """ # Try to pickle the input function, to catch the problems early when # using with multiprocessing pickle.dumps(function) def delayed_function(*args, **kwargs): return function, args, kwargs try: delayed_function = functools.wraps(function)(delayed_function) except AttributeError: " functools.wraps fails on some callable objects " return delayed_function ############################################################################### class ImmediateApply(object): """ A non-delayed apply function. """ def __init__(self, func, args, kwargs): # Don't delay the application, to avoid keeping the input # arguments in memory self.results = func(*args, **kwargs) def get(self): return self.results ############################################################################### class CallBack(object): """ Callback used by parallel: it is used for progress reporting, and to add data to be processed """ def __init__(self, index, parallel): self.parallel = parallel self.index = index def __call__(self, out): self.parallel.print_progress(self.index) if self.parallel._iterable: self.parallel.dispatch_next() ############################################################################### class Parallel(Logger): ''' Helper class for readable parallel mapping. Parameters ----------- n_jobs: int The number of jobs to use for the computation. If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used. verbose: int, optional The verbosity level: if non zero, progress messages are printed. Above 50, the output is sent to stdout. The frequency of the messages increases with the verbosity level. If it more than 10, all iterations are reported. pre_dispatch: {'all', integer, or expression, as in '3*n_jobs'} The amount of jobs to be pre-dispatched. Default is 'all', but it may be memory consuming, for instance if each job involves a lot of a data. Notes ----- This object uses the multiprocessing module to compute in parallel the application of a function to many different arguments. The main functionality it brings in addition to using the raw multiprocessing API are (see examples for details): * More readable code, in particular since it avoids constructing list of arguments. * Easier debugging: - informative tracebacks even when the error happens on the client side - using 'n_jobs=1' enables to turn off parallel computing for debugging without changing the codepath - early capture of pickling errors * An optional progress meter. * Interruption of multiprocesses jobs with 'Ctrl-C' Examples -------- A simple example: >>> from math import sqrt >>> from joblib import Parallel, delayed >>> Parallel(n_jobs=1)(delayed(sqrt)(i**2) for i in range(10)) [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0] Reshaping the output when the function has several return values: >>> from math import modf >>> from joblib import Parallel, delayed >>> r = Parallel(n_jobs=1)(delayed(modf)(i/2.) for i in range(10)) >>> res, i = zip(*r) >>> res (0.0, 0.5, 0.0, 0.5, 0.0, 0.5, 0.0, 0.5, 0.0, 0.5) >>> i (0.0, 0.0, 1.0, 1.0, 2.0, 2.0, 3.0, 3.0, 4.0, 4.0) The progress meter: the higher the value of `verbose`, the more messages:: >>> from time import sleep >>> from joblib import Parallel, delayed >>> r = Parallel(n_jobs=2, verbose=5)(delayed(sleep)(.1) for _ in range(10)) #doctest: +SKIP [Parallel(n_jobs=2)]: Done 1 out of 10 | elapsed: 0.1s remaining: 0.9s [Parallel(n_jobs=2)]: Done 3 out of 10 | elapsed: 0.2s remaining: 0.5s [Parallel(n_jobs=2)]: Done 6 out of 10 | elapsed: 0.3s remaining: 0.2s [Parallel(n_jobs=2)]: Done 9 out of 10 | elapsed: 0.5s remaining: 0.1s [Parallel(n_jobs=2)]: Done 10 out of 10 | elapsed: 0.5s finished Traceback example, note how the line of the error is indicated as well as the values of the parameter passed to the function that triggered the exception, even though the traceback happens in the child process:: >>> from heapq import nlargest >>> from joblib import Parallel, delayed >>> Parallel(n_jobs=2)(delayed(nlargest)(2, n) for n in (range(4), 'abcde', 3)) #doctest: +SKIP #... --------------------------------------------------------------------------- Sub-process traceback: --------------------------------------------------------------------------- TypeError Mon Nov 12 11:37:46 2012 PID: 12934 Python 2.7.3: /usr/bin/python ........................................................................... /usr/lib/python2.7/heapq.pyc in nlargest(n=2, iterable=3, key=None) 419 if n >= size: 420 return sorted(iterable, key=key, reverse=True)[:n] 421 422 # When key is none, use simpler decoration 423 if key is None: --> 424 it = izip(iterable, count(0,-1)) # decorate 425 result = _nlargest(n, it) 426 return map(itemgetter(0), result) # undecorate 427 428 # General case, slowest method TypeError: izip argument #1 must support iteration ___________________________________________________________________________ Using pre_dispatch in a producer/consumer situation, where the data is generated on the fly. Note how the producer is first called a 3 times before the parallel loop is initiated, and then called to generate new data on the fly. In this case the total number of iterations cannot be reported in the progress messages:: >>> from math import sqrt >>> from joblib import Parallel, delayed >>> def producer(): ... for i in range(6): ... print('Produced %s' % i) ... yield i >>> out = Parallel(n_jobs=2, verbose=100, pre_dispatch='1.5*n_jobs')( ... delayed(sqrt)(i) for i in producer()) #doctest: +SKIP Produced 0 Produced 1 Produced 2 [Parallel(n_jobs=2)]: Done 1 jobs | elapsed: 0.0s Produced 3 [Parallel(n_jobs=2)]: Done 2 jobs | elapsed: 0.0s Produced 4 [Parallel(n_jobs=2)]: Done 3 jobs | elapsed: 0.0s Produced 5 [Parallel(n_jobs=2)]: Done 4 jobs | elapsed: 0.0s [Parallel(n_jobs=2)]: Done 5 out of 6 | elapsed: 0.0s remaining: 0.0s [Parallel(n_jobs=2)]: Done 6 out of 6 | elapsed: 0.0s finished ''' def __init__(self, n_jobs=1, verbose=0, pre_dispatch='all'): self.verbose = verbose self.n_jobs = n_jobs self.pre_dispatch = pre_dispatch self._pool = None # Not starting the pool in the __init__ is a design decision, to be # able to close it ASAP, and not burden the user with closing it. self._output = None self._jobs = list() # A flag used to abort the dispatching of jobs in case an # exception is found self._aborting = False def dispatch(self, func, args, kwargs): """ Queue the function for computing, with or without multiprocessing """ if self._pool is None: job = ImmediateApply(func, args, kwargs) index = len(self._jobs) if not _verbosity_filter(index, self.verbose): self._print('Done %3i jobs | elapsed: %s', (index + 1, short_format_time(time.time() - self._start_time) )) self._jobs.append(job) self.n_dispatched += 1 else: # If job.get() catches an exception, it closes the queue: if self._aborting: return try: self._lock.acquire() job = self._pool.apply_async(SafeFunction(func), args, kwargs, callback=CallBack(self.n_dispatched, self)) self._jobs.append(job) self.n_dispatched += 1 except AssertionError: print('[Parallel] Pool seems closed') finally: self._lock.release() def dispatch_next(self): """ Dispatch more data for parallel processing """ self._dispatch_amount += 1 while self._dispatch_amount: try: # XXX: possible race condition shuffling the order of # dispatches in the next two lines. func, args, kwargs = next(self._iterable) self.dispatch(func, args, kwargs) self._dispatch_amount -= 1 except ValueError: """ Race condition in accessing a generator, we skip, the dispatch will be done later. """ except StopIteration: self._iterable = None return def _print(self, msg, msg_args): """ Display the message on stout or stderr depending on verbosity """ # XXX: Not using the logger framework: need to # learn to use logger better. if not self.verbose: return if self.verbose < 50: writer = sys.stderr.write else: writer = sys.stdout.write msg = msg % msg_args writer('[%s]: %s\n' % (self, msg)) def print_progress(self, index): """Display the process of the parallel execution only a fraction of time, controlled by self.verbose. """ if not self.verbose: return elapsed_time = time.time() - self._start_time # This is heuristic code to print only 'verbose' times a messages # The challenge is that we may not know the queue length if self._iterable: if _verbosity_filter(index, self.verbose): return self._print('Done %3i jobs | elapsed: %s', (index + 1, short_format_time(elapsed_time), )) else: # We are finished dispatching queue_length = self.n_dispatched # We always display the first loop if not index == 0: # Display depending on the number of remaining items # A message as soon as we finish dispatching, cursor is 0 cursor = (queue_length - index + 1 - self._pre_dispatch_amount) frequency = (queue_length // self.verbose) + 1 is_last_item = (index + 1 == queue_length) if (is_last_item or cursor % frequency): return remaining_time = (elapsed_time / (index + 1) * (self.n_dispatched - index - 1.)) self._print('Done %3i out of %3i | elapsed: %s remaining: %s', (index + 1, queue_length, short_format_time(elapsed_time), short_format_time(remaining_time), )) def retrieve(self): self._output = list() while self._jobs: # We need to be careful: the job queue can be filling up as # we empty it if hasattr(self, '_lock'): self._lock.acquire() job = self._jobs.pop(0) if hasattr(self, '_lock'): self._lock.release() try: self._output.append(job.get()) except tuple(self.exceptions) as exception: try: self._aborting = True self._lock.acquire() if isinstance(exception, (KeyboardInterrupt, WorkerInterrupt)): # We have captured a user interruption, clean up # everything if hasattr(self, '_pool'): self._pool.close() self._pool.terminate() # We can now allow subprocesses again os.environ.pop('__JOBLIB_SPAWNED_PARALLEL__', 0) raise exception elif isinstance(exception, TransportableException): # Capture exception to add information on the local # stack in addition to the distant stack this_report = format_outer_frames(context=10, stack_start=1) report = """Multiprocessing exception: %s --------------------------------------------------------------------------- Sub-process traceback: --------------------------------------------------------------------------- %s""" % ( this_report, exception.message, ) # Convert this to a JoblibException exception_type = _mk_exception(exception.etype)[0] raise exception_type(report) raise exception finally: self._lock.release() def __call__(self, iterable): if self._jobs: raise ValueError('This Parallel instance is already running') n_jobs = self.n_jobs if n_jobs == 0: raise ValueError('n_jobs == 0 in Parallel has no meaning') if n_jobs < 0 and multiprocessing is not None: n_jobs = max(multiprocessing.cpu_count() + 1 + n_jobs, 1) # The list of exceptions that we will capture self.exceptions = [TransportableException] if n_jobs is None or multiprocessing is None or n_jobs == 1: n_jobs = 1 self._pool = None else: if multiprocessing.current_process()._daemonic: # Daemonic processes cannot have children n_jobs = 1 self._pool = None warnings.warn( 'Parallel loops cannot be nested, setting n_jobs=1', stacklevel=2) else: already_forked = int(os.environ.get('__JOBLIB_SPAWNED_PARALLEL__', 0)) if already_forked: raise ImportError('[joblib] Attempting to do parallel computing' 'without protecting your import on a system that does ' 'not support forking. To use parallel-computing in a ' 'script, you must protect you main loop using "if ' "__name__ == '__main__'" '". Please see the joblib documentation on Parallel ' 'for more information' ) # Set an environment variable to avoid infinite loops os.environ['__JOBLIB_SPAWNED_PARALLEL__'] = '1' self._pool = multiprocessing.Pool(n_jobs) self._lock = threading.Lock() # We are using multiprocessing, we also want to capture # KeyboardInterrupts self.exceptions.extend([KeyboardInterrupt, WorkerInterrupt]) pre_dispatch = self.pre_dispatch if isinstance(iterable, Sized): # We are given a sized (an object with len). No need to be lazy. pre_dispatch = 'all' if pre_dispatch == 'all' or n_jobs == 1: self._iterable = None self._pre_dispatch_amount = 0 else: self._iterable = iterable self._dispatch_amount = 0 if hasattr(pre_dispatch, 'endswith'): pre_dispatch = eval(pre_dispatch) self._pre_dispatch_amount = pre_dispatch = int(pre_dispatch) iterable = itertools.islice(iterable, pre_dispatch) self._start_time = time.time() self.n_dispatched = 0 try: for function, args, kwargs in iterable: self.dispatch(function, args, kwargs) self.retrieve() # Make sure that we get a last message telling us we are done elapsed_time = time.time() - self._start_time self._print('Done %3i out of %3i | elapsed: %s finished', (len(self._output), len(self._output), short_format_time(elapsed_time) )) finally: if n_jobs > 1: self._pool.close() self._pool.join() os.environ.pop('__JOBLIB_SPAWNED_PARALLEL__', 0) self._jobs = list() output = self._output self._output = None return output def __repr__(self): return '%s(n_jobs=%s)' % (self.__class__.__name__, self.n_jobs) joblib-0.7.1/joblib/test/000077500000000000000000000000001217450746300152115ustar00rootroot00000000000000joblib-0.7.1/joblib/test/__init__.py000066400000000000000000000000001217450746300173100ustar00rootroot00000000000000joblib-0.7.1/joblib/test/common.py000066400000000000000000000007351217450746300170600ustar00rootroot00000000000000""" Small utilities for testing. """ import nose # A decorator to run tests only when numpy is available try: import numpy as np def with_numpy(func): """ A decorator to skip tests requiring numpy. """ return func except ImportError: def with_numpy(func): """ A decorator to skip tests requiring numpy. """ def my_func(): raise nose.SkipTest('Test requires numpy') return my_func np = None joblib-0.7.1/joblib/test/test_disk.py000066400000000000000000000036011217450746300175540ustar00rootroot00000000000000""" Unit tests for the disk utilities. """ # Authors: Gael Varoquaux # Lars Buitinck # Copyright (c) 2010 Gael Varoquaux # License: BSD Style, 3 clauses. from __future__ import with_statement import os import shutil import array from tempfile import mkdtemp import nose from ..disk import disk_used, memstr_to_kbytes, mkdirp ############################################################################### def test_disk_used(): cachedir = mkdtemp() try: if os.path.exists(cachedir): shutil.rmtree(cachedir) os.mkdir(cachedir) # Not write a file that is 1M big in this directory, and check the # size. The reason we use such a big file is that it makes us robust # to errors due to block allocation. a = array.array('i') sizeof_i = a.itemsize target_size = 1024 n = int(target_size * 1024 / sizeof_i) a = array.array('i', n * (1,)) with open(os.path.join(cachedir, 'test'), 'wb') as output: a.tofile(output) nose.tools.assert_true(disk_used(cachedir) >= target_size) nose.tools.assert_true(disk_used(cachedir) < target_size + 12) finally: shutil.rmtree(cachedir) def test_memstr_to_kbytes(): for text, value in zip(('80G', '1.4M', '120M', '53K'), (80 * 1024 ** 2, int(1.4 * 1024), 120 * 1024, 53)): yield nose.tools.assert_equal, memstr_to_kbytes(text), value nose.tools.assert_raises(ValueError, memstr_to_kbytes, 'foobar') def test_mkdirp(): try: tmp = mkdtemp() mkdirp(os.path.join(tmp, "ham")) mkdirp(os.path.join(tmp, "ham")) mkdirp(os.path.join(tmp, "spam", "spam")) # Not all OSErrors are ignored nose.tools.assert_raises(OSError, mkdirp, "") finally: shutil.rmtree(tmp) joblib-0.7.1/joblib/test/test_format_stack.py000066400000000000000000000006771217450746300213110ustar00rootroot00000000000000""" Unit tests for the stack formatting utilities """ # Author: Gael Varoquaux # Copyright (c) 2010 Gael Varoquaux # License: BSD Style, 3 clauses. import nose from ..format_stack import safe_repr ############################################################################### class Vicious(object): def __repr__(self): raise ValueError def test_safe_repr(): safe_repr(Vicious()) joblib-0.7.1/joblib/test/test_func_inspect.py000066400000000000000000000131331217450746300213030ustar00rootroot00000000000000""" Test the func_inspect module. """ # Author: Gael Varoquaux # Copyright (c) 2009 Gael Varoquaux # License: BSD Style, 3 clauses. import nose import tempfile import functools from ..func_inspect import filter_args, get_func_name, get_func_code, \ _clean_win_chars from ..memory import Memory ############################################################################### # Module-level functions, for tests def f(x, y=0): pass def f2(x): pass # Create a Memory object to test decorated functions. # We should be careful not to call the decorated functions, so that # cache directories are not created in the temp dir. mem = Memory(cachedir=tempfile.gettempdir()) @mem.cache def g(x): return x def h(x, y=0, *args, **kwargs): pass def i(x=1): pass def j(x, y, **kwargs): pass def k(*args, **kwargs): pass class Klass(object): def f(self, x): return x ############################################################################### # Tests def test_filter_args(): yield nose.tools.assert_equal, filter_args(f, [], (1, )),\ {'x': 1, 'y': 0} yield nose.tools.assert_equal, filter_args(f, ['x'], (1, )),\ {'y': 0} yield nose.tools.assert_equal, filter_args(f, ['y'], (0, )),\ {'x': 0} yield nose.tools.assert_equal, filter_args(f, ['y'], (0, ), dict(y=1)), {'x': 0} yield nose.tools.assert_equal, filter_args(f, ['x', 'y'], (0, )), {} yield nose.tools.assert_equal, filter_args(f, [], (0,), dict(y=1)), {'x': 0, 'y': 1} yield nose.tools.assert_equal, filter_args(f, ['y'], (), dict(x=2, y=1)), {'x': 2} yield nose.tools.assert_equal, filter_args(i, [], (2, )), {'x': 2} yield nose.tools.assert_equal, filter_args(f2, [], (), dict(x=1)), {'x': 1} def test_filter_args_method(): obj = Klass() nose.tools.assert_equal(filter_args(obj.f, [], (1, )), {'x': 1, 'self': obj}) def test_filter_varargs(): yield nose.tools.assert_equal, filter_args(h, [], (1, )), \ {'x': 1, 'y': 0, '*': [], '**': {}} yield nose.tools.assert_equal, filter_args(h, [], (1, 2, 3, 4)), \ {'x': 1, 'y': 2, '*': [3, 4], '**': {}} yield nose.tools.assert_equal, filter_args(h, [], (1, 25), dict(ee=2)), \ {'x': 1, 'y': 25, '*': [], '**': {'ee': 2}} yield nose.tools.assert_equal, filter_args(h, ['*'], (1, 2, 25), dict(ee=2)), \ {'x': 1, 'y': 2, '**': {'ee': 2}} def test_filter_kwargs(): nose.tools.assert_equal(filter_args(k, [], (1, 2), dict(ee=2)), {'*': [1, 2], '**': {'ee': 2}}) nose.tools.assert_equal(filter_args(k, [], (3, 4)), {'*': [3, 4], '**': {}}) def test_filter_args_2(): nose.tools.assert_equal(filter_args(j, [], (1, 2), dict(ee=2)), {'x': 1, 'y': 2, '**': {'ee': 2}}) nose.tools.assert_raises(ValueError, filter_args, f, 'a', (None, )) # Check that we capture an undefined argument nose.tools.assert_raises(ValueError, filter_args, f, ['a'], (None, )) ff = functools.partial(f, 1) # filter_args has to special-case partial nose.tools.assert_equal(filter_args(ff, [], (1, )), {'*': [1], '**': {}}) nose.tools.assert_equal(filter_args(ff, ['y'], (1, )), {'*': [1], '**': {}}) def test_func_name(): yield nose.tools.assert_equal, 'f', get_func_name(f)[1] # Check that we are not confused by the decoration yield nose.tools.assert_equal, 'g', get_func_name(g)[1] def test_func_inspect_errors(): # Check that func_inspect is robust and will work on weird objects nose.tools.assert_equal(get_func_name('a'.lower)[-1], 'lower') nose.tools.assert_equal(get_func_code('a'.lower)[1:], (None, -1)) ff = lambda x: x nose.tools.assert_equal(get_func_name(ff, win_characters=False)[-1], '') nose.tools.assert_equal(get_func_code(ff)[1], __file__.replace('.pyc', '.py')) # Simulate a function defined in __main__ ff.__module__ = '__main__' nose.tools.assert_equal(get_func_name(ff, win_characters=False)[-1], '') nose.tools.assert_equal(get_func_code(ff)[1], __file__.replace('.pyc', '.py')) def test_bound_methods(): """ Make sure that calling the same method on two different instances of the same class does resolv to different signatures. """ a = Klass() b = Klass() nose.tools.assert_not_equal(filter_args(a.f, [], (1, )), filter_args(b.f, [], (1, ))) def test_filter_args_error_msg(): """ Make sure that filter_args returns decent error messages, for the sake of the user. """ nose.tools.assert_raises(ValueError, filter_args, f, []) def test_clean_win_chars(): string = r'C:\foo\bar\main.py' mangled_string = _clean_win_chars(string) for char in ('\\', ':', '<', '>', '!'): nose.tools.assert_false(char in mangled_string) joblib-0.7.1/joblib/test/test_hashing.py000066400000000000000000000206011217450746300202420ustar00rootroot00000000000000""" Test the hashing module. """ # Author: Gael Varoquaux # Copyright (c) 2009 Gael Varoquaux # License: BSD Style, 3 clauses. import nose import time import hashlib import tempfile import os import gc import io import collections from ..hashing import hash from ..func_inspect import filter_args from ..memory import Memory from .common import np, with_numpy from .test_memory import env as test_memory_env from .test_memory import setup_module as test_memory_setup_func from .test_memory import teardown_module as test_memory_teardown_func try: # Python 2/Python 3 compat unicode('str') except NameError: unicode = lambda s: s ############################################################################### # Helper functions for the tests def time_func(func, *args): """ Time function func on *args. """ times = list() for _ in range(3): t1 = time.time() func(*args) times.append(time.time() - t1) return min(times) def relative_time(func1, func2, *args): """ Return the relative time between func1 and func2 applied on *args. """ time_func1 = time_func(func1, *args) time_func2 = time_func(func2, *args) relative_diff = 0.5 * (abs(time_func1 - time_func2) / (time_func1 + time_func2)) return relative_diff class Klass(object): def f(self, x): return x class KlassWithCachedMethod(object): def __init__(self): mem = Memory(cachedir=test_memory_env['dir']) self.f = mem.cache(self.f) def f(self, x): return x ############################################################################### # Tests def test_trival_hash(): """ Smoke test hash on various types. """ obj_list = [1, 2, 1., 2., 1 + 1j, 2. + 1j, 'a', 'b', (1, ), (1, 1, ), [1, ], [1, 1, ], {1: 1}, {1: 2}, {2: 1}, None, gc.collect, [1, ].append, ] for obj1 in obj_list: for obj2 in obj_list: # Check that 2 objects have the same hash only if they are # the same. yield nose.tools.assert_equal, hash(obj1) == hash(obj2), \ obj1 is obj2 def test_hash_methods(): # Check that hashing instance methods works a = io.StringIO(unicode('a')) nose.tools.assert_equal(hash(a.flush), hash(a.flush)) a1 = collections.deque(range(10)) a2 = collections.deque(range(9)) nose.tools.assert_not_equal(hash(a1.extend), hash(a2.extend)) @with_numpy def test_hash_numpy(): """ Test hashing with numpy arrays. """ rnd = np.random.RandomState(0) arr1 = rnd.random_sample((10, 10)) arr2 = arr1.copy() arr3 = arr2.copy() arr3[0] += 1 obj_list = (arr1, arr2, arr3) for obj1 in obj_list: for obj2 in obj_list: yield nose.tools.assert_equal, hash(obj1) == hash(obj2), \ np.all(obj1 == obj2) d1 = {1: arr1, 2: arr1} d2 = {1: arr2, 2: arr2} yield nose.tools.assert_equal, hash(d1), hash(d2) d3 = {1: arr2, 2: arr3} yield nose.tools.assert_not_equal, hash(d1), hash(d3) yield nose.tools.assert_not_equal, hash(arr1), hash(arr1.T) @with_numpy def test_hash_memmap(): """ Check that memmap and arrays hash identically if coerce_mmap is True. """ filename = tempfile.mktemp() try: m = np.memmap(filename, shape=(10, 10), mode='w+') a = np.asarray(m) for coerce_mmap in (False, True): yield (nose.tools.assert_equal, hash(a, coerce_mmap=coerce_mmap) == hash(m, coerce_mmap=coerce_mmap), coerce_mmap) finally: if 'm' in locals(): del m # Force a garbage-collection cycle, to be certain that the # object is delete, and we don't run in a problem under # Windows with a file handle still open. gc.collect() try: os.unlink(filename) except OSError as e: # Under windows, some files don't get erased. if not os.name == 'nt': raise e @with_numpy def test_hash_numpy_performance(): """ Check the performance of hashing numpy arrays: In [22]: a = np.random.random(1000000) In [23]: %timeit hashlib.md5(a).hexdigest() 100 loops, best of 3: 20.7 ms per loop In [24]: %timeit hashlib.md5(pickle.dumps(a, protocol=2)).hexdigest() 1 loops, best of 3: 73.1 ms per loop In [25]: %timeit hashlib.md5(cPickle.dumps(a, protocol=2)).hexdigest() 10 loops, best of 3: 53.9 ms per loop In [26]: %timeit hash(a) 100 loops, best of 3: 20.8 ms per loop """ rnd = np.random.RandomState(0) a = rnd.random_sample(1000000) if hasattr(np, 'getbuffer'): # Under python 3, there is no getbuffer getbuffer = np.getbuffer else: getbuffer = memoryview md5_hash = lambda x: hashlib.md5(getbuffer(x)).hexdigest() relative_diff = relative_time(md5_hash, hash, a) yield nose.tools.assert_true, relative_diff < 0.1 # Check that hashing an tuple of 3 arrays takes approximately # 3 times as much as hashing one array time_hashlib = 3 * time_func(md5_hash, a) time_hash = time_func(hash, (a, a, a)) relative_diff = 0.5 * (abs(time_hash - time_hashlib) / (time_hash + time_hashlib)) yield nose.tools.assert_true, relative_diff < 0.2 def test_bound_methods_hash(): """ Make sure that calling the same method on two different instances of the same class does resolve to the same hashes. """ a = Klass() b = Klass() nose.tools.assert_equal(hash(filter_args(a.f, [], (1, ))), hash(filter_args(b.f, [], (1, )))) @nose.tools.with_setup(test_memory_setup_func, test_memory_teardown_func) def test_bound_cached_methods_hash(): """ Make sure that calling the same _cached_ method on two different instances of the same class does resolve to the same hashes. """ a = KlassWithCachedMethod() b = KlassWithCachedMethod() nose.tools.assert_equal(hash(filter_args(a.f.func, [], (1, ))), hash(filter_args(b.f.func, [], (1, )))) @with_numpy def test_hash_object_dtype(): """ Make sure that ndarrays with dtype `object' hash correctly.""" a = np.array([np.arange(i) for i in range(6)], dtype=object) b = np.array([np.arange(i) for i in range(6)], dtype=object) nose.tools.assert_equal(hash(a), hash(b)) @with_numpy def test_numpy_scalar(): # Numpy scalars are built from compiled functions, and lead to # strange pickling paths explored, that can give hash collisions a = np.float64(2.0) b = np.float64(3.0) nose.tools.assert_not_equal(hash(a), hash(b)) def test_dict_hash(): # Check that dictionaries hash consistently, eventhough the ordering # of the keys is not garanteed k = KlassWithCachedMethod() d = {'#s12069__c_maps.nii.gz': [33], '#s12158__c_maps.nii.gz': [33], '#s12258__c_maps.nii.gz': [33], '#s12277__c_maps.nii.gz': [33], '#s12300__c_maps.nii.gz': [33], '#s12401__c_maps.nii.gz': [33], '#s12430__c_maps.nii.gz': [33], '#s13817__c_maps.nii.gz': [33], '#s13903__c_maps.nii.gz': [33], '#s13916__c_maps.nii.gz': [33], '#s13981__c_maps.nii.gz': [33], '#s13982__c_maps.nii.gz': [33], '#s13983__c_maps.nii.gz': [33]} a = k.f(d) b = k.f(a) nose.tools.assert_equal(hash(a), hash(b)) def test_set_hash(): # Check that sets hash consistently, eventhough their ordering # is not garanteed k = KlassWithCachedMethod() s = set(['#s12069__c_maps.nii.gz', '#s12158__c_maps.nii.gz', '#s12258__c_maps.nii.gz', '#s12277__c_maps.nii.gz', '#s12300__c_maps.nii.gz', '#s12401__c_maps.nii.gz', '#s12430__c_maps.nii.gz', '#s13817__c_maps.nii.gz', '#s13903__c_maps.nii.gz', '#s13916__c_maps.nii.gz', '#s13981__c_maps.nii.gz', '#s13982__c_maps.nii.gz', '#s13983__c_maps.nii.gz']) a = k.f(s) b = k.f(a) nose.tools.assert_equal(hash(a), hash(b)) joblib-0.7.1/joblib/test/test_logger.py000066400000000000000000000034321217450746300201030ustar00rootroot00000000000000""" Test the logger module. """ # Author: Gael Varoquaux # Copyright (c) 2009 Gael Varoquaux # License: BSD Style, 3 clauses. import shutil import os import sys import io from tempfile import mkdtemp import re from ..logger import PrintTime try: # Python 2/Python 3 compat unicode('str') except NameError: unicode = lambda s: s ############################################################################### # Test fixtures env = dict() def setup(): """ Test setup. """ cachedir = mkdtemp() if os.path.exists(cachedir): shutil.rmtree(cachedir) env['dir'] = cachedir def teardown(): """ Test teardown. """ #return True shutil.rmtree(env['dir']) ############################################################################### # Tests def test_print_time(): # A simple smoke test for PrintTime. try: orig_stderr = sys.stderr sys.stderr = io.StringIO() print_time = PrintTime(logfile=os.path.join(env['dir'], 'test.log')) print_time(unicode('Foo')) # Create a second time, to smoke test log rotation. print_time = PrintTime(logfile=os.path.join(env['dir'], 'test.log')) print_time(unicode('Foo')) # And a third time print_time = PrintTime(logfile=os.path.join(env['dir'], 'test.log')) print_time(unicode('Foo')) printed_text = sys.stderr.getvalue() # Use regexps to be robust to time variations match = r"Foo: 0\..s, 0\.0min\nFoo: 0\..s, 0.0min\nFoo: " + \ r".\..s, 0.0min\n" if not re.match(match, printed_text): raise AssertionError('Excepted %s, got %s' % (match, printed_text)) finally: sys.stderr = orig_stderr joblib-0.7.1/joblib/test/test_memory.py000066400000000000000000000320221217450746300201310ustar00rootroot00000000000000""" Test the memory module. """ # Author: Gael Varoquaux # Copyright (c) 2009 Gael Varoquaux # License: BSD Style, 3 clauses. import shutil import os from tempfile import mkdtemp import pickle import warnings import io import sys import nose from ..memory import Memory, MemorizedFunc from .common import with_numpy, np ############################################################################### # Module-level variables for the tests def f(x, y=1): """ A module-level function for testing purposes. """ return x ** 2 + y ############################################################################### # Test fixtures env = dict() def setup_module(): """ Test setup. """ cachedir = mkdtemp() #cachedir = 'foobar' env['dir'] = cachedir if os.path.exists(cachedir): shutil.rmtree(cachedir) # Don't make the cachedir, Memory should be able to do that on the fly print(80 * '_') print('test_memory setup') print(80 * '_') def _rmtree_onerror(func, path, excinfo): print('!' * 79) print('os function failed: %r' % func) print('file to be removed: %s' % path) print('exception was: %r' % excinfo[1]) print('!' * 79) def teardown_module(): """ Test teardown. """ shutil.rmtree(env['dir'], False, _rmtree_onerror) print(80 * '_') print('test_memory teardown') print(80 * '_') ############################################################################### # Helper function for the tests def check_identity_lazy(func, accumulator): """ Given a function and an accumulator (a list that grows every time the function is called, check that the function can be decorated by memory to be a lazy identity. """ # Call each function with several arguments, and check that it is # evaluated only once per argument. memory = Memory(cachedir=env['dir'], verbose=0) memory.clear(warn=False) func = memory.cache(func) for i in range(3): for _ in range(2): yield nose.tools.assert_equal, func(i), i yield nose.tools.assert_equal, len(accumulator), i + 1 ############################################################################### # Tests def test_memory_integration(): """ Simple test of memory lazy evaluation. """ accumulator = list() # Rmk: this function has the same name than a module-level function, # thus it serves as a test to see that both are identified # as different. def f(l): accumulator.append(1) return l for test in check_identity_lazy(f, accumulator): yield test # Now test clearing for compress in (False, True): # We turn verbosity on to smoke test the verbosity code, however, # we capture it, as it is ugly try: # To smoke-test verbosity, we capture stdout orig_stdout = sys.stdout orig_stderr = sys.stdout if sys.version_info[0] == 3: sys.stderr = io.StringIO() sys.stderr = io.StringIO() else: sys.stdout = io.BytesIO() sys.stderr = io.BytesIO() memory = Memory(cachedir=env['dir'], verbose=10, compress=compress) # First clear the cache directory, to check that our code can # handle that # NOTE: this line would raise an exception, as the database file is # still open; we ignore the error since we want to test what # happens if the directory disappears shutil.rmtree(env['dir'], ignore_errors=True) g = memory.cache(f) g(1) g.clear(warn=False) current_accumulator = len(accumulator) out = g(1) finally: sys.stdout = orig_stdout sys.stderr = orig_stderr yield nose.tools.assert_equal, len(accumulator), \ current_accumulator + 1 # Also, check that Memory.eval works similarly yield nose.tools.assert_equal, memory.eval(f, 1), out yield nose.tools.assert_equal, len(accumulator), \ current_accumulator + 1 # Now do a smoke test with a function defined in __main__, as the name # mangling rules are more complex f.__module__ = '__main__' memory = Memory(cachedir=env['dir'], verbose=0) memory.cache(f)(1) def test_no_memory(): """ Test memory with cachedir=None: no memoize """ accumulator = list() def ff(l): accumulator.append(1) return l mem = Memory(cachedir=None, verbose=0) gg = mem.cache(ff) for _ in range(4): current_accumulator = len(accumulator) gg(1) yield nose.tools.assert_equal, len(accumulator), \ current_accumulator + 1 def test_memory_kwarg(): " Test memory with a function with keyword arguments." accumulator = list() def g(l=None, m=1): accumulator.append(1) return l for test in check_identity_lazy(g, accumulator): yield test memory = Memory(cachedir=env['dir'], verbose=0) g = memory.cache(g) # Smoke test with an explicit keyword argument: nose.tools.assert_equal(g(l=30, m=2), 30) def test_memory_lambda(): " Test memory with a function with a lambda." accumulator = list() def helper(x): """ A helper function to define l as a lambda. """ accumulator.append(1) return x l = lambda x: helper(x) for test in check_identity_lazy(l, accumulator): yield test def test_memory_name_collision(): " Check that name collisions with functions will raise warnings" memory = Memory(cachedir=env['dir'], verbose=0) @memory.cache def name_collision(x): """ A first function called name_collision """ return x a = name_collision @memory.cache def name_collision(x): """ A second function called name_collision """ return x b = name_collision if not hasattr(warnings, 'catch_warnings'): # catch_warnings is new in Python 2.6 return with warnings.catch_warnings(record=True) as w: # Cause all warnings to always be triggered. warnings.simplefilter("always") a(1) b(1) yield nose.tools.assert_equal, len(w), 1 yield nose.tools.assert_true, "collision" in str(w[-1].message) def test_memory_warning_lambda_collisions(): # Check that multiple use of lambda will raise collisions memory = Memory(cachedir=env['dir'], verbose=0) # For isolation with other tests memory.clear() a = lambda x: x a = memory.cache(a) b = lambda x: x + 1 b = memory.cache(b) with warnings.catch_warnings(record=True) as w: # Cause all warnings to always be triggered. warnings.simplefilter("always") nose.tools.assert_equal(0, a(0)) nose.tools.assert_equal(2, b(1)) nose.tools.assert_equal(1, a(1)) # In recent Python versions, we can retrieve the code of lambdas, # thus nothing is raised nose.tools.assert_equal(len(w), 4) def test_memory_warning_collision_detection(): # Check that collisions impossible to detect will raise appropriate # warnings. memory = Memory(cachedir=env['dir'], verbose=0) # For isolation with other tests memory.clear() a1 = eval('lambda x: x') a1 = memory.cache(a1) b1 = eval('lambda x: x+1') b1 = memory.cache(b1) if not hasattr(warnings, 'catch_warnings'): # catch_warnings is new in Python 2.6 return with warnings.catch_warnings(record=True) as w: # Cause all warnings to always be triggered. warnings.simplefilter("always") a1(1) b1(1) a1(0) yield nose.tools.assert_equal, len(w), 2 yield nose.tools.assert_true, \ "cannot detect" in str(w[-1].message).lower() def test_memory_partial(): " Test memory with functools.partial." accumulator = list() def func(x, y): """ A helper function to define l as a lambda. """ accumulator.append(1) return y import functools function = functools.partial(func, 1) for test in check_identity_lazy(function, accumulator): yield test def test_memory_eval(): " Smoke test memory with a function with a function defined in an eval." memory = Memory(cachedir=env['dir'], verbose=0) m = eval('lambda x: x') mm = memory.cache(m) yield nose.tools.assert_equal, 1, mm(1) def count_and_append(x=[]): """ A function with a side effect in its arguments. Return the lenght of its argument and append one element. """ len_x = len(x) x.append(None) return len_x def test_argument_change(): """ Check that if a function has a side effect in its arguments, it should use the hash of changing arguments. """ mem = Memory(cachedir=env['dir'], verbose=0) func = mem.cache(count_and_append) # call the function for the first time, is should cache it with # argument x=[] assert func() == 0 # the second time the argument is x=[None], which is not cached # yet, so the functions should be called a second time assert func() == 1 @with_numpy def test_memory_numpy(): " Test memory with a function with numpy arrays." # Check with memmapping and without. for mmap_mode in (None, 'r'): accumulator = list() def n(l=None): accumulator.append(1) return l memory = Memory(cachedir=env['dir'], mmap_mode=mmap_mode, verbose=0) memory.clear(warn=False) cached_n = memory.cache(n) rnd = np.random.RandomState(0) for i in range(3): a = rnd.random_sample((10, 10)) for _ in range(3): yield nose.tools.assert_true, np.all(cached_n(a) == a) yield nose.tools.assert_equal, len(accumulator), i + 1 def test_memory_exception(): """ Smoketest the exception handling of Memory. """ memory = Memory(cachedir=env['dir'], verbose=0) class MyException(Exception): pass @memory.cache def h(exc=0): if exc: raise MyException # Call once, to initialise the cache h() for _ in range(3): # Call 3 times, to be sure that the Exception is always raised yield nose.tools.assert_raises, MyException, h, 1 def test_memory_ignore(): " Test the ignore feature of memory " memory = Memory(cachedir=env['dir'], verbose=0) accumulator = list() @memory.cache(ignore=['y']) def z(x, y=1): accumulator.append(1) yield nose.tools.assert_equal, z.ignore, ['y'] z(0, y=1) yield nose.tools.assert_equal, len(accumulator), 1 z(0, y=1) yield nose.tools.assert_equal, len(accumulator), 1 z(0, y=2) yield nose.tools.assert_equal, len(accumulator), 1 def test_func_dir(): # Test the creation of the memory cache directory for the function. memory = Memory(cachedir=env['dir'], verbose=0) path = __name__.split('.') path.append('f') path = os.path.join(env['dir'], 'joblib', *path) g = memory.cache(f) # Test that the function directory is created on demand yield nose.tools.assert_equal, g._get_func_dir(), path yield nose.tools.assert_true, os.path.exists(path) # Test that the code is stored. yield nose.tools.assert_false, \ g._check_previous_func_code() yield nose.tools.assert_true, \ os.path.exists(os.path.join(path, 'func_code.py')) yield nose.tools.assert_true, \ g._check_previous_func_code() # Test the robustness to failure of loading previous results. dir, _ = g.get_output_dir(1) a = g(1) yield nose.tools.assert_true, os.path.exists(dir) os.remove(os.path.join(dir, 'output.pkl')) yield nose.tools.assert_equal, a, g(1) def test_persistence(): # Test the memorized functions can be pickled and restored. memory = Memory(cachedir=env['dir'], verbose=0) g = memory.cache(f) output = g(1) h = pickle.loads(pickle.dumps(g)) output_dir, _ = g.get_output_dir(1) yield nose.tools.assert_equal, output, h.load_output(output_dir) memory2 = pickle.loads(pickle.dumps(memory)) yield nose.tools.assert_equal, memory.cachedir, memory2.cachedir # Smoke test that pickling a memory with cachedir=None works memory = Memory(cachedir=None, verbose=0) pickle.loads(pickle.dumps(memory)) def test_format_signature(): # Test the signature formatting. func = MemorizedFunc(f, cachedir=env['dir']) path, sgn = func.format_signature(f, list(range(10))) yield nose.tools.assert_equal, \ sgn, \ 'f([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])' path, sgn = func.format_signature(f, list(range(10)), y=list(range(10))) yield nose.tools.assert_equal, \ sgn, \ 'f([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], y=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9])' @with_numpy def test_format_signature_numpy(): """ Test the format signature formatting with numpy. """ joblib-0.7.1/joblib/test/test_my_exceptions.py000066400000000000000000000006711217450746300215140ustar00rootroot00000000000000""" Test my automatically generate exceptions """ from nose.tools import assert_true from .. import my_exceptions def test_inheritance(): assert_true(isinstance(my_exceptions.JoblibNameError(), NameError)) assert_true(isinstance(my_exceptions.JoblibNameError(), my_exceptions.JoblibException)) assert_true(my_exceptions.JoblibNameError is my_exceptions._mk_exception(NameError)[0]) joblib-0.7.1/joblib/test/test_numpy_pickle.py000066400000000000000000000165351217450746300213330ustar00rootroot00000000000000""" Test the numpy pickler as a replacement of the standard pickler. """ from tempfile import mkdtemp import copy import shutil import os import random import nose from .common import np, with_numpy # numpy_pickle is not a drop-in replacement of pickle, as it takes # filenames instead of open files as arguments. from .. import numpy_pickle ############################################################################### # Define a list of standard types. # Borrowed from dill, initial author: Micheal McKerns: # http://dev.danse.us/trac/pathos/browser/dill/dill_test2.py typelist = [] # testing types _none = None typelist.append(_none) _type = type typelist.append(_type) _bool = bool(1) typelist.append(_bool) _int = int(1) typelist.append(_int) try: _long = long(1) typelist.append(_long) except NameError: # long is not defined in python 3 pass _float = float(1) typelist.append(_float) _complex = complex(1) typelist.append(_complex) _string = str(1) typelist.append(_string) try: _unicode = unicode(1) typelist.append(_unicode) except NameError: # unicode is not defined in python 3 pass _tuple = () typelist.append(_tuple) _list = [] typelist.append(_list) _dict = {} typelist.append(_dict) try: _file = file typelist.append(_file) except NameError: pass # file does not exists in Python 3 try: _buffer = buffer typelist.append(_buffer) except NameError: # buffer does not exists in Python 3 pass _builtin = len typelist.append(_builtin) def _function(x): yield x class _class: def _method(self): pass class _newclass(object): def _method(self): pass typelist.append(_function) typelist.append(_class) typelist.append(_newclass) # _instance = _class() typelist.append(_instance) _object = _newclass() typelist.append(_object) # ############################################################################### # Test fixtures env = dict() def setup_module(): """ Test setup. """ env['dir'] = mkdtemp() env['filename'] = os.path.join(env['dir'], 'test.pkl') print(80 * '_') print('setup numpy_pickle') print(80 * '_') def teardown_module(): """ Test teardown. """ shutil.rmtree(env['dir']) #del env['dir'] #del env['filename'] print(80 * '_') print('teardown numpy_pickle') print(80 * '_') ############################################################################### # Tests def test_standard_types(): # Test pickling and saving with standard types. filename = env['filename'] for compress in [0, 1]: for member in typelist: # Change the file name to avoid side effects between tests this_filename = filename + str(random.randint(0, 1000)) numpy_pickle.dump(member, this_filename, compress=compress) _member = numpy_pickle.load(this_filename) # We compare the pickled instance to the reloaded one only if it # can be compared to a copied one if member == copy.deepcopy(member): yield nose.tools.assert_equal, member, _member def test_value_error(): # Test inverting the input arguments to dump nose.tools.assert_raises(ValueError, numpy_pickle.dump, 'foo', dict()) @with_numpy def test_numpy_persistence(): filename = env['filename'] rnd = np.random.RandomState(0) a = rnd.random_sample((10, 2)) for compress, cache_size in ((0, 0), (1, 0), (1, 10)): # We use 'a.T' to have a non C-contiguous array. for index, obj in enumerate(((a,), (a.T,), (a, a), [a, a, a])): # Change the file name to avoid side effects between tests this_filename = filename + str(random.randint(0, 1000)) filenames = numpy_pickle.dump(obj, this_filename, compress=compress, cache_size=cache_size) # Check that one file was created per array if not compress: nose.tools.assert_equal(len(filenames), len(obj) + 1) # Check that these files do exist for file in filenames: nose.tools.assert_true( os.path.exists(os.path.join(env['dir'], file))) # Unpickle the object obj_ = numpy_pickle.load(this_filename) # Check that the items are indeed arrays for item in obj_: nose.tools.assert_true(isinstance(item, np.ndarray)) # And finally, check that all the values are equal. nose.tools.assert_true(np.all(np.array(obj) == np.array(obj_))) # Now test with array subclasses for obj in ( np.matrix(np.zeros(10)), np.core.multiarray._reconstruct(np.memmap, (), np.float) ): this_filename = filename + str(random.randint(0, 1000)) filenames = numpy_pickle.dump(obj, this_filename, compress=compress, cache_size=cache_size) obj_ = numpy_pickle.load(this_filename) if (type(obj) is not np.memmap and hasattr(obj, '__array_prepare__')): # We don't reconstruct memmaps nose.tools.assert_true(isinstance(obj_, type(obj))) # Finally smoke test the warning in case of compress + mmap_mode this_filename = filename + str(random.randint(0, 1000)) numpy_pickle.dump(a, this_filename, compress=1) numpy_pickle.load(this_filename, mmap_mode='r') @with_numpy def test_memmap_persistence(): rnd = np.random.RandomState(0) a = rnd.random_sample(10) filename = env['filename'] + str(random.randint(0, 1000)) numpy_pickle.dump(a, filename) b = numpy_pickle.load(filename, mmap_mode='r') if np.__version__ >= '1.3': nose.tools.assert_true(isinstance(b, np.memmap)) @with_numpy def test_masked_array_persistence(): # The special-case picker fails, because saving masked_array # not implemented, but it just delegates to the standard pickler. rnd = np.random.RandomState(0) a = rnd.random_sample(10) a = np.ma.masked_greater(a, 0.5) filename = env['filename'] + str(random.randint(0, 1000)) numpy_pickle.dump(a, filename) b = numpy_pickle.load(filename, mmap_mode='r') nose.tools.assert_true(isinstance(b, np.ma.masked_array)) def test_z_file(): # Test saving and loading data with Zfiles filename = env['filename'] + str(random.randint(0, 1000)) data = numpy_pickle.asbytes('Foo, \n Bar, baz, \n\nfoobar') numpy_pickle.write_zfile(open(filename, 'wb'), data) data_read = numpy_pickle.read_zfile(open(filename, 'rb')) nose.tools.assert_equal(data, data_read) ################################################################################ # Test dumping array subclasses if np is not None: class SubArray(np.ndarray): def __reduce__(self): return (_load_sub_array, (np.asarray(self), )) def _load_sub_array(arr): d = SubArray(arr.shape) d[:] = arr return d @with_numpy def test_numpy_subclass(): filename = env['filename'] a = SubArray((10,)) numpy_pickle.dump(a, filename) c = numpy_pickle.load(filename) nose.tools.assert_true(isinstance(c, SubArray)) joblib-0.7.1/joblib/test/test_parallel.py000066400000000000000000000177451217450746300204340ustar00rootroot00000000000000""" Test the parallel module. """ # Author: Gael Varoquaux # Copyright (c) 2010-2011 Gael Varoquaux # License: BSD Style, 3 clauses. import time import sys import io import os try: import cPickle as pickle PickleError = TypeError except: import pickle PickleError = pickle.PicklingError if sys.version_info[0] == 3: PickleError = pickle.PicklingError try: # Python 2/Python 3 compat unicode('str') except NameError: unicode = lambda s: s from ..parallel import Parallel, delayed, SafeFunction, WorkerInterrupt, \ multiprocessing, cpu_count from ..my_exceptions import JoblibException import nose ############################################################################### def division(x, y): return x / y def square(x): return x ** 2 def exception_raiser(x): if x == 7: raise ValueError return x def interrupt_raiser(x): time.sleep(.05) raise KeyboardInterrupt def f(x, y=0, z=0): """ A module-level function so that it can be spawn with multiprocessing. """ return x ** 2 + y + z ############################################################################### def test_cpu_count(): assert cpu_count() > 0 ############################################################################### # Test parallel def test_simple_parallel(): X = range(5) for n_jobs in (1, 2, -1, -2): yield (nose.tools.assert_equal, [square(x) for x in X], Parallel(n_jobs=-1)( delayed(square)(x) for x in X)) try: # To smoke-test verbosity, we capture stdout orig_stdout = sys.stdout orig_stderr = sys.stdout if sys.version_info[0] == 3: sys.stderr = io.StringIO() sys.stderr = io.StringIO() else: sys.stdout = io.BytesIO() sys.stderr = io.BytesIO() for verbose in (2, 11, 100): Parallel(n_jobs=-1, verbose=verbose)( delayed(square)(x) for x in X) Parallel(n_jobs=1, verbose=verbose)( delayed(square)(x) for x in X) Parallel(n_jobs=2, verbose=verbose, pre_dispatch=2)( delayed(square)(x) for x in X) except Exception as e: my_stdout = sys.stdout my_stderr = sys.stderr sys.stdout = orig_stdout sys.stderr = orig_stderr print(unicode(my_stdout.getvalue())) print(unicode(my_stderr.getvalue())) raise e finally: sys.stdout = orig_stdout sys.stderr = orig_stderr def nested_loop(): Parallel(n_jobs=2)(delayed(square)(.01) for _ in range(2)) def test_nested_loop(): Parallel(n_jobs=2)(delayed(nested_loop)() for _ in range(2)) def test_parallel_kwargs(): """ Check the keyword argument processing of pmap. """ lst = range(10) for n_jobs in (1, 4): yield (nose.tools.assert_equal, [f(x, y=1) for x in lst], Parallel(n_jobs=n_jobs)(delayed(f)(x, y=1) for x in lst) ) def test_parallel_pickling(): """ Check that pmap captures the errors when it is passed an object that cannot be pickled. """ def g(x): return x ** 2 nose.tools.assert_raises(PickleError, Parallel(), (delayed(g)(x) for x in range(10)) ) def test_error_capture(): # Check that error are captured, and that correct exceptions # are raised. if multiprocessing is not None: # A JoblibException will be raised only if there is indeed # multiprocessing nose.tools.assert_raises(JoblibException, Parallel(n_jobs=2), [delayed(division)(x, y) for x, y in zip((0, 1), (1, 0))], ) nose.tools.assert_raises(WorkerInterrupt, Parallel(n_jobs=2), [delayed(interrupt_raiser)(x) for x in (1, 0)], ) else: nose.tools.assert_raises(KeyboardInterrupt, Parallel(n_jobs=2), [delayed(interrupt_raiser)(x) for x in (1, 0)], ) nose.tools.assert_raises(ZeroDivisionError, Parallel(n_jobs=2), [delayed(division)(x, y) for x, y in zip((0, 1), (1, 0))], ) try: ex = JoblibException() Parallel(n_jobs=1)( delayed(division)(x, y) for x, y in zip((0, 1), (1, 0))) except Exception: # Cannot use 'except as' to maintain Python 2.5 compatibility ex = sys.exc_info()[1] nose.tools.assert_false(isinstance(ex, JoblibException)) class Counter(object): def __init__(self, list1, list2): self.list1 = list1 self.list2 = list2 def __call__(self, i): self.list1.append(i) nose.tools.assert_equal(len(self.list1), len(self.list2)) def consumer(queue, item): queue.append('Consumed %s' % item) def test_dispatch_one_job(): """ Test that with only one job, Parallel does act as a iterator. """ queue = list() def producer(): for i in range(6): queue.append('Produced %i' % i) yield i Parallel(n_jobs=1)(delayed(consumer)(queue, x) for x in producer()) nose.tools.assert_equal(queue, ['Produced 0', 'Consumed 0', 'Produced 1', 'Consumed 1', 'Produced 2', 'Consumed 2', 'Produced 3', 'Consumed 3', 'Produced 4', 'Consumed 4', 'Produced 5', 'Consumed 5'] ) nose.tools.assert_equal(len(queue), 12) def test_dispatch_multiprocessing(): """ Check that using pre_dispatch Parallel does indeed dispatch items lazily. """ if multiprocessing is None: raise nose.SkipTest() manager = multiprocessing.Manager() queue = manager.list() def producer(): for i in range(6): queue.append('Produced %i' % i) yield i Parallel(n_jobs=2, pre_dispatch=3)(delayed(consumer)(queue, i) for i in producer()) nose.tools.assert_equal(list(queue)[:4], ['Produced 0', 'Produced 1', 'Produced 2', 'Consumed 0', ]) nose.tools.assert_equal(len(queue), 12) def test_exception_dispatch(): "Make sure that exception raised during dispatch are indeed captured" nose.tools.assert_raises( ValueError, Parallel(n_jobs=6, pre_dispatch=16, verbose=0), (delayed(exception_raiser)(i) for i in range(30)), ) def _reload_joblib(): # Retrieve the path of the parallel module in a robust way joblib_path = Parallel.__module__.split(os.sep) joblib_path = joblib_path[:1] joblib_path.append('parallel.py') joblib_path = '/'.join(joblib_path) module = __import__(joblib_path) # Reload the module. This should trigger a fail reload(module) def test_multiple_spawning(): # Test that attempting to launch a new Python after spawned # subprocesses will raise an error, to avoid infinite loops on # systems that do not support fork if not int(os.environ.get('JOBLIB_MULTIPROCESSING', 1)): raise nose.SkipTest() nose.tools.assert_raises(ImportError, Parallel(n_jobs=2), [delayed(_reload_joblib)() for i in range(10)]) ############################################################################### # Test helpers def test_joblib_exception(): # Smoke-test the custom exception e = JoblibException('foobar') # Test the repr repr(e) # Test the pickle pickle.dumps(e) def test_safe_function(): safe_division = SafeFunction(division) nose.tools.assert_raises(JoblibException, safe_division, 1, 0) joblib-0.7.1/joblib/testing.py000066400000000000000000000006321217450746300162620ustar00rootroot00000000000000""" Helper for testing. """ import sys import warnings import os.path def warnings_to_stdout(): """ Redirect all warnings to stdout. """ showwarning_orig = warnings.showwarning def showwarning(msg, cat, fname, lno, file=None, line=0): showwarning_orig(msg, cat, os.path.basename(fname), line, sys.stdout) warnings.showwarning = showwarning #warnings.simplefilter('always') joblib-0.7.1/setup.cfg000066400000000000000000000010321217450746300146060ustar00rootroot00000000000000[aliases] release = egg_info -RDb '' # Make sure the sphinx docs are built each time we do a dist. bdist = build_sphinx bdist sdist = build_sphinx sdist # Make sure a zip file is created each time we build the sphinx docs build_sphinx = generate_help build_sphinx zip_help # Make sure the docs are uploaded when we do an upload upload = upload upload_help [bdist_rpm] doc-files = doc [nosetests] verbosity = 2 detailed-errors = 1 with-coverage = 1 cover-package = joblib #pdb = 1 #pdb-failures = 1 with-doctest=1 doctest-extension=rst joblib-0.7.1/setup.py000077500000000000000000000043041217450746300145070ustar00rootroot00000000000000#!/usr/bin/env python from distutils.core import setup import sys import joblib # For some commands, use setuptools if len(set(('develop', 'sdist', 'release', 'bdist_egg', 'bdist_rpm', 'bdist', 'bdist_dumb', 'bdist_wininst', 'install_egg_info', 'build_sphinx', 'egg_info', 'easy_install', 'upload', )).intersection(sys.argv)) > 0: from setupegg import extra_setuptools_args # extra_setuptools_args is injected by the setupegg.py script, for # running the setup with setuptools. if not 'extra_setuptools_args' in globals(): extra_setuptools_args = dict() # if nose available, provide test command try: from nose.commands import nosetests cmdclass = extra_setuptools_args.pop('cmdclass', {}) cmdclass['test'] = nosetests cmdclass['nosetests'] = nosetests extra_setuptools_args['cmdclass'] = cmdclass except ImportError: pass setup(name='joblib', version=joblib.__version__, summary='Tools to use Python functions as pipeline jobs.', author='Gael Varoquaux', author_email='gael.varoquaux@normalesup.org', url='http://packages.python.org/joblib/', description=""" Lightweight pipelining: using Python functions as pipeline jobs. """, long_description=joblib.__doc__, license='BSD', classifiers=[ 'Development Status :: 5 - Production/Stable', 'Environment :: Console', 'Intended Audience :: Developers', 'Intended Audience :: Science/Research', 'Intended Audience :: Education', 'License :: OSI Approved :: BSD License', 'Operating System :: OS Independent', 'Programming Language :: Python :: 2.6', 'Programming Language :: Python :: 2.7', 'Programming Language :: Python :: 3', 'Programming Language :: Python :: 3.0', 'Programming Language :: Python :: 3.1', 'Programming Language :: Python :: 3.2', 'Topic :: Scientific/Engineering', 'Topic :: Utilities', 'Topic :: Software Development :: Libraries', ], platforms='any', #package_data={'joblib': ['joblib/*.rst'],}, packages=['joblib', 'joblib.test'], **extra_setuptools_args) joblib-0.7.1/setupegg.py000077500000000000000000000061171217450746300151760ustar00rootroot00000000000000#!/usr/bin/env python """Wrapper to run setup.py using setuptools.""" import zipfile import os import sys from setuptools import Command from sphinx_pypi_upload import UploadDoc ############################################################################### # Code to copy the sphinx-generated html docs in the distribution. DOC_BUILD_DIR = os.path.join('build', 'sphinx', 'html') def relative_path(filename): """ Return the relative path to the file, assuming the file is in the DOC_BUILD_DIR directory. """ length = len(os.path.abspath(DOC_BUILD_DIR)) + 1 return os.path.abspath(filename)[length:] class ZipHelp(Command): description = "zip the help created by the build_sphinx, " + \ "and put it in the source distribution. " user_options = [ ('None', None, 'this command has no options'), ] def run(self): if not os.path.exists(DOC_BUILD_DIR): raise OSError('Doc directory does not exist.') target_file = os.path.join('doc', 'documentation.zip') # ZIP_DEFLATED actually compresses the archive. However, there # will be a RuntimeError if zlib is not installed, so we check # for it. ZIP_STORED produces an uncompressed zip, but does not # require zlib. try: zf = zipfile.ZipFile(target_file, 'w', compression=zipfile.ZIP_DEFLATED) except RuntimeError: zf = zipfile.ZipFile(target_file, 'w', compression=zipfile.ZIP_STORED) for root, dirs, files in os.walk(DOC_BUILD_DIR): relative = relative_path(root) if not relative.startswith('.doctrees'): for f in files: zf.write(os.path.join(root, f), os.path.join(relative, f)) zf.close() def initialize_options(self): pass def finalize_options(self): pass class GenerateHelp(Command): description = " Generate the autosummary files " user_options = [ ('None', None, 'this command has no options'), ] def run(self): os.system( \ "%s doc/sphinxext/autosummary_generate.py " % sys.executable + \ "-o doc/generated/ doc/*.rst") def initialize_options(self): pass def finalize_options(self): pass ############################################################################### # Call the setup.py script, injecting the setuptools-specific arguments. extra_setuptools_args = dict( tests_require=['nose', 'coverage'], test_suite='nose.collector', cmdclass={'zip_help': ZipHelp, 'generate_help': GenerateHelp, 'upload_help': UploadDoc}, zip_safe=False, ) if __name__ == '__main__': execfile('setup.py', dict(__name__='__main__', extra_setuptools_args=extra_setuptools_args)) joblib-0.7.1/sphinx_pypi_upload.py000066400000000000000000000117011217450746300172610ustar00rootroot00000000000000# -*- coding: utf-8 -*- """ sphinx_pypi_upload ~~~~~~~~~~~~~~~~~~ setuptools command for uploading Sphinx documentation to PyPI :author: Jannis Leidel :contact: jannis@leidel.info :copyright: Copyright 2009, Jannis Leidel. :license: BSD, see LICENSE for details. Modified for joblib by Gael Varoquaux """ import sys import os import socket try: import httplib import urlparse from cStringIO import StringIO as BytesIO except ImportError: # Python3k import http as httplib from urllib import parse as urlparse from io import BytesIO import base64 from distutils import log from distutils.command.upload import upload class UploadDoc(upload): """Distutils command to upload Sphinx documentation.""" description = 'Upload Sphinx documentation to PyPI' user_options = [ ('repository=', 'r', "url of repository [default: %s]" % upload.DEFAULT_REPOSITORY), ('show-response', None, 'display full response text from server'), ('upload-file=', None, 'file to upload'), ] boolean_options = upload.boolean_options def initialize_options(self): upload.initialize_options(self) self.upload_file = None def finalize_options(self): upload.finalize_options(self) if self.upload_file is None: self.upload_file = 'doc/documentation.zip' self.announce('Using upload file %s' % self.upload_file) def upload(self, filename): content = open(filename, 'rb').read() meta = self.distribution.metadata data = { ':action': 'doc_upload', 'name': meta.get_name(), 'content': (os.path.basename(filename), content), } # set up the authentication auth = "Basic " + base64.encodestring(self.username + ":" + \ self.password).strip() # Build up the MIME payload for the POST data boundary = '--------------GHSKFJDLGDS7543FJKLFHRE75642756743254' sep_boundary = '\n--' + boundary end_boundary = sep_boundary + '--' body = BytesIO() for key, value in data.items(): # handle multiple entries for the same name if type(value) != type([]): value = [value] for value in value: if type(value) is tuple: fn = ';filename="%s"' % value[0] value = value[1] else: fn = "" value = str(value) body.write(sep_boundary) body.write('\nContent-Disposition: form-data; name="%s"' % key) body.write(fn) body.write("\n\n") body.write(value) if value and value[-1] == '\r': body.write('\n') # write an extra newline (lurve Macs) body.write(end_boundary) body.write("\n") body = body.getvalue() self.announce("Submitting documentation to %s" % (self.repository), log.INFO) # build the Request # We can't use urllib2 since we need to send the Basic # auth right with the first request schema, netloc, url, params, query, fragments = \ urlparse.urlparse(self.repository) assert not params and not query and not fragments if schema == 'http': http = httplib.HTTPConnection(netloc) elif schema == 'https': http = httplib.HTTPSConnection(netloc) else: raise AssertionError("unsupported schema " + schema) data = '' loglevel = log.INFO try: http.connect() http.putrequest("POST", url) http.putheader('Content-type', 'multipart/form-data; boundary=%s' % boundary) http.putheader('Content-length', str(len(body))) http.putheader('Authorization', auth) http.endheaders() http.send(body) except socket.error: # Cannot use 'except as' to maintain Python 2.5 compatibility e = sys.exc_info()[1] self.announce(str(e), log.ERROR) return response = http.getresponse() if response.status == 200: self.announce('Server response (%s): %s' % (response.status, response.reason), log.INFO) elif response.status == 301: location = response.getheader('Location') if location is None: location = 'http://packages.python.org/%s/' % meta.get_name() self.announce('Upload successful. Visit %s' % location, log.INFO) else: self.announce('Upload failed (%s): %s' % \ (response.status, response.reason), log.ERROR) if self.show_response: print('-' * 75 + response.read() + '-' * 75) def run(self): zip_file = self.upload_file self.upload(zip_file)