pax_global_header00006660000000000000000000000064145741424070014522gustar00rootroot0000000000000052 comment=318facd88412e23feb3f6faa27a5302f956cdb72 petl-1.7.15/000077500000000000000000000000001457414240700125615ustar00rootroot00000000000000petl-1.7.15/.coveragerc000066400000000000000000000001271457414240700147020ustar00rootroot00000000000000[report] exclude_lines = pragma: no cover pragma: ${PY_MAJOR_VERSION} no cover petl-1.7.15/.github/000077500000000000000000000000001457414240700141215ustar00rootroot00000000000000petl-1.7.15/.github/CONTRIBUTING.md000066400000000000000000000002461457414240700163540ustar00rootroot00000000000000Contributing ============ Please see the [project documentation](http://petl.readthedocs.io/en/stable/contributing.html) for information about contributing to petl. petl-1.7.15/.github/ISSUE_TEMPLATE/000077500000000000000000000000001457414240700163045ustar00rootroot00000000000000petl-1.7.15/.github/ISSUE_TEMPLATE/bug_report.md000066400000000000000000000025671457414240700210100ustar00rootroot00000000000000--- name: Bug report about: Create a report to help us improve petl title: '' labels: 'Bug' assignees: '' --- ## Problem description ### What's happenning A clear and concise description of what the bug is. Please explain: - what the current output/behavior is - what's the bug is preventing you from doing - tell if it had worked before but regressed and stopped working ### Expected behavior A clear and concise description of intended behavior. Please explain: - what you expected to happen. - how the current output/behavior doesn't match the intended behavior. ## Scenario for reprodution ### Reproducible test case Please provide a minimal, reproducible code sample, a copy-pastable example if possible: ```python # Your code here ``` ### Version and installation information Please provide the following: - Value of ``petl.__version__`` - Version information for any third-party package dependencies that are relevant - Version of Python interpreter - Operating system (Linux/Windows/Mac) - How petl was installed (e.g., "using pip into virtual environment", or "using conda") Also, if you think it might be relevant, please provide the output from ``pip freeze`` or ``conda env export`` depending on which was used to install petl. ### Additional context Add any other context about the problem here. Also, feel free to remove all sections and text that aren't relevant. petl-1.7.15/.github/ISSUE_TEMPLATE/feature_request.md000066400000000000000000000020441457414240700220310ustar00rootroot00000000000000--- name: Feature request about: Suggest an idea for this project title: '' labels: 'Feature' assignees: '' --- ## Problem description ### Is your feature request related to a problem? A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] ## Change description ### Describe the solution you'd like A clear and concise description of what you want to happen. Whenever relevant, please provide a code sample, of what would be the syntax, the way you meant to use: ```python # Your code here ``` ### Advantages Explain why the current behavior is a problem, what the expected output/behavior is, and why the expected output/behavior is a better solution. ### Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered. ## Additional context Add any other context or information about the feature request here. Please feel free to use whatever template makes sense. Also, feel free to remove all sections and text that aren't relevant. petl-1.7.15/.github/PULL_REQUEST_TEMPLATE.md000066400000000000000000000025611457414240700177260ustar00rootroot00000000000000 This PR has the objective of . ## Changes 1. Added new feature for... 2. Fixed a bug in... 3. Changed the behavior of... 4. Improved the docs about... ## Checklist Use this checklist to ensure the quality of pull requests that include new code and/or make changes to existing code. * [ ] Source Code guidelines: * [ ] Includes unit tests * [ ] New functions have docstrings with examples that can be run with doctest * [ ] New functions are included in API docs * [ ] Docstrings include notes for any changes to API or behavior * [ ] All changes are documented in docs/changes.rst * [ ] Versioning and history tracking guidelines: * [ ] Using atomic commits whenever possible * [ ] Commits are reversible whenever possible * [ ] There are no incomplete changes in the pull request * [ ] There is no accidental garbage added to the source code * [ ] Testing guidelines: * [ ] Tested locally using `tox` / `pytest` * [ ] Rebased to `master` branch and tested before sending the PR * [ ] Automated testing passes (see [CI](https://github.com/petl-developers/petl/actions)) * [ ] Unit test coverage has not decreased (see [Coveralls](https://coveralls.io/github/petl-developers/petl)) * [ ] State of these changes is: * [ ] Just a proof of concept * [ ] Work in progress / Further changes needed * [ ] Ready to review * [ ] Ready to merge petl-1.7.15/.github/workflows/000077500000000000000000000000001457414240700161565ustar00rootroot00000000000000petl-1.7.15/.github/workflows/codacy-analysis.yml000066400000000000000000000043201457414240700217630ustar00rootroot00000000000000# Codacy is an automated code review tool that makes it easy to ensure your team is writing high-quality code # This workflow checks out code, performs a Codacy security scan and integrates # the results with the GitHub Advanced Security code scanning feature. # The following scenario is implemented: # - Integration with GitHub code scanning: # Analyzes each commit and pull request and uploads the results to GitHub, # which displays the identified issues under your repository's tab Security. # For more information on the Codacy security scan action usage, see: # - https://github.com/marketplace/actions/codacy-analysis-cli # - https://github.com/codacy/codacy-analysis-cli-action # For more information on Codacy Analysis CLI in general, see # https://github.com/codacy/codacy-analysis-cli. name: Codacy Security Scan on: push: branches: [ master, main ] pull_request: branches: [ master, main ] jobs: codacy-security-scan: name: Codacy Security Scan runs-on: ubuntu-latest steps: # Checkout the repository to the GitHub Actions runner - name: Checkout code uses: actions/checkout@main # Execute Codacy Analysis CLI and generate a SARIF output with the security issues identified during the analysis - name: Run Codacy Analysis CLI uses: codacy/codacy-analysis-cli-action@master with: # To get your project token from your Codacy repository check: # https://github.com/codacy/codacy-analysis-cli#project-token # You can also omit the token and run the tools that support default configurations project-token: ${{ secrets.CODACY_PROJECT_TOKEN }} verbose: true output: results.sarif format: sarif # Adjust severity of non-security issues gh-code-scanning-compat: true # Force 0 exit code to allow SARIF file generation # This will handover control about PR rejection to the GitHub side max-allowed-issues: 2147483647 # Upload the SARIF file generated in the previous step - name: Upload SARIF results file uses: github/codeql-action/upload-sarif@main with: sarif_file: results.sarif # end of file # petl-1.7.15/.github/workflows/codeql-analysis.yml000066400000000000000000000044731457414240700220010ustar00rootroot00000000000000# For most projects, this workflow file will not need changing; you simply need # to commit it to your repository. # # You may wish to alter this file to override the set of languages analyzed, # or to provide custom queries or build logic. # # ******** NOTE ******** # We have attempted to detect the languages in your repository. Please check # the `language` matrix defined below to confirm you have the correct set of # supported CodeQL languages. # name: "CodeQL" on: push: branches: [ master ] pull_request: # The branches below must be a subset of the branches above branches: [ master ] schedule: - cron: '44 10 * * 0' jobs: analyze: name: Analyze runs-on: ubuntu-latest strategy: fail-fast: false matrix: language: [ 'python' ] # CodeQL supports [ 'cpp', 'csharp', 'go', 'java', 'javascript', 'python' ] # Learn more: # https://docs.github.com/en/free-pro-team@latest/github/finding-security-vulnerabilities-and-errors-in-your-code/configuring-code-scanning#changing-the-languages-that-are-analyzed steps: - name: Checkout repository uses: actions/checkout@main # Initializes the CodeQL tools for scanning. - name: Initialize CodeQL uses: github/codeql-action/init@v2 with: languages: ${{ matrix.language }} # If you wish to specify custom queries, you can do so here or in a config file. # By default, queries listed here will override any specified in a config file. # Prefix the list here with "+" to use these queries and those in the config file. # queries: ./path/to/local/query, your-org/your-repo/queries@main # Autobuild attempts to build any compiled languages (C/C++, C#, or Java). # If this step fails, then you should remove it and run the build manually (see below) - name: Autobuild uses: github/codeql-action/autobuild@v2 # ℹ️ Command-line programs to run using the OS shell. # 📚 https://git.io/JvXDl # ✏️ If the Autobuild fails above, remove it and uncomment the following three lines # and modify them (or add more) to build your code if your project # uses a compiled language #- run: | # make bootstrap # make release - name: Perform CodeQL Analysis uses: github/codeql-action/analyze@v2 petl-1.7.15/.github/workflows/publish-release.yml000066400000000000000000000015301457414240700217640ustar00rootroot00000000000000name: release on: release: types: [published] jobs: pypi: runs-on: ubuntu-latest steps: - name: Checkout source code uses: actions/checkout@main - name: Set up Python ${{ matrix.python }} uses: actions/setup-python@v4 with: python-version: ${{ matrix.python }} - name: Install pypa/build run: | python -m pip install build --user - name: Build the petl package as source tarball run: | # python setup.py sdist python -m build --sdist --outdir dist/ . - name: Publish the package version ${{ github.event.release.tag_name }} to PyPI if: startsWith(github.ref, 'refs/tags') uses: pypa/gh-action-pypi-publish@release/v1 with: password: ${{ secrets.PYPI_API_TOKEN }} print_hash: true petl-1.7.15/.github/workflows/test-changes.yml000066400000000000000000000155731457414240700213010ustar00rootroot00000000000000# This workflow will install Python dependencies, run tests and lint with a variety of Python versions and Operating Systems # For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions name: Test Changes on: [push, pull_request] jobs: run-guard: # it succeeds if any of the following conditions are met: # - when the PR is not a draft and is not labeled 'prevent-ci' # - when the PR is labeled 'force-ci' runs-on: ubuntu-latest if: | ( (!github.event.pull_request.draft) && (github.event.action != 'labeled') && (!contains( github.event.pull_request.labels.*.name, 'prevent-ci')) ) || ((github.event.action != 'labeled') && contains( github.event.pull_request.labels.*.name, 'force-ci')) || (github.event.label.name == 'force-ci') steps: - name: Checking if CI shoud run for this push/PR... run: echo Resuming CI. Continuing next jobs... test-source-code: needs: run-guard strategy: fail-fast: false matrix: os: [ "ubuntu-latest", "windows-latest", "macos-latest" ] python: ['3.6', '3.7', '3.8', '3.9', '3.10', '3.11'] include: - python: '2.7' os: "ubuntu-latest" runs-on: "${{ matrix.os }}" env: testing: simple python_eol: no steps: - name: Determine what scope of testing is available on ${{ matrix.os }} if: | (matrix.python >= '3.' ) && ( matrix.os != 'windows-latest' ) run: | echo 'testing=full' >> $GITHUB_ENV - name: Determine if the python ${{ matrix.python }} is available on ${{ matrix.os }} if: | (matrix.python < '3.' ) || ( matrix.python == '3.6' && matrix.os == 'ubuntu-latest' ) run: | echo 'python_eol=yes' >> $GITHUB_ENV - name: Checkout source code uses: actions/checkout@main - name: Install linux tools if: matrix.os == 'ubuntu-latest' run: | sudo apt-get update sudo apt-get install -y --no-install-recommends python3-h5py - name: Set up Python ${{ matrix.python }} if: env.python_eol == 'no' uses: actions/setup-python@v4 with: python-version: ${{ matrix.python }} - name: Set up Python ${{ matrix.python }} discontinued on ${{ matrix.os }} if: env.python_eol == 'yes' uses: MatteoH2O1999/setup-python@v1 with: python-version: ${{ matrix.python }} cache: pip - name: Install test dependencies run: | python -m pip install --upgrade pip python -m pip install --prefer-binary -r requirements-tests.txt - name: Setup environment variables for remote filesystem testing if: matrix.os == 'ubuntu-latest' && matrix.python == '3.8' run: | echo "Setup SMB environment variable to trigger testing in Petl" echo 'PETL_TEST_SMB=smb://WORKGROUP;petl:test@localhost/public/' >> $GITHUB_ENV echo "Setup SFTP environment variable to trigger testing in Petl" echo 'PETL_TEST_SFTP=sftp://petl:test@localhost:2244/public/' >> $GITHUB_ENV echo "::group::Install remote test dependencies" python -m pip install --prefer-binary -r requirements-remote.txt echo "::endgroup::" - name: Install optional test dependencies for mode ${{ env.testing }} if: matrix.os == 'ubuntu-latest' && matrix.python >= '3.7' env: DISABLE_BLOSC_AVX2: 1 run: | echo "::group::Install tricky test dependencies" if ! pip install --prefer-binary -r requirements-optional.txt ; then echo 'Dismissed failure installing some optional package. Resuming tests...' fi # DISABLE_BLOSC_AVX2=1 && export DISABLE_BLOSC_AVX2 # pip install --prefer-binary bcolz echo "::endgroup::" - name: Install containers for remote filesystem testing if: matrix.os == 'ubuntu-latest' && matrix.python == '3.8' run: | echo "::group::Setup docker for SMB at: ${{ env.PETL_TEST_SMB }}$" docker run -it --name samba -p 139:139 -p 445:445 -d "dperson/samba" -p -u "petl;test" -s "public;/public-dir;yes;no;yes;all" echo "::endgroup::" echo "::group::Setup docker for SFTP at: ${{ env.PETL_TEST_SFTP }}$" docker run -it --name sftp -p 2244:22 -d atmoz/sftp petl:test:::public echo "::endgroup::" - name: Install containers for remote database testing if: matrix.os == 'ubuntu-latest' && matrix.python >= '3.6' run: | echo "::group::Setup docker for MySQL" docker run -it --name mysql -p 3306:3306 -p 33060:33060 -e MYSQL_ROOT_PASSWORD=pass0 -e MYSQL_DATABASE=petl -e MYSQL_USER=petl -e MYSQL_PASSWORD=test -d mysql:latest echo "::endgroup::" echo "::group::Setup docker for Postgres" docker run -it --name postgres -p 5432:5432 -e POSTGRES_DB=petl -e POSTGRES_USER=petl -e POSTGRES_PASSWORD=test -d postgres:latest echo "::endgroup::" echo "::group::Install database test dependencies" python -m pip install --prefer-binary -r requirements-database.txt echo "::endgroup::" - name: Setup petl package run: python setup.py sdist bdist_wheel - name: Test python source code for mode simple if: env.testing == 'simple' run: pytest --cov=petl petl - name: Test documentation inside source code for mode full if: env.testing == 'full' run: | echo "::group::Install extra packages test dependencies" python -m pip install --prefer-binary -r requirements-formats.txt echo "::endgroup::" echo "::group::Perform doctest-modules execution with coverage" pytest --doctest-modules --cov=petl petl echo "::endgroup::" - name: Coveralls if: matrix.os == 'ubuntu-latest' && matrix.python == '3.8' env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} run: | python -m pip install --upgrade coveralls coveralls --service=github - name: Print source code coverage run: coverage report -m test-documentation: needs: run-guard runs-on: ubuntu-latest strategy: matrix: python: [3.8] steps: - name: Checkout source code uses: actions/checkout@main - name: Set up Python ${{ matrix.python }} uses: actions/setup-python@v4 with: python-version: ${{ matrix.python }} - name: Install doc generation dependencies run: | python -m pip install --prefer-binary -r requirements-docs.txt - name: Setup petl package run: python setup.py build - name: Test docs generation run: | cd docs sphinx-build -W -b singlehtml -d ../build/doctrees . ../build/singlehtml petl-1.7.15/.gitignore000066400000000000000000000062651457414240700145620ustar00rootroot00000000000000## https://raw.githubusercontent.com/github/gitignore/main/Python.gitignore ## # Byte-compiled / optimized / DLL files __pycache__/ *.py[cod] *$py.class # C extensions *.so # Distribution / packaging .Python build/ develop-eggs/ dist/ downloads/ eggs/ .eggs/ lib/ lib64/ parts/ sdist/ var/ wheels/ share/python-wheels/ *.egg-info/ .installed.cfg *.egg MANIFEST # PyInstaller # Usually these files are written by a python script from a template # before PyInstaller builds the exe, so as to inject date/other infos into it. *.manifest *.spec # Installer logs pip-log.txt pip-delete-this-directory.txt # Unit test / coverage reports htmlcov/ .tox/ .nox/ .coverage .coverage.* .cache nosetests.xml coverage.xml *.cover *.py,cover .hypothesis/ .pytest_cache/ cover/ # Translations *.mo *.pot # Django stuff: *.log local_settings.py db.sqlite3 db.sqlite3-journal # Flask stuff: instance/ .webassets-cache # Scrapy stuff: .scrapy # Sphinx documentation docs/_build/ # PyBuilder .pybuilder/ target/ # Jupyter Notebook .ipynb_checkpoints # IPython profile_default/ ipython_config.py # pyenv # For a library or package, you might want to ignore these files since the code is # intended to run in multiple environments; otherwise, check them in: # .python-version # pipenv # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. # However, in case of collaboration, if having platform-specific dependencies or dependencies # having no cross-platform support, pipenv may install dependencies that don't work, or not # install all needed dependencies. #Pipfile.lock # poetry # Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control. # This is especially recommended for binary packages to ensure reproducibility, and is more # commonly ignored for libraries. # https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control #poetry.lock # PEP 582; used by e.g. github.com/David-OConnor/pyflow __pypackages__/ # Celery stuff celerybeat-schedule celerybeat.pid # SageMath parsed files *.sage.py # Environments .env .venv env/ venv/ ENV/ env.bak/ venv.bak/ # Spyder project settings .spyderproject .spyproject # Rope project settings .ropeproject # mkdocs documentation /site # mypy .mypy_cache/ .dmypy.json dmypy.json # Pyre type checker .pyre/ # pytype static type analyzer .pytype/ # Cython debug symbols cython_debug/ # PyCharm # JetBrains specific template is maintainted in a separate JetBrains.gitignore that can # be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore # and can be added to the global gitignore or merged into this file. For a more nuclear # option (not recommended) you can uncomment the following to ignore the entire idea folder. #.idea/ ## Custom section for petl ## # Python generated files *.pyc # Jypyter notebooks tem files .ipynb_checkpoints/ **/.ipynb_checkpoints/* # Editor backup files *~ *.backup # Petl build generated files petl/version.py **/tmp/ # Petl doctest generated files example*.* # Ignore this patterns for develepment convenience sketch* # Ignore this folder not idea users .idea/ ## end of .gitignore file ## petl-1.7.15/.vscode/000077500000000000000000000000001457414240700141225ustar00rootroot00000000000000petl-1.7.15/.vscode/extensions.json000066400000000000000000000013511457414240700172140ustar00rootroot00000000000000{ "recommendations": [ // //-- Used for IDE, Workbench, Tools -------------------------------------------- // "editorconfig.editorconfig", "VisualStudioExptTeam.vscodeintellicode", // //-- Used for linters, formatters ---------------------------------------------- // "ms-python.python", "ms-python.vscode-pylance", "charliermarsh.ruff", "njpwerner.autodocstring", "njqdev.vscode-python-typehint", // "ms-python.mypy-type-checker", // //-- Used for: Git, Code Quality ----------------------------------------------- // "vivaxy.vscode-conventional-commits", "codezombiech.gitignore" ] }petl-1.7.15/.vscode/launch.json000066400000000000000000000032461457414240700162740ustar00rootroot00000000000000{ "version": "0.2.0", "configurations": [ { "name": "python: with Args", "type": "debugpy", "request": "launch", "program": "${file}", "args": "${input:arguments}", "cwd": "${input:debug_working_dir}", "justMyCode": true, "autoReload": { "enable": true } }, { "name": "python: Within Libs", "type": "debugpy", "request": "launch", "program": "${file}", "args": "${input:last_arguments}", "cwd": "${input:debug_working_dir}", "justMyCode": false, "autoReload": { "enable": true } } ], "inputs": [ { // Usage: "args": "${input:arguments}", "id": "arguments", "type": "promptString", "description": "Which arguments to pass to the command?" }, { // Usage: "cwd": "${input:debug_working_dir}" "id": "debug_working_dir", "type": "pickString", "description": "Debug the python program in which of these folders?", "options": [ "${fileDirname}", "${fileWorkspaceFolder}", "${fileWorkspaceFolder}/petl", "${fileWorkspaceFolder}/petl/tests", "${fileWorkspaceFolder}/examples", "${relativeFileDirname}", "${userHome}", "${cwd}", "${selectedText}", "" ], "default": "${fileDirname}" }, ] }petl-1.7.15/.vscode/settings.json000066400000000000000000000132171457414240700166610ustar00rootroot00000000000000{ //-- Editor settings for all files ------------------------------------------------- // "editor.formatOnSave": false, // Using this for reducing git changes between commits "editor.formatOnPaste": true, "editor.tabSize": 4, "editor.wordWrapColumn": 83, "editor.renderWhitespace": "boundary", "files.eol": "\n", "files.encoding": "utf8", // //-- Editor settings for search/view files ----------------------------------------- // "files.exclude": { // python "**/.tox": true, "**/.nox": true, "**/.eggs": true, "**/*.egg-info": true, "**/__pycache__": true, "**/__pypackages__": true, "**/.pylint.d": true, "**/.cache": true, "**/.mypy_cache": true, "**/.pytest_cache": true, "**/.ruff_cache": true, "**/.ipynb_checkpoints": true, "**/*.pyc": true, "**/*.egg": true, "**/*.pyenv": true, "**/*.pytype": true, "**/.vscode-server": true, }, "files.watcherExclude": { // git "**/.git/objects/**": true, "**/.git/subtree-cache/**": true, // python "**/.tox": true, "**/.nox": true, "**/.eggs": true, "**/*.egg-info": true, "**/__pycache__": true, "**/__pypackages__": true, "**/.cache": true, "**/.mypy_cache": true, "**/.pytest_cache": true, "**/.ruff_cache": true, "**/.ipynb_checkpoints": true, "**/*.pyc": true, "**/*.egg": true, "**/*.venv": true, "**/*.pyenv": true, "**/*.pytype": true, "**/_build": true, "**/build": true, "**/dist": true, "**/site-packages": true, // others "**/logs": true, "**/*.log": true, "**/.vscode-server": true, "**/example.*": true, }, "search.exclude": { // git "**/.git/objects/**": true, "**/.git/subtree-cache/**": true, // python "**/.tox": true, "**/.nox": true, "**/.eggs": true, "**/*.egg-info": true, "**/__pycache__": true, "**/__pypackages__": true, "**/.cache": true, "**/.mypy_cache": true, "**/.pytest_cache": true, "**/.ruff_cache": true, "**/.ipynb_checkpoints": true, "**/*.pyc": true, "**/*.egg": true, "**/*.venv": true, "**/*.pyenv": true, "**/*.pytype": true, "**/_build": true, "**/build": true, "**/dist": true, "**/site-packages": true, // others "**/*.log": true, "**/logs": true, "**/.vscode-server": true, "**/example.*": true, }, // //-- Python language settings ------------------------------------------------------ // "[python]": { // Editor settings for python files "editor.tabSize": 4, "editor.wordWrapColumn": 83, "files.eol": "\n", "files.encoding": "utf8", "editor.defaultFormatter": "charliermarsh.ruff", // "editor.defaultFormatter": "ms-python.black-formatter", //-- Settings for keeping source code compatibility "editor.codeActionsOnSave": { "source.fixAll": "explicit", "source.organizeImports": "explicit" }, "editor.insertSpaces": true, "files.insertFinalNewline": true, //-- Settings to reduce git changes due to whitespaces up source code (disabled for now) // "editor.trimAutoWhitespace": true, // "files.trimFinalNewlines": true, // "files.trimTrailingWhitespace": true, }, "python.autoComplete.extraPaths": [ "${workspaceFolder}/petl/", "${workspaceFolder}/petl/io/", "${workspaceFolder}/petl/transform/", "${workspaceFolder}/petl/util/", "${workspaceFolder}/petl/test/", "${workspaceFolder}/petl/test/io", ], // //-- Python analysis/tools settings ------------------------------------------------ // "python.analysis.logLevel": "Warning", "pylint.ignorePatterns": [ ".vscode/*.py", ".git", ".tox", ".nox", ".venv", ".eggs", ".egg-info", ".cache", ".mypy_cache", ".pytest_cache", "__pycache__", "_build", "*.pyc", ".vscode", ".vscode-server", ], "pylint.args": [ "--max-line-length=83", "--reports=y", "--disable=import-error,invalid-name,bad-continuation,import-outside-toplevel,missing-module-docstring,missing-function-docstring,trailing-whitespace,line-too-long,bad-whitespace" ], "flake8.args": [ "--max-line-length", "83", "--max-complexity", "10", "--extend-ignore", "E201,E202,E203,E501,W503,W291", "--exclude", ".git,.tox,.venv,.eggs,.egg-info,.cache,.mypy_cache,.pytest_cache,.vscode,__pycache__,_build,*.pyc,.vscode-server" ], "mypy-type-checker.args": [ "--allow-untyped-defs", "--allow-untyped-calls", "--allow-untyped-globals", // "--ignore-missing-imports", // "--follow-imports", // "silent", ], "python.testing.unittestEnabled": false, "python.testing.pytestEnabled": true, "python.analysis.extraPaths": [ "${workspaceFolder}/petl/", "${workspaceFolder}/petl/io/", "${workspaceFolder}/petl/transform/", "${workspaceFolder}/petl/util/", "${workspaceFolder}/petl/test/", "${workspaceFolder}/petl/test/io" ], "bandit.args": [ "--configfile", "${workspaceFolder}/pyproject.toml" ] }petl-1.7.15/.vscode/tasks.json000066400000000000000000000024631457414240700161470ustar00rootroot00000000000000{ // See https://go.microsoft.com/fwlink/?LinkId=733558 // for the documentation about the tasks.json format "version": "2.0.0", "tasks": [ { "label": "Package: build", "command": "python2", "args": [ "setup.py", "build" ], "presentation": { "echo": true, "panel": "shared", "focus": true } }, { "label": "Package: install", "command": "python3", "group": { "kind": "build", "isDefault": true }, "args": [ "setup.py", "install" ], "presentation": { "echo": true, "panel": "shared", "focus": true } } ], "problemMatcher": [ { "fileLocation": "absolute", "pattern": [ { "regexp": "^\\s+File \"(.*)\", line (\\d+), in (.*)$", "file": 1, "line": 2 }, { "regexp": "^\\s+(.*)$", "message": 1 } ] } ] }petl-1.7.15/LICENSE.txt000066400000000000000000000020551457414240700144060ustar00rootroot00000000000000Copyright (c) 2012 Alistair Miles Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. petl-1.7.15/MANIFEST.in000066400000000000000000000000541457414240700143160ustar00rootroot00000000000000include *.txt recursive-include docs *.txt petl-1.7.15/README.rst000066400000000000000000000032371457414240700142550ustar00rootroot00000000000000petl - Extract, Transform and Load =================================================== ``petl`` is a general purpose Python package for extracting, transforming and loading tables of data. .. image:: docs/petl-architecture.png :align: center :alt: petl architecture example Resources --------- - Documentation: http://petl.readthedocs.org/ - Mailing List: http://groups.google.com/group/python-etl - PyPI: http://pypi.python.org/pypi/petl - Conda Forge:https://anaconda.org/conda-forge/petl DevOps Status ------------- .. image:: https://github.com/petl-developers/petl/actions/workflows/test-changes.yml/badge.svg :target: https://github.com/petl-developers/petl/actions/workflows/test-changes.yml :alt: Continuous Integration build status .. image:: https://github.com/petl-developers/petl/actions/workflows/publish-release.yml/badge.svg :target: https://github.com/petl-developers/petl/actions/workflows/publish-release.yml :alt: PyPI release status .. image:: https://github.com/conda-forge/petl-feedstock/actions/workflows/automerge.yml/badge.svg :target: https://github.com/conda-forge/petl-feedstock/actions/workflows/automerge.yml :alt: Conda Forge release status .. image:: https://readthedocs.org/projects/petl/badge/?version=stable :target: http://petl.readthedocs.io/en/stable/?badge=stable :alt: readthedocs.org release status .. image:: https://coveralls.io/repos/github/petl-developers/petl/badge.svg?branch=master :target: https://coveralls.io/github/petl-developers/petl?branch=master :alt: Coveralls release status .. image:: https://zenodo.org/badge/2233194.svg :target: https://zenodo.org/badge/latestdoi/2233194 petl-1.7.15/README.txt000066400000000000000000000013751457414240700142650ustar00rootroot00000000000000petl - Extract, Transform and Load ================================== ``petl`` is a general purpose Python package for extracting, transforming and loading tables of data. Resources --------- - Documentation: http://petl.readthedocs.org/ - Mailing List: http://groups.google.com/group/python-etl - Source Code: https://github.com/petl-developers/petl - Download: - PyPI: http://pypi.python.org/pypi/petl - Conda Forge:https://anaconda.org/conda-forge/petl Getting Help ------------- Please feel free to ask questions via the mailing list (python-etl@googlegroups.com). To report installation problems, bugs or any other issues please email python-etl@googlegroups.com or `raise an issue on GitHub `_. petl-1.7.15/bin/000077500000000000000000000000001457414240700133315ustar00rootroot00000000000000petl-1.7.15/bin/petl000077500000000000000000000013321457414240700142220ustar00rootroot00000000000000#!/usr/bin/env python from __future__ import print_function, division, absolute_import import sys import os import os.path import glob from optparse import OptionParser from petl import __version__ from petl import * parser = OptionParser( usage="%prog [options] expression", description="Evaluate a Python expression. The expression will be " "evaluated using eval(), with petl functions imported.", version=__version__) options, args = parser.parse_args() try: (expression,) = args except ValueError: parser.error("invalid number of arguments (%s)" % len(args)) r = eval(expression) if r is not None: if isinstance(r, Table): print(look(r)) else: print(str(r)) petl-1.7.15/docs/000077500000000000000000000000001457414240700135115ustar00rootroot00000000000000petl-1.7.15/docs/Makefile000066400000000000000000000107461457414240700151610ustar00rootroot00000000000000# Makefile for Sphinx documentation # # You can set these variables from the command line. SPHINXOPTS = SPHINXBUILD = sphinx-build PAPER = BUILDDIR = _build # Internal variables. PAPEROPT_a4 = -D latex_paper_size=a4 PAPEROPT_letter = -D latex_paper_size=letter ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . .PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest help: @echo "Please use \`make ' where is one of" @echo " html to make standalone HTML files" @echo " dirhtml to make HTML files named index.html in directories" @echo " singlehtml to make a single large HTML file" @echo " pickle to make pickle files" @echo " json to make JSON files" @echo " htmlhelp to make HTML files and a HTML help project" @echo " qthelp to make HTML files and a qthelp project" @echo " devhelp to make HTML files and a Devhelp project" @echo " epub to make an epub" @echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter" @echo " latexpdf to make LaTeX files and run them through pdflatex" @echo " text to make text files" @echo " man to make manual pages" @echo " changes to make an overview of all changed/added/deprecated items" @echo " linkcheck to check all external links for integrity" @echo " doctest to run all doctests embedded in the documentation (if enabled)" clean: -rm -rf $(BUILDDIR)/* html: $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html @echo @echo "Build finished. The HTML pages are in $(BUILDDIR)/html." dirhtml: $(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml @echo @echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml." singlehtml: $(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml @echo @echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml." pickle: $(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle @echo @echo "Build finished; now you can process the pickle files." json: $(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json @echo @echo "Build finished; now you can process the JSON files." htmlhelp: $(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp @echo @echo "Build finished; now you can run HTML Help Workshop with the" \ ".hhp project file in $(BUILDDIR)/htmlhelp." qthelp: $(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp @echo @echo "Build finished; now you can run "qcollectiongenerator" with the" \ ".qhcp project file in $(BUILDDIR)/qthelp, like this:" @echo "# qcollectiongenerator $(BUILDDIR)/qthelp/petl.qhcp" @echo "To view the help file:" @echo "# assistant -collectionFile $(BUILDDIR)/qthelp/petl.qhc" devhelp: $(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp @echo @echo "Build finished." @echo "To view the help file:" @echo "# mkdir -p $$HOME/.local/share/devhelp/petl" @echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/petl" @echo "# devhelp" epub: $(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub @echo @echo "Build finished. The epub file is in $(BUILDDIR)/epub." latex: $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex @echo @echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex." @echo "Run \`make' in that directory to run these through (pdf)latex" \ "(use \`make latexpdf' here to do that automatically)." latexpdf: $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex @echo "Running LaTeX files through pdflatex..." make -C $(BUILDDIR)/latex all-pdf @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." text: $(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text @echo @echo "Build finished. The text files are in $(BUILDDIR)/text." man: $(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man @echo @echo "Build finished. The manual pages are in $(BUILDDIR)/man." changes: $(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes @echo @echo "The overview file is in $(BUILDDIR)/changes." linkcheck: $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck @echo @echo "Link check complete; look for any errors in the above output " \ "or in $(BUILDDIR)/linkcheck/output.txt." doctest: $(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest @echo "Testing of doctests in the sources finished, look at the " \ "results in $(BUILDDIR)/doctest/output.txt." petl-1.7.15/docs/acknowledgments.rst000066400000000000000000000053371457414240700174400ustar00rootroot00000000000000Acknowledgments =============== This is community-maintained software. The following people have contributed to the development of this package: * Alexander Stauber * Alistair Miles (`alimanfoo `_) * Andreas Porevopoulos (`sv1jsb `_) * Andrew Kim (`andrewakim `_) * Brad Maggard (`bmaggard `_) * Caleb Lloyd (`caleblloyd `_) * César Roldán (`ihuro `_) * Chris Lasher (`gotgenes `_) * Dean Way (`DeanWay `_) * Dustin Engstrom (`engstrom `_) * Fahad Siddiqui (`fahadsiddiqui `_) * Florent Xicluna (`florentx `_) * Henry Rizzi (`henryrizzi `_) * Jonathan Camile (`deytao `_) * Jonathan Moss (`a-musing-moose `_) * Juarez Rudsatz (`juarezr `_) * Kenneth Borthwick * Krisztián Fekete (`krisztianfekete `_) * Matt Katz (`mattkatz `_) * Matthew Scholefield (`MatthewScholefield `_) * Michael Rea (`rea725 `_) * Olivier Macchioni (`omacchioni `_) * Olivier Poitrey (`rs `_) * Pablo Castellano (`PabloCastellano `_) * Paul Jensen (`psnj `_) * Paulo Scardine (`scardine `_) * Peder Jakobsen (`pjakobsen `_) * Phillip Knaus (`phillipknaus `_) * Richard Pearson (`podpearson `_) * Robert DeSimone (`icenine457 `_) * Robin Moss (`LupusUmbrae `_) * Roger Woodley (`rogerkwoodley `_) * Tucker Beck (`dusktreader `_) * Viliam Segeďa (`vilos `_) * Zach Palchick (`palchicz `_) * `adamsdarlingtower `_ * `hugovk `_ * `imazor `_ * `james-unified `_ * `Mgutjahr `_ * `shayh `_ * `thatneat `_ * `titusz `_ * `zigen `_ Development of petl has been supported by an open source license for `PyCharm `_. petl-1.7.15/docs/changes.rst000066400000000000000000000620461457414240700156630ustar00rootroot00000000000000Changes ======= Version 1.7.15 -------------- * Add unit tests for randomtable, dummytable, and their supporting functions and classes. By :user:`bmos`, :issue:`657`. * Fix: DeprecationWarning: Seeding based on hashing is deprecated since Python 3.9 and will be removed in a subsequent version. By :user:`bmos`, :issue:`657`. Version 1.7.14 -------------- * Enhancement: Fix other functions to conform with PEP 479 By :user:`augustomen`, :issue:`645`. * CI: fix build as SQLAlchemy 2 is not supported yet By :user:`juarezr`, :issue:`635`. * CI: workaround for actions/setup-python#672 as Github removed python 2.7 and 3.6 By :user:`juarezr`, :issue:`649`. * CI: Gh actions upgrade By :user:`juarezr`, :issue:`639`. Version 1.7.13 -------------- * Fix in case a custom protocol was registered in fsspec By :user:`timheb`, :issue:`647`. Version 1.7.12 -------------- * Fix: calling functions to*() should output by default to stdout By :user:`juarezr`, :issue:`632`. * Add python3.11 for the build and testing By :user:`juarezr`, :issue:`635`. * Add support for writing to JSONL files By :user:`mzaeemz`, :issue:`524`. Version 1.7.11 -------------- * Fix generator support in fromdicts to use file cache By :user:`arturponinski`, :issue:`625`. Version 1.7.10 -------------- * Fix fromtsv() to pass on header argument By :user:`jfitzell`, :issue:`622`. Version 1.7.9 ------------- * Feature: Add improved support for working with Google Sheets By :user:`juarezr`, :issue:`615`. * Maintanance: Improve test helpers testing By :user:`juarezr`, :issue:`614`. Version 1.7.8 ------------- * Fix iterrowslice() to conform with PEP 479 By :user:`arturponinski`, :issue:`575`. * Cleanup and unclutter old and unused files in repository By :user:`juarezr`, :issue:`606`. * Add tohtml with css styles test case By :user:`juarezr`, :issue:`609`. * Fix sortheader() to not overwrite data for duplicate column names By :user:`arturponinski`, :issue:`392`. * Add NotImplementedError to IterContainer's __iter__ By :user:`arturponinski`, :issue:`483`. * Add casting of headers to strings in toxlsx and appendxlsx By :user:`arturponinski`, :issue:`530`. * Fix sorting of rows with different length By :user:`arturponinski`, :issue:`385`. Version 1.7.7 ------------- * New pull request template. No python changes. By :user:`juarezr`, :issue:`594`. Version 1.7.6 ------------- * Fix convertall does not work when table header has non-string elements By :user:`dnicolodi`, :issue:`579`. * Fix todataframe() to do not iterate the table multiple times By :user:`dnicolodi`, :issue:`578`. * Fix broken aggregate when supplying single key By :user:`MalayGoel`, :issue:`552`. * Migrated to pytest By :user:`arturponinski`, :issue:`584`. * Testing python 3.10 on Github Actions. No python changes. By :user:`juarezr`, :issue:`591`. * codacity: upgrade to latest/main github action version. No python changes. By :user:`juarezr`, :issue:`585`. * Publish releases to PyPI with Github Actions. No python changes. By :user:`juarezr`, :issue:`593`. Version 1.7.5 ------------- * Added Decimal to numeric types By :user:`blas`, :issue:`573`. * Add support for ignore_workbook_corruption parameter in xls By :user:`arturponinski`, :issue:`572`. * Add support for generators in the petl.fromdicts By :user:`arturponinski`, :issue:`570`. * Add function to support fromdb, todb, appenddb via clickhouse_driver By :user:`superjcd`, :issue:`566`. * Fix fromdicts(...).header() raising TypeError By :user:`romainernandez`, :issue:`555`. Version 1.7.4 ------------- * Use python 3.6 instead of 2.7 for deploy on travis-ci. No python changes. By :user:`juarezr`, :issue:`550`. Version 1.7.3 ------------- * Fixed SQLAlchemy 1.4 removed the Engine.contextual_connect method By :user:`juarezr`, :issue:`545`. * How to use convert with custom function and reference row By :user:`javidy`, :issue:`542`. Version 1.7.2 ------------- * Allow aggregation over the entire table (without a key) By :user:`bmaggard`, :issue:`541`. * Allow specifying output field name for simple aggregation By :user:`bmaggard`, :issue:`370`. * Bumped version of package dependency on lxml from 4.4.0 to 4.6.2 By :user:`juarezr`, :issue:`536`. Version 1.7.1 ------------- * Fixing conda packaging failures. By :user:`juarezr`, :issue:`534`. Version 1.7.0 ------------- * Added `toxml()` as convenience wrapper over `totext()`. By :user:`juarezr`, :issue:`529`. * Document behavior of multi-field convert-with-row. By :user:`chrullrich`, :issue:`532`. * Allow user defined sources from fsspec for :ref:`remote I/O `. By :user:`juarezr`, :issue:`533`. Version 1.6.8 ------------- * Allow using a custom/restricted xml parser in `fromxml()`. By :user:`juarezr`, :issue:`527`. Version 1.6.7 ------------- * Reduced memory footprint for JSONL files, huge improvement. By :user:`fahadsiddiqui`, :issue:`522`. Version 1.6.6 ------------- * Added python version 3.8 and 3.9 to tox.ini for using in newer distros. By :user:`juarezr`, :issue:`517`. * Fixed compatibility with python3.8 in `petl.timings.clock()`. By :user:`juarezr`, :issue:`484`. * Added json lines support in `fromjson()`. By :user:`fahadsiddiqui`, :issue:`521`. Version 1.6.5 ------------- * Fixed `fromxlsx()` with read_only crashes. By :user:`juarezr`, :issue:`514`. Version 1.6.4 ------------- * Fixed exception when writing to S3 with ``fsspec`` ``auto_mkdir=True``. By :user:`juarezr`, :issue:`512`. Version 1.6.3 ------------- * Allowed reading and writing Excel files in remote sources. By :user:`juarezr`, :issue:`506`. * Allow `toxlsx()` to add or replace a worksheet. By :user:`churlrich`, :issue:`502`. * Improved avro: improve message on schema or data mismatch. By :user:`juarezr`, :issue:`507`. * Fixed build for failed test case. By :user:`juarezr`, :issue:`508`. Version 1.6.2 ------------- * Fixed boolean type detection in toavro(). By :user:`juarezr`, :issue:`504`. * Fix unavoidable warning if fsspec is installed but some optional package is not installed. By :user:`juarezr`, :issue:`503`. Version 1.6.1 ------------- * Added `extras_require` for the `petl` pip package. By :user:`juarezr`, :issue:`501`. * Fix unavoidable warning if fsspec is not installed. By :user:`juarezr`, :issue:`500`. Version 1.6.0 ------------- * Added class :class:`petl.io.remotes.RemoteSource` using package **fsspec** for reading and writing files in remote servers by using the protocol in the url for selecting the implementation. By :user:`juarezr`, :issue:`494`. * Removed classes :class:`petl.io.source.s3.S3Source` as it's handled by fsspec By :user:`juarezr`, :issue:`494`. * Removed classes :class:`petl.io.codec.xz.XZCodec`, :class:`petl.io.codec.xz.LZ4Codec` and :class:`petl.io.codec.zstd.ZstandardCodec` as it's handled by fsspec. By :user:`juarezr`, :issue:`494`. * Fix bug in connection to a JDBC database using jaydebeapi. By :user:`miguelosana`, :issue:`497`. Version 1.5.0 ------------- * Added functions :func:`petl.io.sources.register_reader` and :func:`petl.io.sources.register_writer` for registering custom source helpers for hanlding I/O from remote protocols. By :user:`juarezr`, :issue:`491`. * Added function :func:`petl.io.sources.register_codec` for registering custom helpers for compressing and decompressing files with other algorithms. By :user:`juarezr`, :issue:`491`. * Added classes :class:`petl.io.codec.xz.XZCodec`, :class:`petl.io.codec.xz.LZ4Codec` and :class:`petl.io.codec.zstd.ZstandardCodec` for compressing files with `XZ` and the "state of art" `LZ4` and `Zstandard` algorithms. By :user:`juarezr`, :issue:`491`. * Added classes :class:`petl.io.source.s3.S3Source` and :class:`petl.io.source.smb.SMBSource` reading and writing files to remote servers using int url the protocols `s3://` and `smb://`. By :user:`juarezr`, :issue:`491`. Version 1.4.0 ------------- * Added functions :func:`petl.io.avro.fromavro`, :func:`petl.io.avro.toavro`, and :func:`petl.io.avro.appendavro` for reading and writing to `Apache Avro ` files. Avro generally is faster and safer than text formats like Json, XML or CSV. By :user:`juarezr`, :issue:`490`. Version 1.3.0 ------------- .. note:: The parameters to the :func:`petl.io.xlsx.fromxlsx` function have changed in this release. The parameters ``row_offset`` and ``col_offset`` are no longer supported. Please use ``min_row``, ``min_col``, ``max_row`` and ``max_col`` instead. * A new configuration option `failonerror` has been added to the :mod:`petl.config` module. This option affects various transformation functions including :func:`petl.transform.conversions.convert`, :func:`petl.transform.maps.fieldmap`, :func:`petl.transform.maps.rowmap` and :func:`petl.transform.maps.rowmapmany`. The option can have values `True` (raise any exceptions encountered during conversion), `False` (silently use a given `errorvalue` if any exceptions arise during conversion) or `"inline"` (use any exceptions as the output value). The default value is `False` which maintains compatibility with previous releases. By :user:`bmaggard`, :issue:`460`, :issue:`406`, :issue:`365`. * A new function :func:`petl.util.timing.log_progress` has been added, which behaves in a similar way to :func:`petl.util.timing.progress` but writes to a Python logger. By :user:`dusktreader`, :issue:`408`, :issue:`407`. * Added new function :func:`petl.transform.regex.splitdown` for splitting a value into multiple rows. By :user:`John-Dennert`, :issue:`430`, :issue:`386`. * Added new function :func:`petl.transform.basics.addfields` to add multiple new fields at a time. By :user:`mjumbewu`, :issue:`417`. * Pass through keyword arguments to :func:`xlrd.open_workbook`. By :user:`gjunqueira`, :issue:`470`, :issue:`473`. * Added new function :func:`petl.io.xlsx.appendxlsx`. By :user:`victormpa` and :user:`alimanfoo`, :issue:`424`, :issue:`475`. * Fixes for upstream API changes in openpyxl and intervaltree modules. N.B., the arguments to :func:`petl.io.xlsx.fromxlsx` have changed for specifying row and column offsets to match openpyxl. (:issue:`472` - :user:`alimanfoo`). * Exposed `read_only` argument in :func:`petl.io.xlsx.fromxlsx` and set default to False to prevent truncation of files created by LibreOffice. By :user:`mbelmadani`, :issue:`457`. * Added support for reading from remote sources with gzip or bz2 compression (:issue:`463` - :user:`H-Max`). * The function :func:`petl.transform.dedup.distinct` has been fixed for the case where ``None`` values appear in the table. By :user:`bmaggard`, :issue:`414`, :issue:`412`. * Changed keyed sorts so that comparisons are only by keys. By :user:`DiegoEPaez`, :issue:`466`. * Documentation improvements by :user:`gamesbook` (:issue:`458`). Version 1.2.0 ------------- Please note that this version drops support for Python 2.6 (:issue:`443`, :issue:`444` - :user:`hugovk`). * Function :func:`petl.transform.basics.addrownumbers` now supports a "field" argument to allow specifying the name of the new field to be added (:issue:`366`, :issue:`367` - :user:`thatneat`). * Fix to :func:`petl.io.xlsx.fromxslx` to ensure that the underlying workbook is closed after iteration is complete (:issue:`387` - :user:`mattkatz`). * Resolve compatibility issues with newer versions of openpyxl (:issue:`393`, :issue:`394` - :user:`henryrizzi`). * Fix deprecation warnings from openpyxl (:issue:`447`, :issue:`445` - :user:`scardine`; :issue:`449` - :user:`alimanfoo`). * Changed exceptions to use standard exception classes instead of ArgumentError (:issue:`396` - :user:`bmaggard`). * Add support for non-numeric quoting in CSV files (:issue:`377`, :issue:`378` - :user:`vilos`). * Fix bug in handling of mode in MemorySource (:issue:`403` - :user:`bmaggard`). * Added a get() method to the Record class (:issue:`401`, :issue:`402` - :user:`dusktreader`). * Added ability to make constraints optional, i.e., support validation on optional fields (:issue:`399`, :issue:`400` - :user:`dusktreader`). * Added support for CSV files without a header row (:issue:`421` - :user:`LupusUmbrae`). * Documentation fixes (:issue:`379` - :user:`DeanWay`; :issue:`381` - :user:`PabloCastellano`). Version 1.1.0 ------------- * Fixed :func:`petl.transform.reshape.melt` to work with non-string key argument (`#209 `_). * Added example to docstring of :func:`petl.transform.dedup.conflicts` to illustrate how to analyse the source of conflicts when rows are merged from multiple tables (`#256 `_). * Added functions for working with bcolz ctables, see :mod:`petl.io.bcolz` (`#310 `_). * Added :func:`petl.io.base.fromcolumns` (`#316 `_). * Added :func:`petl.transform.reductions.groupselectlast`. (`#319 `_). * Added example in docstring for :class:`petl.io.sources.MemorySource` (`#323 `_). * Added function :func:`petl.transform.basics.stack` as a simpler alternative to :func:`petl.transform.basics.cat`. Also behaviour of :func:`petl.transform.basics.cat` has changed for tables where the header row contains duplicate fields. This was part of addressing a bug in :func:`petl.transform.basics.addfield` for tables where the header contains duplicate fields (`#327 `_). * Change in behaviour of :func:`petl.io.json.fromdicts` to preserve ordering of keys if ordered dicts are used. Also added :func:`petl.transform.headers.sortheader` to deal with unordered cases (`#332 `_). * Added keyword `strict` to functions in the :mod:`petl.transform.setops` module to enable users to enforce strict set-like behaviour if desired (`#333 `_). * Added `epilogue` argument to :func:`petl.util.vis.display` to enable further customisation of content of table display in Jupyter notebooks (`#337 `_). * Added :func:`petl.transform.selects.biselect` as a convenience for obtaining two tables, one with rows matching a condition, the other with rows not matching the condition (`#339 `_). * Changed :func:`petl.io.json.fromdicts` to avoid making two passes through the data (`#341 `_). * Changed :func:`petl.transform.basics.addfieldusingcontext` to enable running calculations (`#343 `_). * Fix behaviour of join functions when tables have no non-key fields (`#345 `_). * Fix incorrect default value for 'errors' argument when using codec module (`#347 `_). * Added some documentation on how to write extension classes, see :doc:`intro` (`#349 `_). * Fix issue with unicode field names (`#350 `_). Version 1.0 ----------- Version 1.0 is a new major release of :mod:`petl`. The main purpose of version 1.0 is to introduce support for Python 3.4, in addition to the existing support for Python 2.6 and 2.7. Much of the functionality available in :mod:`petl` versions 0.x has remained unchanged in version 1.0, and most existing code that uses :mod:`petl` should work unchanged with version 1.0 or with minor changes. However there have been a number of API changes, and some functionality has been migrated from the `petlx`_ package, described below. If you have any questions about migrating to version 1.0 or find any problems or issues please email python-etl@googlegroups.com. Text file encoding ~~~~~~~~~~~~~~~~~~ Version 1.0 unifies the API for working with ASCII and non-ASCII encoded text files, including CSV and HTML. The following functions now accept an 'encoding' argument, which defaults to the value of ``locale.getpreferredencoding()`` (usually 'utf-8'): `fromcsv`, `tocsv`, `appendcsv`, `teecsv`, `fromtsv`, `totsv`, `appendtsv`, `teetsv`, `fromtext`, `totext`, `appendtext`, `tohtml`, `teehtml`. The following functions have been removed as they are now redundant: `fromucsv`, `toucsv`, `appenducsv`, `teeucsv`, `fromutsv`, `toutsv`, `appendutsv`, `teeutsv`, `fromutext`, `toutext`, `appendutext`, `touhtml`, `teeuhtml`. To migrate code, in most cases it should be possible to simply replace 'fromucsv' with 'fromcsv', etc. `pelt.fluent` and `petl.interactive` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The functionality previously available through the `petl.fluent` and `petl.interactive` modules is now available through the root petl module. This means two things. First, is is now possible to use either functional or fluent (i.e., object-oriented) styles of programming with the root :mod:`petl` module, as described in introductory section on :ref:`intro_programming_styles`. Second, the default representation of table objects uses the :func:`petl.util.vis.look` function, so you can simply return a table from the prompt to inspect it, as described in the introductory section on :ref:`intro_interactive_use`. The `petl.fluent` and `petl.interactive` modules have been removed as they are now redundant. To migrate code, it should be possible to simply replace "import petl.fluent as etl" or "import petl.interactive as etl" with "import petl as etl". Note that the automatic caching behaviour of the `petl.interactive` module has **not** been retained. If you want to enable caching behaviour for a particular table, make an explicit call to the :func:`petl.util.materialise.cache` function. See also :ref:`intro_caching`. IPython notebook integration ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In version 1.0 :mod:`petl` table container objects implement `_repr_html_()` so can be returned from a cell in an IPython notebook and will automatically format as an HTML table. Also, the :func:`petl.util.vis.display` and :func:`petl.util.vis.displayall` functions have been migrated across from the `petlx.ipython` package. If you are working within the IPython notebook these functions give greater control over how tables are rendered. For some examples, see: http://nbviewer.ipython.org/github/petl-developers/petl/blob/v1.0/repr_html.ipynb Database extract/load functions ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The :func:`petl.io.db.todb` function now supports automatic table creation, inferring a schema from data in the table to be loaded. This functionality has been migrated across from the `petlx`_ package, and requires `SQLAlchemy `_ to be installed. The functions `fromsqlite3`, `tosqlite3` and `appendsqlite3` have been removed as they duplicate functionality available from the existing functions :func:`petl.io.db.fromdb`, :func:`petl.io.db.todb` and :func:`petl.io.db.appenddb`. These existing functions have been modified so that if a string is provided as the `dbo` argument it is interpreted as the name of an :mod:`sqlite3` file. It should be possible to migrate code by simply replacing 'fromsqlite3' with 'fromdb', etc. Other functions removed or renamed ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The following functions have been removed because they are overly complicated and/or hardly ever used. If you use any of these functions and would like to see them re-instated then please email python-etl@googlegroups.com: `rangefacet`, `rangerowreduce`, `rangeaggregate`, `rangecounts`, `multirangeaggregate`, `lenstats`. The following functions were marked as deprecated in petl 0.x and have been removed in version 1.0: `dataslice` (use `data` instead), `fieldconvert` (use `convert` instead), `fieldselect` (use `select` instead), `parsenumber` (use `numparser` instead), `recordmap` (use `rowmap` instead), `recordmapmany` (use `rowmapmany` instead), `recordreduce` (use `rowreduce` instead), `recordselect` (use `rowselect` instead), `valueset` (use ``table.values(‘foo’).set()`` instead). The following functions are no longer available in the root :mod:`petl` namespace, but are still available from a subpackage if you really need them: `iterdata` (use `data` instead), `iterdicts` (use `dicts` instead), `iternamedtuples` (use `namedtuples` instead), `iterrecords` (use `records` instead), `itervalues` (use `values` instead). The following functions have been renamed: `isordered` (renamed to `issorted`), `StringSource` (renamed to `MemorySource`). The function `selectre` has been removed as it duplicates functionality, use `search` instead. Sorting and comparison ~~~~~~~~~~~~~~~~~~~~~~ A major difference between Python 2 and Python 3 involves comparison and sorting of objects of different types. Python 3 is a lot stricter about what you can compare with what, e.g., ``None < 1 < 'foo'`` works in Python 2.x but raises an exception in Python 3. The strict comparison behaviour of Python 3 is generally a problem for typical usages of :mod:`petl`, where data can be highly heterogeneous and a column in a table may have a mixture of values of many different types, including `None` for missing. To maintain the usability of :mod:`petl` in this type of scenario, and to ensure that the behaviour of :mod:`petl` is as consistent as possible across different Python versions, the :func:`petl.transform.sorts.sort` function and anything that depends on it (as well as any other functions making use of rich comparisons) emulate the relaxed comparison behaviour that is available under Python 2.x. In fact :mod:`petl` goes further than this, allowing comparison of a wider range of types than is possible under Python 2.x (e.g., ``datetime`` with ``None``). As the underlying code to achieve this has been completely reworked, there may be inconsistencies or unexpected behaviour, so it's worth testing carefully the results of any code previously run using :mod:`petl` 0.x, especially if you are also migrating from Python 2 to Python 3. The different comparison behaviour under different Python versions may also give unexpected results when selecting rows of a table. E.g., the following will work under Python 2.x but raise an exception under Python 3.4:: >>> import petl as etl >>> table = [['foo', 'bar'], ... ['a', 1], ... ['b', None]] >>> # raises exception under Python 3 ... etl.select(table, 'bar', lambda v: v > 0) To get the more relaxed behaviour under Python 3.4, use the :mod:`petl.transform.selects.selectgt` function, or wrap values with :class:`petl.comparison.Comparable`, e.g.:: >>> # works under Python 3 ... etl.selectgt(table, 'bar', 0) +-----+-----+ | foo | bar | +=====+=====+ | 'a' | 1 | +-----+-----+ >>> # or ... ... etl.select(table, 'bar', lambda v: v > etl.Comparable(0)) +-----+-----+ | foo | bar | +=====+=====+ | 'a' | 1 | +-----+-----+ New extract/load modules ~~~~~~~~~~~~~~~~~~~~~~~~ Several new extract/load modules have been added, migrating functionality previously available from the `petlx`_ package: * :ref:`io_xls` * :ref:`io_xlsx` * :ref:`io_numpy` * :ref:`io_pandas` * :ref:`io_pytables` * :ref:`io_whoosh` These modules all have dependencies on third party packages, but these have been kept as optional dependencies so are not required for installing :mod:`petl`. New validate function ~~~~~~~~~~~~~~~~~~~~~ A new :func:`petl.transform.validation.validate` function has been added to provide a convenient interface when validating a table against a set of constraints. New intervals module ~~~~~~~~~~~~~~~~~~~~ A new module has been added providing transformation functions based on intervals, migrating functionality previously available from the `petlx`_ package: * :ref:`transform_intervals` This module requires the `intervaltree `_ module. New configuration module ~~~~~~~~~~~~~~~~~~~~~~~~ All configuration variables have been brought together into a new :mod:`petl.config` module. See the source code for the variables available, they should be self-explanatory. :mod:`petl.push` moved to :mod:`petlx` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The :mod:`petl.push` module remains in an experimental state and has been moved to the `petlx`_ extensions project. Argument names and other minor changes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Argument names for a small number of functions have been changed to create consistency across the API. There are some other minor changes as well. If you are migrating from :mod:`petl` version 0.x the best thing is to run your code and inspect any errors. Email python-etl@googlegroups.com if you have any questions. Source code reorganisation ~~~~~~~~~~~~~~~~~~~~~~~~~~ The source code has been substantially reorganised. This should not affect users of the :mod:`petl` package however as all functions in the public API are available through the root :mod:`petl` namespace. .. _petlx: http://petlx.readthedocs.org petl-1.7.15/docs/conf.py000066400000000000000000000164611457414240700150200ustar00rootroot00000000000000# -*- coding: utf-8 -*- # # petl documentation build configuration file, created by # sphinx-quickstart on Fri Aug 19 11:16:43 2011. # # This file is execfile()d with the current directory set to its containing dir. # # Note that not all possible configuration values are present in this # autogenerated file. # # All configuration values have a default; values that are commented out # serve to show the default. import sys, os # If extensions (or modules to document with autodoc) are in another directory, # add these directories to sys.path here. If the directory is relative to the # documentation root, use os.path.abspath to make it absolute, like shown here. sys.path.insert(0, os.path.abspath('..')) import petl # -- General configuration ----------------------------------------------------- # If your documentation needs a minimal Sphinx version, state it here. #needs_sphinx = '1.0' # Add any Sphinx extension module names here, as strings. They can be extensions # coming with Sphinx (named 'sphinx.ext.*') or your custom ones. extensions = ['sphinx.ext.autodoc', 'sphinx.ext.doctest', 'sphinx.ext.intersphinx', 'sphinx.ext.todo', 'sphinx.ext.coverage', 'sphinx.ext.imgmath', 'sphinx.ext.ifconfig', 'sphinx.ext.viewcode', 'sphinx_issues'] # Add any paths that contain templates here, relative to this directory. templates_path = ['_templates'] # The suffix of source filenames. source_suffix = '.rst' issues_github_path = 'petl-developers/petl' # The encoding of source files. #source_encoding = 'utf-8-sig' # The master toctree document. master_doc = 'index' # General information about the project. project = u'petl' copyright = u'2014, Alistair Miles' # The version info for the project you're documenting, acts as replacement for # |version| and |release|, also used in various other places throughout the # built documents. # # The short X.Y version. version = petl.__version__ # The full version, including alpha/beta/rc tags. release = petl.__version__ # The language for content autogenerated by Sphinx. Refer to documentation # for a list of supported languages. #language = None # There are two options for replacing |today|: either, you set today to some # non-false value, then it is used: #today = '' # Else, today_fmt is used as the format for a strftime call. #today_fmt = '%B %d, %Y' # List of patterns, relative to source directory, that match files and # directories to ignore when looking for source files. exclude_patterns = ['_build', 'examples', 'notes', 'bin', 'dist'] # The reST default role (used for this markup: `text`) to use for all documents. #default_role = None # If true, '()' will be appended to :func: etc. cross-reference text. #add_function_parentheses = True # If true, the current module name will be prepended to all description # unit titles (such as .. function::). #add_module_names = True # If true, sectionauthor and moduleauthor directives will be shown in the # output. They are ignored by default. #show_authors = False # The name of the Pygments (syntax highlighting) style to use. pygments_style = 'sphinx' # A list of ignored prefixes for module index sorting. #modindex_common_prefix = [] # -- Options for HTML output --------------------------------------------------- # The theme to use for HTML and HTML Help pages. See the documentation for # a list of builtin themes. html_theme = 'default' # Theme options are theme-specific and customize the look and feel of a theme # further. For a list of options available for each theme, see the # documentation. #html_theme_options = {} # Add any paths that contain custom themes here, relative to this directory. #html_theme_path = [] # The name for this set of Sphinx documents. If None, it defaults to # " v documentation". #html_title = None # A shorter title for the navigation bar. Default is the same as html_title. #html_short_title = None # The name of an image file (relative to this directory) to place at the top # of the sidebar. #html_logo = None # The name of an image file (within the static path) to use as favicon of the # docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32 # pixels large. #html_favicon = None # Add any paths that contain custom static files (such as style sheets) here, # relative to this directory. They are copied after the builtin static files, # so a file named "default.css" will overwrite the builtin "default.css". #html_static_path = ['_static'] # If not '', a 'Last updated on:' timestamp is inserted at every page bottom, # using the given strftime format. #html_last_updated_fmt = '%b %d, %Y' # If true, SmartyPants will be used to convert quotes and dashes to # typographically correct entities. #html_use_smartypants = True # Custom sidebar templates, maps document names to template names. #html_sidebars = {} # Additional templates that should be rendered to pages, maps page names to # template names. #html_additional_pages = {} # If false, no module index is generated. #html_domain_indices = True # If false, no index is generated. #html_use_index = True # If true, the index is split into individual pages for each letter. #html_split_index = False # If true, links to the reST sources are added to the pages. #html_show_sourcelink = True # If true, "Created using Sphinx" is shown in the HTML footer. Default is True. #html_show_sphinx = True # If true, "(C) Copyright ..." is shown in the HTML footer. Default is True. #html_show_copyright = True # If true, an OpenSearch description file will be output, and all pages will # contain a tag referring to it. The value of this option must be the # base URL from which the finished HTML is served. #html_use_opensearch = '' # This is the file name suffix for HTML files (e.g. ".xhtml"). #html_file_suffix = None # Output file base name for HTML help builder. htmlhelp_basename = 'petldoc' # -- Options for LaTeX output -------------------------------------------------- # The paper size ('letter' or 'a4'). #latex_paper_size = 'letter' # The font size ('10pt', '11pt' or '12pt'). #latex_font_size = '10pt' # Grouping the document tree into LaTeX files. List of tuples # (source start file, target name, title, author, documentclass [howto/manual]). latex_documents = [ ('index', 'petl.tex', u'petl Documentation', u'Alistair Miles', 'manual'), ] # The name of an image file (relative to this directory) to place at the top of # the title page. #latex_logo = None # For "manual" documents, if this is true, then toplevel headings are parts, # not chapters. #latex_use_parts = False # If true, show page references after internal links. #latex_show_pagerefs = False # If true, show URL addresses after external links. #latex_show_urls = False # Additional stuff for the LaTeX preamble. #latex_preamble = '' # Documents to append as an appendix to all manuals. #latex_appendices = [] # If false, no module index is generated. #latex_domain_indices = True # -- Options for manual page output -------------------------------------------- # One entry per manual page. List of tuples # (source start file, name, description, authors, manual section). man_pages = [ ('index', 'petl', u'petl Documentation', [u'Alistair Miles'], 1) ] # Example configuration for intersphinx: refer to the Python standard library. # disable temporarily # intersphinx_mapping = {'http://docs.python.org/': None} petl-1.7.15/docs/config.rst000066400000000000000000000001321457414240700155040ustar00rootroot00000000000000Configuration =============================== .. automodule:: petl.config :members: petl-1.7.15/docs/contributing.rst000066400000000000000000000110321457414240700167470ustar00rootroot00000000000000Contributing ============ Contributions to :mod:`petl` are welcome in any form, please feel free to email the python-etl@googlegroups.com mailing list if you have some code or ideas you'd like to discuss. Please note that the :mod:`petl` package is intended as a stable, general purpose library for ETL work. If you would like to extend :mod:`petl` with functionality that is domain-specific, or if you have an experimental or tentative feature that is not yet ready for inclusion in the core :mod:`petl` package but you would like to distribute it, please contribute to the `petlx `_ project instead, or distribute your code as a separate package. If you are thinking of developing or modifying the :mod:`petl` code base in any way, here is some information on how to set up your development environment to run tests etc. Running the test suite ---------------------- The main :mod:`petl` test suite can be run with `nose `_. E.g., assuming you have the source code repository cloned to the current working directory, you can run the test suite with:: $ pip install -r requirements-tests.txt $ pytest -v petl Currently :mod:`petl` supports Python 2.7, 3.6 up to 3.11 so the tests should pass under all these Python versions. Dependencies ------------ To keep installation as simple as possible on different platforms, :mod:`petl` has no installation dependencies. Most functionality also depends only on the Python core libraries. Some :mod:`petl` functions depend on third party packages, however these should be kept as optional requirements. Any tests for modules requiring third party packages should be written so that they are skipped if the packages are not available. See the existing tests for examples of how to do this. Running database tests ---------------------- There are some additional tests within the test suite that require database servers to be setup correctly on the local host. To run these additional tests, make sure you have both MySQL and PostgreSQL servers running locally, and have created a user "petl" with password "test" and all permissions granted on a database called "petl". Install dependencies:: $ pip install pymysql psycopg2 sqlalchemy If these dependencies are not installed, or if a local database server is not found, these tests are skipped. Running doctests ---------------- Doctests in docstrings should (almost) all be runnable, and should pass if run with Python 3.6. Doctests can be run with `nose `_. See the tox.ini file for example doctest commands. Building the documentation -------------------------- Documentation is built with `sphinx `_. To build:: $ pip install -r requirements-docs.txt $ cd docs $ make html Built docs can then be found in the ``docs/_build/html/`` directory. Automatically running all tests ------------------------------- All of the above tests can be run automatically using `tox `_. You will need binaries for Python 2.7 and 3.6 available on your system. To run all tests **without** installing any of the optional dependencies, do:: $ tox -e py27,py36,docs To run the entire test suite, including installation of **all** optional dependencies, do:: $ tox The first time you run this it will take some while all the optional dependencies are installed in each environment. Contributing code via GitHub ---------------------------- The best way to contribute code is via a GitHub pull request. Please include unit tests with any code contributed. If you are able, please run tox and ensure that all the above tests pass before making a pull request. Thanks! Guidelines for core developers ------------------------------ Before merging a pull request that includes new or modified code, all items in the `PR checklist `_ should be complete. Pull requests containing new and/or modified code that is anything other than a trivial bug fix should be approved by at least one core developer before being merged. If a core developer is making a PR themselves, it is OK to merge their own PR if they first allow some reasonable time (e.g., at least one working day) for other core devs to raise any objections, e.g., by posting a comment like "merging soon if no objections" on the PR. If the PR contains substantial new features or modifications, the PR author might want to allow a little more time to ensure other core devs have an opportunity to see it. petl-1.7.15/docs/index.rst000066400000000000000000000034401457414240700153530ustar00rootroot00000000000000.. module:: petl petl - Extract, Transform and Load =================================== :mod:`petl` is a general purpose Python package for extracting, transforming and loading tables of data. .. image:: petl-architecture.png :width: 750 :align: center :alt: petl use cases diagram Resources --------- - Documentation: http://petl.readthedocs.org/ - Mailing List: http://groups.google.com/group/python-etl - Source Code: https://github.com/petl-developers/petl - Download: - PyPI: http://pypi.python.org/pypi/petl - Conda Forge:https://anaconda.org/conda-forge/petl .. note:: - Version 2.0 will be a major milestone for :mod:`petl`. - This version will introduce some changes that could affect current behaviour. - We will try to keep compatibility to the maximum possible, except when the current behavior is inconsistent or have shortcomings. - The biggest change is the end of support of Python `2.7`. - The minimum supported version will be Python `3.6`. Getting Help ------------- Please feel free to ask questions via the mailing list (python-etl@googlegroups.com). To report installation problems, bugs or any other issues please email python-etl@googlegroups.com or `raise an issue on GitHub `_. For an example of :mod:`petl` in use, see the `case study on comparing tables `_. Contents -------- For an alphabetic list of all functions in the package, see the :ref:`genindex`. .. toctree:: :maxdepth: 2 install intro io transform util config changes contributing acknowledgments related_work Indices and tables ------------------ * :ref:`genindex` * :ref:`modindex` * :ref:`search` petl-1.7.15/docs/install.rst000066400000000000000000000046571457414240700157250ustar00rootroot00000000000000Installation ============ .. _intro_installation: Getting Started --------------- This package is available from the `Python Package Index `_. If you have `pip `_ you should be able to do:: $ pip install petl You can also download manually, extract and run ``python setup.py install``. To verify the installation, the test suite can be run with `pytest `_, e.g.:: $ pip install pytest $ pytest -v petl :mod:`petl` has been tested with Python versions 2.7 and 3.4-3.6 under Linux and Windows operating systems. .. _intro_dependencies: Dependencies and extensions --------------------------- This package is written in pure Python and has no installation requirements other than the Python core modules. Some domain-specific and/or experimental extensions to :mod:`petl` are available from the petlx_ package. .. _petlx: http://petlx.readthedocs.org Some of the functions in this package require installation of third party packages. These packages are indicated in the relevant parts of the documentation for each file format. Also is possible to install some of dependencies when installing `petl` by specifying optional extra features, e.g.:: $ pip install petl['avro', 'interval', 'remote'] The available extra features are: db For using records from :ref:`Databases ` with `SQLAlchemy`. Note that is also required installing the package for the desired database. interval For using :ref:`Interval transformations ` with `intervaltree` avro For using :ref:`Avro files ` with `fastavro` pandas For using :ref:`DataFrames ` with `pandas` numpy For using :ref:`Arrays ` with `numpy` xls For using :ref:`Excel/LO files ` with `xlrd`/`xlwt` xlsx For using :ref:`Excel/LO files ` with `openpyxl` xpath For using :ref:`XPath expressions ` with `lxml` bcolz For using :ref:`Bcolz ctables ` with `bcolz` whoosh For using :ref:`Text indexes ` with `whoosh` hdf5 For using :ref:`HDF5 files ` with `PyTables`. Note that also are additional software to be installed. remote For reading and writing from :ref:`Remote Sources ` with `fsspec`. Note that `fsspec` also depends on other packages for providing support for each protocol as described in :class:`petl.io.remotes.RemoteSource`. petl-1.7.15/docs/intro.rst000066400000000000000000000311721457414240700154020ustar00rootroot00000000000000Introduction ============ .. _intro_design_goals: Design goals ------------ This package is designed primarily for convenience and ease of use, especially when working interactively with data that are unfamiliar, heterogeneous and/or of mixed quality. :mod:`petl` transformation pipelines make minimal use of system memory and can scale to millions of rows if speed is not a priority. However if you are working with very large datasets and/or performance-critical applications then other packages may be more suitable, e.g., see `pandas `_, `pytables `_, `bcolz `_ and `blaze `_. See also :doc:`related_work`. .. _intro_pipelines: ETL pipelines ------------- This package makes extensive use of lazy evaluation and iterators. This means, generally, that a pipeline will not actually be executed until data is requested. E.g., given a file at 'example.csv' in the current working directory:: >>> example_data = """foo,bar,baz ... a,1,3.4 ... b,2,7.4 ... c,6,2.2 ... d,9,8.1 ... """ >>> with open('example.csv', 'w') as f: ... f.write(example_data) ... ...the following code **does not** actually read the file or load any of its contents into memory:: >>> import petl as etl >>> table1 = etl.fromcsv('example.csv') Rather, `table1` is a **table container** (see :ref:`intro_conventions` below) which can be iterated over, extracting data from the underlying file on demand. Similarly, if one or more transformation functions are applied, e.g.:: >>> table2 = etl.convert(table1, 'foo', 'upper') >>> table3 = etl.convert(table2, 'bar', int) >>> table4 = etl.convert(table3, 'baz', float) >>> table5 = etl.addfield(table4, 'quux', lambda row: row.bar * row.baz) ...no actual transformation work will be done until data are requested from `table5` (or any of the other tables returned by the intermediate steps). So in effect, a 5 step pipeline has been set up, and rows will pass through the pipeline on demand, as they are pulled from the end of the pipeline via iteration. A call to a function like :func:`petl.util.vis.look`, or any of the functions which write data to a file or database (e.g., :func:`petl.io.csv.tocsv`, :func:`petl.io.text.totext`, :func:`petl.io.sqlite3.tosqlite3`, :func:`petl.io.db.todb`), will pull data through the pipeline and cause all of the transformation steps to be executed on the requested rows, e.g.:: >>> etl.look(table5) +-----+-----+-----+--------------------+ | foo | bar | baz | quux | +=====+=====+=====+====================+ | 'A' | 1 | 3.4 | 3.4 | +-----+-----+-----+--------------------+ | 'B' | 2 | 7.4 | 14.8 | +-----+-----+-----+--------------------+ | 'C' | 6 | 2.2 | 13.200000000000001 | +-----+-----+-----+--------------------+ | 'D' | 9 | 8.1 | 72.89999999999999 | +-----+-----+-----+--------------------+ ...although note that :func:`petl.util.vis.look` will by default only request the first 5 rows, and so the minimum amount of processing will be done to produce 5 rows. .. _intro_programming_styles: Functional and object-oriented programming styles ------------------------------------------------- The :mod:`petl` package supports both functional and object-oriented programming styles. For example, the example in the section on :ref:`intro_pipelines` above could also be written as:: >>> import petl as etl >>> table = ( ... etl ... .fromcsv('example.csv') ... .convert('foo', 'upper') ... .convert('bar', int) ... .convert('baz', float) ... .addfield('quux', lambda row: row.bar * row.baz) ... ) >>> table.look() +-----+-----+-----+--------------------+ | foo | bar | baz | quux | +=====+=====+=====+====================+ | 'A' | 1 | 3.4 | 3.4 | +-----+-----+-----+--------------------+ | 'B' | 2 | 7.4 | 14.8 | +-----+-----+-----+--------------------+ | 'C' | 6 | 2.2 | 13.200000000000001 | +-----+-----+-----+--------------------+ | 'D' | 9 | 8.1 | 72.89999999999999 | +-----+-----+-----+--------------------+ A ``wrap()`` function is also provided to use the object-oriented style with any valid table container object, e.g.:: >>> l = [['foo', 'bar'], ['a', 1], ['b', 2], ['c', 2]] >>> table = etl.wrap(l) >>> table.look() +-----+-----+ | foo | bar | +=====+=====+ | 'a' | 1 | +-----+-----+ | 'b' | 2 | +-----+-----+ | 'c' | 2 | +-----+-----+ .. _intro_interactive_use: Interactive use --------------- When using :mod:`petl` from within an interactive Python session, the default representation for table objects uses the :func:`petl.util.vis.look()` function, so a table object can be returned at the prompt to inspect it, e.g.:: >>> l = [['foo', 'bar'], ['a', 1], ['b', 2], ['c', 2]] >>> table = etl.wrap(l) >>> table +-----+-----+ | foo | bar | +=====+=====+ | 'a' | 1 | +-----+-----+ | 'b' | 2 | +-----+-----+ | 'c' | 2 | +-----+-----+ By default data values are rendered using the built-in :func:`repr` function. To see the string (:func:`str`) values instead, :func:`print` the table, e.g.: >>> print(table) +-----+-----+ | foo | bar | +=====+=====+ | a | 1 | +-----+-----+ | b | 2 | +-----+-----+ | c | 2 | +-----+-----+ .. _intro_ipython_notebook: IPython notebook integration ---------------------------- Table objects also implement ``_repr_html_()`` and so will be displayed as an HTML table if returned from a cell in an IPython notebook. The functions :func:`petl.util.vis.display` and :func:`petl.util.vis.displayall` also provide more control over rendering of tables within an IPython notebook. For examples of usage see the `repr_html notebook `_. .. _intro_executable: ``petl`` executable ------------------- Also included in the ``petl`` distribution is a script to execute simple transformation pipelines directly from the operating system shell. E.g.:: $ petl "dummytable().tocsv()" > example.csv $ cat example.csv | petl "fromcsv().cut('foo', 'baz').convert('baz', float).selectgt('baz', 0.5).head().data().totsv()" The ``petl`` script is extremely simple, it expects a single positional argument, which is evaluated as Python code but with all of the functions in the :mod:`petl` namespace imported. .. _intro_conventions: Conventions - table containers and table iterators -------------------------------------------------- This package defines the following convention for objects acting as containers of tabular data and supporting row-oriented iteration over the data. A **table container** (also referred to here as a **table**) is any object which satisfies the following: 1. implements the `__iter__` method 2. `__iter__` returns a **table iterator** (see below) 3. all table iterators returned by `__iter__` are independent, i.e., consuming items from one iterator will not affect any other iterators A **table iterator** is an iterator which satisfies the following: 4. each item returned by the iterator is a sequence (e.g., tuple or list) 5. the first item returned by the iterator is a **header row** comprising a sequence of **header values** 6. each subsequent item returned by the iterator is a **data row** comprising a sequence of **data values** 7. a **header value** is typically a string (`str`) but may be an object of any type as long as it implements `__str__` and is pickleable 8. a **data value** is any pickleable object So, for example, a list of lists is a valid table container:: >>> table = [['foo', 'bar'], ['a', 1], ['b', 2]] Note that an object returned by the :func:`csv.reader` function from the standard Python :mod:`csv` module is a table iterator and **not** a table container, because it can only be iterated over once. However, it is straightforward to define functions that support the table container convention and provide access to data from CSV or other types of file or data source, see e.g. the :func:`petl.io.csv.fromcsv` function. The main reason for requiring that table containers support independent table iterators (point 3) is that data from a table may need to be iterated over several times within the same program or interactive session. E.g., when using :mod:`petl` in an interactive session to build up a sequence of data transformation steps, the user might want to examine outputs from several intermediate steps, before all of the steps are defined and the transformation is executed in full. Note that this convention does not place any restrictions on the lengths of header and data rows. A table may contain a header row and/or data rows of varying lengths. .. _intro_extending: Extensions - integrating custom data sources -------------------------------------------- The :mod:`petl.io` module has functions for extracting data from a number of well-known data sources. However, it is also straightforward to write an extension that enables integration with other data sources. For an object to be usable as a :mod:`petl` table it has to implement the **table container** convention described above. Below is the source code for an :class:`ArrayView` class which allows integration of :mod:`petl` with numpy arrays. This class is included within the :mod:`petl.io.numpy` module but also provides an example of how other data sources might be integrated:: >>> import petl as etl >>> class ArrayView(etl.Table): ... def __init__(self, a): ... # assume that a is a numpy array ... self.a = a ... def __iter__(self): ... # yield the header row ... header = tuple(self.a.dtype.names) ... yield header ... # yield the data rows ... for row in self.a: ... yield tuple(row) ... Now this class enables the use of numpy arrays with :mod:`petl` functions, e.g.:: >>> import numpy as np >>> a = np.array([('apples', 1, 2.5), ... ('oranges', 3, 4.4), ... ('pears', 7, 0.1)], ... dtype='U8, i4,f4') >>> t1 = ArrayView(a) >>> t1 +-----------+----+-----------+ | f0 | f1 | f2 | +===========+====+===========+ | 'apples' | 1 | 2.5 | +-----------+----+-----------+ | 'oranges' | 3 | 4.4000001 | +-----------+----+-----------+ | 'pears' | 7 | 0.1 | +-----------+----+-----------+ >>> t2 = t1.cut('f0', 'f2').convert('f0', 'upper').addfield('f3', lambda row: row.f2 * 2) >>> t2 +-----------+-----------+---------------------+ | f0 | f2 | f3 | +===========+===========+=====================+ | 'APPLES' | 2.5 | 5.0 | +-----------+-----------+---------------------+ | 'ORANGES' | 4.4000001 | 8.8000001907348633 | +-----------+-----------+---------------------+ | 'PEARS' | 0.1 | 0.20000000298023224 | +-----------+-----------+---------------------+ If you develop an extension for a data source that you think would also be useful for others, please feel free to submit a PR to the `petl GitHub repository `_, or if it is a domain-specific data source, the `petlx GitHub repository `_. .. _intro_caching: Caching ------- This package tries to make efficient use of memory by using iterators and lazy evaluation where possible. However, some transformations cannot be done without building data structures, either in memory or on disk. An example is the :func:`petl.transform.sorts.sort` function, which will either sort a table entirely in memory, or will sort the table in memory in chunks, writing chunks to disk and performing a final merge sort on the chunks. Which strategy is used will depend on the arguments passed into the :func:`petl.transform.sorts.sort` function when it is called. In either case, the sorting can take some time, and if the sorted data will be used more than once, it is undesirable to start again from scratch each time. It is better to cache the sorted data, if possible, so it can be re-used. The :func:`petl.transform.sorts.sort` function, and all functions which use it internally, provide a `cache` keyword argument which can be used to turn on or off the caching of sorted data. There is also an explicit :func:`petl.util.materialise.cache` function, which can be used to cache in memory up to a configurable number of rows from any table. petl-1.7.15/docs/io.rst000066400000000000000000000310271457414240700146550ustar00rootroot00000000000000.. module:: petl.io .. _io_usage: Usage - reading/writing tables ============================== `petl` uses simple python functions for providing a rows and columns abstraction for reading and writing data from files, databases, and other sources. The main features that `petl` was designed are: - Pure python implementation based on `streams `, `iterators ` , and other python types. - Extensible approach, only requiring package dependencies when using their functionality. - Use a Dataframe/Table like paradigm similar of Pandas, R, and others - Lightweight alternative to develop and maintain compared to heavier, full-featured frameworks, like PySpark, PyArrow and other ETL tools. .. _io_overview: Brief Overview -------------- .. _io_extract: Extract (read) ^^^^^^^^^^^^^^ The "from..." functions extract a table from a file-like source or database. For everything except :func:`petl.io.db.fromdb` the ``source`` argument provides information about where to extract the underlying data from. If the ``source`` argument is ``None`` or a string it is interpreted as follows: * ``None`` - read from stdin * string starting with `http://`, `https://` or `ftp://` - read from URL * string ending with `.gz` or `.bgz` - read from file via gzip decompression * string ending with `.bz2` - read from file via bz2 decompression * any other string - read directly from file .. _io_extract_codec: Some helper classes are also available for reading from other types of file-like sources, e.g., reading data from a Zip file, a string or a subprocess, see the section on :ref:`io_helpers` below for more information. Be aware that loading data from stdin breaks the table container convention, because data can usually only be read once. If you are sure that data will only be read once in your script or interactive session then this may not be a problem, however note that some :mod:`petl` functions do access the underlying data source more than once and so will not work as expected with data from stdin. .. _io_load: Load (write) ^^^^^^^^^^^^ The "to..." functions load data from a table into a file-like source or database. For functions that accept a ``source`` argument, if the ``source`` argument is ``None`` or a string it is interpreted as follows: * ``None`` - write to stdout * string ending with `.gz` or `.bgz` - write to file via gzip decompression * string ending with `.bz2` - write to file via bz2 decompression * any other string - write directly to file .. _io_load_codec: Some helper classes are also available for writing to other types of file-like sources, e.g., writing to a Zip file or string buffer, see the section on :ref:`io_helpers` below for more information. .. _io_builtin_formats: Built-in File Formats --------------------- .. module:: petl.io.csv .. _io_csv: Python objects ^^^^^^^^^^^^^^ .. autofunction:: petl.io.base.fromcolumns Delimited files ^^^^^^^^^^^^^^^ .. autofunction:: petl.io.csv.fromcsv .. autofunction:: petl.io.csv.tocsv .. autofunction:: petl.io.csv.appendcsv .. autofunction:: petl.io.csv.teecsv .. autofunction:: petl.io.csv.fromtsv .. autofunction:: petl.io.csv.totsv .. autofunction:: petl.io.csv.appendtsv .. autofunction:: petl.io.csv.teetsv .. module:: petl.io.pickle .. _io_pickle: Pickle files ^^^^^^^^^^^^ .. autofunction:: petl.io.pickle.frompickle .. autofunction:: petl.io.pickle.topickle .. autofunction:: petl.io.pickle.appendpickle .. autofunction:: petl.io.pickle.teepickle .. module:: petl.io.text .. _io_text: Text files ^^^^^^^^^^ .. autofunction:: petl.io.text.fromtext .. autofunction:: petl.io.text.totext .. autofunction:: petl.io.text.appendtext .. autofunction:: petl.io.text.teetext .. module:: petl.io.xml .. _io_xml: XML files ^^^^^^^^^ .. autofunction:: petl.io.xml.fromxml .. autofunction:: petl.io.xml.toxml .. module:: petl.io.html .. _io_html: HTML files ^^^^^^^^^^ .. autofunction:: petl.io.html.tohtml .. autofunction:: petl.io.html.teehtml .. module:: petl.io.json .. _io_json: JSON files ^^^^^^^^^^ .. autofunction:: petl.io.json.fromjson .. autofunction:: petl.io.json.fromdicts .. autofunction:: petl.io.json.tojson .. autofunction:: petl.io.json.tojsonarrays .. module:: petl.io.streams .. _io_helpers: Python I/O streams ^^^^^^^^^^^^^^^^^^ The following classes are helpers for extract (``from...()``) and load (``to...()``) functions that use a file-like data source. An instance of any of the following classes can be used as the ``source`` argument to data extraction functions like :func:`petl.io.csv.fromcsv` etc., with the exception of :class:`petl.io.sources.StdoutSource` which is write-only. An instance of any of the following classes can also be used as the ``source`` argument to data loading functions like :func:`petl.io.csv.tocsv` etc., with the exception of :class:`petl.io.sources.StdinSource`, :class:`petl.io.sources.URLSource` and :class:`petl.io.sources.PopenSource` which are read-only. The behaviour of each source can usually be configured by passing arguments to the constructor, see the source code of the :mod:`petl.io.sources` module for full details. .. autoclass:: petl.io.sources.StdinSource .. autoclass:: petl.io.sources.StdoutSource .. autoclass:: petl.io.sources.MemorySource .. autoclass:: petl.io.sources.PopenSource .. module:: petl.io.register .. _io_register: Custom I/O streams ^^^^^^^^^^^^^^^^^^ For creating custom helpers for :ref:`remote I/O ` or `compression` use the following functions: .. autofunction:: petl.io.sources.register_reader .. autofunction:: petl.io.sources.register_writer .. autofunction:: petl.io.sources.get_reader .. autofunction:: petl.io.sources.get_writer See the source code of the classes in :mod:`petl.io.sources` module for more details. .. _io_extended_formats: Supported File Formats ---------------------- .. module:: petl.io.xls .. _io_xls: Excel .xls files (xlrd/xlwt) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. note:: The following functions require `xlrd `_ and `xlwt `_ to be installed, e.g.:: $ pip install xlrd xlwt-future .. autofunction:: petl.io.xls.fromxls .. autofunction:: petl.io.xls.toxls .. module:: petl.io.xlsx .. _io_xlsx: Excel .xlsx files (openpyxl) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. note:: The following functions require `openpyxl `_ to be installed, e.g.:: $ pip install openpyxl .. autofunction:: petl.io.xlsx.fromxlsx .. autofunction:: petl.io.xlsx.toxlsx .. autofunction:: petl.io.xlsx.appendxlsx .. module:: petl.io.numpy .. _io_numpy: Arrays (NumPy) ^^^^^^^^^^^^^^ .. note:: The following functions require `numpy `_ to be installed, e.g.:: $ pip install numpy .. autofunction:: petl.io.numpy.fromarray .. autofunction:: petl.io.numpy.toarray .. autofunction:: petl.io.numpy.torecarray .. autofunction:: petl.io.numpy.valuestoarray .. module:: petl.io.pandas .. _io_pandas: DataFrames (pandas) ^^^^^^^^^^^^^^^^^^^ .. note:: The following functions require `pandas `_ to be installed, e.g.:: $ pip install pandas .. autofunction:: petl.io.pandas.fromdataframe .. autofunction:: petl.io.pandas.todataframe .. module:: petl.io.pytables .. _io_pytables: HDF5 files (PyTables) ^^^^^^^^^^^^^^^^^^^^^ .. note:: The following functions require `PyTables `_ to be installed, e.g.:: $ # install HDF5 $ apt-get install libhdf5-7 libhdf5-dev $ # install other prerequisites $ pip install cython $ pip install numpy $ pip install numexpr $ # install PyTables $ pip install tables .. autofunction:: petl.io.pytables.fromhdf5 .. autofunction:: petl.io.pytables.fromhdf5sorted .. autofunction:: petl.io.pytables.tohdf5 .. autofunction:: petl.io.pytables.appendhdf5 .. module:: petl.io.bcolz .. _io_bcolz: Bcolz ctables ^^^^^^^^^^^^^ .. note:: The following functions require `bcolz `_ to be installed, e.g.:: $ pip install bcolz .. autofunction:: petl.io.bcolz.frombcolz .. autofunction:: petl.io.bcolz.tobcolz .. autofunction:: petl.io.bcolz.appendbcolz .. module:: petl.io.whoosh .. _io_whoosh: Text indexes (Whoosh) ^^^^^^^^^^^^^^^^^^^^^ .. note:: The following functions require `Whoosh `_ to be installed, e.g.:: $ pip install whoosh .. autofunction:: petl.io.whoosh.fromtextindex .. autofunction:: petl.io.whoosh.searchtextindex .. autofunction:: petl.io.whoosh.searchtextindexpage .. autofunction:: petl.io.whoosh.totextindex .. autofunction:: petl.io.whoosh.appendtextindex .. module:: petl.io.avro .. _io_avro: Avro files (fastavro) ^^^^^^^^^^^^^^^^^^^^^ .. note:: The following functions require `fastavro `_ to be installed, e.g.:: $ pip install fastavro .. autofunction:: petl.io.avro.fromavro .. autofunction:: petl.io.avro.toavro .. autofunction:: petl.io.avro.appendavro .. literalinclude:: ../petl/test/io/test_avro_schemas.py :name: logical_schema :language: python :caption: Avro schema for logical types :start-after: begin_logical_schema :end-before: end_logical_schema .. literalinclude:: ../petl/test/io/test_avro_schemas.py :name: nullable_schema :language: python :caption: Avro schema with nullable fields :start-after: begin_nullable_schema :end-before: end_nullable_schema .. literalinclude:: ../petl/test/io/test_avro_schemas.py :name: array_schema :language: python :caption: Avro schema with array values in fields :start-after: begin_array_schema :end-before: end_array_schema .. literalinclude:: ../petl/test/io/test_avro_schemas.py :name: complex_schema :language: python :caption: Example of recursive complex Avro schema :start-after: begin_complex_schema :end-before: end_complex_schema .. module:: petl.io.gsheet .. _io_gsheet: Google Sheets (gspread) ^^^^^^^^^^^^^^^^^^^^^^^ .. warning:: This is a experimental feature. API and behavior may change between releases with some possible breaking changes. .. note:: The following functions require `gspread `_ to be installed, e.g.:: $ pip install gspread .. autofunction:: petl.io.gsheet.fromgsheet .. autofunction:: petl.io.gsheet.togsheet .. autofunction:: petl.io.gsheet.appendgsheet .. module:: petl.io.db .. _io_db: Databases --------- .. note:: For reading and writing to databases, the following functions require `SQLAlchemy ` and the database specific driver to be installed along petl, e.g.:: $ pip install sqlalchemy $ pip install sqlite3 $ pip install pymysql .. autofunction:: petl.io.db.fromdb .. autofunction:: petl.io.db.todb .. autofunction:: petl.io.db.appenddb .. module:: petl.io.remote .. _io_remotes: Remote and Cloud Filesystems ---------------------------- The following classes are helpers for reading (``from...()``) and writing (``to...()``) functions transparently as a file-like source. There are no need to instantiate them. They are used in the mecanism described in :ref:`Extract ` and :ref:`Load `. It's possible to read and write just by prefixing the protocol (e.g: `s3://`) in the source path of the file. .. note:: For reading and writing to remote filesystems, the following functions requires `fsspec ` to be installed along petl package e.g.:: $ pip install fsspec The supported filesystems with their URI formats can be found in fsspec documentation: - `Built-in Implementations `__ - `Other Known Implementations `__ Remote sources ^^^^^^^^^^^^^^ .. autoclass:: petl.io.remotes.RemoteSource .. autoclass:: petl.io.remotes.SMBSource .. _io_deprecated: Deprecated I/O sources ^^^^^^^^^^^^^^^^^^^^^^ The following helpers are deprecated and will be removed in a future version. It's functionality was replaced by helpers in :ref:`Remote helpers `. .. autoclass:: petl.io.sources.FileSource .. autoclass:: petl.io.sources.GzipSource .. autoclass:: petl.io.sources.BZ2Source .. autoclass:: petl.io.sources.ZipSource .. autoclass:: petl.io.sources.URLSource petl-1.7.15/docs/make.bat000066400000000000000000000100061457414240700151130ustar00rootroot00000000000000@ECHO OFF REM Command file for Sphinx documentation if "%SPHINXBUILD%" == "" ( set SPHINXBUILD=sphinx-build ) set BUILDDIR=_build set ALLSPHINXOPTS=-d %BUILDDIR%/doctrees %SPHINXOPTS% . if NOT "%PAPER%" == "" ( set ALLSPHINXOPTS=-D latex_paper_size=%PAPER% %ALLSPHINXOPTS% ) if "%1" == "" goto help if "%1" == "help" ( :help echo.Please use `make ^` where ^ is one of echo. html to make standalone HTML files echo. dirhtml to make HTML files named index.html in directories echo. singlehtml to make a single large HTML file echo. pickle to make pickle files echo. json to make JSON files echo. htmlhelp to make HTML files and a HTML help project echo. qthelp to make HTML files and a qthelp project echo. devhelp to make HTML files and a Devhelp project echo. epub to make an epub echo. latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter echo. text to make text files echo. man to make manual pages echo. changes to make an overview over all changed/added/deprecated items echo. linkcheck to check all external links for integrity echo. doctest to run all doctests embedded in the documentation if enabled goto end ) if "%1" == "clean" ( for /d %%i in (%BUILDDIR%\*) do rmdir /q /s %%i del /q /s %BUILDDIR%\* goto end ) if "%1" == "html" ( %SPHINXBUILD% -b html %ALLSPHINXOPTS% %BUILDDIR%/html echo. echo.Build finished. The HTML pages are in %BUILDDIR%/html. goto end ) if "%1" == "dirhtml" ( %SPHINXBUILD% -b dirhtml %ALLSPHINXOPTS% %BUILDDIR%/dirhtml echo. echo.Build finished. The HTML pages are in %BUILDDIR%/dirhtml. goto end ) if "%1" == "singlehtml" ( %SPHINXBUILD% -b singlehtml %ALLSPHINXOPTS% %BUILDDIR%/singlehtml echo. echo.Build finished. The HTML pages are in %BUILDDIR%/singlehtml. goto end ) if "%1" == "pickle" ( %SPHINXBUILD% -b pickle %ALLSPHINXOPTS% %BUILDDIR%/pickle echo. echo.Build finished; now you can process the pickle files. goto end ) if "%1" == "json" ( %SPHINXBUILD% -b json %ALLSPHINXOPTS% %BUILDDIR%/json echo. echo.Build finished; now you can process the JSON files. goto end ) if "%1" == "htmlhelp" ( %SPHINXBUILD% -b htmlhelp %ALLSPHINXOPTS% %BUILDDIR%/htmlhelp echo. echo.Build finished; now you can run HTML Help Workshop with the ^ .hhp project file in %BUILDDIR%/htmlhelp. goto end ) if "%1" == "qthelp" ( %SPHINXBUILD% -b qthelp %ALLSPHINXOPTS% %BUILDDIR%/qthelp echo. echo.Build finished; now you can run "qcollectiongenerator" with the ^ .qhcp project file in %BUILDDIR%/qthelp, like this: echo.^> qcollectiongenerator %BUILDDIR%\qthelp\petl.qhcp echo.To view the help file: echo.^> assistant -collectionFile %BUILDDIR%\qthelp\petl.ghc goto end ) if "%1" == "devhelp" ( %SPHINXBUILD% -b devhelp %ALLSPHINXOPTS% %BUILDDIR%/devhelp echo. echo.Build finished. goto end ) if "%1" == "epub" ( %SPHINXBUILD% -b epub %ALLSPHINXOPTS% %BUILDDIR%/epub echo. echo.Build finished. The epub file is in %BUILDDIR%/epub. goto end ) if "%1" == "latex" ( %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex echo. echo.Build finished; the LaTeX files are in %BUILDDIR%/latex. goto end ) if "%1" == "text" ( %SPHINXBUILD% -b text %ALLSPHINXOPTS% %BUILDDIR%/text echo. echo.Build finished. The text files are in %BUILDDIR%/text. goto end ) if "%1" == "man" ( %SPHINXBUILD% -b man %ALLSPHINXOPTS% %BUILDDIR%/man echo. echo.Build finished. The manual pages are in %BUILDDIR%/man. goto end ) if "%1" == "changes" ( %SPHINXBUILD% -b changes %ALLSPHINXOPTS% %BUILDDIR%/changes echo. echo.The overview file is in %BUILDDIR%/changes. goto end ) if "%1" == "linkcheck" ( %SPHINXBUILD% -b linkcheck %ALLSPHINXOPTS% %BUILDDIR%/linkcheck echo. echo.Link check complete; look for any errors in the above output ^ or in %BUILDDIR%/linkcheck/output.txt. goto end ) if "%1" == "doctest" ( %SPHINXBUILD% -b doctest %ALLSPHINXOPTS% %BUILDDIR%/doctest echo. echo.Testing of doctests in the sources finished, look at the ^ results in %BUILDDIR%/doctest/output.txt. goto end ) :end petl-1.7.15/docs/petl-architecture.png000066400000000000000000004560161457414240700176570ustar00rootroot00000000000000PNG  IHDRg zTXtRaw profile type exifxU 0ߙ#?3N%R6ű(݁ ڿAZtOw1Q4ÃoR^ Fx%ObVsC?- iTXtXML:com.adobe.xmp _wsBITO IDATxy\Tg X)FE45rm[ҫR26oTz1M⎠32l3G_< 0tf9"}DZtX" +" +" +" +" +" +" +" + ?=&j8 Оi;cbVe2vKXX{}0p`Ю>rtРAΝ#<ҨY%%%=袢b"dE ׮DDD>::==8!}#M7tů^C‰(>>׷ۦ[q+UK.=1y ݊%אp\4? )w7dEs D)\>KFEoJxb\DD,@dEPRR ={:kM:%}x#WpxoK.+U8Ȋ` d/0{Bkȗ]NWO M7nsOpT\WTwoךh4""jdE0y䑨)z򮃟mb?0&&YکBEtt HNNf)E&F|ll_/pٴy_x"O)QnySgq"]_\4_hM j/Ęr+W󼙶&51)?::""\P+##Ȟc>3g!!!n8,oHHθ9Z1"{F |kɱOiZ!DDNJNN.='xiC=UIYZJ'f7e2"$\*`]d"*8z%mI{S Oܰl唤X7u˫vA,:$+@RiHHHrr*ا1Tz1CwENG "wnK9WD4yr萰f+۫q?՛&ES\.@VÚ񲳳lJ#a4Wc^UjY5<<…" 5`[XXXRRn.8lp[^?Uy7B}<&xyoN>WtTXWFeROO&&os5v^vÉW.q1tTXWFD>1{6egg78yyjЄ4WC"{~w`< XI.8<cM ~^ۚhLP$"߈(bd\Ёa]7m59ih1\CyJD[yO= EFL8N.(T6#&1 3.NӐ'[ߡ =)MW[QYp2L} +jSDѝYMŚL aܶjb D=lQS* = WtlAƉD(CJR eM|d8ju[ʹ>۔)SpI@uE0kkFC'~CD?Ư|nl%bȞ?3kEp1"@-֤R &[Y9 jPXGyJrLwIӧ&x9j2ߴ\~C|_U4u#iӦJKuEh3fҝ-^ZE7%ewE OGsL?mv/}w{+OLFDSY@V4e*R.¹lК،yԠ>p%"]"h UyZ>cM ~^SjD6孉uIaT@VCղV,CŚ쌋~<$tyՎ>݊E(*y\O$V/CՕz*]a&a8$$į/OV ՀcNM >duG#Jv!?O P#"'"@&=7%==,}͊Dt%8?z͏>Lh^A>6< j PSCjqI]ם|s}a?՛ ezˌUYaaa N=X+@u !jCROJPQ6kVE"9ND#qYyOÿSLFDC" +45FȊ #Ӗ+@4% g'X _.CdELxgZ:p*+@7qG{n@V0[xx!ߠoD4m k@V0Ve|oy"#&\"D"QĄ9kݱI0*~ܸljG:[cS* +4kc%:V6,2bqr#A // +4kHe2"""qrYXS_)C-=T:zN. +4SX؎W:sLY@Vh^x:JjD"4e*N+ +Ȕӝ9fU<(Z$t(OyU%~##iӦ"@+` ~l]|PF!+"@iZgelԠ>pNYD"AAf] P'=7'YZ k32l"z^XQh +i*!!!~}q*hR^PUjuBQTʫZ- "@  22ԚDqjY9=Ұh4ꢢbS&MȊ`"#&2)Stȃa(Em^tjVYZp2&Oyv֚خN C#"X aw[pRJK (I@V/,l,q<ϫ.×7qb=&ZP$Y:_zu#l.Sr9#c 'o|=.垻m"8JjM4^W(V>tw2oVۢ垻![Q 8ǟ e%as5H$Dֽq'Xwb_IɱO N&<#kGUQhĬg̘1cƌM.V-ɗ,~k"8KO㟡Zi*m_vv6k/Z8ϬaqqKT?x/>>^eΜ9 \,6+qs.E":~ѯT:%#cSbY9xgfw6K6dD4o|"ںe ;w-xY&~=6X`PȊF)HY""nqȩ[k=Ĭf kHV̊ejr' fMyE%R~ƉSgVXR}XXغw6uJDro|kiRRoLlm휝66g MW_:ۖ ׄJjl8qYQ(:[Ϛyhee*八WJL<{oH}7yEPGX_sĉD"@V/k "&׉DDꋔzghjKYx-R.Im4+cۜrx2JEV[v7}iyE垄>ǽ|K/βHD՚Ow%~k>WM̝]\Q 2LH1:olv}x(s7}zm[u[#6ns[ 8xubYj + P{$.meT62b_3JR V+R 7""۾d׭;8+QmDkoY-1a=YW/]frr2;n:Lws̟ǚ3MȊ#S7jrC=>"3K֦AUxG}~'655ү(|(EKwL&[lٞ={\YQhMdYs_jfڽV0a4ٺCp7~'lY5k7===LR&.ꔘu(4((Lqw^jQKڣX坦?ӭ/SoΤ4k\ ݬ[t:o5טݽOpyb׭+ {(6x{mʕ1O>#LAՊuZjÕ׾F]K!>(ݓ /u̥\"Ҹ aD_̭9}٨ú:wd)ZvƍA:6!2bYIkbn^na4B`C`;W#$$d{nkkk抒TdEE ؤT4Mw `,y.l/;wbVWq69Y2s 7lÊ%oM,ԭ՛aLk(4ju0W̙%bw1‰} Y>P$S{V*T26kФvTb2.Z$E |[{)wËt3qΖ6HI+xiQJy[h]gfΜlJyxKQul"Ot޼gB#";2XDNbNEUՖ糍Uj4jғ5g˭NSEc[tlm՚xU+W]VڻgКX7.ʺb;˙dV:r0 @Vhe.N49ݸu'QFms}i[n{H!EqY,j4j/?Kv'\'յn@7USswqcC[t^&>2ЬXw$F30 vZMSM]oꋵN0y̥>Mk>j{7 ܯʪc)ky{{e6qpkɺc V^Ŷ,JcbֆO@#"@;' *V[7Ѿ UT,}OV DhI{ZV{,%Ҹ (' :ˏR8.2f҂z~bmU*u-mڪ5rM>yOtӧEQf+6[{_ݰa=ۦzzzY>*Ek $mCTny^k❔hɊBֽJkwpAiXwOt^Ƕ/uT \5w̥WM&&&ޝ1gΜeo-s\C1 0%xwq/9k6ښ0rN\$G6# ZAk<6FMD6֧e֋fWͩДkyn2|XOkBxO[HK 8yFDf 5C>Voʕ1O>3JRYYj3^b8>tZ]B| ]~#j ۖZۚaۚhIڰ5m;M`*nDŽ SZ $늺T<&UL! )/ y:{Vd~]v㯇ZO=OHHHܺc#lkk抒T@Vj"YB={ΰ9 ' IDATAVIDU))u-[l VoOJꁖۚh1LqqILM{FCIoPP]S5IvctX^m<͜9sbm41"ܛrklMecc[nzc M9mHR5HlH)2}k"}+aYkbTԬUkVH\07)K~?_PNeqV檗Qow^[\l㸷={j +FFeMhL6aB3d튨npm u4uDMm՚xԙ^27o׭xYQ7Uľ>a^yDdLd[flEs5X41"4 EE{*?D\K7*sIDab D52b[+x{m,Y/&M_W+~;02%Jv@?ϗ6<}}%!=ۨ w#oN5,,luڀM|.Ȋp/6IBOoZnN؇A}I}_Vu_`;kSjNǿBOOآӶ;5&Ήkkcc| gHDۓ/2U2SD.6^1;l޼i-_pCĜ 'E@VVL08Q[; j8#KUۢfD P^QYVZb/751<<ݍ? +^VTk}}}lpEEY**}Dlq릮+}{c[KS*]tԩ`j[euƦ /r{\'j\sI7+tѨmwS5UVXlkb[Нa=4!ƘEw9])qSE3vf(TU}}xbϗ $ƸȞB#5q]Q&/Y5p}ݡL9;ȊZ"bzО7ut$qHck\wb9T52u[&八Wbll(o-~{ o徳 UK9~VC_N"{Mؾ45Jxg:6W#<CD!Uk?m,̙3g[ـ歼n STyAQ"x~2'Su)2%.rM-ywYQ\u{!"^{jPgW"!tg;?vn|3yOϤcr{%Ԇ՟-: E/6S"ۇq06AM-45*0D錜¸Ȟ#x VYmE= +<]7vw-?OxȈ &>ȊlKU۾,iZQljpPQ۔ѫ?jZW&Zii /Z"99%+cyIaJ@ЪME7v҅w1O;$)**^c.üy#mMoOM}#l~}:똘j"Y]M7b NSd"({iwzs(6]t5DVdLvVVV-)9rjl:sh+E[SS\\RS~]0.n rrr8[|wׇVkee>~QOݏ£_:|+y&s7sˊwIOK+']Wvq.gN66i wjyUug_7zWN}H$ WX[[eee}gOaas\'ٚjڢ`JTeLMK.k:OUM )O)OQ68 &Eٻh~n+i+{?X@ڔ_FKv'̜9ɓDwcƌbn3&BuչriҪ.T}!O|]G=6_<ϧs쌉^VW==<y9՛F9y/}g9x S7MRdemkco`P p;Ben6>=}-X5baUkⱔTIlPtBSڞ`9HPoSV󏫬ű-mO25յ7ǘm 17DbM*OS)'N.;ソia&>hbdEV;BETRU.|KDT'Z")t{i4dѨMTۚyxyyQ0ZU7+ ۜ_ns3lKz_ޔfv-؊u?x';mrmm\Q ȊpO2I{n 륻x$eHK%t&W4XG[è׬Yco@hQK6A򃺡.|+,F?j+*8v=MPS.(ۍ t ѻފ"3a }Jќ̱7nІYqs, +XD8!jlvJ/PWarO4lj/-Rݠcj&WcZTj'_ ˊ>WpX1dPn][z ͺVzۜ-!JDjoylqXʫN~-3Jv킺eJMFnY͸O8}:M5ebgE"bLbdED"ҙxQOf[ۦYQQ|!& GMaȊǎp8%ϧ9;!qsmHzꙙX^Q/Y Qd?-d**| ;9ra|Aiޜ ff43$;V^^E柍 oY\ @Vz>E ^][VmښhVd;ϯN^y%݂}GԽ1_s( C).V|"eCzܺ_Q[T{gϧǛ޲" ;qhbdE77oϻ-6v.Gk"D#"Uj4/_)ȻdžΖ6DT^Qű]m+*2 ;vs:x:* Dvij֤e{n:}fE!1ެ6;e ĺIMa4;+2JtREeyȑ^͎LuR޸HDq=Ybd%ưz]v,yъSqouŋ3S_wmYȊPen C=К52'ĦfEFQ8}:F9%+ocY\xK;WXcL?WTIn~ӨȞ%֭+)K8tHhMyVdȊ:Кw4KUkb"yʵk{=A%Bb4Huv$q`h?̀b":{ChѩGy AI7hOpnkfEM-HOwiM̵D"*U5%Y*5>wuE k?̛Ơ{a˫HqI[[;77W"Dޚ>a4/+؟o3@aF6F+Dt40xA/Rsɿd5ښ"&F@VۚHOOaEdˉe%͊2|uur%;%ϧWMMZTτ[ꪬѡ %O\vv1i"55CfE\ 09vMK?}U[tzDH>):&?vĎZSVZ«ȊDKhM{qcm>>>>yyGɸwP555NNڣ^:;kUƦWϞ^ޗ.\ڷkؤEEȞÖO/zrfTGG{75rTxuy%YkM|E؃с[ ̾q3J`)3KR T^Q)JN7x"EzZZzZXw)@N3^To!n3&6tEqIJrJqI1Zǚo9;ȊM<Dw[-htĆtȤk׮ gRf1wIȑWsr0uEwh(ֻhdˢSUVU?zu6Z]Pa +t(5GY}'ӚhJy&<;;qNYۖWTfdfQFfvJ7w}> jbCynbw1< 7W7PP.(IdE3̥cj[@k_~>^ Z3DuR$66uvj4Hu~?耢V`)QhM܅֬?HUzddf^ңghsWvAPע6QQ`s5FRK؄%eJ).)"q| ݺbuĩ3W._Fk&F@V0\$*:.]Dv4**JK,'%5ko"v=u`/;~;e}7B}_?`xJVvҚhLt0W7eBk"Y< c2]`ccջ"UA "݆֫Q& F/98:kTT5ިxE6WhZxf^cD~Z,f@# +#ZZc͇hMR&6I4uDxw:q 8q" d=E^(Rz{{D"vKZTTl`ɑ510~ESH)' nD Y`\Q ȊmG.nmMjZ pᲧQ o7:qwt7+u|#V)h4~:n4%z :_YWs 44xxg^wπ+Z8::D+EB# + e.iMj-`5xU9םRiΤW>|ee޽ٜk -]"?a~q'G :Ѹhoc3xJOOKKOK{lkK薢(%9&Z#kb,`\T5QՊ,r-Q5ɿj(S*˔NN?PzљTTۭv קm꧝?@qq-{lzĨ7WѡzRk4Ǐ^v˧Ko=fs5CVE5G,ǂbM?kEUʫrV,S={^&i4j/^Jkko:w"qԀG{v~}^ջX,\8 ':Jogϝ˄,:}>EDgϝ۷ҒshH奼 ta])QTt\.+: ==ʩV%&ZZ(41._PMm~Xߧ)x`F58#3{#q]8YSX+ZQQw ÂJcC s5.]$ yҚ~Nq{Khb]"Z-`X;I_~w8cѸ>D|#>`p7?ՑS ݻ'dCTUV|v]]Bٳ]8K/qQQ1MiiFjM>E'Ůޝ ,V98ddfWC>s괝uNbq) )#3@&gK2;u0 УGC_ډ~ޑ{KP̾snT{Sʺz}ƌ3fȺzݔ_ZP$X(4w""YjѩƶYfRy^Gr)\ (c',)S9(P`6G9͛yzݾU}Xy+tLu۳݃+J &VUVyzv֔])wܓn.\@@נ()iɩ/EIim[|X\\q…?$_ֽ$Xb JTչYYYYYYw}T*=4ddzjb#GC-YY` ,V񡣃ܹs?5<ԓCUgϞT5'<<$JYT) dE󰒿ձJDD~>^D*+6!q'92No`gˉ́߾⌤$" 8~ĢE (]鮄Q'yϛ?G=2] &>*^(B#"хK$v[^yHPP7n<1 *4ͼKYksrkk oz]l IDATu^ƧѨDt:Vvֶl\Ǎ[7ӊ;'>3y vaΌ1b劷2YXXiVZriaaarl劷FqSjMYiIaBQ@džmGuww+U*KNa 퓔11_;'$$d5>> =BWY|%GhZ$[qt/ 1֝إ-Enbm--vyy[|NDRҥˣlKU661Q{K/Ξ?#V^˔Q(sssEI*@ZBJR[1DFL8qs)qO<16"'9Uj{R^u% _Y9D_m D"EG: m~ǎgϞqܜ9sNAWTnڴ98[;))) ""ݏ IIIY;%&}dMM\VWR^o}ȊFqĞU*1{Ѣ{,<<#M}e2'ٳYjNCȿjΤ_?,Qa4}Z[ ;c5IQC>q磢f}&}+/[[XYYEkk9YyЇB*1@LED/G<'NYb+ Z&WϞ~6'g~|޽ KOkgV7ZYYM~nwoCo^cVr9JRbis5Zqcll쯿f\zUUTdÝNeoZ6jNmO~ř{y>$$G}]Wg]Bl;xא#bccy8cK,snfS74͓555NNK,a<Ɔq&~lDn!%LWuztϾ۟jjt^^~SGYTkj" P;gu.]|z}5{űrϿ[J\̺z}ĉgOnΝ;ΚaKQ_;wٳO8K|lFQQ1#"51Z"ѶBb9b0#hFcNûzw>p,_YDwV ;wqԩwO 79l?4XʓO>nWBIj͚5#MNŃ֛Q-X =Jy4t5kJL:M 4Am%?|Z]nϞ=Dq܋/4m,{cj};woacO=o;q<|j͍2cMjFs<Qxxڵq'6ޝurfΜlJ+ݻiu{~`pT薢HfܼVA9w(Z%Mu祧T:7:g'X'g'ݾD'gD")=ZEY k7i_z*nx71Z[pb1X %M(WKE_%~'Mq7 ۼ'dȘ_1jm^s4ѵK,zhV늺k6}kMȊȊ`,KkblFirkyYe խ*v%6.7B" [rU^a+*JJ?OpsV!-]|(2BX7x]QϿػazݵPo9TdEdEhV CNLh?WEfոYR9| 3g\k{v ; ͙3g[:|]Q}g}||\+c|qU41 + +Bt&ƬLq-3 3JPP5zl+lab^^-)DŽɺpҪ+L~cB띚ؤuE22ɔ"*(`rq`:}gηƬ+CW-ֽݯoa41 + +BhڊʪҒj.Ipĺ+}jFwz̙3-_lP41 + +BcGjb\VYhд!_htkܼ ^4ej'ݘj G w'~{+*K~?_ѾZ^:m4_Q\6q[WC.(I@V0㬨ѨQ.eJLERSPZazc568ufeB!ʕ1?x?ڶ2}eK[˱% 0ݽ.+(۹HD.1F_q0/eF/EHݹm5c,+j4 ?xg^x!66V*8LǞqO4풫Ҿ}_ 3.\lL4_1//OX;1LBkcV =bbNhtLZyO>pg>hh#( @ƭ+-%oNh| [Y?pkl8[~sӌ)6WL5?]o˭ =+y5Mǽ g6h̺wqM]a0Q(V݂Z@q+ں*"d($@~=ay{on sΘ9sc-;o#/@]dMT =u E*w%VUSPUw@@\\--mcc#RH H W+&&&-ZE4-.sԯ䄵FDHĄ_N>O]j_4FkūW/@Z7 5Caí-NQKTmf]"$ی:Vh٠bMݘz-4iB1'FhE@hgӿIJf3w'qdR. uFmZIRJCLt%NъЪ(*{W\&/ZjpJYt1 oA-V5)քkhȊ0|/H})//_KmA  *ꄄIOdP!~?(8?u4x.Va#vIkPfYYٶ;Z?p8;vsTi8RKR_ͩbyzFbNiCR>~a:ixCThu3燶UUrCrP 12mpj?t 5 ʺp\G\^^^F. mB(.NJ*AmM~?׮.) rr*KX^^RӔ*DmN)FNJxץXڝ mF+ ML|0AOOИ5e@~~~L1F H -LY Xpp2Yc[VH'MB<<< v%KƮYqxIC:xncf̚NF0m[W=ho޾`ŏtu񆱌@ea4^iErxfpŨ'YLm{K3Phk=3Z^1//?͊ EW\ڃVRH H .ׯRL hQQaa!ؘσ/ӵ RG .GDtX,FCBCH"Q)FەVDKrs>䙡B).JVP[cJEֳ*mU3̨Ʀ&TÌ+[^Ytja-loZA|5 ey!k+*N>wyTjmv9㉓)**ޱc;r/_bjJ 삶F;ԊJiϟgff п_a5 "Bf?BEsCjvgROUUtCS+"@"@ \^™?~]am4[RW>[ml?_ :1 c:Dc+~8EuiE Jñ"49|-(ZS\WRh5*2N3K$3NƱfLRcjhh$ߎkk+F's۞"Ј$tqr h40i g@Y^-,,=rɩϫ=L uvvFBzGj_ScDSA HL@NwtاO︛7/^&Q}Pk Y&FwQS}|1S ݌O3VdQ_1otw&԰#onѵ_߾5bD r""@ 4#o4Z+oRR]ވ|D}ؐ_036{Ml?T",-6Hb"HEvT[;;'hqETGvrue`ncaE-_TMo !v,Z4"(oߎ4V\C  54J)n ߀:٘LA 6y}3'?~/ԴSLFuSgclr[j6K#ъE+"&2t 4^{מUB 8Ÿ1㱠i $%!? Ď@ Z@ Z TolH;K<:]K.|w| 3ۦM? p%8%`0v܋I-pXf\?2RQSVloZQ[WWG[g)i'#G"_ ŕ *v1[S] 6Ԟ7KSc7{:Q~hE @hQtu 7O퇱4'Z[ug6m2+~X5[`0N4\"pt\\ߎ3z˖75Z&&-AhMtz[ @A왳]?vaO -%e3G oyuI>TH4XX!oz0*q.TeXA}8Vܭba1rM|*yiP,8‡ +B+P(WjHxP(ПO! IDAT-[;1bD\\^^^6l 9OXԈTWlsVr&PUj(*b׷G hЛ ^~@M&-"ᅨܕ562nDi⇠WޣAN@ 4X(dgl߶mbbb<<<ǻ_UD=+_wk<էjZrvc\QD. h4f`X&ZZ!3O\X*7Wgή./2w^^>~Th.۝cY#y}PSv$E]s H\XXX!SA Z@ FO/ھm1~)/d5~gx f}3rٰ!$==yG\]Ubb;J"rZbhj~@Mߗ,%fՔ:t>}zݼyҲz`zR͒B"QEӧߎ4HiI:ъ@xèQГ!Ne{#;8p=g.ޤD,ڲuG^~AS0Xsot i}M:c?wW?)77oԨqqq te-!smMMYFE1Ǝt3%-=77r_X[u߿igSu6gxÇ[Xt9vĭ;Pj㊪E#9TGJ++coĝ=sKz4*@PTTTL#=+BQ( %r8ggg[ЂFG?pВ  W bJ&IwIر=44WLMYµɭ-b$mNm=9mRS~bWKWCZ7K *xa1ˬszZ!E  ?ݵ ,p\F-]{^y9~8r6ze9o|5ӿUȌ(Рc}/ @.Y"JKE2Ye{֊#CQO21?Et9URYbX,E" ޥy#6ɍyy._e-Kp"hj҉dB#h'2u. f0 ._1:l &a|,-+baqaŢ7_+rBRH MԔ `LFqTE033.}ةڣ@Xr#~-WL}~#m뇡!d+5hE@ Cp811bbbz|WbR'P7"@R11_}?& Y"Oǵ >[zd<~l .\&]ֶZZ,A'Ɓll7n WZA,/\ЫW︛MLе=+%gR ]GpH,n +iPU%0 iY߯q;[vXZVݻsTut 8!sEЊqż8!¢ he$"1/ ] D+‡`ffVJJۧxyQQqy_=`eތ`v{qj?Ԥ[[ug}DLL j=u]]p"6t؅ 1+/uwիWoIUFk*FMM:ebbb|# <ПY=G 3p𧬴L1_  _L\ӹحkkj2Yٿ[[SȀ qq2yWD7K8=8@EW@"@ ԔǗwO>;oGKoP[4\Fʷ\B)pj?-cՓ))ɨYNsLZ=]W.ajz/:uNx:y=-6662 7zԧ cvFVa!U_? f@*[[v%<3s=˹.%(((PXXtH#OZ.xރcNH۪4LV)E! ݼyS8x;v9ޤB"t埇EJ&S۲ǟ|2qtLZ_L|w7;t-ӽ)wB:۱Zf`2ttZάyIkmX. b|||ֲX&bH67ypt?t(@ (jtr2cSǙe{LLMY1ʡת{< f郪;^OOΎd7$"D$604"jB+P!oXoee=MPiNIK߰.ѣJyÙCF[ ^j'UU;Zfٳ I-Dr^AF3r-tX]+2fTk"uE aoO$blݱpzzf$@ |X(ٻwV-?t۶egex1dȐ̜THtuiq',_Ak}z_|k"t*O>w6y/{h|Ha6 @rP )**5 [~%od'}CBMJj箰6bFAV0d0^p8X6\%4sPk҈BAjpԒE3mڴCy֜:Z{w\%*</ak~-cDP@9Jl>_#YgSgcydM2y[8eȏtu -:yjkg8:_PU%svXX+C+b466") 111޴y z~]XTT:y%bQPВvhp`0|}]Fh4LZ3~> W TH(x{Pl+ L333--m7}oܸ\ptFFFhhv= t$9Ξbccu dRNh򕗮fQdE(-9HJma!?%-]KOS\B1prPXcVI*h.?yy+mO[}}5߫GwKgWH|#;"(|݁\%v~K7pv#H :>ߟG/5%쌬̬FFFs{ʹgf|Eb>_P!SAhH*@ #*$~ݼ)򼼼I}g>4\.w.7վ.#n gP>?7..%↙{-h~9Zf}\[5#T]*yB{wd(,,CC33s{0A,h Zt ɫц\rrW0s Vu޽͔xŰp8+V8:l>uul9\6d͞n>ZFvuq^[L}hR (f޿P"t>ݻmfoXi̘˛"ku*XڦU+; H : UDJ+3+{ÇПflvHj_,hA?UUrmJY>*ΝT"xxxnݶM3&TgΏܝ\G՗vmJS{._9Z4$)Nt M-mM-+d-V?s'vIܮOLLyQ;vl.,>>>vLkB"=yft͟0bhŢƚ{ڌGf:i4YZ^!"yќ3ٳ[g+1 KE[Dp* H Z-^Iq7o&'?J$3dohY_ q+)222ԏU]aypppHHX ",݇2km(u붝xxx?c񉈈hb3jBU+qԩSg5ig.Sgc.?(| AZ1 H|'m>r_qի\O{DXεT.?O^yxx^̴Eo hjƟy7'֌7swaC#A Z@ ׮Θ;m߹WO˚)+|9My됐P-'A}}\qvӟpbu 0o [YP|gT3 Ys+l֩Yrʼ%2ctdoom?|iH2kЅ x&`ߓ3chHKj+^zr[筛=xso[pknr ts8;v;9hI722 ) CG.عwel6{Ϟ}(#~a.Ήb03ft:BZ^~a4ro);֢뀥CPY'24RAA';`0%E*_'q`iû~#fX&yky}7ɓE޽ NjJ>ϟ?B*YTz:!Ҵα'CBVa*8mvտ߭} "ҊQ X,ħ_gtv 䗯T$g"c_2yBV]bgͺG*{#nQ$=<< 0o$E  Ԙ@@`jHsXgzJ Uwb0 ѿ>y5&7*:zr~oyRrbDUU{'%%NתH+Rk'N߿/;~Ӄc26wK.YXXHMԴT+B"ݺu/?%`433V=;=:e߯ގ.n]:~;V:8nH'''TT)( }Q66I1Fm4||t j#TU{ IDAT%j0rd1V$Aŋ7P_/ ^ؾmVt~175W7>gΞɓgp0i4HnjcPJjo fvvޛB(.64|SgLz<+?vep4tOX tOF*<LyCy+PB'NjhmO윗]04`|6O|ĴMۏyiE6D18!437b?&ː-Rϛr':oZs)+C uX&t-:Z7&E ;>>>{E3r짟X`2[/ O}F.CQ{޴+x}lڼe!{vTfG%JAE%Op+j֧wW@"@ D,0psggg[xtr(KjuB%%`H1qdY9#?; ﶠoG٣G+[)W& 6bٔ_#>u R})ɩaI~+lp8)R-=N Rg/,I36nm!@5x+V7HuX2| y:ӦD,B>}T|fAx*ٕWN?J xKh'=9~_^Q^b;ъJ~K>[֮f]Gm_x%WT_S#AA [lJ Z@ >h ۻkRLo:]77M~P|5}#hΟ"'N2r 1)i#A%bѱ?S3T"I@|5F3MM l㋈&ijҋlvs‰5v]D2{u8[1윗`nFNѫ+w0s=]N Ytю(χpw`ma`ڸk~kE\~MdيַlK?ڒNUKDee{A-pc i]&)) ъ $% RΡnڼEB"Ep BWKGssƸFB12i\XC0mڴVWJĢrqɟH#3+{y'NeRT%u?ԜFǎDM#QWIh!tUo_6hv7α*m |`g遒; K}Bm~7,s.y4u.QWi]k-yIbEȘ1[3"Fъ@.@" e-X0:b(֭۽{׌3z*\ CnĤOG,,(@!\VM֍rZ o+T-]f*$c.)w yTk ٵk'E)J5u1FMMzU-Yޣeo׮s baaa۾GVE!PmM?qEP;Hsao긹0o3C D+፮pvvFZۦ@X5[jԩSKFoٲGZ)"F,?fϙ۫B}ަ n Y]e 1,^ᅬ7o"P6KcXPJŻ'5U([ _>Ex֌.⾎Yݺvn4WJjD={?}5j̔f:;ffet\Q\.>+M@/蠱}ah돛p%-.B4FIm[b_:դ@"@ H+lju{M oP[k1"..`oF// 6H2W]1ݛje*\I1.]b 2|Q*uIe1&oJJB(.6TAgbbmO[]mCu|\] g}F5,PD&zba۷dBTж-ANn>H0c|}Ǐ);D[P(RghsHUYV3VqEmWZ_"Fъ@hmETu+NypppHHu}jkS_@c`b_(04a%I_xۗ/7QIo !JaF$zѓ{>mkBV$BVJc6(͔7b颥JN?Mڔ,:u6fӆ5׬B" O(Ԫ))(_ --mN*dQN_3gϹ8S0Pkc?vukr8tBQXX+]HZxco *wvC??PV͋܏0BBVwP-[6x,4.ԉŽ\z*PS%ҋF5e͗sG+{[0GezBCb57nbߔ1H BHHH\gS JXkWޥ^p? FV3I>>>zMg0N>P<G.OzyľV=ӼЈPT4W\Q  BP?)ԕ֮_lIDHIPgo 0s95!oQ ԩSgPG+>5Ȯct>ZԚ+8!U+SW^aQ>%bTѪIWG\=[nٱ};6 ED+~PƶʄDmKpFޞ:zP %AsT5"Ʊ_S^</..`,]Y&-ٸ1;ohxom쭔hD#t EvӿELy$&N׺O`<<<=(rL^]Q^8jN ['r' 6?)W Ґ#/kz^|mk\(^և[[vSBJT{^LE(`iZ5Yyf ňeX +*-ybiՋeҤ ߩ1H eey=L P{`.3NkR?/AGO<umÓM8%UI1ؠ[/l|9gИM0G5h {#/ \;d4ٴyK{ugO12L]m_`-7c==<͋׮th #+pgWW!Y!6H4 iXKLle?|rd:ֈZIHHi*|5T3^dlKW8ҦȢz nڼy^A:.ڜ*MhrrSzё}8]oݻ怢>7E #t\he"њs ESzQyn'v%]Jm,^^Jyv7^~?{y+FUSUP Z\,wAnn߿WOJWS.V?s'vIܮOPK$$##v؆.7ڵV6bjM:U8eYrrqA \>W>mr-n5 j`QqfBK }*j( uۢs uFU}KD IDAT(;_  FPt㗢Q}YJ Jv|ɵǹM7@G ]HlTSDmoP\S6%z:}R48$))i3{̦4rч۩1NP/3|9xtCX2A)(PSM[%UWWhjҷnۉs5xxx5EEş|1.t񉈈@U J7]okitk\x\VK}XYq+RK0V! 8"HU9MMM;?{ ! CQ#q^EQc-a[[zXwZ*np_'"D@P6 +\o}zqM ys߈e z{^³NgP+j}'~ jU7j2 WdurŅBBhu­Tycul"BGX1ܵDRF95];1Bη#[ka=WO =FLJ?i.¨eaavmuqsΝ{<РukX? |3l |1HԦD Rf5K#T*+BN}V'rKJI yuwT13~ٳgS5QM>4!+>F"uZ%D,<fD*H4a/Tx3T''GR^"N(bYYY5d3=\1~ΚINaa|ndfs:tH=F\k@]1iePǍ_`LlH3#0` b}T9R}T3+0qe'&]>Eg~hkkEKJç}6o:a«H3 eb>Kp+s^#"O Wo;߲͟3 Q(.]z GGǂ'O>~ k,XÑ)?^t ƍu q VO}Ƶ ZZ$;'OW0ݻv=R/lh sC5OKBrEVVUb4rdJ]C"= hLZ7\aoˌ̸q]_[K抈`" X&ʒ Ҩ{~29ȋRIlzY'ypՈ%}*I*O/-!f6 ܸ ;l@#lQܣU`Mk]!vv5*@6KI*js3EFuNӉ? !Dm;f| 憥EX,qvv"I K,/0T&د'at9$FD'G'01^<.W⓳&tIT{I><gaO)0ǮfMyf>pE 0he=D\Y.LT*ՆәaQRM_a"'bS^~k{LB{G`o+r dedИ7WT`L":g!nTh"nߎKKKZ[/W=Z'WDn_#'bU\pE 0he+W;Z//..t9iXi/Ǐ9bE~N/sHԡӅ㼺;t@+Sc=gN.^r+AIR9cƧfT6)W@RM' q`d)K}x6j +Z[[Y{Hocrrʜ9bbbkbæHOMAzzUG={tg" X"7u9U![g"q9j܏:JJW5uRngGU%?ƀ끮/sM^~)a+n1?AuUw%'%Ok(ɮ^^YUI# qU5x,hB3{5``=akf3'I5›O!6;D[#b:,;;]JJr߂^ 5G\9͆uⴟSXC܃q鉫g!kkZN)+hLIʕ+uێC> kasЪ]$ WdAu WˤWS eOb_SN0bf8GO^/})m/_l㓏á~-KUx@'GQGAIN-njSpj,Cj#Ifۂ )Sc"AYnظsf},3w'ӧBH(&&&2EpEIeaaffewEmHM"MLt@nPDz ӟ$y/>SJxjb$I垯7ϟ?h őÇ g͜JȀ6T yHFy M?gTSsTI*:L V}yRvE`6mxN{֞;w:f&ih5Q\eKd*Zlt]ޢgI*]9Iib4-[eK$\tM4:r+h5:?I$#T6g j3:wrl蜄2YzZ\Q7p`]zN **lC0ux;h6Y7&[iEy_!P7Æ b" 4+3)*Ybم kbzi %&&ÐB(,,|MnW 0hŠFō-( 9@?ޝfxPWLimf͆WW>&kU jZg4kRs}QQfΜy6GW(պ}#Tc44 I"H$Zb%T))"=3{%QQ`wGϴa8!J!d)j⥔\džPb8-1W0FS|ZyZªJR,6^IQYN)+;wd2@ XjWƵJk(+-cŢE3\ f~ZO Qc zy51vF94d232Ŋ qMY!׍GUN`oϚ:ܳ|[4M` b3g͆5rhgkx ]'-5ġSM̙=-vtj:Շ: >* ~4}옐޽`㩸ݍOʪzRʵ'b''ܿ ??!w;h@ BHPJ"=Wq쏒SB.],V#41jF=?HC{_ɫ4,^WT*!8s'OC/^<-|zsa ſm;ZE%0`кAm feM=g(J5.G׭[ѥU 5Fݡ#ޫpE J79IOcxj)vjT0B~-_GL>N˒!D1~ԇ!SN P@waBbqL\yqeՕh}D8uBJ~KTˤ'ʨ>&burō7\<|D7P@,\A4XfͪU# }Zъ4FCwi3k:1Bc'ް!_D]Q-EW955 SJlLleUeSRZW,+5G`޻P(Er.1_:[s{ ^5_حd+2`)aoɘ7_7^j`Hʜ q Ah3B~";dߩ,BI?$[hadc2H:0ZWB3N5+*.i]F|͞Ae66,$J6ىZÈ"bCKN~ȑ#Sxߏovk" 6l2&&-j1$TMh1x?7XtYd:!q'st^bc2]|pŹpVWES'+6gp1Z[*TS+'8dHvJ]rrtv4i+劷b,^Cǟ|tkBEj`_Ok鏭VsVpJ[f0`К@u'䭋5k> Mib\&FIR)KЋ^>fB011n#3;v/sJv/meeu?!iK`bHuJ$Et $u4&F+\mgM7Zbvn_O$mf1@c[VVrh吤ͫvZ>cn_~WGw^9urJmO;I-i~~}`c34@޹̧ զMtx"UA}Y,SYbiXZmL`zv@UIO_'Tq^Ȃwe^~ᰌIΝG󳡁ϋEŅÇOsp]7.,,t9!4KxU=<ANN/?>51qdoXOAݝM11ZYYu GD**gg36<999"QBzzcmڵc=*lIT&ʵ{Eb;wLHK"D)Vaed/g={Ö(DiL+W.҇ 996z~h<- bnBo"?SdZEPJ[>.\ЁRiF!}9lͼ WdAA-YT͏deeepBunxJ,ONjHj\_ QOBb|"x;irx} YBAsurEݳgۡd: f s2?A%6{^rR M˕iUxT^GO; kjkG'^0a݇}ӺD,,'3MY#R0 ʬF+TR_|19[b M|*v!n ݕ /8:"x$ *ITC܃qՎf.bFV+W,̲\k֬Yvm2K`b\1 51BmF3衤$Mt&R6nDn%EGZdVr<@6iq܉c \Z&uhGy©;:w\Ĩ]cuxQsˉz3رzvjfk)\LኹVOf[+?`XId,]X?ۈ: +2`BLJA~v5<<\SNt?8i⛘`Ϙ).*x^hОCҬo/;.UaTrfN@AĔrE=芻c܉c]D/z5;<ȀLŕU1UeMlú zj}̀}T7Ou>4aJg" ]MR:!|>̳3a rjT"I*>VYfGEM[PIUTÆlݏ09.$] :;|2R$B@lɞ.9$ϟhD$ !_o9>K ;%.њh(peD mkaڰ҆@t⏰7}D[nG>t0Yvvvۓ"f]:+VTllXrE=h3^1l hb\+22&gI zV+(ode_TT3 l$oߎ˷S\F㸢]oi]`xNeIOĕ =HWTNgZUT-TͮNB"- Tnնm c3g몓IaF9BOj UlM Wl F1WzZ]w^rRRԞomk$躡wՂ&F5k?ɼ&F{5Y,)ӏ ̬lw7~l]LB,3w'ӧAXBH(&&&b"m`/S(Hfl6nWx2+gr)1}X,DRٮXY!Q31BZl~ښZ |lb7ESo(%DK&+Ȼ]j=#gOC2UE|+>ˌFb'C 5`" h_Yazͯ*Q'N'r1u.zǣ)ed`NUߎ4)VໆW1bӉ?"KUXXeC5P%IFEf֊ Jk근jOTZwW6Vсb +rlMyH- 犈bb,.)uwcqo`{)%A#G_&Fur{Z\ɽCۧJW1&'>+ꂆSX\\}bM4#Wl6PS)6?]TDWMtш1T+2`~ZL_؉Ck%VpґM:Ggq9ܫc&4bKZB)̠xXZ"b.//!믁Īg0jĸyiӦ4C%ffI:5Q|ƈC wwjsSIR$s95dmd!DǎO&|q9y FE6Il^f`qE@{+(01vp 2';yJ1#WlM4Dp'Iղq)OInݯ늺~ǰe+2`A#< цӎ** AuSϠRathF|Ծ7L KH$}6 L坷߂yr$o @ nyõH ӠZnF Bϟ?;`j4bBSvZRThb99uurj|?}0\FV%P(w&51RG-sZkc)4ĂkOt '$"hP[zfKNoݶɍ*Wc&w0wuq5)]l:z`W |7FŗqЀ@ւ-!5RZ=nGoYbh jx hcωs?!+#2fM B.c':oȀ (곥G$uHUk5MFLz5:P $Fu08̺o/]&[ʛ${`M; #"zbb тZfAtuA&j>Lv,AI9p 9N4Q|'܏on{\FK)u9^IMI%d &F֨ZrȿPGǎ `> U (זŲtbKiYVpu3ψ_؛Ҳ/}Sj4MB <"b~{8z gBFIQ!jXzfP.eTsƝZv>K Nġ * СZ&MySii-IM5/krbvNRj.!!Va$AݝsKP]1yH nzlMrfũ\k{y؇ q+Q~}9߅%|4jhh?x\[n# *z1dsR^ ,֚XRThg@Mgb~GN6hgc";oX'D~ӵW\fY#Ub;,_g]X$c;FĈ*xT1p@?XZZzͣwaNѣTH*51dirEג8{ fmc\l*]@u쉠Ν 33*גJL   Ir%&jmj* *u&#< Qf$:"/:% `߾`)D$(`:QVΣ kq?An0ڪvs[5QAY#+R_Cqe&FMkb@OM-zLtbb4BÌNq+~@>`t87s±cq .]>Klӈ)Y=aԳg/nذo@jq\!&Ɖo˧Nt& LU(]xgf{uؖ5ګq2'"%5uvë W_ז'>;:wU`_/q|ff:30`Ŧ=z*Mz*-{tv$\xGߌtH}-seSSz$x4z0gl]vl ʆvhG;vĂ0fswɒ%ߦie[GLVT\7}"X z^,K/;NCˁ'EEO /VÃ'bʔ)̜m`qԉ={tTG%蒒;BnFbfVVuUsEIQQL d0ZWDV[A=Νfz Ʋ}o|d" &tQ\Q8yۧ!fXL z'e"b}ZgTRm,g~/6԰a*6lHzO׵J<bQqpVhgtE8xqqɥRcҤPMnF]QKW 5k֬]V KhMl|XxK+:99648G . D1:x_}<|e+ǎ 18j&F;.$ɩ{O]Dw̬C3+++7-_WrG`DE}XASpEv%4 ͳB|*,@' ۈpE DIgzlScr}4Vo &ʨ=)x`͆9gþ>OLt]{^VFF׮]F'f ]QTS[ g^ 'uurQJ zJ55%5E \bzFfMbM4&FPb"I!(U,ZeVΟ7SOc:_(bcc|~EllX?܈kmġ晬{7Eۿw"ֶ\, Mtj9^|iA䔩S`H)(((rvFIC}-&7;s@ЫA?j?ŕU !cgؚs}5sĊ˸+\\A]欉:6#fg`S\l9 ?8{֧ ֫:q={tos\E dR/7.Ƣte0 6~+s8ʡ5=2#GMD/#lPX\2__+2`[M*)+`p4ɦF鐤rә\M˄%UT.\m`-B~!I=Ng9kMDH$ Z_BW_bݰ˘;v/t[۞/ш|56œN5)_t͛7'> EkI*OtѼq&"=3{;oJ(ڽGP+?xfgCPG\qVyMf̰ٶ06enDGCeth|v͑#Gh I#G̛:z5,Wg=ݖozZ\k٬ I.^'O3!!4qp!۲ j}`"nsb57k~ 6ub>H-Q"jԫ\8/MOäuڅeȀM/{TTZ^S!vu`Jge:>K=͸`__GNJXbLe_IJs_^ ܹ];)&q"=uўF/ ?M͚g$v„W zU5 V 0]b]lqRR5<1{okO8#!lcڸ)\jP$w(-)ӌKEFn&IX,iWt 5r3ճ7cl=M^.7nH* Aذ߸y2QY{{# <|ZX\[Sۂ\QA~?o14tu]]]'K+WHs zgi`:)!ƵuuϺuaoe" 6/J:: ;dߩ^aIަ$LBt3zLM͚_\_UuEЍ1yyNNfY^ry˖HPȜ^21ذNBFxxp7Θ)Vˤ%&11v=&FWWWWWގ˩745dp)%(--Tͯxv m}!rNć6lcqT RM46&ZopE 0] 4\Z.<}RRU**</5SRU>7c` c_ьYS@AO ?6NW4àe2Çq,ҫ?4O!VD"֭wwfM--DdF̩X׫߲r<{JQ-ۑT\R[ə9-AW,**޽{;k׮MD="BHZSX" >gϞ댡P( pS3g> Ό])M'3;w悅 yμFC{>_S] Bed~8 ?$"b,pE|6NNJk7vLK1~V5 0``}S^M/Q!?SfDyGxlբT*Ց#?Λ7G,XdoǎAßRܹuur&3o݊pkoذ~&ZIWӇa Km:*RI+.[|˖H$((͛xwȏW,Zx%hDRsGGw7˱54'Z g|'S? gC{/C.^+=w]HaSdg7kr\ofI5.G&2\ G\| >m"O#!4yx6:oL%"'$M~{XaUѵs'=82&>LWtEY5Q<7<>s9t \a! wwjbT*\m3u0fñ&Y'W @5b¨S`&Խfɭ:wOѸ^ D4_1%5uv>v 6eB ERbD"@a=#Bquzk"0`TdT6Mpund,(.)3{6 )b5Q+ͣbڵˈ?3EpNeUef&"'7 M߯>vvXϞ;gnНSslgP#X'$"hPKDE ߺmULra)ܹkFc\Q$jfh}C1((hժ5bwU ƀ!gpe@ٶ߯aYyJj*BhDHOw7 y:&2\ @BM ŲZ&=WF51b`R˺͛m}opjXԘj.G_t;-ܚH0EQSWWV]7géa2,7/k/lcƏM*^ (ը=PGǎ Ne%n|T|*A|kbryuaf1M,)QC*}]>!TU#ݹ}+~j4WTØ6mښ_FsEjx{=eGRsn!<<:e" ^)VPDSR?#b\X͜9sTk" J5qDHUԔ"Dzٵ̡SqUuIQ!BhҤP{0Iz]+%E0;wnpt1gn~ rSv؆9 BH$8M.^urek%] wYrUݛ]FVre"w;S/ˠ^&>ȭۻw^Lz"=-w⭩^H$2\ Z8ϙj\l)$'L酰 6:TG_ΤNW*JD͚"t(H2+#!4dj>Ѻ"BQJJ~hA-!҆O&-ebČNq+~0Mjaw KJ,^sS1';v7+T14to`fUVe5Ba$ZbV\ @{5DbNn|.o|L&Z5$"k|,lȀ͍j4GOd;КIVBoMd, )n!A66,R!2@F_kHOCM8GSJK8e7ZaFJ+BٶvڱY66*ۡ!Wرs Xr8w,Y974#WW, RNaC=OeMl ]la@)T&ãΆR떓sdn-nbĤbg&"KXr6hD0)m@zWbEQ(FFnhs=`~ @xɲ#LJZZ8oL^R ̇'bh\ڇ! w۫'51kE 5+J2 X㯖I}]͛Ba> sB2xf -[BB'/J>ԍKMSL-3CɤNY,M24{'HXIR脙!AbБ#G,^CRa5U5=2pWwGռ_hûG2I}z5ed'8MtX2aC^D=);5D݆bVFVBBBض\pEfmZbLu <{5l\#foF-]={7JemmR`w W*:W|sˡ#ٙ8B_P||Ÿg<Ͽ}"SSm &WAG.xjS>/r?Wo$Ƿo] `Y[[~}{T*}_94)B^sսi9=:u34ŵ% JsB^_aUWW+ |mqbbdֶ >>nT*󟑽z „78L&S*'5x$Z){7+jΏE!R/An|[b :Ê?~x̟N={-+hP*iNqW\ qcBix>U [YܼJ BC888HRSۦuE 4I%1/YcBD29j@XtA^IyVYOry|4]?lE9 _Ǐ 򪓈-0g]*C3f܌ jnnԁ% ?Ea/[^ {{TZY,fDܞELXSӗEx ܷ3#/RhA+00 [`W_o֋4zشtՅniq붠K O?j~~-[ژ+R'䅜}{|x#g}_|ðkRG,-y i IDATݔ1b0 -W*&%I2ؖ Z+-@Jt:k?${6"n(.۾i{fؕjo?zKqϞ1$ubM;Jŵ›pLaҙ^ uGI:Ar1RIiK/B ...ӦO~-[\ۊkxM3覚g>toe3(|VcVT*Dx(შ@u~N5&@ %!C7ξ-(r@sKnb <߱&[YYJqE z [|Y"O\.t[HYoQ@Ÿm.vy=vB5 5pJ=~5fXfdT4.3֤2t(bZT?/STss@؈Q'zQ'⋍KLri2a@*+aƴfg_ 0ٳ"6400gc %b]ĨFԓIa?tlⅺl5169>WN8th$...~g8_p7hbB4G pIocċWtμSݬQ.{Q~;"U'N4⭷5ϋ0 WgƘQ/ l”RAT?j?b067n>P׊<o/Ty|gENc4Q(oYÀPsT).$g]*xԃ/6OѻǓ9C f0 :,'򃂂nEMhgC4שs]Q"NEm"u#7C'GM~~WZW4KO`)ۜ6ۡS2/~',gϔJ`ð 6\R/,7(" T- 1lr[#G7 I]1#-U[P:.C -RT׭_!;'$Ek>y|~vv6-J_՛WZZ6{?˵ݺ}B#,ǩiAIZZ* z/2ؒZ*WP ^lߍ+/}NZ pD2tϟ?S EL:sY\j]+bc4#cCHSaY71$ڦS /?|̙3虀l^ oT9'!$;2+m5"F??UC]  런[C{-XI"]1%Bȍe20Zn(k; Ȁ{ ejϣJ cX%`gir!|R*ڣG%&NN9..Ύk3"@sg¤3Z| R%`l-iU/a@M݈)e>!ܙӰcha1b6gΜeWLM+*ΆZ_#0qyP\BۀoB`ҙL35`;1Q_@ !`@ lm}\nYy6ksR 4D{GoI6!!V,ʴf[RrX("CW T"%_M;TB5Z`0HGjڣÇ9h΂4Pj]jʓ)=@Uѡxw'SR5U}}}}K#1q8k֬%\16.t-aw;11&1e|,+RxV3 ;fљ\kUC ]ɓ'GDD^?%~~~66KmٲMl3LOUC22r}䦺'եyyyw)Y9y(p-Ԡc̭iXQމ&-yMZo6K$B,^^\.XD->vkL~*Ml+Rw8Sp4rw(J:t#Guڑ%&$½m ʍ C+Vv $K"m\"~~ jE`@ =DO7@qϫV.0ʕP<H&MxO( ,(p Z(W*=aZͩ QbL8M_>%2*J<"04@U>p8P+M$:7<}|z=&1Rj}Y|A;(C+B!#G+u...|0G3= lGK{e7^bŰP\.#CL,kd59i j3\vI^ :ϊzyJ$m۶͞=UV!I!)|>[fLNՐJTOaG1\c<2E$MuUۤWTKU&$Fi"wa2^N.k?/zJŋ6%1RZ0u @hzzZ'<9sLAqE Q>21kJ"6FZW5+djˏ9 1⭴4__ߣGԒ$p^mEFVNИdGU[&&#-ѧ\qco uTU"GYY9 $UY5DQ(poׯ_(L.Ob(<&+ByA604QW./j ~F$A%N) Mڳg7a (}ե0b ĄDiD29i>)h'S]u ,kA @"X"{'a@S!h=^ ݀o~zdjI&|'rEpXM'pqqYZ&gT+Rh$D,:`TmQ5^~K@3ZPP]VRm$xMط__GPCG6(*6|ҏAyΝGtI+/}h^?%$e?U&(<ԯgG u_(fiI;{R&+6jt!yvN@1)HAwY-r\^ec8GoI)F(P !o?sǎoѓGty-%xċQͪU+jOd%7r+6<ɶVu+0I2K5aYӿD ?wv簋 B9)Jm %BKAN\N>,ΚUi")h |OTՉR<8/Z h، ]bA?g'@|B$x-ZEQQ\=sdT[mjՀ̔_3s ϯLP?WI"ົA I$;;U(K޼3/<1RlSatWxjtyK|¶W Brw2_ҨgGְ N\\2A]jE}}}{ȓf:K{߸9mЩbM2ÀJ &etL\y!KCC/~t 1vw1AnwH7\>Ci")h"[ⷡr"b:;iȤ3w+5h @l%K>xWtAm[~S]]՞6Ȩ0O7et(Bycƍuv죖i ZU_c0rr9{-c=z| .> _#n7-WCKHI+r5 |D+Rhq&TQdҙ9ddL S ijbSZYB )tJy mmP7. ]ii٪UuAEkRRR;(7BT> A!G {]^L,A?/;1kv,J3%bX"!XQQQ &R\zQ%nzfn3kH8:(tXĉ7mu7Ri .,<EImaa,B"섙aAz9>}*a(#H/UVm0ݻs8+z NWgK>GiS&u[输E]- Wq\љAHqE qWsa{ȍڼN"^Z<]<20:KBwA㴷 0,xSc㌬=mgZ+Wnѣ'`2#+NT y-^WݥcǏ˿~(*?zML9BwaoD5~~~>[siiYtjhEAA )V<҆z KkH> M=QQZZ4)" _=lT(KOK89:z+Osr*+X,&Vǽp&>~hN"*(Ą|V~1SQnn-?zD-vS5uH۱D KK+KN%f<{y ts JJ*+*܉~0s"?Wdrto޺uKPq8 ,5p(!>sa< 2]ZZZ1`8g#eJ|^%%EEVILLFS*%eQȭ'q5:kAw(t> FZZzNNհa_fߛ7^SǗeP9wt cyjz-Fn *W{WCegg'bҥsRHrQ~ʵK0fC5 R(J$'m`pA ʤ3VV9|{TIQ%y.@%6̝>:.z3Yv ;jŽ[յt|-^zr8˩biN֤I~SIdXΝ=Ӟ˗ DbX,n޼ا6B2|Ą8*hQFFƄc'O]zÞ} cF@y_T&Ϯ͎{~?4>YkE+v69]g rp47635,LY=pGG1cIJ$LBY{㩑[ fGaɕ)(tҒD, kT4dg&3g`JAA?7Śh:6aZ[D߆=4Zh&4mܹs ·jjWTKCy O&=~z(YPό}{iPs7qɉ"F8NMr`nE!!4#1vS3 JHZXZYw4QKaJE#n=HL2d56zzD"ADSu Ew$+R=zL:ӧ(A%zET4שK<i(777kJ^'!i(@qШ=s>MQV^daeV-oOjQQް×~ >FAQΞngڵ_1 WTTۉ0}=wR-W|$>>0ZMM,-{~w׳#z>722$ͿIy:,ZV>}"Q-AtNjX` ۰o3 `eVj7&[X,9Ɉa'Nۚ/&If:WlL1;GK >jk$ԤVxO*B )uw:eʔm۶͞=j [ %!Bv0q|;{j~ڠEH<<<ƼOMQh+GFE i08p~ص70p-w3t䷬ߐQ>R1.q8'Lm1EW:絵9Q~;"-IUz P= :c}R :;ge5(IgӻsϞ6CF8//A[ e^l,=>%612aX-5 ++yIl7PCOT+eZDP wjh¸zRmmY4SFl-h4z]m`aea*skCC}DDʕêxgQ0^?ggZ=∈'YYzvp ,i+,--47FG=v0uظqSko5 a5F,w&Dd)F2"ǕiҰzE=ܝNtS \.FL-,---I$H22R=+ݧ7txc|||tmg'}DbFZ֭[j&!(~vU(4mͤPuE"Ȩ bg400X)ӁQ~sWO IDATsa=ƤRz8Qr/ڛv2deI 6<9sLeE恃GXiHܜM 0zMLĤ3gNȺ.o 9f#!10q8~;z^APߪ֡ſ]F.3j"/j#"naUW~000WF ˆiU03 Ni4gejv&:X.mknnή~P$VJNqIbYRHAÊGb؂fxn3"& KbY'B~p8(}B ^044LJJ222?33۷|pà3`aaq^lvv&eVȨ0k+3ٓ'OLͰ‚ '7cA +>_|E&yyzYXX-(,/o55'ɁͰ[_nF*iJM쀹K\ب011&ax<*#*=(څNJe01ذ4RU13覙Sz!L&K dgg߸*2ظ 򳿞޷ooqq1!5Q-qW\l˗b]A̙'-56bk{\0AiS?y_V9 wFzZ066!h,tN J -@~_@o033kojl_ff ϿuZȕ{1pƍÇ:ksSTw˃<ƖRZۤ《Qk2bNtT*㺻SKiL.zq/]kB6 D@a8rQ"&RdR]>v0Ό]xxCdRx /yPFV`nL @_l ogɩYImhhzL4^"9wH$,Xzg%wX,P3zԩDAmjRFĕ>՛mmE (yڵ;ۉdʳ&o~іrTD2wk=m...ӦO11h'tG|\bBB,r󛬹h+Vv|%@%>bϹpNRr߾䃪,qBm"fuł?,[jV[-68x$AsU.0ՊwW\{q@ii٦Mllzi>I!Diiٓ,<{D7t `o|Ah#W*&%I1:;QA0It+~08$sQh KG2h!">DVyO;g)\yQW{~<\b[ (=/MrG]m NT:޳Y,&.b#D{\v(#UV-w2C((HIJY)_}͢)e!Q)DFV4T%Z);wn S(*{+w{}aZ3}o9iB$C|^L,0`L6yΞFGg$o#>~\CWMM.o y \(ÎDM8! N>y&&&&3g*:R+&U"~C9$VUHL'6 ΥTlq^ vBTPI ?Efͫ?OhU%Kz[huEDrNV섙a$֐lð~法Hi yhǁBMFĶrWwʪ~D!|InIqE4>+jfY&o VWagF,o&˄T&5QBV m,7ؘ͝WvXHWD M#7Rוn @*oѕV *Y(bdצ5NFLwǍr|7Hvl85ͭ%yah%UI\gX>>𙌬LkDY/Cdm$b swsmx\qCow@@UGwҢ\5MrMiRR<_ ܷo?b/ݴ׎EiSx,,hHj !*b$D՟k꡸"nC E'Na:Iv5@Xq{J[5ѩhYɿSZQRoUVJP Yae˖2g׷͢ j9*| JcXEESE ~Sh "F*3PTC6j/M|O83bAq"4M A4X3"Q-Ւ4=\"Ʈ5pQϘ։X Mgڙ1.Z8ms0T+ \&wߘzC!_CYq-_凶tjf%5""VZمU5 *),,d>Bm]FX#ʆ Qѻwî g4%QjkVǾWwpKUXVVط[ajE!WD3cN̚WRQ[gՑ\_\zzoGA2 ;+D##opkqIqŮh,qE7k.lY|EBmAf6*}BTHP*(SeL:ӓٟrscrLL,!KG(Y7碇jb؝G@QÕ(t៎mf3Z\|GK=xfccy{O,XhSq?VH|~nIdiiCnٲMBsG >TRy7ym& 6}5Z驵9F挥 @kBoLǾ]~,xi%l9I1!C|RQ&;ШI9$4' 9EHZ%2յ%Oӟ3N#'fߵAD_<ف)VgjэɥzB_8qV5ٹ3(..%m/9 Jen.0?O 3n,pa@Yy Si ~JFjV=۝*"1G'?JyII!pfx9=;8"Fy7\z;wn>۰Vi"ɋE>F -17UUjp675[kP`cЁn-zAE̖`Cf[ⶵzo8;ut-EwPĞ'V<o 7.ArrJbggg- n~}U idZjT*㺻ڠ{?|&N$'vÆ g! l|7~<uTx֧S42*ZXXl*"HD==<<L5#+'`wV~}{kPm-,ss5UdMî M{a sY~g07WärΜ9ӅX{i"I HPíxU60«4 N EE$E'~n$HѪf:U'.跩=a\-2yv}>^tŪ/r^L,@t҄\.tĩ.ͫ[P)ŒDȑ#" : @5jFHNNՊ<=Ix% lÌk('KPYqAAxi"#G~hV г=Rq\l΄D+ziTW*Z2L8TseC'sCs4ኺݗadmI/^kBFVq\ݦBVM(. L^³)SP1P*lR*z3QV^冯/\`lڴ [EB)e3 ർy?W7yC K$BC/yaKj#$ZJS恃 },8ီ #U$uWTV=3nlk!,\Pۅุ8+v=W< sch*VEV>x+mG/Mrvi=@CuzuFbYRD䠇2)p$"iaO M ߇Ծ=?/MoO4Wn4C.[|S,,|mҲO0c55^r'***0 ;tΝbqsK}혥(}I"4SzNqLq]ڡBI2 :ݴQcgˍY,k ޴B7͔noϩD7oްapt{7ٹŦF6!:oddx>>'':sΝGr%Ç46莎RС55/xCy ~55Bر۴ff.nRѣI -((b2 LJ$yy|Lfiiaani5k = 4˗Mz#0˚ljjJe?3lTdrC#rzn-x)`hhdfffdlT*c422x3sJÆXZVVYQqNĄlbNV&ۖW~֭[ ŽY`ѾfnkkKhMMcҙWӦy|skfqu;_aa2=H?RHЄ,/ u.Zxq Z:=%w1u0s.,7nJ"fdp)Reγf~2`ҙn\)G-ɏw1c>Fsv/^8&5w;tݻ(|V?S"<~#?UTT=C)&wʟyސ!Ȩ踸CNNnSSa#鹠NlδfjP'nGPMeU;Қʪ?|k 0 f:ǾELHAO+&am 0j+mnn`/^6$+'10>FmVސX K?D@'dRfcHZ'yxoXt,:9xf'΅Gfo(Y7זk]x_-7K筃+׮>-C(`kÖH$n#aa[%Imzz޽?ҲUk?բ(o0}vÍnģGOM.MN0/cE[̊LlL`sH.ȩj yj5ؾh0;o"FW!3mifz!o5mLJbcQtA pߦ& L:S~CHjğE$RpڡVcEM] L.8xOq2= z:s>@-qP|f3$RauBp`p_rrʡCEEEذK} P/`oC2t@dTXTd4w_WhWHj / Nkjf^ںVYL.z5""Nv|Vi\GgJ}"yclT5Mi= vꑌQ;]$ \0gڀk;+Fsuӆ+ߋ9s \dwqq6}=aEcIqvS"Tz|ly 0o-mSRX٬4t:yێ))eqILeY9iI^'y uEɱpdkk9?F3ft1J&ѧHT>ɩ`X9x,_Ϳ{oIuZ=MLh66lfjjzVK?й)cǁcwe乵݊G!R: F D.du doz`Hkj\ Y9-*BW';}zZ/sU Ջd`UrRBcAAAƍrssQ.ߡwsUðQwt=f [5ɪUO\r9d8+Wvrr=b=dOP:_~h|kFVNƥKGҲҲ6jE!W;رxN05A7ojөf cw"'I#];ukհs *#%MDH 8jf/j#"n!՟ΙdFf+7JSQ4;+74@,S\Qw&%dAxؒ НFV#U5N"^\7k.b2HZ%8@;St~N5"pDB(MFP.Z8 +j_u-̘_P峝JV-]R<*!/j"(+KyP(^DDGxBCՆ#D(}gΜ}}}޽# q^6NuH$wcb,vD0iBr=l6 ݞ],df|h $IbBKKoG܂ }y矯Gَ?@F$7|3ڜ8c/~ ג0 >A3x|/dV ;(2i^jf;hZ@}:jՋP5Uentf5ʪV|^- ȶ.@V!XJ+gx~^G?d۷o_m /_~1ON!>YrذE7>@@#'`H!c7e>ѓUG̙BCC֬]ep3aJ$U bp$+K*mh4))>$4^ۅoԻ(LM%b*JZ-?eF}&4:h"v!'00A78O+l1g dGdHD*h ->oFYZZzdmX[:HPW|.(B0X0lI#NEBA#?w@MڿQ[#Tղ^S /x⹖n[xpAݳhy%4שK<Ik^-}1JHk`o{/ogC/IY#%3=ƶw is"4b>uć{wX;䔝;Z3n1/'S=Wka๠X/mKj=K0|͹3;jjGͽH7[΋ɓ'b#KMM8CL oX"%MlWOH:w4014 /b0lΜ9˖v!WT6ʃϝE(yCHLLvJ=VbNfg'acA}I{ج{AeAk3j@EߞI/ r=>zh_a줥uPkkt @*C1!䨥@ΖllzD<x^[>+*>p`6hw~eK9" ]$FXlW*s܂ӧޫx>@>nt&ko`t S01aZaMMJ${ ]$<?:1[v x D7i7͒NzN ;L*ӈ;\\\lپxBxS{܅/*R3&% BEҊ4jt-{Xl τ.EF^0@anVZacca:yBBlhuozi$ APNDԭqAA&B!KvL=bwoF}E;Q啑$.>>''K. 2e&,bʕ 0/UV~~~gϞI{Х¤3s_^Ie%z :dd<wt;n֮@uvRg8x٩zq;mm:_N7D\agL0q淵,k v\_+B3mذaW_kFjblxc[,HAWC͞K>^&(~fJϘ>Bkў#T\?~?rp*e?'rx7ҥ'No@qΆj!"^@QNlMdL-T$t6d3LSH 9ӌ+JDnRA} /|}8,:skZ(E U{*49UGR R|lm۶m9@տS3AZ]] ػw"\wwSc١4զҲ9n }Ԥ&vp8Wh265!z*qVDYRP HFתȌo(7nڽ1IHrmC"6t&ZKxW$sEhvmuljٯ(ɫU;jD YN$ӌhazN>~/m-7sR\b1Ϝ9_VVT* __ߕ37B._Dy<]5X #b Zt㊿]0|o ??Ԇ|>7~G=Z:h!ݎr e+ ݤu˨Z)##c6T ҴY_[߻JEM zhnn B9k֮ݶu}n7 zZs|VTf陪ħywaHQ'3ڬ(ܼpEgvmS;g)\9<oUݒ]FUQ !riSiAUe;ڝ100_sHB5L!a!pגKSrK\L}}}wGQ*'OJpS]Q vי9rȸMAv#˴,oiPǬ=?7Ram͔H$nCwKa TTދ܆*W*4FGE5kn'] ZzϨj fM${AvEz$WzՃ²jg4뚵k<`7u겁@E1%??w' m`)tZ*Sbbw%+*b$;+Wo\&9|{Dq4שٍm yR& 1N?+6KWe :웁|%ͲU?O0>}+SW?Nѽ--٦&tg{4jqBķ{AۧU.6jәl'ٙU $%2T=CC#O{\Q}r%sJsssyEeFfcu4uы`BԨu5BF;\ZZGP[Tւ ` nTU(US(% D$ H%1㽗{wA psk۶g.[>ѳ{z+Zl#\T*) snkĩ쭧3wҽboQ&^M'|(:իSG-ng`k9n1V .z"rss?o꧟=>b}:ׯ_?.(hӦ B~sé[ڵm?6jh]ZRIWsnox0AByϝQ*nnnIj\Q\TT(Lr8uڿEc*NH!)U6>lؠ]mJVV&Njbh߷ׯʪ߄VV(/_*k׶U֘!:vЭR)6m5Ç0mĩ1l4lXبq#gҜ7ڣ6##cĉ}/sxoN{U3ӎ?2zgm,Lvv R|;986lX6򋟿m~mg{ͧO^pKE۶mlm-P?w^Tp<==g̜~ /_*W:uΝk)pPV)S=oc*-PA _ E?\2΍@g.[#FIM Veo'ar=g%B޵,.}!cF"8p…t)۬[7aWR&n.͂RPoi~iGmz,6ᷪױU=0>Cs9Nt{;ʨa} -kO\,#9fc(S”E|?TjLODm)#dFUvRyee:.Iu!rIԩI9F'=ZT)uXIisYYY+Wm1E,a.F&&5y/J"Ci\Qv q?^1
]m9ym"0=6M|U[B1BH~CL}JxЪ61]s5E Kl7r}4h/ׯwkLj|r՚9:=`ݻwCc2W.jW < +Z1(F\ܪV+XVp:hX##)n[*@m!XhWİ˕_scu.|1X&'Eqqss _i j@Fztm>wJ s ? 6lD;aCB;̞=S&Υ"0 ED"paAJ/^BMn04LT+{:P_ He- %!_7+iKU E ׮_^A:~i&N2/,>>bkҤ{wqnXڢ!`RR.[rⅸ'::`$.X\ըЌRAJ5a\ZAi;DET?#׷Q5ve?qеeW(6R bK>E&@+À5MtvL69nh_;

>& 0BgD5iҀ*,,HNN7/^|۾;bŶhޜaLVt1sS$chE.M.MDoZU*HDqbI,#,r]Zu0#P)iBq!fz2)!hR~#BCC;Vbb~4ScDr7L<‚EK۹˗égH :0Xhׁ.ӦV^ٽiX.)MO5]XR8 F r슼{$1Gl+q-4'r%97eZ"i dMΌ |,[+WF@>zшgG䱪dq鏰4qLþlR W}n8~!@ 0.&&%MW1cL{+W>tq%-7w?A劲cGq"3{2hdÙ$ҮTE7s@>q젯XnJ|\&"_YN&:_إk_st" .Ft]a\% և`y*?Cx>o%ZlvZvƷ-Ԣ΋"Mv}Ag%puEed\^f'OzɼYܹR,!1 OIRȑ,-o\OtY/ܷ/X޻Ӳ&-߽{=rrϘI8 Ps-3u^RJO+}N}? ~E+3充bR".}n֠{)鞧(OSa IDAT )2JYN={.]@ɂr܍g7n\eǟuwn`r`bTT-5~0y9y녱G3 m?xP'v`?dv7lۃK߁ZT墴:;: mQk!wlރ~MS̛;iE6&M1GǫW&ALI]aFI!B('7Bn>ScjA˖-)Vtj#yyw[b U)c˶MSӸ\ĵv:u{\I>}Zo[CcRRkVu5|NxYF.ǐVE\ r !U>>>K.ej,X@y b E|>}:ruKTdTTT3.CPS'Āqh>yŗ/_1MB#\1=-lg9Aoetm}˄k9Gmx&]ǜ2%h(Vޯp2Jl\Sz иkak&jtіH~zEYڨT+7I~D4piaJ5Vas](sD$#%5ǧ"C}\ Ѭ"ysȪ`L=6zǎ]t,S*U#`<#-U)sgJ>%!18 Reݺt3i4a^X7r. X7i뺬lP:Ii18[[u֮mWK[JRW O1mc1U0 :c0OmHRVj8?M8o,XX/=|~Я}pŴvso:|=bjlrE+VFH+Rk#ԋ͚F1~B(9^4ћfԀ_ʅ q\\tVWl 譑#>v$aCɠ0z{6Sz@y|ȑPfDNΟ7-n]$T1qKcɂYP&WZ)R1jL{=VQWTTw"-rE-WCwVΎΫz/gaϐ]ϕHY)@z刣 JI_H(ɆԇP(ɛxMIPHhOo6sˀ;6ƟB0v~Q#QEb)i*ophu1}LL6uss\xQXXpmClg|Y)}䉎Vx譛aG*l,E_шXُ!IM$"FQ)]G' P+W4>bՀb BTJ}~C=͕U>ydժP!=xX0#Hf\n@4-+[Wㅄ1M:Q&x$ m;p'7xύ &QRt =S' 7$ n9 tG8^V&+$54 _ r_ܪ#*zaf?֮kͰG#\ mOq078ƒL-bR^zmCȍѯ̙3<wb?6 E\WκD\V劗vd"BH"<lHz(quAz?˵E-^,HWdVa-,Gt)=.D\laoر$]Unqb@ 7ǏpZ4,-aR)SAm[#"hȑXȼLJ [n"FFcpc'K ?";Ed:>4l|fރ+$fUE/V# &j~)Hmg1iIu{ݭ=WH#@匏ܻiY׺m۶B vLMOGg8{kn4ebvu2HI/?,,j"nnnӧrjԢ"M[U\X8y${ueqgrJτ٤S]'5Lp|ABv;z]  .vmUw8y`橱Դ?_ҁnKHIM LaR)Ϥ۽k'Vy%Kڣolly(y[63 1PАyկGKR/鷹AxG./,(DqS톛f~!!#h=)ܸnZ@%:[uZD KDuڿz5VXĥy{E2iIm3DV]ݩ$B+# B<o ;84k|!z0PX(n׮-@"Fn|XX5$̙3hڭMhiuEKEmp8'Ot\XVsa}SIGǍ}60zQ&آf/+~9/7B8FGiL0e1 Q,:@0(e. ]$kT*ɓwr iVsĊ564!tF&kҍ"#&&n0ܬϝ` o9\IYv+^J=&B>N<%lm\=7'N8z5}eQ\ Q4X1Eq]!$~SGkC=7Z-K"rKcr{͛7ryÆu~:> B'Mѣ7GMa,}|'m"F+DK`Aa)I݃tÔcpmb ˄*DIdt:;:/w]E~/lQ7Lr򉈈)Oc/2Ht2Fi1ux&IzHWBڵ=.&7_BTJ1\[Wyereb\JD"F /KXRH&2 Y~Μ9 !C22.cp&YkQ1 &Rܜz\5)IXerE]{k2~ރڵ-_\Fږ/]DA^olXy=|99Hta,Eh1ZH]1nm[ceu?I+AJRѹUVd iܢ/լq{*r$ٽIe!ʃ`C&s<ֵEĭP}cI@3.](g8CRb@`ի59z5+sʔM8UTCOםȃ?=6׻1wK\ ~ MHǏ/,|VRRBa AgڶIIMۧpE9E7%QF-$~o.A1{ {2AFfϾcq۪s :7yJea'%~Lh` c7n8~DV `aZv 0յYꊩp@ ذac5bb%WT(J tBX.;.߉^͓ŲR?a?T+ǐi'c[ =~vT*卢_HtN0dp=ִ(.}aV֕𕆍r슏ÄKz=LKyN0 a1cz9$F϶NIM]P%422L gU;2*rIB+zFjBOzͦODgJȹmdBaX?^j$T3_l 9|n|B"0W˖{4XVuS)AY⩲_?ׯ?BJ  'N4ZEU0ܳ| h{SIsA,M _j\*ž Hb >q젯nI2p C* rr%9PT'ۙR.#/>DvV2!p"̬w^ ixz], j@Fԣkso鷓+9ĀΈ0x,ڔHⰦ\ L1 i֮Y fϚq]&\1%5ǧ"CAqJ./f~CBN- '{eB*btڽO75Ü;ߖ9pcF|}PD"ќ9sp9kp8c'1o}0B|11qFAg?}qt)t4,eߜzFs}U*6Bkqvtfal3He)s !I Lo5e73FRi8I(kϞgN2}M=w5v%5mZ`V4rGب_uO_/6^$3+WDEFEE5r?tq ,XTE?{2/lxzXR[Y긶>eB%n9@e9/dQɌ|r딼ڦCvG aC۷F^ԴOqx՟2|<ʠXrx0r2ͨ11qv! @த+Y%'rExlU:k~qƭ]ݰv#P/J)9R_ibM YtN_r[d]h|dK>P(B,W 8;;gffb~?@>ۏܸ Z9Ѥ otI86v`a {_,WdA!~-2bI~kuwjhM5NdЅnNK;"F>>T32m@D/]vosBڊj:!1 [jkE5R <˃Sp 1kma5+lvlhgWao'P"^'^"uo(ԋ1VIr5Az9s~< w!]݈`K//[ N6͸z9zu`a$)6g/+BQA̚7x)Kv::q87viȹIgد1kg͂޺rZ}3MXI iD"Y/4R^pJKK{]??e+Whׅ¡.ƤA+&&r$&& "Z'G>Q5cRi֬0`Ioa`S׌Jkre?֬[O>~"񇑸-%44OoZ|F,3AG2#|FӠa/_l# ,ȀP3,8;:O<@&iś4ΎΫz/742O:(ftrpjT5. aT,%<ey:uJ4 4a/~eSbYd % N">/.\M?ޔ=>mZbbx4e#FY{ 7o6e2W&ѰHІ3fcU(Q҆dƛz8 JO%5`/+0Z4{XTouc LZ^~1W0qJ(Gg篋ZoeO_SPwڥ+$*Lp/́VuaLvՏBYRG+*vķk&%5ozqE4QSzkVT6\TcԫbQElM(FFV f\nĚc!#3k 5jJr EoyU2D(Dk鸶8Y%bD5>ybTt x-e uvv37oP͛72p|F!|>vRCܼ%&,lStR.K$de]-,|֤I/92r&sۗ]W! p{YXȢW6+)/5Oȥ.έj+>u2Y?,cŲRiAt1r8%$C KOAݻ9*h)))1;?yzoZ&"ocfE^q̈́j|is?'?ɖWlAa~"O:5Jș<4$$ـH CTJȪlTJAR<7 ͟7w+2 %T0ҳt|ĝl5_,WdQkUR)  Fݛ(T> I'*BRiv=? >6^LqB~XV:^сe˖~+Bv]PhK#䊗Sp^̘@yΪRM]"[^32Lrƍ3C\eQkk@Oƻo,6aP*_%q۩ATpDj#2:W6vDžRB5L kŋ{gt:.D-[mdH'pus+vɓU"/_rŚ^0ωYTWt8;:s; !!pvt}aWV$L ձqeﭜ;+wء1el lXVxaHM,+,bG6BtذN4J#+3|}~DHac]̏.!+*qtTj6$Ҩ6T(lZe|aaa0  J6jpђys0YlX~wrWսz`J,(y<6ŝD#!-ωYT!MLsS*ޔܦS;T Y\&ʥ +gd#|0l!p4NJe2,*)ɏ%>s挗K{mBqvmaL \%H#7F-YђEU"`^~`rE!lx߽Sأ jkHe[ͩZ1:܎=J%`/ϞIFe r.9b< Xq5N1nC99pZx2)!b.MjiZCSRmȩu~W\D6MšwyJM"qc50J52WJ#F.ի칳?SZ`ܰlbBOѣwaa3g5kXuV}!-'6ω,, ڢ ĸ){ ˨idۺTAi:xW4tzzXBFfU=Hi"ð dyK ؟FGoױ y~WiQTJ$Rʪ߲+(+JiauggƛFM:79RL_Q1x?WSv'| ?_t ٬ūpTioP}=|99 sGrv-guH<'+.M`T+l# ?[YN#Fvm'H beڰag}j{Pu'iQ A*"U߲da5&FҨP螐rYc 53g4W#+X &ʗyp\j7%UBh5UÛWT 3w< XbU&1/8m#79*~^l\'O _Uk zF}A,Wda5Ж!UHqAIRgGUW|HeBjV^=N2mn5YNvi)+a-ɯiAX(4\+.Q#ѻYAAAq:O"hȑ&$U]?Z[ VQ 4  \ȤerEO?rrcLY^]BUyj+]aG[\IRGEdB<"P,+ݺ)z謬,q>o'gQ\]B\#a4Q!$ O!B,!%5yodEgΚ :Uv숧:4 \&W*ʴDP~7kSl"L_X]n4^#E0kB4{r{͛7ryÆu~:q U$Z:ꅅ;<._a=o,j#0*â!>^G"P +%nuw^EGW$Rㆍpg ŝ# 0E ͨV2 v0n]edHU!naDVV 0uR/, )?46B ̲Z4bUEf ѭtvr}\ 㹙}N}?5-73s!dhz#) K|~`LrLhEG+Ua;zDŽZ6}:&98gO<lo(GI>*-gu;VSٺ"Nť ;t _`Ud Z";Qe>^?R7n?jVfƍt;Ӆ0Hs 9Wf\nX~uv !aib@faP,_']!$o޼Y/zEwP2d-(_J$PY6;P8U@t+ :l }n4ekkuŸv1}U 5-7Þ/mf62ygrݻ4lp ?O\O>s8IBX@ 8v]\z";Z=qDy<)Rڌ5jTϛ9,Q56etAeBO|J+7sT7Q9eT=oc*Xั&Z^~1xfG\]o_5perŜ9 kvٳgv#z<}<èTi)lB5*ԚKĈIȟ.ӝeVD״(ik^Q+=־?|r/+\V*bj&)`;۳ayȨT<`B"#8X5r݅Quރ~"D"Q=۷En͢aPvQB7RЮjLYXCAQUOP<-fłAyYa> 9F^,Bݔ^S4qL¯޷2K/ܢ]bb. ÍO|> zO\򕒞V-wD>ck*N=Yk?h(Tu6CڕSP&o^oջ_5HakJK[jXL3ڰ(44)Fׯ7㭜;vODXVC#G͛H\P,+sw7A,7Xr8B(gJ@N=YȢ:V6+6t(HRGVZ9:6”XVjX'%C"^{w!XRN2:Q 3#!věkSV aT4HU!? [)!*,ӑڿ/ƍ !t*%`IYieSR?H&Y5Ϧs/xԓ"7D>>3$9U*(x<ȭy$Q=ԸOx#F`!.q{eH:uk5Bs)Bt[C2!jjFSٺ"j6"rEJ9G?tiܽiB峐PzyZ˵>|WP`S#^ŌE2M&jZmklR62>%5opENaCuݫvk k^ąA BwWXuRLD5LķWCѣa  N$͙3'--MRI?wnB^nݺ'!OBxUŢ'b=F;x+ Sw56%Rv=Q궦,*IO?+nt1[|_ڢƾPrљ+A` X,+]¬+WܳR7oܧȍ2444,,Plr+gN,Dި I?y{ł-"?A1_FrE2>˖&;DW"?YII %jE >}ZUѭ,T*WRnTJ&,MmT4.[]-L^!ʹ]^Mo$u2E$E:FhZ<9ιsi!H7lyD `#>h$B)om.Aۙd-Q5Վ4|qAB(y1?S϶ -vyerb6\in.zK8 ٹ"9 .\dHTLEWÆ%w!/R)e>]w"&In͎,A\WH8;l٩SOu6sЌVk?,[W 9*Fin$',dG\F` w0JL0zgo=%ޮ[1T_x 5x{yBC&D{z|>ʕ+N4ҀigYD7,, Lb ~ރؘ踸8`p^SJ #[WՏ0C^R)V7~6- QZGST$uhA8td苍G@󒒚-6fP!(!H9lG{JydB߯o+dUlWBǡ|tC;ˣ[.+9H)0$\CV?,}+$+`,fXǬm^'6vǚ5+^A5HDQ=jGZSRӠd 4 q̞5ݻYѧ7]He FX|jbc`8TDU?ըۢAC=RA>%sDbd˄ )UIvW&W,[TN*rheDt_+ʘq~x">>>C _ B~y{wQ'$(F}=: A勅7Au1Ɔ<0D-nkpn(m eyojB.D"=l(M4fyBs߸TV{[W|>?33n T.~ЄmllH5BmͺZR@=o }j!CVgS AlJDٕ=)/$8buB"[?Qq5a"w~H`X*;n0ƠC M8Z} `t:۶ M}RR<<~|*x<4I9Z ] FGFGo= C"Μ9#66 b+e] feZv޽{e2ӧ ՝;wʢKtYh!wj0Ca8.֦7tK&ءCq8C{RI6x z5PVۮur#֮R~JͯsC\*csh3gn_d2}x/_3J)srEE1Qt -nnnVΝOyB.oذQ=}O0kcc׶_B^֮=)hL~,3g/^R%waPAQ}9d/ YZ0"b$rlξ>o`؍pQYj|4Q߽R.D:sq.qD"رcY_~?i|t VɽyF112Sq3Dv 4r##`e^,e__B*bb OoAD0',WUjdc߷gD;ղx=@ ?O$#.~WI:xp s8vDt\l8;:~9!n:Dxի SnrVr)Cķ^rE&^CksPs.GqN񉉉1(#V< ,M}e#FBlìYaZn͖&b*6vc"/Ze\\)Xq5".VݒoB}Ne\.:TiǁT 1cz]9U!2j/z-6sco=o(WO""V\PPeKkK$45MR)_P fSL,4E,Wda e}Gn#_f-N^L<􂂂?/Wo xDJlJ18WB!!,p-+&MƓōnnn;v '[]h2]+Vh6zR:ަ:T3ih߾=̘ame?78!4a .*zNl=M(I?}VszPI?B|2&6"('BOI;NKT+xDt5+[\T\SOH1r8%$C KOAݻ9*h)))1㌢R)e>]w"!TΘ-_o@ d2mܘ9(|> lW^'n7läGz800EJ9k6d'ItYXmP D`D%S*lImaWvJe֞aЇxk ݧh掟~3MkthAF< hBǽ1>Xj,WdQ]sm2_æ,ŲB`-mS9{c4}(aC/il~T& 03 h4& BO4 >b![lAXI"nٌ`Dgg={ &,Wv^ΨVjuSQ&W=PByCaDsIKK#v(ш]PO .u"Wl׮-+BQFZ#vS4Gؤfˉ8DI4ʙZ?5a"j8c=q+N;vhLY4oB9)M1,aŭ:drdeL o׶MJjZ> UH."Vѱ(+{:"¡t7'~+b n"FGF!uڿ EVQ^>efinccCvIqeD#BB~E&x-w ξ>i?=TF;4nJ+{ۜzϢot-QHyO|~JU$#c_heu\XVxO㏯feFnHabYc'SSOݺyHc۶.?`2".%À陔#5FDPhp*X&W|ÙrUEZ!ebZ[uK*䊗S=D5\X=cG p8!p3*iNu $9 6  ɍv,p ]v"D1!2AiMsjIGWU9ϢiŰtG+~r1]&W̙L5VZ} 0@8˖ XIvCRy}jcJ0S 81b?Pܭ[Wnnnd΂p#BȵE劗̍v,=!#0&L0vJQdQ폠F]>ҫW*s̛H &lvXȂE卞ÇK ZP|>B(++˼EJo4&@5 04HB/ROl(HgΜ+N>S(t߻w1גCMi"(#3;kxsMAR~\—@?J[K}Nu6u/L&B6XZ=p@Ix.eH$*z^JCRp0{ej\躠uVLviר< n!my$VzAyYS14P\"KźT/Z~,,erEnNX"cвEsc}y;%7+444,,PlޡK@m]1 Ҳ{{{BL[3jeCt`Yaņ:6q@"&iߧdMkdO+Z:Ҫs5`dD{Hhۦi\.׌6lF!,, :Jqi;W32=C-Qh IRZ4Bڌv5**MD彦=ȳV6sw,m0 Srub99Dq)nirEVGcc팏"5r 8Ǘ\acBp8^^^ 6JN>W.QEߒa^4Z}ptrB|eCxtp 0N+>c2ԑ"(&&./e5OڵӶ0E*?$ f\ p#BkEYibUK*q۷((x|'<Fa2rk][vs;£]Qt`HT*]aUH3TDġ` 6,Wda)عg:R)o:KXVxE T*yAPz5=Lrj\nDRVAd:ڍ6rE1VhI3c{-Q[0C{ի7~W'*}~X($=! O[j!VaU^棢BHz Lb#b%j&bd" 8mNL쓏B׬Y!>IX'p8~\&*Pu'!MdeAZaFOBgÚ8?~"H7jѸֶ]/%j`EDQR[ к *jU@ Z&"D $d~2LBH}?I&;Kf9}FBC tlj۹g$GeZfY=_v|a mQLiCBȦ ֖95(^jhBəZ~",_sڡX셞_QSڵ,ZWY)eF1*T*Lٖ>e[Pu{%% #%*GoͽF9vt"()9Ő\vJՠm' 'BEwK02ggNƣH$"-#UQx4郟̊[iRk`ZRr U[N2Y_问Px GD"A mGu-i$K/P L}z\JNkGj> BIy~P.ZK?_{cg˖H$"{; Tq)t*jOS,9TGH`n5?L2=(iEs;;,25NR8;;֭3#g۷&!!!ǎE߿W!r35(R(뿹P놻vۻtFP=LoDkƌFH$ ̮.--H$EEEF+ay*UC@``ZZI}}}KJQ/YH$n;)W^4X\Rq19%5::&hJ̹3qǏ_ rM wf+‹/zJɕJY"o=5rrI=Gdee93^8hРI'\x iO<QPp:BY;p[NND")--<M^|tt N41--M$-.(pkBY@v*y> Zѿ{\NvۗDΓv@d@=o d>vaÀ([Пy)e@ٴjàoEVjUQP(O}}M2]ǼăJ0o)OgSSS !CGK9UU/P(ܾ}4gS,7K(:s>ѫ±.sƸx<ىҮe=7m4T%f6K0#=(\]] ***P.k}Q0i7KQ wlpobSu,D/x`p%f~^l=zM(S$mP1m!?[ yy\/+'1* _аUA ?1ʀ߁۷oCӆcOYy2\[=^ͪKT'YE ~[m[IO2l/eÀ ?fnNZqZnFVL:cdU׳% c-5iroVL1TB{So5*]BP0/]A[O |>g~# I8E+k@Ta,r~SmTVJp[\"{r K ›}Kll?ݧwKݻ%xPoP4Z1tݗ:뱤dUqZY)%ŎڵaݧjC"? *?HYh gu;b9xOM>j*UC-;#Z`TJ}.>xxTWh} @p:6R)Um+@*PNA7jKHV`S2.!MTnϯ-&Fs8moT؋zSaHĆEh~An t0C'l2YD"AQӱа oUY7HLؑyPJp+CG:$kEϏsxqYPה>.3ÁY".:ِAE˯@|36ɪm#7PP'Ҋ*UݗM*qDtD|'jj*Q]-9V;hD[?Q$)9>DQQG_B,wKq-5Fi֊V4M t-ZL ԡI M0*3~f/E:랕f^޳w>H$ac-k,zNߍ̌ziClC)/ut{*c1Dd*:6u?zjJ|4x y===9kr㮜aC v)T74zڤhcicytwE,2Y tlumA?p!>L̚OHUJR_P6#_}$v 5^7^l|||0 .})ٷk-Y}ď7zp3ng+蘒;pbxZan껸9@/`+mV=w8sδH K7>y˶UAvE~q=*WPqg*** .eؚeULXXѣǍgrlllݺ2$ e/>B  xŰ!(t%s/2.elY *c3d-]Q΄zA(hy8W;[ q~ ͝ACg m_=F|СvS*ڻwC:92p[=-[6F^?0ܵ[;CjeK<'11SR7nPU?kY9AFֳ5hE`UdTd+k}+ ^\.o}^+*Y~ J IDAT-oq p|vRc#@cH1M[Xn喝 @v*p+Z*zX3D~whI@0+\64֪p^eȷ[$j2jKhѱ1~hEB=)(v}m+χt  pug{\ïΗ@nv+a}6DEy1=ML;9: atz'']g3Ch 4,"h rP7I\@" f3 (h(P A$k}<6aԫ w<0V$k &N.*)-,}έc=0X+5םLO?$.XYd1e{8[bbkbފT(cnŜ8vu=$+T) eMMR%&&D" d@|@,e 'Nfe7Mʅl:S͹yi^6.%4k:ƶGN tZRپ˺??}mNU~ao8/!Z@hx/(''o9MRLf|bU)*驉]bb H$gD8ҴQ[[7jzǦZX~(?N-hY83?;q'ŐOeSPcȫ,@Y=[nf sZe 93>c%1DoۼUYGG/Ћ=o9qω&WWjBY" vW€Н; G[BN3N+6!ph{86syc&rW6Ӡ+߹MS Pn6zPx{wwyĒZjuŒڵRHX?2/8ΜV%Kz1\0%ՂNQAB^-MXDT7pI"plۼ]nEQgrW7c]į"E'̛+}XzS\u3! Z[fD8=UUnH/ ]UR"$ HMmw]b˘dJ S(6n-9!X]Ț-;=|7b:%ϙ"(hU07ߩ̼Z?֖Y؀X9N$fR%ϷXJuWH\$ZЎܺkOur:qbѝ/FdÆ #r&43Z im azj~խx==={v[gՁ(&kޝn+o kȫHb7 W7|+22 wS_%kKԱ ;t *d-`ſ D+ڝ$ƾNN!!4v iƤz<]i̛^-8Nb֊t40!Tv{)85q样xk52\ɄilnX(WwCFP̙]]FNh;V|$5ݻxc웳gM,<. ʖ֓h<:$tqíJ"`uW{+_n@j** &f#KZ  -H$6|HbLH^gg6L5-~o]Bm0P眙o5OݰmVx6Y ƿV~|AC !t}5Hj"``ᄪvE"YC^b獶o^{V$)3&Jv$|m\{E1[8P(t crϙVUU6!]ámF^qzt x-ٖ^jSzj yyXᴛ2pRvaIeg0L-HR kPM ?'U%eeѿhE³Ƃםz܊KJsfFX "1@K|I)$ i¡ײrWw8-3wΏA*qsUVYlZHMJ568Tmr=~fNx FbP47[#A XVd9UU7=W<\8N yjʶdȍզ\ @xFyTW9`ʶ)T*>Q(}||Я@"dUuϏ:¹Q"nq EQI)虤Lt@>ݯh`Rr P(ߒ@m،դK(>Q(۸+M%fy=Vے*U>G_6-.>Uڸ tl<޳׳u H6}Hu;bbMHxQC*αҎNl * 0/~X KjE4={ iwKB![/eC.EN@``TeAkJ J^S8LxssoO3Po,[zzzjs2(3aTq);50BlHŔ@ &İLfH_i~H>*UcM?vHA A4 NMTT 8Xٍ*Gun=N({HbdCvÎ@ >.é:aN!NhX)jXr!8 'fw`g6>ЖhtCZŸã]ݰ10j%J=C=@h/ #X~ DZK'Y+:4z=6D+h4zD &/_#fHٻ_ O mR:B Bm[3*U:áTa,2~S+:=*^mm[Wvth͉T:Š#}JR ]!hXoƉ2Oe^RFIH"Q]-(`وFx~~~wKQ[| ţQEnqK"М\zj"1mC/3q&ѦrWjb+bH?p"@ ^%H$Iut`B*򃤏"ъ7+PDRwB#i:Re*(tIzPoU9M#¡0&SV)^0N[ܰz?uHU+Fr!'ӃeꔈS >.mZ:!q`HD ъճN-<75 Ub-m`x8BIOM4s131Dh5U9\_ױa`tj_Ug>.mZh_bC}ǐ@ xE`X8o^ΪW Ԗ]M.AaV[n"Vr X  Ennd.ϛ br***ϟP!!!;w"(..w^ML0mo2._FwfΚuo`ΰe~K8DjpHDͼ SP785&n6@B@Ķvm߅RrU-]"MV-~덑ߙAα#ˀ2!EQ*˩TȚu]tqen뼩J"`yyÍv۽xT}BY"|v2  ݹֶ3LccǢbO%Ce6C "`EE\.G/G(&OAҫg7}$Xbɓ'8cscI&UBY|Z}DسcUU~(VЗD/^FD1G p{1R3@QԾ߮&B[O]ikۙε>Zdb7o^5q)R7wR(U.Nam,Nq{#K?A;6Rwz2r BA)'_GQ<IͯBShNC[޹P*hE_ \;4UN_0o1hر|oE(ͨ &o֭\.:qRUUz ,P~Lfz{Kgm!v{S{ݱIH{Eɧܼ<_,Zhݰ `cX&ޝwqB=lupU#m/IӢ.ȏZ(Uե/?_ cz_2!aF`GB`ן֏~^UU%ZXRa2h(uMI0}IY<:"68`?%.\0_G ]ޛbl!3g! bѧȲ{Gą//Џص碍Cvܜ,Ǚ3 'PtTLL ӢwJv{3p|wb?2kql:Dd8-NA\fBy5TM1ƈQ` h}lmm;!nj ϥϖ zi3Ǽ"3(҈]DV@*"O੧aGv}oPWSQOGtӍHD*l+dLϷ"fb VE;-7 FJ[UUi&8ĨV>d#^qPnYYY ٺ19%%EEE~~~Cw]֥F^w;?cWcVSPoǧzP4r ooo5O(M*@lզ&JI,fL~ D;nhT]|l׀yj4OWJfc^or=%}`\r%VDm#5qsuz<}UUՇ:E"QЪ`T'pȈxZ ڭNASeKf^ZJqIisX5+X.ɗxxz{MGvb ߛ8f׭v,E.5p4&o03UC9.=<3O+hsҗ$-1V$ ">ϯΗLzF6~ޖF|`ԉٲ3 H\v>bFp&jN}QQߠ  Ws6ի'H$ԃXF?NZa+od Z-"|v?BkY9~Eχ[逽3d 4$Ʀt'H(O-KB,{$F4`%l+-=꽶|VAE&2DwK(JJNA$%! 0->NRrH$OOJi[zYq^>yINeJհg~QQG4!hΜ@ y==y=I)a'4,а;tUԊ{S}W-o(;sUv?WAj4_>Oߘg̹JteÞCm M2=T,{%h4TA<.J6Uǘk$M';j!Z`Eʙ-RaS ig4I)lZ"A% C KWr4郟 x0VjHJN>\ [TkסЖɪ3o|B{TWk3B*UKǥ,( UPXzEIWK( {![<+؍eJ ZbT{ ^DJ}k*{m)lDG )e<3._~XQ!fΚuo`ΰe~KJ~F+G Dx:l8 =i?nSINI]hbةКa0*ӥ,5.8|5Elne96> T5{Oϭzoq:gJM>OK;(Qm:GYVGZ"ߒAwp]F BI>5C"\cZF^o0y?vG]ijb}B$Դ/_\R)D[a5NޱNOEEto\R;J^"f͎*GQT؍7[sNGKJt聡+w?>KߞL5M7Ia"퐝N%/)RhE.lcHˇ` e}Czj*økI&'̛+ֻuz?e`qHuzнRRaXo ?nn7E]PനLq}ԈQ`2zS!cWlABBzyf1&yY0E4iT-V$ eK|aį`,Swˇc%-ʒǠ~Gx8? șaC!q?,A||5w"gJ,D====f\ď(ttܴI%>a&ܻPAɝLs֝ݱtrϥOЂ&ɃfG4pry|kA ZСpu[7qClaDX0o>2 u"{Ó _(q{BavLN-kNMHYsot7qCqޝwo9]oOv;#UW~i"sb =I8zWwg(W$퇩=6?c$FъZh\ЫmQuą 棠I@خO7[Xna5}7D/  Da'L{-ݽ?Ҵة SxMY=zj—2TH$9zlۋmib0tުܼ<0`t/]GDnqYfV~Ȓj)LQԮqD]fݢ?OieSP.Pl+~+Խmk2O Y҆@"v^F-F^w8zJ^#GqL4Z>ӷVV;vu=h`T$uzݞn+6ݛR E"D"=Fӕ|BB܋n/J>9sw0Ѥ\&Dn ^.i<\ۆWA w,P.N!UWC'QJ\A+~Ew{n[/6'@x->[qo8uҘ}ZSbqM 係IM$Z@h Jͮ]c#7n%>tɓ'9[>_b \i^p@M.hhXz{ wZZk=mۏkY9 զ{ssg$S\Rz(:*&&#<{~Մ=!pj>B5e}y<oFzp͆[T@ڻ뵿FxBLe~"]1lYD(BQUX(cZO9OK?j)'_n`eSA<˃ w{(3}[>MY~ߡ$&1c:Go4qg***B=*&MNT ^9Hj{9~4PM>9%h2\kY9H"to< e}\$FSgnJmjw[ƮilTGxl'PHp+(h=#'']pKQlZߔ4p)n6SzZ[Id46ܮQ"2K+8E#~}`\rV$vQ-zA/ ъg|uMVװwhc?~ZZ5D,'D"Q\\@4P~ l^JQٟ.o:.+Ӎ`m+֠n 2 ՜4z V2ad5_ؾSSl>Rf{Ow?nH۞SȥRYƕLNk"@ ?MMj~?:ǎSnUъy͍ 5͔]Hr;r. ;w>Q#wc;5eժO>وK,ntmiΰ6{-+gr+>kY9uk_^*XR{w [/_}ڢF46&LD|~-R^fp(K) ɨwʩ{F@ XZe[)ъ0aa0ǎjjscij~752&￿(11h IqIiʕqBzNaުj/22 ׽M~(5]L+oP (v=0D|7V.?S*zețo}\[zK҅l)PoI'*@ Xڲ撧b@%Z@*l/FBkÆ}Mtt n>=V0,&YFټiӧ>_}אشiS_'{{gV&m  =i?tZ}x6Lfo3EM~ى !ٟܠ -׎^ކ*&`$Z@ ,#EnP+EQ$Z@[ &~>F^`|Ԓ!88U6@,:p jSYv,HbĪxu}ҢUQ^>tHMD0Ϫ‰EhL0֝ݱt.) zs0Ұ+?4OO@ n>⌦D%V$XQUۿ ⒄l2.G@"lIg5EQ?}Fx&4ʪժkP -?c&/cB^SFj"^uMkuSm]~{Fw‡%>j(0sw _6ӧNڅ;#CH F5A Z`(3wϛ'|m\. #"R4::fBWS{;>ιje5;5RKl88N0L^Cݻ205̈9zL}Mzj"|Ϛo~te!w]Љ~1yfϔw9WUߟ'(2`oC@"`+lS\R*3^0еCp]@`%K>}mܵ+ Sx㩿zg##Y~-oiZCHtQOioeLVHߒDjżɝ,UFp? D=qKsۓ.K hE_?:_2C Y^_ E8T)'M4B+4ʢ&@+֢wzEarwyjq^=} HMTU66VqE$c<5NVlm?kY9GN_~M@ ъEQ@7>յw>NN}];ruӫg/^z{ 0RtX WЊ e}̭q85eqpBYШ2yƗk1t>L?XѮV"(׻'k0fBa{ n6]~>i~-@ V4/('':98;;88E{prvurr>z}|8:me,?ŲhcהalV(*^ qsmFY h5'r,UCHW:b)xiQzih0lnRO6P(Jv@ bK6?XcǢCHT[[ba\:\aEEd߾={trrrvqիӳTPVݼ_6f/ X`vﴴF{K%Ww(?Ø9m|w(pWFYs&J-%7ʦMcq)2Zqήt:E=wcN3R)L}/fF(ۿP !CC2e\.VIgg޽]_x{Ыo޽z8@GQv\x)QW^yeW8hƼ"xblܙ5:t`]ߍ=eǼ"LJNA<346٠Fu= hNNΕ+WZ^EQLwWv*k;4YLͲj8av~ǙzhOÆ \Oozk_?[&G0҇CF p{q7mC ъ"ߝ8qZ0&\]][ R'IW>'8I}~WuO===/رF{J>Ь;)" IDATk\9MpB7 ժ3wΏcrIUNlttLIL8!;;xڴݢ]]]E#H{2(U]mZC*Զ 6蹈)j c-,R^upI'z^xzl ! EE:9 2Ă2tEzOyTWk=/ ۠]Ÿ6of*3*zU[&%DX<{(F1@ahtVujl^:CoB=IZfFj"aHaPQQEE|uGޑ 19p+*MA>oP3qwYYYgJ\`U0-3lW@hX%EQu=&2vHf4&')tRFe5LUU'驉 S:PU4 WC'QgYgmԠ%GPTUB +i#uu_W8胿5Ӗ0Wn:{oyhm}yK5@ X$nݺ1R;#FzV(@YEM zUcZ'r;^ÍfZ2NG0;MXeeVmQb'Oj54:4R;uD;Qe+ƽQ}`\ckzXZJ8SrFCED#@ D+v ER}Nkw>A-4x+Eݒ/?pW7V1R/B<#pPxiv7bo'9ao^}ifFmV4(k4g̘&"FPoI׊ شZ&ZJDdz/s*zo thY)EQz[#$>>d +='э#//?Ϝ_n{]q@ D+U+FEQ+ e^n۷y|>I./\~Q$Vֳ3|ɤzo5CMP)|_'{{gG4/Z8JM9%/-6<:/\0brCn7p#lݼe(Zuh>*ꈏ vmab BQs˦@,zF6.%ʐ]%ƳRU;ͨQͿ-꽙x>EQ0KJ7ߥKٟfbkN@ D+vH(Ǝ+ݵ͛RiıcǺ *)w>TcUqEBщ ,l{MŲeg:Fwh:S?؀Q&,*ʼ i e%KA0:4s` QG K#:Q(C$e;;wjQūl:DNZaXJ|R+,T덑o{N}ЏI#f\s5yޖ(2l n"&KJÎA>@@ ZhEn<=g (7/yla޽0ɱcjo'OdCqŪ~HtEPlKad\%,3g:cg2ct样xGtHQڴe6j3ysŠP α5 ðDm橿/Wӥ3Pb7QBrg`W+Gju;fK `q2._=Sf08Qہmob\}@ ZkĘQ1޳gO3B#t}x"0D"A:SʹconoToWxcNQ'}7_Fw׮Q:9%u0(|x6Hܰ!Đb?X Q&!V:vڷS);&:PyhgՐ^]õmepY.8,Ҍ˙:EQv&2 E_[wL1py|vz:K;@ iE.䚼Έ#x-eb=:q~󝜜߿k ZYGSF;6V=w8?Lh8eR#[0o>jܪ Xu$FjcG0=pogd8N٠"h(ܲ>LZ)EXe6}a 8V wqO/(Bh [+N13M hgG+b%pLM]Ǧct);Y Jtuu[VTurʿoDK-l*ʒQj",=g6v7Ve`>/hc ;dÆO8RCH[jϕ7K*IFnM~gdXR 6DX^ dAEO_T75'je\UUի-m-@ V^1I 51;0S+VUUoڼ њ"< r320L#(KZ$Nbiu]j]niiK&fXij5pWP\b8w|9s03l~?;f ͗}Ӭg.;WtgktƈvÞ zmuGm{'0b0x4,]~vBCC9j+ Qylk .[Mzbt~GׂJrҕlh]Mrs^w'<_ )77o #IAcQ'].G?%rٟHv!+K,ԉYQzG߿cNDtJCرsf֟1X.HeEA0.]oMlתVUk?SP4j"G,4UDQ(9+^t$hBy3ft 9qD6§QQUpͷ|qI3Z@7V$A~vJ~Y9gϞqjf6M|GuNN?vVNNHB-p[jQO)_ڠ\*h +XBXpRs8}C0,.1/O[L^OLL$?| ئ]O5_)GQ03udN&N~MC?0 "YnMtEdiiȓjٿܒDYKB5…7xYNΩ09i S"wGF;W9@VeF"4hPhz^laݻo¢xD /Trh4El^pkbՎp fXsUvYgW˥Q}^fau]_\R*U⣚lQÛ.ְϾ&]SR&M.ߖ5]gMZzWzo~%S/K mNﻬX8Dcxܗ0d~3Ėqqv̏K_ȭ6ҭ=6Tw%ˉ4~'N'>(vh7hVT5'aq$eDt'< ʿO 7o 6lRv2?2{9j>~4@{j.Al+Ochs xgEib#oCD7 sek٩hS6 -5sO)klձtcQ&_U[dEqRU!jr w 2*Z-?={VZZ - t(dKR; oCV&"Z{{9%~-f<ڡFJ9$"٭kv izd<})"mNXhV+ܦ(jCa8m\ :u&щLY5qDaU~]oޘ4o3? 5+2**zCtھ/:]vk(ap|hD^Wd]$ZSϹGCb;4oiϹ+1CCB:[pf#dڨ+ZXc-Oʂ[mccc8[,Z6/XxﱱV~FsƽNjgfh-z6}*ݚJr݉\ͦ X`66;z/J7\+ c,UND  OUt4+;ge. ڜ7 @VDV89Z{TmDpѩE|%UkLCL`%OC16M5y76}8 _w}eX{>hiSCubk"RZR cBӚ\nj_;O_` e>x#j4d61hs PϨTIv+H@eX)oϞ<~:mVE_ȶ&F1}T%X53Jx9|͛ N:qBZZڤI,utZ9D7ӥܒ}wLGX˃4."[BC[Mjejgm|qI%x6  GŽKo\g=uݺ)[Yˍ|e&GiPQyAdkFNNTTdҹvc^" jt(nsA;mRQ\I-}O~^zqح[c;/]'lylk(*WݙiA"[Gd뻣Q(޻܀tt=jmYyl4o{۟lvСoڜcXjg#:H)+TZQڭ/8c9RnɔnlSh(ZmMlߤ+5sɱw%wժ5Q$"Q5-:u>4DJ7G?j{;nKNݼ˧btyM .53_yG^l<Ɩ؟yO91+;ݕ̟:65lRme赵{]1+J9mc-ىMM q1& #(?\wߋ +7HDlƀG3(*zn h?6|ӦM1 ۜ#{?ʶ&lYefI4.JK^ ~->Mͯ!̌W!ࠂ"8YUMh;9$-A]eff ttbkl/w5wtU[pq$ft2 7eCdEp>+VkuEOY n?I|bpbfѪ{thȞ{Ng𭉏=6oMQ8'77/::СC9o<4eu(ݚxw{ t( @A9_~I~'YܻEqn8@VPD6KL{- ȊPY66+*phƌӷGqܹZyI&^FD^) Eߧmb0jk]dE%92#}E'ng̓toz[Tn5EqG]ڔcâ˒SRWl-]jmJ Ξt25kӫWO5 VjΙjun;8-@VfJ6m>hǾ8.??k7.e-ư)kO5a#ۖ[ng0Ě\((F6jgEEkN2L'pNݤbvLS_4]3L"B#'mkL+SܤLQPDSW kjMz}X?Hqٔ4~tno>"}1+1K+BJoB]Y͊4\8]{H#"_h|}}}ߛlkmߞ[kHp,9Z]l3,^,5hT0]K W:1єZT8U8mȳbEEeuEO~v-u޽5t?o)-\a">t< 5@?L)qQQ?,)Є/x< =KMҟHIէ<:btZ9WŪvit](bjiΑ~ikmڮ\>jq-R;YʊU5PFTdEETW'zz6m37.\NA/ۜ=幍lƈvl԰{8ݱ{ehruS5zoW&QUR"jC"Ѐ-7ݞU6P)<#nwc ضcW\RZD zGfUh(zoW,ڽ=(C;%c Rhݖxc&U?e*\Z\Rj~pP@~= IDAT)p2*'=>@YT뽽mK xɓ5 7rŕXͻED!`pW'[vaT&9/v>FS,asVwܙJDs}vTZ X،,f}wy-@RFUjK_Gd;fZ 'CF6{]j>RÉxr'cMj$ J~ۊ=D7.4)W6k)k\ MEQlcK׮o9pQO]]+VU~E{*ʢךq!",X kl\VJpn#5iG<3gTݏ(>[D_uADA0N=uWGaSNDC?|$%}w<8GEq[$𠴅(.d%#v7@ڳS;NE D qq׮=G6$\5iȨޮ*߯N3O##KFm邵rK~R'Z)wMatR-ݩ,.Z|"tl[ bebUeB{rB WWv sW^{c\3N~ bAQ6#{k+2(fȏ$"4U>B:c3]Ck#0+ۛe3Xe(֊Ve]v^ɉlbxxx/?p4݇}>gu Qxʳ#%xD jrmԚl(Z\R:u2v9l$beIWG\"9XW 裏D 6qqܹcu?M!,Eڽte, V/善/ڸYct7ɦ9,9~DԾWxG Vʛ:}QxgbE1;Fal9ۛ_kG|EEE9_“d oZ'j>\ʾ2<):JDeշ=TU:,1GS~rx{ 9i r_]nL)9PD{aĐ=k>޵h}Pm1rdО:Qm\l{|`F4Gգqji3$[5BY~~[&X#`Aާ!AE w'ʚh5^>hH3^wm͝/^3 _Y%W^7֣Ɋ2d {j6.}~R.qM7/7鉥OW]^7lon~Sиld[f6H,F)kk E,Y}CzRv/rvxDn[,(;ywC\X{;F :g_^x>7OmاK؆CDQݺm(r68Նj7.?IT 6rKƞKR-JƬ|J7?y~˰o*r>Y͢c.]ξ^z:ZՐy_hC*dEWSO*giTr"Y(]Z>NxyDOFo\zVFCcͲKFf Yu9u?w]s_6ʑMUNt{#?'fB;{pW]oذMqIt"kl@mg"VwO`QmG5-ձy&jQQѱ[Lh| Qּ̔~6( K.hVj]c)i+# RqSR vs+sĐ)c Y9+cwIhHpFѣ:+fP_}؟9"} ͬF5V%B(9s%K*m +JCź=?܋7 p۸t|u7&Ɍ3Z@7ABK%|Cc͓6NZ[yt]ww?ls,)w'1fDͤF\$F@}̊Vb~W|IҸ\kuEkV~8˗/W3&.ℯP; QyR-[2h:[!:mA0Zj"X38օ[5󧎓$ܝ(FWQ `lN<٧Oxl얛wpE|"I+ҽ Jޝ;wN>e4<۪)œ'Emț{sMKՠAKvݛ6PxFib-:7ųʵ,xh<~1Ŕ[iZ "/54񾘞F,/e 6mauW"5[Ws> {(c'd!pA;(II#  NVWT+VEѣTD qQќ<1[I쀪܅:JKkw s )fUh(zoWFLHpЬqgiSe7Qm؝SV6aN(ոzzӑN7GF0=9[hϋQ/be]R\R@-]`_<ݦIw#.ڟ[UٰT^Œ!_#?9wEbz~=NSR?kzOI8"v~r2v'}EQTe'A Y A0s7[o.:աC:t¸LeU[Wt":i*&j&MZz5ސu%%Lqu4QP%Ie׆|~ytvfVx{i C?|$%=IHްGʼns!G<}CN-EjjDYiNEF2o֓ys >Y+:-~{jQQQ*?Qşl$"i[ ~eҵ}JQ+X,88ϐ\w_}W3H{o2ԫ׮/ڴ? @}ˊ`T_c{G*dj /R_tP͛##!!]FFFWr߸ts̡bȾ0ss)O0P#U׮9&ucs=+2W e_ۤ]LmqH`׾?(ܕj\ PW(AfMɷ# #Bqj|OhI&7NqT&j4mNO`#4BCCtY۳Cnղ݉XckqIi<g>p_TVh(gkذ:޼ySZ$,.):B0_WogV 4"2EA0 kF^xF ^6U5_۷o'3ܵs۾}ޏa*ka'؎%"[eJEQt[}ʣ(~,V3EQt9{ {?ҝ9q:NiiNg ?2$8hʘឍ͊ZߘW:8qPc-}®9(;)@ϊDOW0+$z(`iIg*Z( yfQF:lSiiiEuEiPh*SaC{mR>Q?Ν;ǯ ~<1"Zzqku#.[,E$YG*̥̊ܒ_eDt{-:و[BD~->2,u"%"Mom^oK{ژiWvϯYFTdEK]R=u 3b>OV(c^uE"99NAj@[kZWNj릩Xb|KLGR>Q#vf5q۶={~{kląV`\rAŷ#Qj^MaEBQ n6lʷY%1Ujߪ}S/":~=h1+R^c ljz ]2w\"JLLcBBBcضXmMU3Wv^|'1O4f(8|Qgff1Z\ Gi>D]JlbpxGTyƿm*v5MW{xЃ KJmXkVSd#mk"fabSt}jJ0v{U'{7=zZkѼybb{K4.Z$M>^D#k+Nw2&gD.~GA'24%J!ꊵ$jB7eчJRn~Ove.3Ez[,\L1.()$o ^n UNt>,Lz(x9=r] v3}ƌg͜^|'ti׶jo~YD4x kNEk h%gZ̛6惷,VdEmDWJ MmT%61Onɼ ߚHWeaRvj='XxqӦM*<҉>Dd4 a033S=v>Px8DܬA~ o8A,vFu5] |usKԮ~ccEfi[5iqi*XfVK{c6ԅ^9*䎣ߥCV1lش QZ Ϯa{333񂗝P+L|t"ލ"3Y+r͚<}᥈p HFޙ&&h)"OWސD`?dj9,a5 y(gj^;X1O4V}@D]ۖ'=w2yB;扒wp7e/e>h/T:ۆ5rĠBBGS3KztJ%ߚ2a81 )+ZYSטQAb^DF״,.-|*텩'v0 دU+N(8VMPEb>&A0Rv %̆1dO![DRfvLE6\]y,4b]Om0S՞L6Qr ap ퟯ((+ +V>4t\v:|uv,lJ^?ۯ( uοj|g]3F+SgJ>yM{h]RedEe@VSmM; ~p^<0ɀ[ܯس>vSF֔y>x(lU̮0=8YEQ:q(.Nn?SrӵJ^*c<( g`0 EҜC,1'gۑ'|MS n|D:HU3DIDATZ;gYtD]ٛVj[vdạڣEKۜ/7|y#'FN4ηv""5-pCa_% ~ ސoR7dXػ(&wlZ> {ȐDDBŦ}\$g {֭ʖD]u ;pgAqmպ%|0cf=gꕗ/_!y֩j"wU0 O5@fa(ڛc(c,"WШVG9|H~~6X$--͞2Cv2 Ca¸^zmܹl㍜^iS5X XGh}YO>YulbY?G͛kמhXhMP>; 92bD|EdEדCy^8S78ct#)NSH&IJ5xpDl얼!+3JS_uiTW$]K^x`fVrmjluEin$]r-$+mH(AW=u)ZtP̎1 :jso)눣O'VlLaT*"kモ/88,^h0~޾Yx>cЂ=OQZfQ8gjz5t:tz߯_r՚Po҆?習/F³( +ڔO'[hHReXtJE'SvLMTl&eSvFէSr)ԙǶJR}eXȖw$"i[f͛_zմiϊlaz YCb չ/}'T>0eZ!CPdEyc ol@GE tGjn.ytYvk^ ȊCYHUؔ V-+U*"JDKVV9,?{d?]r8tdE;ͽ' iRwYZ,nMNVS8A]gδL"Ψ*KTYufT3Wݢ*δ8j_ssSSvl=K'Ԓ:մ1V1Оul'![ZC5mo즓O#%v-bPnMNچjhWUi0ݒv5? )u%ʃ:Q8;Zf(`Ք*LEQAS,(Hy:#(/rLE'[=DSҮ0jܙS'fC$\ eiMqrC]-Nq#- ~bMPgL}}LiKD$6coAYQU9¡,UP8r,CM8W- $6abDD@Vj!kӐCcȊf հ-ˀid<:OVAA0s77Ub,ar zȊ6eMBc,1,zޔrklŒ ֩bIY<,m#ꐆ))j,qG2 +Vv25~#͊N;.P?Q4 2%Ce":BP~;&1 +BeUEI>ǫifB"Rn4몊ȊP?cY.eS0FHɐF@VU*_"Ɇ{OŲ2A}ا{`_. **Data Wrangler** - http://vis.stanford.edu/wrangler/ - http://vis.stanford.edu/papers/wrangler - http://pypi.python.org/pypi/DataWrangler A web application for exploring, transforming and cleaning tabular data, in a similar vein to Google Refine but with a strong focus on usability, and more capabilities for transforming tables, including folding/unfolding (similar to R reshape's melt/cast) and cross-tabulation. Currently a client-side only web application, not available for download. There is also a Python library providing data transformation functions as found in the GUI. The research paper has a good discussion of data transformation and quality issues, esp. w.r.t. tool usability. **Pentaho Data Integration (a.k.a. Kettle)** - http://kettle.pentaho.com/ - http://wiki.pentaho.com/display/EAI/Getting+Started - http://wiki.pentaho.com/display/EAI/Pentaho+Data+Integration+Steps **SnapLogic** - http://www.snaplogic.com - https://www.snaplogic.org/Documentation/3.2/ComponentRef/index.html A data integration platform, where ETL components are web resources with a RESTful interface. Standard components for transforms like filter, join and sort. **Talend** - http://www.talend.com **Jaspersoft ETL** - http://www.jaspersoft.com/jasperetl **CloverETL** - http://www.cloveretl.com/ **Apatar** - http://apatar.com/ **Jitterbit** - http://www.jitterbit.com/ **Scriptella** - http://scriptella.javaforge.com/ **Kapow Katalyst** - http://kapowsoftware.com/products/kapow-katalyst-platform/index.php - http://kapowsoftware.com/products/kapow-katalyst-platform/extraction-browser.php - http://kapowsoftware.com/products/kapow-katalyst-platform/transformation-normalization.php **Flat File Checker (FlaFi)** - http://www.flat-file.net/ **Orange** - http://orange.biolab.si/ **North Concepts Data Pipeline** - http://northconcepts.com/data-pipeline/ **SAS Clinical Data Integration** - http://www.sas.com/industry/pharma/cdi/index.html **R Reshape Package** - http://had.co.nz/reshape/ **TableFu** - http://propublica.github.com/table-fu/ **python-tablefu** - https://github.com/eyeseast/python-tablefu **pygrametl (Python package)** - http://www.pygrametl.org/ - http://people.cs.aau.dk/~chr/pygrametl/pygrametl.html - http://dbtr.cs.aau.dk/DBPublications/DBTR-25.pdf **etlpy (Python package)** - http://sourceforge.net/projects/etlpy/ - http://etlpy.svn.sourceforge.net/viewvc/etlpy/source/samples/ Looks abandoned since 2009, but there is some code. **OpenETL** - https://launchpad.net/openetl - http://bazaar.launchpad.net/~openerp-commiter/openetl/OpenETL/files/head:/lib/openetl/component/transform/ **Data River** - http://www.datariver.it/ **Ruffus** - http://www.ruffus.org.uk/ **PyF** - http://pyfproject.org/ **PyDTA** - http://presbrey.mit.edu/PyDTA **Google Fusion Tables** - http://www.google.com/fusiontables/Home/ **pivottable (Python package)** - http://pypi.python.org/pypi/pivottable/0.8 **PrettyTable (Python package)** - http://pypi.python.org/pypi/PrettyTable **PyTables (Python package)** - http://www.pytables.org/ **plyr** - http://plyr.had.co.nz/ **Tablib** - https://github.com/jazzband/tablib - https://tablib.readthedocs.io Tablib is an MIT Licensed format-agnostic tabular dataset library, written in Python. It allows you to import, export, and manipulate tabular data sets. Advanced features include segregation, dynamic columns, tags & filtering, and seamless format import & export. **PowerShell** - http://technet.microsoft.com/en-us/library/ee176874.aspx - Import-Csv - http://technet.microsoft.com/en-us/library/ee176955.aspx - Select-Object - http://technet.microsoft.com/en-us/library/ee176968.aspx - Sort-Object - http://technet.microsoft.com/en-us/library/ee176864.aspx - Group-Object **SwiftRiver** - http://ushahidi.com/products/swiftriver-platform **Data Science Toolkit** - http://www.datasciencetoolkit.org/about **IncPy** - http://www.stanford.edu/~pgbovine/incpy.html Doesn't have any ETL functionality, but possibly (enormously) relevant to exploratory development of a transformation pipeline, because you could avoid having to rerun the whole pipeline every time you add a new step. **Articles, Blogs, Other** - http://metadeveloper.blogspot.com/2008/02/iron-python-dsl-for-etl.html - http://www.cs.uoi.gr/~pvassil/publications/2009_IJDWM/IJDWM_2009.pdf - http://web.tagus.ist.utl.pt/~helena.galhardas/ajax.html - http://stackoverflow.com/questions/1321396/what-are-the-required-functionnalities-of-etl-frameworks - http://stackoverflow.com/questions/3762199/etl-using-python - http://www.jonathanlevin.co.uk/2008/03/open-source-etl-tools-vs-commerical-etl.html - http://www.quora.com/ETL/Why-should-I-use-an-existing-ETL-vs-writing-my-own-in-Python-for-my-data-warehouse-needs - http://synful.us/archives/41/the-poor-mans-etl-python - http://www.gossamer-threads.com/lists/python/python/418041?do=post_view_threaded#418041 - http://code.activestate.com/lists/python-list/592134/ - http://fuzzytolerance.info/code/open-source-etl-tools/ - http://www.protocolostomy.com/2009/12/28/codekata-4-data-munging/ - http://www.hanselman.com/blog/ParsingCSVsAndPoorMansWebLogAnalysisWithPowerShell.aspx - nice example of a data transformation problem, done in PowerShell - http://www.datascience.co.nz/blog/2011/04/01/the-science-of-data-munging/ - http://wesmckinney.com/blog/?p=8 - on grouping with pandas - http://stackoverflow.com/questions/4341756/data-recognition-parsing-filtering-and-transformation-gui On memoization... - http://wiki.python.org/moin/PythonDecoratorLibrary#Memoize - http://code.activestate.com/recipes/577219-minimalistic-memoization/ - http://ubuntuforums.org/showthread.php?t=850487 petl-1.7.15/docs/transform.rst000066400000000000000000000212461457414240700162630ustar00rootroot00000000000000.. module:: petl.transform Usage - transforming rows and columns ===================================== .. module:: petl.transform.basics .. _transform_basics: Basic transformations --------------------- .. autofunction:: petl.transform.basics.head .. autofunction:: petl.transform.basics.tail .. autofunction:: petl.transform.basics.rowslice .. autofunction:: petl.transform.basics.cut .. autofunction:: petl.transform.basics.cutout .. autofunction:: petl.transform.basics.movefield .. autofunction:: petl.transform.basics.cat .. autofunction:: petl.transform.basics.stack .. autofunction:: petl.transform.basics.skipcomments .. autofunction:: petl.transform.basics.addfield .. autofunction:: petl.transform.basics.addfields .. autofunction:: petl.transform.basics.addcolumn .. autofunction:: petl.transform.basics.addrownumbers .. autofunction:: petl.transform.basics.addfieldusingcontext .. autofunction:: petl.transform.basics.annex .. module:: petl.transform.headers .. _transform_headers: Header manipulations -------------------- .. autofunction:: petl.transform.headers.rename .. autofunction:: petl.transform.headers.setheader .. autofunction:: petl.transform.headers.extendheader .. autofunction:: petl.transform.headers.pushheader .. autofunction:: petl.transform.headers.prefixheader .. autofunction:: petl.transform.headers.suffixheader .. autofunction:: petl.transform.headers.sortheader .. autofunction:: petl.transform.headers.skip .. module:: petl.transform.conversions .. _transform_conversions: Converting values ----------------- .. autofunction:: petl.transform.conversions.convert .. autofunction:: petl.transform.conversions.convertall .. autofunction:: petl.transform.conversions.convertnumbers .. autofunction:: petl.transform.conversions.replace .. autofunction:: petl.transform.conversions.replaceall .. autofunction:: petl.transform.conversions.format .. autofunction:: petl.transform.conversions.formatall .. autofunction:: petl.transform.conversions.interpolate .. autofunction:: petl.transform.conversions.interpolateall .. autofunction:: petl.transform.conversions.update .. module:: petl.transform.selects .. _transform_selects: Selecting rows -------------- .. autofunction:: petl.transform.selects.select .. autofunction:: petl.transform.selects.selectop .. autofunction:: petl.transform.selects.selecteq .. autofunction:: petl.transform.selects.selectne .. autofunction:: petl.transform.selects.selectlt .. autofunction:: petl.transform.selects.selectle .. autofunction:: petl.transform.selects.selectgt .. autofunction:: petl.transform.selects.selectge .. autofunction:: petl.transform.selects.selectrangeopen .. autofunction:: petl.transform.selects.selectrangeopenleft .. autofunction:: petl.transform.selects.selectrangeopenright .. autofunction:: petl.transform.selects.selectrangeclosed .. autofunction:: petl.transform.selects.selectcontains .. autofunction:: petl.transform.selects.selectin .. autofunction:: petl.transform.selects.selectnotin .. autofunction:: petl.transform.selects.selectis .. autofunction:: petl.transform.selects.selectisnot .. autofunction:: petl.transform.selects.selectisinstance .. autofunction:: petl.transform.selects.selecttrue .. autofunction:: petl.transform.selects.selectfalse .. autofunction:: petl.transform.selects.selectnone .. autofunction:: petl.transform.selects.selectnotnone .. autofunction:: petl.transform.selects.selectusingcontext .. autofunction:: petl.transform.selects.rowlenselect .. autofunction:: petl.transform.selects.facet .. autofunction:: petl.transform.selects.biselect .. module:: petl.transform.regex .. _transform_regex: Regular expressions ------------------- .. autofunction:: petl.transform.regex.search .. autofunction:: petl.transform.regex.searchcomplement .. autofunction:: petl.transform.regex.sub .. autofunction:: petl.transform.regex.split .. autofunction:: petl.transform.regex.splitdown .. autofunction:: petl.transform.regex.capture .. module:: petl.transform.unpacks .. _transform_unpacks: Unpacking compound values ------------------------- .. autofunction:: petl.transform.unpacks.unpack .. autofunction:: petl.transform.unpacks.unpackdict .. module:: petl.transform.maps .. _transform_maps: Transforming rows ----------------- .. autofunction:: petl.transform.maps.fieldmap .. autofunction:: petl.transform.maps.rowmap .. autofunction:: petl.transform.maps.rowmapmany .. autofunction:: petl.transform.maps.rowgroupmap .. module:: petl.transform.sorts .. _transform_sorts: Sorting ------- .. autofunction:: petl.transform.sorts.sort .. autofunction:: petl.transform.sorts.mergesort .. autofunction:: petl.transform.sorts.issorted .. module:: petl.transform.joins .. _transform_joins: Joins ----- .. autofunction:: petl.transform.joins.join .. autofunction:: petl.transform.joins.leftjoin .. autofunction:: petl.transform.joins.lookupjoin .. autofunction:: petl.transform.joins.rightjoin .. autofunction:: petl.transform.joins.outerjoin .. autofunction:: petl.transform.joins.crossjoin .. autofunction:: petl.transform.joins.antijoin .. autofunction:: petl.transform.joins.unjoin .. autofunction:: petl.transform.hashjoins.hashjoin .. autofunction:: petl.transform.hashjoins.hashleftjoin .. autofunction:: petl.transform.hashjoins.hashlookupjoin .. autofunction:: petl.transform.hashjoins.hashrightjoin .. autofunction:: petl.transform.hashjoins.hashantijoin .. module:: petl.transform.setops .. _transform_setops: Set operations -------------- .. autofunction:: petl.transform.setops.complement .. autofunction:: petl.transform.setops.diff .. autofunction:: petl.transform.setops.recordcomplement .. autofunction:: petl.transform.setops.recorddiff .. autofunction:: petl.transform.setops.intersection .. autofunction:: petl.transform.setops.hashcomplement .. autofunction:: petl.transform.setops.hashintersection .. module:: petl.transform.dedup .. _transform_dedup: Deduplicating rows ------------------ .. autofunction:: petl.transform.dedup.duplicates .. autofunction:: petl.transform.dedup.unique .. autofunction:: petl.transform.dedup.conflicts .. autofunction:: petl.transform.dedup.distinct .. autofunction:: petl.transform.dedup.isunique .. module:: petl.transform.reductions .. _transform_reductions: Reducing rows (aggregation) --------------------------- .. autofunction:: petl.transform.reductions.aggregate .. autofunction:: petl.transform.reductions.rowreduce .. autofunction:: petl.transform.reductions.mergeduplicates .. autofunction:: petl.transform.reductions.merge .. autofunction:: petl.transform.reductions.fold .. autofunction:: petl.transform.reductions.groupcountdistinctvalues .. autofunction:: petl.transform.reductions.groupselectfirst .. autofunction:: petl.transform.reductions.groupselectlast .. autofunction:: petl.transform.reductions.groupselectmin .. autofunction:: petl.transform.reductions.groupselectmax .. module:: petl.transform.reshape .. _transform_reshape: Reshaping tables ---------------- .. autofunction:: petl.transform.reshape.melt .. autofunction:: petl.transform.reshape.recast .. autofunction:: petl.transform.reshape.transpose .. autofunction:: petl.transform.reshape.pivot .. autofunction:: petl.transform.reshape.flatten .. autofunction:: petl.transform.reshape.unflatten .. module:: petl.transform.fills .. _transform_fills: Filling missing values ---------------------- .. autofunction:: petl.transform.fills.filldown .. autofunction:: petl.transform.fills.fillright .. autofunction:: petl.transform.fills.fillleft .. module:: petl.transform.validation .. _transform_validation: Validation ---------- .. autofunction:: petl.transform.validation.validate .. module:: petl.transform.intervals .. _transform_intervals: Intervals (intervaltree) ------------------------ .. note:: The following functions require the package `intervaltree `_ to be installed, e.g.:: $ pip install intervaltree .. autofunction:: petl.transform.intervals.intervaljoin .. autofunction:: petl.transform.intervals.intervalleftjoin .. autofunction:: petl.transform.intervals.intervaljoinvalues .. autofunction:: petl.transform.intervals.intervalantijoin .. autofunction:: petl.transform.intervals.intervallookup .. autofunction:: petl.transform.intervals.intervallookupone .. autofunction:: petl.transform.intervals.intervalrecordlookup .. autofunction:: petl.transform.intervals.intervalrecordlookupone .. autofunction:: petl.transform.intervals.facetintervallookup .. autofunction:: petl.transform.intervals.facetintervallookupone .. autofunction:: petl.transform.intervals.facetintervalrecordlookup .. autofunction:: petl.transform.intervals.facetintervalrecordlookupone .. autofunction:: petl.transform.intervals.intervalsubtract .. autofunction:: petl.transform.intervals.collapsedintervals petl-1.7.15/docs/util.rst000066400000000000000000000057361457414240700152330ustar00rootroot00000000000000.. module:: petl.util Utility functions ================= Basic utilities --------------- .. autofunction:: petl.util.base.header .. autofunction:: petl.util.base.fieldnames .. autofunction:: petl.util.base.data .. autofunction:: petl.util.base.values .. autofunction:: petl.util.base.dicts .. autofunction:: petl.util.base.namedtuples .. autofunction:: petl.util.base.records .. autofunction:: petl.util.base.expr .. autofunction:: petl.util.base.rowgroupby .. autofunction:: petl.util.base.empty Visualising tables ------------------ .. autofunction:: petl.util.vis.look .. autofunction:: petl.util.vis.lookall .. autofunction:: petl.util.vis.see .. autofunction:: petl.util.vis.display .. autofunction:: petl.util.vis.displayall Lookup data structures ---------------------- .. autofunction:: petl.util.lookups.lookup .. autofunction:: petl.util.lookups.lookupone .. autofunction:: petl.util.lookups.dictlookup .. autofunction:: petl.util.lookups.dictlookupone .. autofunction:: petl.util.lookups.recordlookup .. autofunction:: petl.util.lookups.recordlookupone Parsing string/text values -------------------------- .. autofunction:: petl.util.parsers.dateparser .. autofunction:: petl.util.parsers.timeparser .. autofunction:: petl.util.parsers.datetimeparser .. autofunction:: petl.util.parsers.boolparser .. autofunction:: petl.util.parsers.numparser Counting -------- .. autofunction:: petl.util.counting.nrows .. autofunction:: petl.util.counting.valuecount .. autofunction:: petl.util.counting.valuecounter .. autofunction:: petl.util.counting.valuecounts .. autofunction:: petl.util.counting.stringpatterncounter .. autofunction:: petl.util.counting.stringpatterns .. autofunction:: petl.util.counting.rowlengths .. autofunction:: petl.util.counting.typecounter .. autofunction:: petl.util.counting.typecounts .. autofunction:: petl.util.counting.parsecounter .. autofunction:: petl.util.counting.parsecounts Timing ------ .. autofunction:: petl.util.timing.progress .. autofunction:: petl.util.timing.log_progress .. autofunction:: petl.util.timing.clock Statistics ---------- .. autofunction:: petl.util.statistics.limits .. autofunction:: petl.util.statistics.stats Materialising tables -------------------- .. autofunction:: petl.util.materialise.columns .. autofunction:: petl.util.materialise.facetcolumns .. autofunction:: petl.util.materialise.listoflists .. autofunction:: petl.util.materialise.listoftuples .. autofunction:: petl.util.materialise.tupleoflists .. autofunction:: petl.util.materialise.tupleoftuples .. autofunction:: petl.util.materialise.cache Randomly generated tables ------------------------- .. autofunction:: petl.util.random.randomtable .. autofunction:: petl.util.random.dummytable Miscellaneous ------------- .. autofunction:: petl.util.misc.typeset .. autofunction:: petl.util.misc.diffheaders .. autofunction:: petl.util.misc.diffvalues .. autofunction:: petl.util.misc.strjoin .. autofunction:: petl.util.misc.nthword .. autofunction:: petl.util.misc.coalesce petl-1.7.15/examples/000077500000000000000000000000001457414240700143775ustar00rootroot00000000000000petl-1.7.15/examples/comparison.py000066400000000000000000000005411457414240700171230ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division import petl as etl table = [['foo', 'bar'], ['a', 1], ['b', None]] # raises exception under Python 3 etl.select(table, 'bar', lambda v: v > 0) # no error under Python 3 etl.selectgt(table, 'bar', 0) # or ... etl.select(table, 'bar', lambda v: v > etl.Comparable(0)) petl-1.7.15/examples/intro.py000066400000000000000000000015231457414240700161050ustar00rootroot00000000000000from __future__ import division, print_function, absolute_import example_data = """foo,bar,baz a,1,3.4 b,2,7.4 c,6,2.2 d,9,8.1 """ with open('example.csv', 'w') as f: f.write(example_data) import petl as etl table1 = etl.fromcsv('example.csv') table2 = etl.convert(table1, 'foo', 'upper') table3 = etl.convert(table2, 'bar', int) table4 = etl.convert(table3, 'baz', float) table5 = etl.addfield(table4, 'quux', lambda row: row.bar * row.baz) table5 table = ( etl .fromcsv('example.csv') .convert('foo', 'upper') .convert('bar', int) .convert('baz', float) .addfield('quux', lambda row: row.bar * row.baz) ) table l = [['foo', 'bar'], ['a', 1], ['b', 2], ['c', 2]] table = etl.wrap(l) table.look() l = [['foo', 'bar'], ['a', 1], ['b', 2], ['c', 2]] table = etl.wrap(l) table etl.config.look_index_header = True table petl-1.7.15/examples/io/000077500000000000000000000000001457414240700150065ustar00rootroot00000000000000petl-1.7.15/examples/io/csv.py000066400000000000000000000011751457414240700161570ustar00rootroot00000000000000from __future__ import division, print_function, absolute_import # fromcsv() ########### import petl as etl import csv # set up a CSV file to demonstrate with table1 = [['foo', 'bar'], ['a', 1], ['b', 2], ['c', 2]] with open('example.csv', 'w') as f: writer = csv.writer(f) writer.writerows(table1) # now demonstrate the use of fromcsv() table2 = etl.fromcsv('example.csv') table2 # tocsv() ######### import petl as etl table1 = [['foo', 'bar'], ['a', 1], ['b', 2], ['c', 2]] etl.tocsv(table1, 'example.csv') # look what it did print(open('example.csv').read()) petl-1.7.15/examples/io/html.py000066400000000000000000000004431457414240700163250ustar00rootroot00000000000000from __future__ import division, print_function, absolute_import # tohtml() ########## import petl as etl table1 = [['foo', 'bar'], ['a', 1], ['b', 2], ['c', 2]] etl.tohtml(table1, 'example.html', caption='example table') print(open('example.html').read()) petl-1.7.15/examples/io/json.py000066400000000000000000000017031457414240700163320ustar00rootroot00000000000000from __future__ import division, print_function, absolute_import # fromjson() ############ import petl as etl data = ''' [{"foo": "a", "bar": 1}, {"foo": "b", "bar": 2}, {"foo": "c", "bar": 2}] ''' with open('example.json', 'w') as f: f.write(data) table1 = etl.fromjson('example.json') table1 # fromdicts() ############# import petl as etl dicts = [{"foo": "a", "bar": 1}, {"foo": "b", "bar": 2}, {"foo": "c", "bar": 2}] table1 = etl.fromdicts(dicts) table1 # tojson() ########## import petl as etl table1 = [['foo', 'bar'], ['a', 1], ['b', 2], ['c', 2]] etl.tojson(table1, 'example.json', sort_keys=True) # check what it did print(open('example.json').read()) # tojsonarrays() ################ import petl as etl table1 = [['foo', 'bar'], ['a', 1], ['b', 2], ['c', 2]] etl.tojsonarrays(table1, 'example.json') # check what it did print(open('example.json').read()) petl-1.7.15/examples/io/numpy.py000066400000000000000000000016631457414240700165360ustar00rootroot00000000000000from __future__ import division, print_function, absolute_import # toarray() ########### import petl as etl table = [('foo', 'bar', 'baz'), ('apples', 1, 2.5), ('oranges', 3, 4.4), ('pears', 7, .1)] a = etl.toarray(table) a # the dtype can be specified as a string a = etl.toarray(table, dtype='a4, i2, f4') a # the dtype can also be partially specified a = etl.toarray(table, dtype={'foo': 'a4'}) a # fromarray() ############# import petl as etl import numpy as np a = np.array([('apples', 1, 2.5), ('oranges', 3, 4.4), ('pears', 7, 0.1)], dtype='U8, i4,f4') table = etl.fromarray(a) table # valuestoarray() ################# import petl as etl table = [('foo', 'bar', 'baz'), ('apples', 1, 2.5), ('oranges', 3, 4.4), ('pears', 7, .1)] table = etl.wrap(table) table.values('bar').array() # specify dtype table.values('bar').array(dtype='i4') petl-1.7.15/examples/io/pandas.py000066400000000000000000000010141457414240700166220ustar00rootroot00000000000000from __future__ import division, print_function, absolute_import # todataframe() ############### import petl as etl table = [('foo', 'bar', 'baz'), ('apples', 1, 2.5), ('oranges', 3, 4.4), ('pears', 7, .1)] df = etl.todataframe(table) df # fromdataframe() ################# import petl as etl import pandas as pd records = [('apples', 1, 2.5), ('oranges', 3, 4.4), ('pears', 7, 0.1)] df = pd.DataFrame.from_records(records, columns=('foo', 'bar', 'baz')) table = etl.fromdataframe(df) table petl-1.7.15/examples/io/pickle.py000066400000000000000000000012021457414240700166220ustar00rootroot00000000000000from __future__ import division, print_function, absolute_import # frompickle() ############## import petl as etl import pickle # set up a file to demonstrate with with open('example.p', 'wb') as f: pickle.dump(['foo', 'bar'], f) pickle.dump(['a', 1], f) pickle.dump(['b', 2], f) pickle.dump(['c', 2.5], f) # demonstrate the use of frompickle() table1 = etl.frompickle('example.p') table1 # topickle() ############ import petl as etl table1 = [['foo', 'bar'], ['a', 1], ['b', 2], ['c', 2]] etl.topickle(table1, 'example.p') # look what it did table2 = etl.frompickle('example.p') table2 petl-1.7.15/examples/io/pytables.py000066400000000000000000000050701457414240700172050ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division # fromhdf5() ############ import petl as etl import tables # set up a new hdf5 table to demonstrate with h5file = tables.openFile('example.h5', mode='w', title='Example file') h5file.createGroup('/', 'testgroup', 'Test Group') class FooBar(tables.IsDescription): foo = tables.Int32Col(pos=0) bar = tables.StringCol(6, pos=2) h5table = h5file.createTable('/testgroup', 'testtable', FooBar, 'Test Table') # load some data into the table table1 = (('foo', 'bar'), (1, b'asdfgh'), (2, b'qwerty'), (3, b'zxcvbn')) for row in table1[1:]: for i, f in enumerate(table1[0]): h5table.row[f] = row[i] h5table.row.append() h5file.flush() h5file.close() # # now demonstrate use of fromhdf5 table1 = etl.fromhdf5('example.h5', '/testgroup', 'testtable') table1 # alternatively just specify path to table node table1 = etl.fromhdf5('example.h5', '/testgroup/testtable') # ...or use an existing tables.File object h5file = tables.openFile('example.h5') table1 = etl.fromhdf5(h5file, '/testgroup/testtable') # ...or use an existing tables.Table object h5tbl = h5file.getNode('/testgroup/testtable') table1 = etl.fromhdf5(h5tbl) # use a condition to filter data table2 = etl.fromhdf5(h5tbl, condition='foo < 3') table2 h5file.close() # fromhdf5sorted() ################## import petl as etl import tables # set up a new hdf5 table to demonstrate with h5file = tables.openFile('example.h5', mode='w', title='Test file') h5file.createGroup('/', 'testgroup', 'Test Group') class FooBar(tables.IsDescription): foo = tables.Int32Col(pos=0) bar = tables.StringCol(6, pos=2) h5table = h5file.createTable('/testgroup', 'testtable', FooBar, 'Test Table') # load some data into the table table1 = (('foo', 'bar'), (3, b'asdfgh'), (2, b'qwerty'), (1, b'zxcvbn')) for row in table1[1:]: for i, f in enumerate(table1[0]): h5table.row[f] = row[i] h5table.row.append() h5table.cols.foo.createCSIndex() # CS index is required h5file.flush() h5file.close() # # access the data, sorted by the indexed column table2 = etl.fromhdf5sorted('example.h5', '/testgroup', 'testtable', sortby='foo') table2 # tohdf5() ########## import petl as etl table1 = (('foo', 'bar'), (1, b'asdfgh'), (2, b'qwerty'), (3, b'zxcvbn')) etl.tohdf5(table1, 'example.h5', '/testgroup', 'testtable', drop=True, create=True, createparents=True) etl.fromhdf5('example.h5', '/testgroup', 'testtable') petl-1.7.15/examples/io/sqlite3.py000066400000000000000000000017021457414240700167440ustar00rootroot00000000000000from __future__ import division, print_function, absolute_import import os # fromsqlite3() ############### os.remove('example.db') import petl as etl import sqlite3 # set up a database to demonstrate with data = [['a', 1], ['b', 2], ['c', 2.0]] connection = sqlite3.connect('example.db') c = connection.cursor() _ = c.execute('drop table if exists foobar') _ = c.execute('create table foobar (foo, bar)') for row in data: _ = c.execute('insert into foobar values (?, ?)', row) connection.commit() c.close() # now demonstrate the petl.fromsqlite3 function table = etl.fromsqlite3('example.db', 'select * from foobar') table # tosqlite3() ############## os.remove('example.db') import petl as etl table1 = [['foo', 'bar'], ['a', 1], ['b', 2], ['c', 2]] _ = etl.tosqlite3(table1, 'example.db', 'foobar', create=True) # look what it did table2 = etl.fromsqlite3('example.db', 'select * from foobar') table2 petl-1.7.15/examples/io/text.py000066400000000000000000000013161457414240700163450ustar00rootroot00000000000000from __future__ import division, print_function, absolute_import # fromtext() ############ import petl as etl # setup example file text = 'a,1\nb,2\nc,2\n' with open('example.txt', 'w') as f: f.write(text) table1 = etl.fromtext('example.txt') table1 # post-process, e.g., with capture() table2 = table1.capture('lines', '(.*),(.*)$', ['foo', 'bar']) table2 # totext() ########## import petl as etl table1 = [['foo', 'bar'], ['a', 1], ['b', 2], ['c', 2]] prologue = '''{| class="wikitable" |- ! foo ! bar ''' template = '''|- | {foo} | {bar} ''' epilogue = '|}' etl.totext(table1, 'example.txt', template, prologue, epilogue) # see what we did print(open('example.txt').read()) petl-1.7.15/examples/io/whoosh.py000066400000000000000000000047221457414240700166740ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division # fromtextindex() ################# import petl as etl import os # set up an index and load some documents via the Whoosh API from whoosh.index import create_in from whoosh.fields import * schema = Schema(title=TEXT(stored=True), path=ID(stored=True), content=TEXT) dirname = 'example.whoosh' if not os.path.exists(dirname): os.mkdir(dirname) index = create_in(dirname, schema) writer = index.writer() writer.add_document(title=u"First document", path=u"/a", content=u"This is the first document we've added!") writer.add_document(title=u"Second document", path=u"/b", content=u"The second one is even more interesting!") writer.commit() # extract documents as a table table = etl.fromtextindex(dirname) table # totextindex() ############### import petl as etl import datetime import os # here is the table we want to load into an index table = (('f0', 'f1', 'f2', 'f3', 'f4'), ('AAA', 12, 4.3, True, datetime.datetime.now()), ('BBB', 6, 3.4, False, datetime.datetime(1900, 1, 31)), ('CCC', 42, 7.8, True, datetime.datetime(2100, 12, 25))) # define a schema for the index from whoosh.fields import * schema = Schema(f0=TEXT(stored=True), f1=NUMERIC(int, stored=True), f2=NUMERIC(float, stored=True), f3=BOOLEAN(stored=True), f4=DATETIME(stored=True)) # load index dirname = 'example.whoosh' if not os.path.exists(dirname): os.mkdir(dirname) etl.totextindex(table, dirname, schema=schema) # searchtextindex() ################### import petl as etl import os # set up an index and load some documents via the Whoosh API from whoosh.index import create_in from whoosh.fields import * schema = Schema(title=TEXT(stored=True), path=ID(stored=True), content=TEXT) dirname = 'example.whoosh' if not os.path.exists(dirname): os.mkdir(dirname) index = create_in('example.whoosh', schema) writer = index.writer() writer.add_document(title=u"Oranges", path=u"/a", content=u"This is the first document we've added!") writer.add_document(title=u"Apples", path=u"/b", content=u"The second document is even more " u"interesting!") writer.commit() # demonstrate the use of searchtextindex() table1 = etl.searchtextindex('example.whoosh', 'oranges') table1 table2 = etl.searchtextindex('example.whoosh', 'doc*') table2 petl-1.7.15/examples/io/xml.py000066400000000000000000000025711457414240700161650ustar00rootroot00000000000000from __future__ import division, print_function, absolute_import import petl as etl # setup a file to demonstrate with d = '''
foobar
a1
b2
c2
''' with open('example1.xml', 'w') as f: f.write(d) table1 = etl.fromxml('example1.xml', 'tr', 'td') table1 # if the data values are stored in an attribute, provide the attribute name # as an extra positional argument d = '''
''' with open('example2.xml', 'w') as f: f.write(d) table2 = etl.fromxml('example2.xml', 'tr', 'td', 'v') table2 # data values can also be extracted by providing a mapping of field # names to element paths d = ''' a b c
''' with open('example3.xml', 'w') as f: f.write(d) table3 = etl.fromxml('example3.xml', 'row', {'foo': 'foo', 'bar': ('baz/bar', 'v')}) table3 petl-1.7.15/examples/notes/000077500000000000000000000000001457414240700155275ustar00rootroot00000000000000petl-1.7.15/examples/notes/.gitignore000066400000000000000000000000151457414240700175130ustar00rootroot00000000000000*.csv *.zip* petl-1.7.15/examples/notes/20140424_example.ipynb000066400000000000000000000311751457414240700212140ustar00rootroot00000000000000{ "metadata": { "name": "", "signature": "sha256:3b7c0da066835df2d372d5f622fbdba586eba2c48933fc5c4abb3750e5aaa882" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "code", "collapsed": false, "input": [ "data = \"\"\"type,price,quantity\n", "Apples\n", "Cortland,0.30,24\n", "Red Delicious,0.40,24\n", "Oranges\n", "Navel,0.50,12\n", "\"\"\"" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "code", "collapsed": false, "input": [ "import petl.interactive as etl\n", "from petl.io import StringSource" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "code", "collapsed": false, "input": [ "tbl1 = (etl\n", " .fromcsv(StringSource(data))\n", ")\n", "tbl1" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "
typepricequantity
Apples
Cortland0.3024
Red Delicious0.4024
Oranges
Navel0.5012
\r\n" ], "metadata": {}, "output_type": "pyout", "prompt_number": 3, "text": [ "+-----------------+---------+------------+\n", "| 'type' | 'price' | 'quantity' |\n", "+=================+=========+============+\n", "| 'Apples' | | |\n", "+-----------------+---------+------------+\n", "| 'Cortland' | '0.30' | '24' |\n", "+-----------------+---------+------------+\n", "| 'Red Delicious' | '0.40' | '24' |\n", "+-----------------+---------+------------+\n", "| 'Oranges' | | |\n", "+-----------------+---------+------------+\n", "| 'Navel' | '0.50' | '12' |\n", "+-----------------+---------+------------+" ] } ], "prompt_number": 3 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Option 1 - using existing petl functions" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def make_room_for_category(row):\n", " if len(row) == 1:\n", " return (row[0], 'X', 'X', 'X')\n", " else:\n", " return (None,) + tuple(row)\n", "\n", "tbl2 = tbl1.rowmap(make_room_for_category, fields=['category', 'type', 'price', 'quantity'])\n", "tbl2" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "
categorytypepricequantity
ApplesXXX
NoneCortland0.3024
NoneRed Delicious0.4024
OrangesXXX
NoneNavel0.5012
\r\n" ], "metadata": {}, "output_type": "pyout", "prompt_number": 4, "text": [ "+------------+-----------------+---------+------------+\n", "| 'category' | 'type' | 'price' | 'quantity' |\n", "+============+=================+=========+============+\n", "| 'Apples' | 'X' | 'X' | 'X' |\n", "+------------+-----------------+---------+------------+\n", "| None | 'Cortland' | '0.30' | '24' |\n", "+------------+-----------------+---------+------------+\n", "| None | 'Red Delicious' | '0.40' | '24' |\n", "+------------+-----------------+---------+------------+\n", "| 'Oranges' | 'X' | 'X' | 'X' |\n", "+------------+-----------------+---------+------------+\n", "| None | 'Navel' | '0.50' | '12' |\n", "+------------+-----------------+---------+------------+" ] } ], "prompt_number": 4 }, { "cell_type": "code", "collapsed": false, "input": [ "tbl3 = tbl2.filldown()\n", "tbl3" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "
categorytypepricequantity
ApplesXXX
ApplesCortland0.3024
ApplesRed Delicious0.4024
OrangesXXX
OrangesNavel0.5012
\r\n" ], "metadata": {}, "output_type": "pyout", "prompt_number": 5, "text": [ "+------------+-----------------+---------+------------+\n", "| 'category' | 'type' | 'price' | 'quantity' |\n", "+============+=================+=========+============+\n", "| 'Apples' | 'X' | 'X' | 'X' |\n", "+------------+-----------------+---------+------------+\n", "| 'Apples' | 'Cortland' | '0.30' | '24' |\n", "+------------+-----------------+---------+------------+\n", "| 'Apples' | 'Red Delicious' | '0.40' | '24' |\n", "+------------+-----------------+---------+------------+\n", "| 'Oranges' | 'X' | 'X' | 'X' |\n", "+------------+-----------------+---------+------------+\n", "| 'Oranges' | 'Navel' | '0.50' | '12' |\n", "+------------+-----------------+---------+------------+" ] } ], "prompt_number": 5 }, { "cell_type": "code", "collapsed": false, "input": [ "tbl4 = tbl3.ne('type', 'X')\n", "tbl4" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "
categorytypepricequantity
ApplesCortland0.3024
ApplesRed Delicious0.4024
OrangesNavel0.5012
\r\n" ], "metadata": {}, "output_type": "pyout", "prompt_number": 6, "text": [ "+------------+-----------------+---------+------------+\n", "| 'category' | 'type' | 'price' | 'quantity' |\n", "+============+=================+=========+============+\n", "| 'Apples' | 'Cortland' | '0.30' | '24' |\n", "+------------+-----------------+---------+------------+\n", "| 'Apples' | 'Red Delicious' | '0.40' | '24' |\n", "+------------+-----------------+---------+------------+\n", "| 'Oranges' | 'Navel' | '0.50' | '12' |\n", "+------------+-----------------+---------+------------+" ] } ], "prompt_number": 6 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Option 2 - custom transformer" ] }, { "cell_type": "code", "collapsed": false, "input": [ "class CustomTransformer(object):\n", " \n", " def __init__(self, source):\n", " self.source = source\n", " \n", " def __iter__(self):\n", " it = iter(self.source)\n", " \n", " # construct new header\n", " source_fields = it.next()\n", " out_fields = ('category',) + tuple(source_fields)\n", " yield out_fields\n", " \n", " # transform data\n", " current_category = None\n", " for row in it:\n", " if len(row) == 1:\n", " current_category = row[0]\n", " else:\n", " yield (current_category,) + tuple(row)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 7 }, { "cell_type": "code", "collapsed": false, "input": [ "tbl5 = CustomTransformer(tbl1)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 8 }, { "cell_type": "code", "collapsed": false, "input": [ "# just so it formats nicely as HTML in the notebook...\n", "etl.wrap(tbl5)" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "
categorytypepricequantity
ApplesCortland0.3024
ApplesRed Delicious0.4024
OrangesNavel0.5012
\r\n" ], "metadata": {}, "output_type": "pyout", "prompt_number": 9, "text": [ "+------------+-----------------+---------+------------+\n", "| 'category' | 'type' | 'price' | 'quantity' |\n", "+============+=================+=========+============+\n", "| 'Apples' | 'Cortland' | '0.30' | '24' |\n", "+------------+-----------------+---------+------------+\n", "| 'Apples' | 'Red Delicious' | '0.40' | '24' |\n", "+------------+-----------------+---------+------------+\n", "| 'Oranges' | 'Navel' | '0.50' | '12' |\n", "+------------+-----------------+---------+------------+" ] } ], "prompt_number": 9 } ], "metadata": {} } ] }petl-1.7.15/examples/notes/20140424_example.py000066400000000000000000000027171457414240700205230ustar00rootroot00000000000000# -*- coding: utf-8 -*- # 3.0 # data = """type,price,quantity Apples Cortland,0.30,24 Red Delicious,0.40,24 Oranges Navel,0.50,12 """ # import petl.interactive as etl from petl.io import StringSource # tbl1 = (etl .fromcsv(StringSource(data)) ) tbl1 # # Option 1 - using existing petl functions # def make_room_for_category(row): if len(row) == 1: return (row[0], 'X', 'X', 'X') else: return (None,) + tuple(row) tbl2 = tbl1.rowmap(make_room_for_category, fields=['category', 'type', 'price', 'quantity']) tbl2 # tbl3 = tbl2.filldown() tbl3 # tbl4 = tbl3.ne('type', 'X') tbl4 # # Option 2 - custom transformer # class CustomTransformer(object): def __init__(self, source): self.source = source def __iter__(self): it = iter(self.source) # construct new header source_fields = it.next() out_fields = ('category',) + tuple(source_fields) yield out_fields # transform data current_category = None for row in it: if len(row) == 1: current_category = row[0] else: yield (current_category,) + tuple(row) # tbl5 = CustomTransformer(tbl1) # # just so it formats nicely as HTML in the notebook... etl.wrap(tbl5) petl-1.7.15/examples/notes/20141022_example.ipynb000066400000000000000000000100201457414240700211710ustar00rootroot00000000000000{ "metadata": { "name": "", "signature": "sha256:e33329756d366a4e9d0643f56d7eaab5b311404c9a5bae0c66dc53fc988c46b2" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "code", "collapsed": false, "input": [ "import petl.interactive as etl\n", "etl.__version__" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 1, "text": [ "'0.26'" ] } ], "prompt_number": 1 }, { "cell_type": "code", "collapsed": false, "input": [ "table1 = (('name', 'kids'),\n", " ('John', '1'),\n", " ('Jenny', '2'),\n", " ('James', '2'),\n", " ('Joan', '4'))\n", "\n", "table2 = (('name', 'age'),\n", " ('John', '33'),\n", " ('Jenni', ''),\n", " ('Jomes', '20'),\n", " ('Joan', ''))" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "code", "collapsed": false, "input": [ "from fuzzywuzzy import fuzz" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [ "table3 = (etl\n", " .wrap(table1)\n", " .prefixheader('l_')\n", " .crossjoin(etl.wrap(table2).prefixheader('r_'))\n", " .addfield('fuzz', lambda row: fuzz.partial_ratio(row.l_name, row.r_name))\n", " .selectge('fuzz', 80)\n", ")\n", "table3" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "
l_namel_kidsr_namer_agefuzz
John1John33100
Jenny2Jenni80
James2Jomes2080
Joan4Joan100
\r\n" ], "metadata": {}, "output_type": "pyout", "prompt_number": 4, "text": [ "+----------+----------+----------+---------+--------+\n", "| 'l_name' | 'l_kids' | 'r_name' | 'r_age' | 'fuzz' |\n", "+==========+==========+==========+=========+========+\n", "| 'John' | '1' | 'John' | '33' | 100 |\n", "+----------+----------+----------+---------+--------+\n", "| 'Jenny' | '2' | 'Jenni' | '' | 80 |\n", "+----------+----------+----------+---------+--------+\n", "| 'James' | '2' | 'Jomes' | '20' | 80 |\n", "+----------+----------+----------+---------+--------+\n", "| 'Joan' | '4' | 'Joan' | '' | 100 |\n", "+----------+----------+----------+---------+--------+" ] } ], "prompt_number": 4 }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 4 } ], "metadata": {} } ] }petl-1.7.15/examples/notes/20141110_example.ipynb000066400000000000000000000166751457414240700212150ustar00rootroot00000000000000{ "metadata": { "name": "", "signature": "sha256:81aed14b5c7bdf9461390525cbd56f1b2754be3bd424493d7fcd7b81d0350bdf" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "https://groups.google.com/forum/#!topic/python-etl/XRIJovpb6Qc\n", "\n", "The conversion I'm trying to make is:\n", "- If nb_day is not integer:\n", " - if notes contains 'am':\n", " - split line:\n", " 1 with full days, modifying date_end = previous date_end - 1\n", " 1 with 0.5 day, modifying date_begin = date_end\n", " - if notes contacts 'pm':\n", " - split line:\n", " 1 with 0.5 day, modifying date_end = date_begin\n", " 1 with full days, modifying date_begin = previous date_begin + 1" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import datetime\n", "import petl.interactive as etl\n", "print etl.__version__" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "0.26\n" ] } ], "prompt_number": 1 }, { "cell_type": "code", "collapsed": false, "input": [ "data = \"\"\"user_id,date_begin,date_end,nb_days,notes,projet_id\n", "user1,2014-07-31,2014-08-07,5.5,5 days + am,cp\n", "user2,2014-07-31,2014-08-07,5.5,5 days + pm,cp\n", "user3,2014-07-31,2014-08-06,5,5 days,cp\n", "\"\"\"" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "code", "collapsed": false, "input": [ "day = datetime.timedelta(days=1)\n", "\n", "\n", "def split_partial_days(row):\n", " if isinstance(row.nb_days, float):\n", " # split out partial days into separate row\n", " if 'am' in row.notes:\n", " # full days\n", " yield (row.user_id, \n", " row.date_begin, \n", " row.date_end - day,\n", " int(row.nb_days),\n", " row.notes.split('+')[0].strip(), \n", " row.projet_id)\n", " # partial days\n", " yield (row.user_id, \n", " row.date_end, \n", " row.date_end, \n", " row.nb_days - int(row.nb_days),\n", " row.notes.split('+')[1].strip(), \n", " row.projet_id)\n", " if 'pm' in row.notes:\n", " # partial days\n", " yield (row.user_id, \n", " row.date_begin, \n", " row.date_begin, \n", " row.nb_days - int(row.nb_days),\n", " row.notes.split('+')[1].strip(), \n", " row.projet_id)\n", " # full days\n", " yield (row.user_id, \n", " row.date_begin + day, \n", " row.date_end,\n", " int(row.nb_days),\n", " row.notes.split('+')[0].strip(), \n", " row.projet_id)\n", " else:\n", " # do nothing\n", " yield row\n", "\n", " \n", "tbl = (etl\n", " .fromcsv(etl.StringSource(data))\n", " .convert(('date_begin', 'date_end'), etl.dateparser('%Y-%m-%d'))\n", " .convert('nb_days', etl.parsenumber)\n", " .rowmapmany(split_partial_days, fields=['user_id', 'date_begin', 'date_end', 'nb_days', 'notes', 'projet_id'])\n", ")\n", "tbl" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "
user_iddate_begindate_endnb_daysnotesprojet_id
user12014-07-312014-08-0655 dayscp
user12014-08-072014-08-070.5amcp
user22014-07-312014-07-310.5pmcp
user22014-08-012014-08-0755 dayscp
user32014-07-312014-08-0655 dayscp
\r\n" ], "metadata": {}, "output_type": "pyout", "prompt_number": 3, "text": [ "+-----------+----------------------------+----------------------------+-----------+----------+-------------+\n", "| 'user_id' | 'date_begin' | 'date_end' | 'nb_days' | 'notes' | 'projet_id' |\n", "+===========+============================+============================+===========+==========+=============+\n", "| 'user1' | datetime.date(2014, 7, 31) | datetime.date(2014, 8, 6) | 5 | '5 days' | 'cp' |\n", "+-----------+----------------------------+----------------------------+-----------+----------+-------------+\n", "| 'user1' | datetime.date(2014, 8, 7) | datetime.date(2014, 8, 7) | 0.5 | 'am' | 'cp' |\n", "+-----------+----------------------------+----------------------------+-----------+----------+-------------+\n", "| 'user2' | datetime.date(2014, 7, 31) | datetime.date(2014, 7, 31) | 0.5 | 'pm' | 'cp' |\n", "+-----------+----------------------------+----------------------------+-----------+----------+-------------+\n", "| 'user2' | datetime.date(2014, 8, 1) | datetime.date(2014, 8, 7) | 5 | '5 days' | 'cp' |\n", "+-----------+----------------------------+----------------------------+-----------+----------+-------------+\n", "| 'user3' | datetime.date(2014, 7, 31) | datetime.date(2014, 8, 6) | 5 | '5 days' | 'cp' |\n", "+-----------+----------------------------+----------------------------+-----------+----------+-------------+" ] } ], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 3 } ], "metadata": {} } ] }petl-1.7.15/examples/notes/20150319 resolve conflicts.ipynb000066400000000000000000000252641457414240700231140ustar00rootroot00000000000000{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Resolving conflicts" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There is currently no in-built support for resolving conflicts within a table using petl, however this notebook gives an example of a workaround strategy." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "sys.version_info(major=3, minor=4, micro=2, releaselevel='final', serial=0)" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import sys\n", "sys.version_info" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'1.0.6'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import petl as etl\n", "etl.__version__" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
idnamevalueagemaster_age
1Tressa12034242
2Phil23997NoneNone
3DariusNone7878
4Delinda965016464
5Adelina965085050
\n" ], "text/plain": [ "+-----+-----------+---------+------+------------+\n", "| id | name | value | age | master_age |\n", "+=====+===========+=========+======+============+\n", "| '1' | 'Tressa' | '1203' | '42' | '42' |\n", "+-----+-----------+---------+------+------------+\n", "| '2' | 'Phil' | '23997' | None | None |\n", "+-----+-----------+---------+------+------------+\n", "| '3' | 'Darius' | None | '78' | '78' |\n", "+-----+-----------+---------+------+------------+\n", "| '4' | 'Delinda' | '96501' | '64' | '64' |\n", "+-----+-----------+---------+------+------------+\n", "| '5' | 'Adelina' | '96508' | '50' | '50' |\n", "+-----+-----------+---------+------+------------+" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data_master = b\"\"\"id name value age\n", " 1 Tressa 1203 42\n", " 2 Phil 23997 \n", " 3 Darius . 78\n", " 4 Delinda 96501 64\n", " 5 Adelina 96508 50\n", "\"\"\"\n", "tbl_master = (\n", " etl\n", " .fromtext(etl.MemorySource(data_master))\n", " .split('lines', r'\\s+')\n", " .skip(1)\n", " .replaceall('.', None)\n", " .addfield('master_age', lambda row: row.age)\n", ")\n", "tbl_master" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
idnamevalueage
2PhilNone53
3Darius500076
\n" ], "text/plain": [ "+-----+----------+--------+------+\n", "| id | name | value | age |\n", "+=====+==========+========+======+\n", "| '2' | 'Phil' | None | '53' |\n", "+-----+----------+--------+------+\n", "| '3' | 'Darius' | '5000' | '76' |\n", "+-----+----------+--------+------+" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data_other = b\"\"\"id name value age\n", " 2 Phil . 53\n", " 3 Darius 5000 76\n", "\"\"\"\n", "tbl_other = (\n", " etl\n", " .fromtext(etl.MemorySource(data_other))\n", " .split('lines', r'\\s+')\n", " .skip(1)\n", " .replaceall('.', None)\n", ")\n", "tbl_other" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
idnamevalueagemaster_age
1Tressa12034242
2Phil2399753None
3Darius5000Conflict({'76', '78'})78
4Delinda965016464
5Adelina965085050
\n" ], "text/plain": [ "+-----+-----------+---------+------------------------+------------+\n", "| id | name | value | age | master_age |\n", "+=====+===========+=========+========================+============+\n", "| '1' | 'Tressa' | '1203' | '42' | '42' |\n", "+-----+-----------+---------+------------------------+------------+\n", "| '2' | 'Phil' | '23997' | '53' | None |\n", "+-----+-----------+---------+------------------------+------------+\n", "| '3' | 'Darius' | '5000' | Conflict({'76', '78'}) | '78' |\n", "+-----+-----------+---------+------------------------+------------+\n", "| '4' | 'Delinda' | '96501' | '64' | '64' |\n", "+-----+-----------+---------+------------------------+------------+\n", "| '5' | 'Adelina' | '96508' | '50' | '50' |\n", "+-----+-----------+---------+------------------------+------------+" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tbl_merge = etl.merge(tbl_master, tbl_other, key='id')\n", "tbl_merge" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
idnamevalueage
1Tressa120342
2Phil2399753
3Darius500078
4Delinda9650164
5Adelina9650850
\n" ], "text/plain": [ "+-----+-----------+---------+------+\n", "| id | name | value | age |\n", "+=====+===========+=========+======+\n", "| '1' | 'Tressa' | '1203' | '42' |\n", "+-----+-----------+---------+------+\n", "| '2' | 'Phil' | '23997' | '53' |\n", "+-----+-----------+---------+------+\n", "| '3' | 'Darius' | '5000' | '78' |\n", "+-----+-----------+---------+------+\n", "| '4' | 'Delinda' | '96501' | '64' |\n", "+-----+-----------+---------+------+\n", "| '5' | 'Adelina' | '96508' | '50' |\n", "+-----+-----------+---------+------+" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tbl_merge_resolved = (\n", " tbl_merge\n", " .convert('age', lambda v, row: (row.master_age if isinstance(v, etl.Conflict) else v),\n", " pass_row=True)\n", " .cutout('master_age')\n", ")\n", "tbl_merge_resolved" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.4.2" } }, "nbformat": 4, "nbformat_minor": 0 } petl-1.7.15/examples/notes/20150331 split null.ipynb000066400000000000000000000061521457414240700215430ustar00rootroot00000000000000{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "sys.version_info(major=3, minor=4, micro=2, releaselevel='final', serial=0)" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import sys\n", "sys.version_info" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'1.0.6'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import petl as etl\n", "etl.__version__" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "tbl1 = [['foo', 'bar'],\n", " ['a b c', 1],\n", " ['d e f', 2],\n", " [None, 3]]" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
barxyz
1abc
2def
3NoneNoneNone
\n" ], "text/plain": [ "+-----+------+------+------+\n", "| bar | x | y | z |\n", "+=====+======+======+======+\n", "| 1 | 'a' | 'b' | 'c' |\n", "+-----+------+------+------+\n", "| 2 | 'd' | 'e' | 'f' |\n", "+-----+------+------+------+\n", "| 3 | None | None | None |\n", "+-----+------+------+------+" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tbl2 = etl.wrap(tbl1).replace('foo', None, ' ').split('foo', ' ', ['x', 'y', 'z']).replaceall('', None)\n", "tbl2" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.4.2" } }, "nbformat": 4, "nbformat_minor": 0 } petl-1.7.15/examples/notes/case_study_1.ipynb000066400000000000000000001510511457414240700211600ustar00rootroot00000000000000{ "metadata": { "name": "", "signature": "sha256:3bbbcfb0d62b99e5755186306ff0cf10e99bf3c5a78c08cf76b27d3ad4da5512" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "code", "collapsed": false, "input": [ "import sys\n", "sys.version_info" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 1, "text": [ "sys.version_info(major=3, minor=4, micro=2, releaselevel='final', serial=0)" ] } ], "prompt_number": 1 }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "[petl](http://petl.readthedocs.org) Case Study 1 - Comparing Tables" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This case study illustrates the use of the [petl](http://petl.readthedocs.org) package for doing some simple profiling and comparison of data from\n", "two tables.\n" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Introduction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The files used in this case study can be downloaded from the following\n", "link:\n", "\n", "* http://aliman.s3.amazonaws.com/petl/petl-case-study-1-files.zip\n", "\n", "Download and unzip the files:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "!wget http://aliman.s3.amazonaws.com/petl/petl-case-study-1-files.zip\n", "!unzip -o petl-case-study-1-files.zip" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "--2015-01-19 17:37:39-- http://aliman.s3.amazonaws.com/petl/petl-case-study-1-files.zip\r\n", "Resolving aliman.s3.amazonaws.com (aliman.s3.amazonaws.com)... 54.231.9.241\r\n", "Connecting to aliman.s3.amazonaws.com (aliman.s3.amazonaws.com)|54.231.9.241|:80... " ] }, { "output_type": "stream", "stream": "stdout", "text": [ "connected.\r\n", "HTTP request sent, awaiting response... " ] }, { "output_type": "stream", "stream": "stdout", "text": [ "200 OK\r\n", "Length: 3076773 (2.9M) [application/zip]\r\n", "Saving to: \u2018petl-case-study-1-files.zip\u2019\r\n", "\r\n", "\r", " 0% [ ] 0 --.-K/s " ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\r", " 2% [ ] 75,696 276KB/s " ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\r", " 8% [==> ] 265,496 484KB/s " ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\r", "22% [=======> ] 688,896 838KB/s " ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\r", "50% [==================> ] 1,567,816 1.39MB/s " ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\r", "100%[======================================>] 3,076,773 2.34MB/s in 1.3s \r\n", "\r\n", "2015-01-19 17:37:41 (2.34 MB/s) - \u2018petl-case-study-1-files.zip\u2019 saved [3076773/3076773]\r\n", "\r\n" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "Archive: petl-case-study-1-files.zip\r\n", " inflating: popdata.csv " ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\r\n", " inflating: snpdata.csv " ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\r\n" ] } ], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first file (`snpdata.csv`) contains a list of locations in the\n", "genome of the malaria parasite *P. falciparum*, along with some basic\n", "data about genetic variations found at those locations.\n", "\n", "The second file (`popdata.csv`) is supposed to contain the same list\n", "of genome locations, along with some additional data such as allele\n", "frequencies in different populations.\n", "\n", "The main point for this case study is that the first file\n", "(`snpdata.csv`) contains the canonical list of genome locations, and\n", "the second file (`popdata.csv`) contains some additional data about\n", "the same genome locations and therefore should be consistent with the\n", "first file. We want to check whether this second file is in fact\n", "consistent with the first file." ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Preparing the data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Start by importing the petl package:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import petl as etl\n", "etl.__version__" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 3, "text": [ "'1.0.0'" ] } ], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "To save some typing, let ***a*** be the table of data extracted from the\n", "first file (`snpdata.csv`), and let ***b*** be the table of data extracted\n", "from the second file (`popdata.csv`), using the `fromcsv()`\n", "function:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "a = etl.fromtsv('snpdata.csv')\n", "b = etl.fromtsv('popdata.csv')" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 4 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Examine the header from each file:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "a.header()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 5, "text": [ "('Chr',\n", " 'Pos',\n", " 'Ref',\n", " 'Nref',\n", " 'Der',\n", " 'Mut',\n", " 'isTypable',\n", " 'GeneId',\n", " 'GeneAlias',\n", " 'GeneDescr')" ] } ], "prompt_number": 5 }, { "cell_type": "code", "collapsed": false, "input": [ "b.header()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 6, "text": [ "('Chromosome',\n", " 'Coordinates',\n", " 'Ref. Allele',\n", " 'Non-Ref. Allele',\n", " 'Outgroup Allele',\n", " 'Ancestral Allele',\n", " 'Derived Allele',\n", " 'Ref. Aminoacid',\n", " 'Non-Ref. Aminoacid',\n", " 'Private Allele',\n", " 'Private population',\n", " 'maf AFR',\n", " 'maf PNG',\n", " 'maf SEA',\n", " 'daf AFR',\n", " 'daf PNG',\n", " 'daf SEA',\n", " 'nraf AFR',\n", " 'nraf PNG',\n", " 'nraf SEA',\n", " 'Mutation type',\n", " 'Gene',\n", " 'Gene Aliases',\n", " 'Gene Description',\n", " 'Gene Information')" ] } ], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "There is a common set of 9 fields that is present in both tables, and\n", "we would like focus on comparing these common fields, however\n", "different field names have been used in the two files. To simplify\n", "comparison, use `rename()` to rename some fields in the\n", "second file:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "b_renamed = b.rename({'Chromosome': 'Chr', \n", " 'Coordinates': 'Pos', \n", " 'Ref. Allele': 'Ref', \n", " 'Non-Ref. Allele': 'Nref', \n", " 'Derived Allele': 'Der', \n", " 'Mutation type': 'Mut', \n", " 'Gene': 'GeneId', \n", " 'Gene Aliases': 'GeneAlias', \n", " 'Gene Description': 'GeneDescr'})\n", "b_renamed.header()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 7, "text": [ "('Chr',\n", " 'Pos',\n", " 'Ref',\n", " 'Nref',\n", " 'Outgroup Allele',\n", " 'Ancestral Allele',\n", " 'Der',\n", " 'Ref. Aminoacid',\n", " 'Non-Ref. Aminoacid',\n", " 'Private Allele',\n", " 'Private population',\n", " 'maf AFR',\n", " 'maf PNG',\n", " 'maf SEA',\n", " 'daf AFR',\n", " 'daf PNG',\n", " 'daf SEA',\n", " 'nraf AFR',\n", " 'nraf PNG',\n", " 'nraf SEA',\n", " 'Mut',\n", " 'GeneId',\n", " 'GeneAlias',\n", " 'GeneDescr',\n", " 'Gene Information')" ] } ], "prompt_number": 7 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use `cut()` to extract only the fields we're interested in\n", "from both tables:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "common_fields = ['Chr', 'Pos', 'Ref', 'Nref', 'Der', 'Mut', 'GeneId', 'GeneAlias', 'GeneDescr']\n", "a_common = a.cut(common_fields)\n", "b_common = b_renamed.cut(common_fields)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 8 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Inspect the data:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "a_common" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
ChrPosRefNrefDerMutGeneIdGeneAliasGeneDescr
MAL191099GA-SPFA0095cMAL1P1.10rifin
MAL191104AT-NPFA0095cMAL1P1.10rifin
MAL193363TA-NPFA0100cMAL1P1.11hypothetical protein, conserved in P. falciparum
MAL193382TG-NPFA0100cMAL1P1.11hypothetical protein, conserved in P. falciparum
MAL193384GA-NPFA0100cMAL1P1.11hypothetical protein, conserved in P. falciparum
\n", "

...

" ], "metadata": {}, "output_type": "pyout", "prompt_number": 9, "text": [ "+--------+---------+-----+------+-----+-----+------------+-------------+----------------------------------------------------+\n", "| Chr | Pos | Ref | Nref | Der | Mut | GeneId | GeneAlias | GeneDescr |\n", "+========+=========+=====+======+=====+=====+============+=============+====================================================+\n", "| 'MAL1' | '91099' | 'G' | 'A' | '-' | 'S' | 'PFA0095c' | 'MAL1P1.10' | 'rifin' |\n", "+--------+---------+-----+------+-----+-----+------------+-------------+----------------------------------------------------+\n", "| 'MAL1' | '91104' | 'A' | 'T' | '-' | 'N' | 'PFA0095c' | 'MAL1P1.10' | 'rifin' |\n", "+--------+---------+-----+------+-----+-----+------------+-------------+----------------------------------------------------+\n", "| 'MAL1' | '93363' | 'T' | 'A' | '-' | 'N' | 'PFA0100c' | 'MAL1P1.11' | 'hypothetical protein, conserved in P. falciparum' |\n", "+--------+---------+-----+------+-----+-----+------------+-------------+----------------------------------------------------+\n", "| 'MAL1' | '93382' | 'T' | 'G' | '-' | 'N' | 'PFA0100c' | 'MAL1P1.11' | 'hypothetical protein, conserved in P. falciparum' |\n", "+--------+---------+-----+------+-----+-----+------------+-------------+----------------------------------------------------+\n", "| 'MAL1' | '93384' | 'G' | 'A' | '-' | 'N' | 'PFA0100c' | 'MAL1P1.11' | 'hypothetical protein, conserved in P. falciparum' |\n", "+--------+---------+-----+------+-----+-----+------------+-------------+----------------------------------------------------+\n", "..." ] } ], "prompt_number": 9 }, { "cell_type": "code", "collapsed": false, "input": [ "b_common" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
ChrPosRefNrefDerMutGeneIdGeneAliasGeneDescr
MAL191099GA-SYNPFA0095cMAL1P1.10,RIFrifin
MAL191104AT-NONPFA0095cMAL1P1.10,RIFrifin
MAL193363TA-NONPFA0100cMAL1P1.11Plasmodium exported protein (PHISTa), unknown function
MAL193382TG-NONPFA0100cMAL1P1.11Plasmodium exported protein (PHISTa), unknown function
MAL193384GA-NONPFA0100cMAL1P1.11Plasmodium exported protein (PHISTa), unknown function
\n", "

...

" ], "metadata": {}, "output_type": "pyout", "prompt_number": 10, "text": [ "+--------+---------+-----+------+-----+-------+------------+-----------------+----------------------------------------------------------+\n", "| Chr | Pos | Ref | Nref | Der | Mut | GeneId | GeneAlias | GeneDescr |\n", "+========+=========+=====+======+=====+=======+============+=================+==========================================================+\n", "| 'MAL1' | '91099' | 'G' | 'A' | '-' | 'SYN' | 'PFA0095c' | 'MAL1P1.10,RIF' | 'rifin' |\n", "+--------+---------+-----+------+-----+-------+------------+-----------------+----------------------------------------------------------+\n", "| 'MAL1' | '91104' | 'A' | 'T' | '-' | 'NON' | 'PFA0095c' | 'MAL1P1.10,RIF' | 'rifin' |\n", "+--------+---------+-----+------+-----+-------+------------+-----------------+----------------------------------------------------------+\n", "| 'MAL1' | '93363' | 'T' | 'A' | '-' | 'NON' | 'PFA0100c' | 'MAL1P1.11' | 'Plasmodium exported protein (PHISTa), unknown function' |\n", "+--------+---------+-----+------+-----+-------+------------+-----------------+----------------------------------------------------------+\n", "| 'MAL1' | '93382' | 'T' | 'G' | '-' | 'NON' | 'PFA0100c' | 'MAL1P1.11' | 'Plasmodium exported protein (PHISTa), unknown function' |\n", "+--------+---------+-----+------+-----+-------+------------+-----------------+----------------------------------------------------------+\n", "| 'MAL1' | '93384' | 'G' | 'A' | '-' | 'NON' | 'PFA0100c' | 'MAL1P1.11' | 'Plasmodium exported protein (PHISTa), unknown function' |\n", "+--------+---------+-----+------+-----+-------+------------+-----------------+----------------------------------------------------------+\n", "..." ] } ], "prompt_number": 10 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `fromucsv()` function does not attempt to parse any of the\n", "values from the underlying CSV file, so all values are reported as\n", "strings:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "b_common.display(vrepr=repr)" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
ChrPosRefNrefDerMutGeneIdGeneAliasGeneDescr
'MAL1''91099''G''A''-''SYN''PFA0095c''MAL1P1.10,RIF''rifin'
'MAL1''91104''A''T''-''NON''PFA0095c''MAL1P1.10,RIF''rifin'
'MAL1''93363''T''A''-''NON''PFA0100c''MAL1P1.11''Plasmodium exported protein (PHISTa), unknown function'
'MAL1''93382''T''G''-''NON''PFA0100c''MAL1P1.11''Plasmodium exported protein (PHISTa), unknown function'
'MAL1''93384''G''A''-''NON''PFA0100c''MAL1P1.11''Plasmodium exported protein (PHISTa), unknown function'
\n", "

...

" ], "metadata": {}, "output_type": "display_data" } ], "prompt_number": 11 }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, the 'Pos' field should be interpreted as an integer.\n", "\n", "Also, the 'Mut' field has a different representation in the two\n", "tables, which needs to be converted before the data can be compared:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "a_common.valuecounts('Mut')" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Mutcountfrequency
N711620.6865804123611875
S315350.30425386166507473
-9500.009165725973737783
\n" ], "metadata": {}, "output_type": "pyout", "prompt_number": 12, "text": [ "+-----+-------+----------------------+\n", "| Mut | count | frequency |\n", "+=====+=======+======================+\n", "| 'N' | 71162 | 0.6865804123611875 |\n", "+-----+-------+----------------------+\n", "| 'S' | 31535 | 0.30425386166507473 |\n", "+-----+-------+----------------------+\n", "| '-' | 950 | 0.009165725973737783 |\n", "+-----+-------+----------------------+" ] } ], "prompt_number": 12 }, { "cell_type": "code", "collapsed": false, "input": [ "b_common.valuecounts('Mut')" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Mutcountfrequency
NON708800.6840510336042
SYN327380.31594896639579995
\n" ], "metadata": {}, "output_type": "pyout", "prompt_number": 13, "text": [ "+-------+-------+---------------------+\n", "| Mut | count | frequency |\n", "+=======+=======+=====================+\n", "| 'NON' | 70880 | 0.6840510336042 |\n", "+-------+-------+---------------------+\n", "| 'SYN' | 32738 | 0.31594896639579995 |\n", "+-------+-------+---------------------+" ] } ], "prompt_number": 13 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use the `convert()` function to convert the type of the 'Pos'\n", "field in both tables and the representation of the 'Mut' field in\n", "table ***b***:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "a_conv = a_common.convert('Pos', int)\n", "b_conv = (\n", " b_common\n", " .convert('Pos', int)\n", " .convert('Mut', {'SYN': 'S', 'NON': 'N'})\n", ")" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 14 }, { "cell_type": "code", "collapsed": false, "input": [ "highlight = 'background-color: yellow'\n", "a_conv.display(caption='a', vrepr=repr, td_styles={'Pos': highlight})" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
a
ChrPosRefNrefDerMutGeneIdGeneAliasGeneDescr
'MAL1'91099'G''A''-''S''PFA0095c''MAL1P1.10''rifin'
'MAL1'91104'A''T''-''N''PFA0095c''MAL1P1.10''rifin'
'MAL1'93363'T''A''-''N''PFA0100c''MAL1P1.11''hypothetical protein, conserved in P. falciparum'
'MAL1'93382'T''G''-''N''PFA0100c''MAL1P1.11''hypothetical protein, conserved in P. falciparum'
'MAL1'93384'G''A''-''N''PFA0100c''MAL1P1.11''hypothetical protein, conserved in P. falciparum'
\n", "

...

" ], "metadata": {}, "output_type": "display_data" } ], "prompt_number": 15 }, { "cell_type": "code", "collapsed": false, "input": [ "b_conv.display(caption='b', vrepr=repr, td_styles={'Pos': highlight, 'Mut': highlight})" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
b
ChrPosRefNrefDerMutGeneIdGeneAliasGeneDescr
'MAL1'91099'G''A''-''S''PFA0095c''MAL1P1.10,RIF''rifin'
'MAL1'91104'A''T''-''N''PFA0095c''MAL1P1.10,RIF''rifin'
'MAL1'93363'T''A''-''N''PFA0100c''MAL1P1.11''Plasmodium exported protein (PHISTa), unknown function'
'MAL1'93382'T''G''-''N''PFA0100c''MAL1P1.11''Plasmodium exported protein (PHISTa), unknown function'
'MAL1'93384'G''A''-''N''PFA0100c''MAL1P1.11''Plasmodium exported protein (PHISTa), unknown function'
\n", "

...

" ], "metadata": {}, "output_type": "display_data" } ], "prompt_number": 16 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now the tables are ready for comparison." ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Looking for missing or unexpected rows" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Because both tables should contain the same list of genome locations,\n", "they should have the same number of rows. Use `nrows()` to\n", "compare:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "a_conv.nrows()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 17, "text": [ "103647" ] } ], "prompt_number": 17 }, { "cell_type": "code", "collapsed": false, "input": [ "b_conv.nrows()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 18, "text": [ "103618" ] } ], "prompt_number": 18 }, { "cell_type": "markdown", "metadata": {}, "source": [ "There is some discrepancy. First investigate by comparing just the\n", "genomic locations, defined by the 'Chr' and 'Pos' fields, using\n", "`complement()`:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "a_locs = a_conv.cut('Chr', 'Pos')\n", "b_locs = b_conv.cut('Chr', 'Pos')\n", "locs_only_in_a = a_locs.complement(b_locs)\n", "locs_only_in_a.nrows()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 19, "text": [ "29" ] } ], "prompt_number": 19 }, { "cell_type": "code", "collapsed": false, "input": [ "locs_only_in_a.displayall(caption='a only')" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
a only
ChrPos
MAL1216961
MAL10538210
MAL10548779
MAL101432969
MAL11500289
MAL111119809
MAL111278859
MAL1251827
MAL13183727
MAL13398404
MAL13627342
MAL131216664
MAL132750149
MAL141991758
MAL142297918
MAL142372268
MAL142994810
MAL238577
MAL264017
MAL41094258
MAL51335335
MAL51338718
MAL7670602
MAL7690509
MAL8489937
MAL9416116
MAL9868677
MAL91201970
MAL91475245
\n" ], "metadata": {}, "output_type": "display_data" } ], "prompt_number": 20 }, { "cell_type": "code", "collapsed": false, "input": [ "locs_only_in_b = b_locs.complement(a_locs)\n", "locs_only_in_b.nrows()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 21, "text": [ "0" ] } ], "prompt_number": 21 }, { "cell_type": "markdown", "metadata": {}, "source": [ "So it appears that 29 locations are missing from table ***b***. Export\n", "these missing locations to a CSV file using `toucsv()`:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "locs_only_in_a.tocsv('missing_locations.csv')" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 22 }, { "cell_type": "markdown", "metadata": {}, "source": [ "An alternative method for finding rows in one table where some key\n", "value is not present in another table is to use the `antijoin()`\n", "function:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "locs_only_in_a = a_conv.antijoin(b_conv, key=('Chr', 'Pos'))\n", "locs_only_in_a.nrows()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 23, "text": [ "29" ] } ], "prompt_number": 23 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Finding conflicts" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'd also like to compare the values given in the other fields, to\n", "find any discrepancies between the two tables.\n", "\n", "The simplest way to find conflicts is to `merge()` both tables under a given key:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "ab_merge = etl.merge(a_conv, b_conv, key=('Chr', 'Pos'))\n", "ab_merge.display(caption='ab_merge', \n", " td_styles=lambda v: highlight if isinstance(v, etl.Conflict) else '')" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
ab_merge
ChrPosRefNrefDerMutGeneIdGeneAliasGeneDescr
MAL191099GA-SPFA0095cConflict({'MAL1P1.10', 'MAL1P1.10,RIF'})rifin
MAL191104AT-NPFA0095cConflict({'MAL1P1.10', 'MAL1P1.10,RIF'})rifin
MAL193363TA-NPFA0100cMAL1P1.11Conflict({'Plasmodium exported protein (PHISTa), unknown function', 'hypothetical protein, conserved in P. falciparum'})
MAL193382TG-NPFA0100cMAL1P1.11Conflict({'Plasmodium exported protein (PHISTa), unknown function', 'hypothetical protein, conserved in P. falciparum'})
MAL193384GA-NPFA0100cMAL1P1.11Conflict({'Plasmodium exported protein (PHISTa), unknown function', 'hypothetical protein, conserved in P. falciparum'})
\n", "

...

" ], "metadata": {}, "output_type": "display_data" } ], "prompt_number": 24 }, { "cell_type": "markdown", "metadata": {}, "source": [ "From a glance at the conflicts above, it appears there are\n", "discrepancies in the 'GeneAlias' and 'GeneDescr' fields. There may\n", "also be conflicts in other fields, so we need to investigate further.\n", "\n", "Note that the table ***ab_merge*** will contain all rows, not only those containing conflicts. To find only conflicting rows, use `cat()` then `conflicts()`, e.g.:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "ab = etl.cat(a_conv.addfield('source', 'a', index=0), \n", " b_conv.addfield('source', 'b', index=0))\n", "ab_conflicts = ab.conflicts(key=('Chr', 'Pos'), exclude='source')\n", "ab_conflicts.display(10)" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
sourceChrPosRefNrefDerMutGeneIdGeneAliasGeneDescr
aMAL191099GA-SPFA0095cMAL1P1.10rifin
bMAL191099GA-SPFA0095cMAL1P1.10,RIFrifin
aMAL191104AT-NPFA0095cMAL1P1.10rifin
bMAL191104AT-NPFA0095cMAL1P1.10,RIFrifin
aMAL193363TA-NPFA0100cMAL1P1.11hypothetical protein, conserved in P. falciparum
bMAL193363TA-NPFA0100cMAL1P1.11Plasmodium exported protein (PHISTa), unknown function
aMAL193382TG-NPFA0100cMAL1P1.11hypothetical protein, conserved in P. falciparum
bMAL193382TG-NPFA0100cMAL1P1.11Plasmodium exported protein (PHISTa), unknown function
aMAL193384GA-NPFA0100cMAL1P1.11hypothetical protein, conserved in P. falciparum
bMAL193384GA-NPFA0100cMAL1P1.11Plasmodium exported protein (PHISTa), unknown function
\n", "

...

" ], "metadata": {}, "output_type": "display_data" } ], "prompt_number": 25 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, let's find conflicts in a specific field:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "ab_conflicts_mut = ab.conflicts(key=('Chr', 'Pos'), include='Mut')\n", "ab_conflicts_mut.display(10, caption='Mut conflicts', td_styles={'Mut': highlight})" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Mut conflicts
sourceChrPosRefNrefDerMutGeneIdGeneAliasGeneDescr
aMAL199099GT--PFA0110wMAL1P1.13,Pf155ring-infected erythrocyte surface antigen
bMAL199099GT-NPFA0110wMAL1P1.13,Pf155,RESAring-infected erythrocyte surface antigen
aMAL199211CT--PFA0110wMAL1P1.13,Pf155ring-infected erythrocyte surface antigen
bMAL199211CT-NPFA0110wMAL1P1.13,Pf155,RESAring-infected erythrocyte surface antigen
aMAL1197903CAASPFA0220wMAL1P1.34bubiquitin carboxyl-terminal hydrolase, putative
bMAL1197903CAANPFA0220wPFA0215w,MAL1P1.34bubiquitin carboxyl-terminal hydrolase, putative
aMAL1384429CT-NPFA0485wMAL1P2.26dolichol kinase
bMAL1384429CT-S---
aMAL1513268AG-NPFA0650wMAL1P3.12,MAL1P3.12a,PFA0655wsurface-associated interspersed gene pseudogene, (SURFIN) pseudogene
bMAL1513268AG-SPFA0650wMAL1P3.12,PFA0655,MAL1P3.12a,3D7surf1.2,PFA0655w,MAL1P12asurface-associated interspersed gene (SURFIN), pseudogene
\n", "

...

" ], "metadata": {}, "output_type": "display_data" } ], "prompt_number": 26 }, { "cell_type": "code", "collapsed": false, "input": [ "ab_conflicts_mut.nrows()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 27, "text": [ "3592" ] } ], "prompt_number": 27 }, { "cell_type": "markdown", "metadata": {}, "source": [ "For more information about the `petl` package see the [petl online documentation](http://petl.readthedocs.org)." ] } ], "metadata": {} } ] }petl-1.7.15/examples/notes/issue_219.ipynb000066400000000000000000000315631457414240700203250ustar00rootroot00000000000000{ "metadata": { "name": "" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Using server-side cursors with PostgreSQL and MySQL " ] }, { "cell_type": "code", "collapsed": false, "input": [ "# see http://pynash.org/2013/03/06/timing-and-profiling.html for setup of profiling magics" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "code", "collapsed": false, "input": [ "import sys\n", "sys.path.insert(0, '../src')\n", "import petl; print petl.VERSION\n", "from petl.fluent import etl\n", "import psycopg2\n", "import MySQLdb" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "0.18-SNAPSHOT\n" ] } ], "prompt_number": 2 }, { "cell_type": "code", "collapsed": false, "input": [ "tbl_dummy_data = etl().dummytable(100000)\n", "tbl_dummy_data.look()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 3, "text": [ "+-------+-----------+----------------------+\n", "| 'foo' | 'bar' | 'baz' |\n", "+=======+===========+======================+\n", "| 31 | 'pears' | 0.12509339133627373 |\n", "+-------+-----------+----------------------+\n", "| 90 | 'oranges' | 0.05715662664829624 |\n", "+-------+-----------+----------------------+\n", "| 12 | 'oranges' | 0.8525934855236975 |\n", "+-------+-----------+----------------------+\n", "| 68 | 'apples' | 0.911131148945329 |\n", "+-------+-----------+----------------------+\n", "| 77 | 'apples' | 0.8115001426786242 |\n", "+-------+-----------+----------------------+\n", "| 94 | 'apples' | 0.6671472950408706 |\n", "+-------+-----------+----------------------+\n", "| 55 | 'apples' | 0.003432210002982883 |\n", "+-------+-----------+----------------------+\n", "| 50 | 'apples' | 0.7744929413714756 |\n", "+-------+-----------+----------------------+\n", "| 82 | 'oranges' | 0.46001316056152297 |\n", "+-------+-----------+----------------------+\n", "| 13 | 'bananas' | 0.9602502583307483 |\n", "+-------+-----------+----------------------+\n" ] } ], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [ "%memit print tbl_dummy_data.nrows()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "100000\n", "peak memory: 51.38 MiB, increment: 0.20 MiB\n" ] } ], "prompt_number": 4 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "PostgreSQL" ] }, { "cell_type": "code", "collapsed": false, "input": [ "psql_connection = psycopg2.connect(host='localhost', dbname='petl', user='petl', password='petl')" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 6 }, { "cell_type": "code", "collapsed": false, "input": [ "cursor = psql_connection.cursor()\n", "cursor.execute('DROP TABLE IF EXISTS issue_219;')\n", "cursor.execute('CREATE TABLE issue_219 (foo INTEGER, bar TEXT, baz FLOAT);')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "%memit -r1 tbl_dummy_data.progress(10000).todb(psql_connection, 'issue_219')" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stderr", "text": [ "10000 rows in 3.61s (2770 row/s); batch in 3.61s (2770 row/s)\n", "20000 rows in 7.21s (2774 row/s); batch in 3.60s (2778 row/s)" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "30000 rows in 10.96s (2736 row/s); batch in 3.75s (2663 row/s)" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "40000 rows in 14.69s (2723 row/s); batch in 3.72s (2685 row/s)" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "50000 rows in 18.32s (2728 row/s); batch in 3.64s (2748 row/s)" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "60000 rows in 22.04s (2722 row/s); batch in 3.72s (2689 row/s)" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "70000 rows in 25.76s (2717 row/s); batch in 3.72s (2687 row/s)" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "80000 rows in 29.58s (2704 row/s); batch in 3.82s (2617 row/s)" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "90000 rows in 33.41s (2693 row/s); batch in 3.83s (2613 row/s)" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "100000 rows in 37.14s (2692 row/s); batch in 3.73s (2680 row/s)" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "peak memory: 53.32 MiB, increment: 0.01 MiB\n" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "100000 rows in 37.14s (2692 row/s)\n" ] } ], "prompt_number": 7 }, { "cell_type": "code", "collapsed": false, "input": [ "# memory usage using default cursor\n", "%memit print etl.fromdb(psql_connection, 'select * from issue_219 order by foo').look(2)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "+-------+-----------+-------------------+\n", "| 'foo' | 'bar' | 'baz' |\n", "+=======+===========+===================+\n", "| 0 | 'pears' | 0.625346298507174 |\n", "+-------+-----------+-------------------+\n", "| 0 | 'bananas' | 0.191535466509102 |\n", "+-------+-----------+-------------------+\n", "\n", "peak memory: 60.89 MiB, increment: 7.32 MiB\n" ] } ], "prompt_number": 8 }, { "cell_type": "code", "collapsed": false, "input": [ "# memory usage using server-side cursor\n", "%memit print etl.fromdb(lambda: psql_connection.cursor(name='server-side'), 'select * from issue_219 order by foo').look(2)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "+-------+-----------+-------------------+\n", "| 'foo' | 'bar' | 'baz' |\n", "+=======+===========+===================+\n", "| 0 | 'pears' | 0.625346298507174 |\n", "+-------+-----------+-------------------+\n", "| 0 | 'bananas' | 0.191535466509102 |\n", "+-------+-----------+-------------------+\n", "\n", "peak memory: 54.38 MiB, increment: 0.00 MiB\n" ] } ], "prompt_number": 9 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "MySQL" ] }, { "cell_type": "code", "collapsed": false, "input": [ "mysql_connection = MySQLdb.connect(host='127.0.0.1', db='petl', user='petl', passwd='petl')" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 10 }, { "cell_type": "code", "collapsed": false, "input": [ "cursor = mysql_connection.cursor()\n", "cursor.execute('SET SQL_MODE=ANSI_QUOTES')\n", "cursor.execute('DROP TABLE IF EXISTS issue_219;')\n", "cursor.execute('CREATE TABLE issue_219 (foo INTEGER, bar TEXT, baz FLOAT);')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 11, "text": [ "0L" ] } ], "prompt_number": 11 }, { "cell_type": "code", "collapsed": false, "input": [ "%memit -r1 tbl_dummy_data.progress(10000).todb(mysql_connection, 'issue_219')" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stderr", "text": [ "10000 rows in 2.96s (3373 row/s); batch in 2.96s (3373 row/s)\n", "20000 rows in 6.22s (3214 row/s); batch in 3.26s (3069 row/s)" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "30000 rows in 9.45s (3174 row/s); batch in 3.23s (3097 row/s)" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "40000 rows in 12.50s (3199 row/s); batch in 3.05s (3278 row/s)" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "50000 rows in 15.60s (3205 row/s); batch in 3.10s (3229 row/s)" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "60000 rows in 18.65s (3217 row/s); batch in 3.05s (3280 row/s)" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "70000 rows in 21.90s (3196 row/s); batch in 3.25s (3072 row/s)" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "80000 rows in 24.88s (3215 row/s); batch in 2.98s (3353 row/s)" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "90000 rows in 28.08s (3205 row/s); batch in 3.19s (3130 row/s)" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "100000 rows in 31.24s (3200 row/s); batch in 3.17s (3158 row/s)" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "peak memory: 54.77 MiB, increment: 0.01 MiB\n" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "100000 rows in 31.24s (3200 row/s)\n" ] } ], "prompt_number": 12 }, { "cell_type": "code", "collapsed": false, "input": [ "# memory usage with default cursor\n", "%memit print etl.fromdb(mysql_connection, 'select * from issue_219 order by foo').look(2)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "+-------+-----------+-----------+\n", "| 'foo' | 'bar' | 'baz' |\n", "+=======+===========+===========+\n", "| 0L | 'bananas' | 0.191535 |\n", "+-------+-----------+-----------+\n", "| 0L | 'bananas' | 0.0228774 |\n", "+-------+-----------+-----------+\n", "\n", "peak memory: 79.88 MiB, increment: 25.11 MiB\n" ] } ], "prompt_number": 13 }, { "cell_type": "code", "collapsed": false, "input": [ "# memory usage with server-side cursor\n", "%memit print etl.fromdb(lambda: mysql_connection.cursor(MySQLdb.cursors.SSCursor), 'select * from issue_219 order by foo').look(2)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "+-------+-----------+-----------+\n", "| 'foo' | 'bar' | 'baz' |\n", "+=======+===========+===========+\n", "| 0L | 'bananas' | 0.191535 |\n", "+-------+-----------+-----------+\n", "| 0L | 'bananas' | 0.0228774 |\n", "+-------+-----------+-----------+\n", "\n", "peak memory: 80.00 MiB, increment: 0.09 MiB\n" ] } ], "prompt_number": 15 } ], "metadata": {} } ] }petl-1.7.15/examples/notes/issue_219.py000066400000000000000000000036701457414240700176320ustar00rootroot00000000000000# -*- coding: utf-8 -*- # 3.0 # # Using server-side cursors with PostgreSQL and MySQL # # see http://pynash.org/2013/03/06/timing-and-profiling.html for setup of profiling magics # import sys sys.path.insert(0, '../src') import petl; print petl.VERSION from petl.fluent import etl import psycopg2 import MySQLdb # tbl_dummy_data = etl().dummytable(100000) tbl_dummy_data.look() # print tbl_dummy_data.nrows() # # PostgreSQL # psql_connection = psycopg2.connect(host='localhost', dbname='petl', user='petl', password='petl') # cursor = psql_connection.cursor() cursor.execute('DROP TABLE IF EXISTS issue_219;') cursor.execute('CREATE TABLE issue_219 (foo INTEGER, bar TEXT, baz FLOAT);') # tbl_dummy_data.progress(10000).todb(psql_connection, 'issue_219') # # memory usage using default cursor print etl.fromdb(psql_connection, 'select * from issue_219 order by foo').look(2) # # memory usage using server-side cursor print etl.fromdb(lambda: psql_connection.cursor(name='server-side'), 'select * from issue_219 order by foo').look(2) # # MySQL # mysql_connection = MySQLdb.connect(host='127.0.0.1', db='petl', user='petl', passwd='petl') # cursor = mysql_connection.cursor() cursor.execute('SET SQL_MODE=ANSI_QUOTES') cursor.execute('DROP TABLE IF EXISTS issue_219;') cursor.execute('CREATE TABLE issue_219 (foo INTEGER, bar TEXT, baz FLOAT);') # tbl_dummy_data.progress(10000).todb(mysql_connection, 'issue_219') # # memory usage with default cursor print etl.fromdb(mysql_connection, 'select * from issue_219 order by foo').look(2) # # memory usage with server-side cursor print etl.fromdb(lambda: mysql_connection.cursor(MySQLdb.cursors.SSCursor), 'select * from issue_219 order by foo').look(2) petl-1.7.15/examples/notes/issue_256.ipynb000066400000000000000000000134111457414240700203160ustar00rootroot00000000000000{ "metadata": { "name": "", "signature": "sha256:de89820c5fab8cef559e00bba60995491ff1479456a94e72176224881326d1d8" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Notes supporting [issue #256](https://github.com/alimanfoo/petl/issues/256)." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import petl.interactive as etl" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "code", "collapsed": false, "input": [ "t1 = etl.wrap([['foo', 'bar'], [1, 'a'], [2, 'b']])\n", "t1" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "
foobar
1a
2b
\r\n" ], "metadata": {}, "output_type": "pyout", "prompt_number": 2, "text": [ "+-------+-------+\n", "| 'foo' | 'bar' |\n", "+=======+=======+\n", "| 1 | 'a' |\n", "+-------+-------+\n", "| 2 | 'b' |\n", "+-------+-------+" ] } ], "prompt_number": 2 }, { "cell_type": "code", "collapsed": false, "input": [ "t2 = etl.wrap([['foo', 'bar'], [1, 'a'], [2, 'c']])\n", "t2" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "
foobar
1a
2c
\r\n" ], "metadata": {}, "output_type": "pyout", "prompt_number": 3, "text": [ "+-------+-------+\n", "| 'foo' | 'bar' |\n", "+=======+=======+\n", "| 1 | 'a' |\n", "+-------+-------+\n", "| 2 | 'c' |\n", "+-------+-------+" ] } ], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [ "t3 = etl.merge(t1, t2, key='foo')\n", "t3" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "
foobar
1a
2Conflict(['c', 'b'])
\r\n" ], "metadata": {}, "output_type": "pyout", "prompt_number": 5, "text": [ "+-------+----------------------+\n", "| 'foo' | 'bar' |\n", "+=======+======================+\n", "| 1 | 'a' |\n", "+-------+----------------------+\n", "| 2 | Conflict(['c', 'b']) |\n", "+-------+----------------------+" ] } ], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The problem with the above is that you cannot tell from inspecting *t3* alone which conflicting value comes from which source.\n", "\n", "A workaround as suggested by [@pawl](https://github.com/pawl) is to use the [*conflicts()*](http://petl.readthedocs.org/en/latest/#petl.conflicts) function, e.g.: " ] }, { "cell_type": "code", "collapsed": false, "input": [ "t4 = (etl\n", " .cat(\n", " t1.addfield('source', 1),\n", " t2.addfield('source', 2)\n", " )\n", " .conflicts(key='foo', exclude='source')\n", ")\n", "t4" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "\r\n", "
foobarsource
2b1
2c2
\r\n" ], "metadata": {}, "output_type": "pyout", "prompt_number": 9, "text": [ "+-------+-------+----------+\n", "| 'foo' | 'bar' | 'source' |\n", "+=======+=======+==========+\n", "| 2 | 'b' | 1 |\n", "+-------+-------+----------+\n", "| 2 | 'c' | 2 |\n", "+-------+-------+----------+" ] } ], "prompt_number": 9 } ], "metadata": {} } ] }petl-1.7.15/examples/notes/issue_256.py000066400000000000000000000015361457414240700176320ustar00rootroot00000000000000# -*- coding: utf-8 -*- # 3.0 # # Notes supporting [issue #256](https://github.com/alimanfoo/petl/issues/256). # import petl.interactive as etl # t1 = etl.wrap([['foo', 'bar'], [1, 'a'], [2, 'b']]) t1 # t2 = etl.wrap([['foo', 'bar'], [1, 'a'], [2, 'c']]) t2 # t3 = etl.merge(t1, t2, key='foo') t3 # # The problem with the above is that you cannot tell from inspecting *t3* alone which conflicting value comes from which source. # # A workaround as suggested by [@pawl](https://github.com/pawl) is to use the [*conflicts()*](http://petl.readthedocs.org/en/latest/#petl.conflicts) function, e.g.: # t4 = (etl .cat( t1.addfield('source', 1), t2.addfield('source', 2) ) .conflicts(key='foo', exclude='source') ) t4 petl-1.7.15/examples/transform/000077500000000000000000000000001457414240700164125ustar00rootroot00000000000000petl-1.7.15/examples/transform/basics.py000066400000000000000000000103351457414240700202320ustar00rootroot00000000000000from __future__ import division, print_function, absolute_import # cut() ####### import petl as etl table1 = [['foo', 'bar', 'baz'], ['A', 1, 2.7], ['B', 2, 3.4], ['B', 3, 7.8], ['D', 42, 9.0], ['E', 12]] table2 = etl.cut(table1, 'foo', 'baz') table2 # fields can also be specified by index, starting from zero table3 = etl.cut(table1, 0, 2) table3 # field names and indices can be mixed table4 = etl.cut(table1, 'bar', 0) table4 # select a range of fields table5 = etl.cut(table1, *range(0, 2)) table5 # cutout() ########## import petl as etl table1 = [['foo', 'bar', 'baz'], ['A', 1, 2.7], ['B', 2, 3.4], ['B', 3, 7.8], ['D', 42, 9.0], ['E', 12]] table2 = etl.cutout(table1, 'bar') table2 # cat() ####### import petl as etl table1 = [['foo', 'bar'], [1, 'A'], [2, 'B']] table2 = [['bar', 'baz'], ['C', True], ['D', False]] table3 = etl.cat(table1, table2) table3 # can also be used to square up a single table with uneven rows table4 = [['foo', 'bar', 'baz'], ['A', 1, 2], ['B', '2', '3.4'], [u'B', u'3', u'7.8', True], ['D', 'xyz', 9.0], ['E', None]] table5 = etl.cat(table4) table5 # use the header keyword argument to specify a fixed set of fields table6 = [['bar', 'foo'], ['A', 1], ['B', 2]] table7 = etl.cat(table6, header=['A', 'foo', 'B', 'bar', 'C']) table7 # using the header keyword argument with two input tables table8 = [['bar', 'foo'], ['A', 1], ['B', 2]] table9 = [['bar', 'baz'], ['C', True], ['D', False]] table10 = etl.cat(table8, table9, header=['A', 'foo', 'B', 'bar', 'C']) table10 # addfield() ############ import petl as etl table1 = [['foo', 'bar'], ['M', 12], ['F', 34], ['-', 56]] # using a fixed value table2 = etl.addfield(table1, 'baz', 42) table2 # calculating the value table2 = etl.addfield(table1, 'baz', lambda rec: rec['bar'] * 2) table2 # rowslice() ############ import petl as etl table1 = [['foo', 'bar'], ['a', 1], ['b', 2], ['c', 5], ['d', 7], ['f', 42]] table2 = etl.rowslice(table1, 2) table2 table3 = etl.rowslice(table1, 1, 4) table3 table4 = etl.rowslice(table1, 0, 5, 2) table4 # head() ######## import petl as etl table1 = [['foo', 'bar'], ['a', 1], ['b', 2], ['c', 5], ['d', 7], ['f', 42], ['f', 3], ['h', 90]] table2 = etl.head(table1, 4) table2 # tail() ######## import petl as etl table1 = [['foo', 'bar'], ['a', 1], ['b', 2], ['c', 5], ['d', 7], ['f', 42], ['f', 3], ['h', 90], ['k', 12], ['l', 77], ['q', 2]] table2 = etl.tail(table1, 4) table2 # skipcomments() ################ import petl as etl table1 = [['##aaa', 'bbb', 'ccc'], ['##mmm',], ['#foo', 'bar'], ['##nnn', 1], ['a', 1], ['b', 2]] table2 = etl.skipcomments(table1, '##') table2 # annex() ######### import petl as etl table1 = [['foo', 'bar'], ['A', 9], ['C', 2], ['F', 1]] table2 = [['foo', 'baz'], ['B', 3], ['D', 10]] table3 = etl.annex(table1, table2) table3 # addrownumbers() ################# import petl as etl table1 = [['foo', 'bar'], ['A', 9], ['C', 2], ['F', 1]] table2 = etl.addrownumbers(table1) table2 # addcolumn() ############# import petl as etl table1 = [['foo', 'bar'], ['A', 1], ['B', 2]] col = [True, False] table2 = etl.addcolumn(table1, 'baz', col) table2 # addfieldusingcontext() ######################## import petl as etl table1 = [['foo', 'bar'], ['A', 1], ['B', 4], ['C', 5], ['D', 9]] def upstream(prv, cur, nxt): if prv is None: return None else: return cur.bar - prv.bar def downstream(prv, cur, nxt): if nxt is None: return None else: return nxt.bar - cur.bar table2 = etl.addfieldusingcontext(table1, 'baz', upstream) table3 = etl.addfieldusingcontext(table2, 'quux', downstream) table3 petl-1.7.15/examples/transform/conversions.py000066400000000000000000000033311457414240700213340ustar00rootroot00000000000000from __future__ import division, print_function, absolute_import # convert() ########### import petl as etl table1 = [['foo', 'bar', 'baz'], ['A', '2.4', 12], ['B', '5.7', 34], ['C', '1.2', 56]] # using a built-in function: table2 = etl.convert(table1, 'bar', float) table2 # using a lambda function:: table3 = etl.convert(table1, 'baz', lambda v: v*2) table3 # a method of the data value can also be invoked by passing # the method name table4 = etl.convert(table1, 'foo', 'lower') table4 # arguments to the method invocation can also be given table5 = etl.convert(table1, 'foo', 'replace', 'A', 'AA') table5 # values can also be translated via a dictionary table7 = etl.convert(table1, 'foo', {'A': 'Z', 'B': 'Y'}) table7 # the same conversion can be applied to multiple fields table8 = etl.convert(table1, ('foo', 'bar', 'baz'), str) table8 # multiple conversions can be specified at the same time table9 = etl.convert(table1, {'foo': 'lower', 'bar': float, 'baz': lambda v: v * 2}) table9 # ...or alternatively via a list table10 = etl.convert(table1, ['lower', float, lambda v: v*2]) table10 # conversion can be conditional table11 = etl.convert(table1, 'baz', lambda v: v * 2, where=lambda r: r.foo == 'B') table11 # conversion can access other values from the same row table12 = etl.convert(table1, 'baz', lambda v, row: v * float(row.bar), pass_row=True) table12 # convertnumbers() ################## import petl as etl table1 = [['foo', 'bar', 'baz', 'quux'], ['1', '3.0', '9+3j', 'aaa'], ['2', '1.3', '7+2j', None]] table2 = etl.convertnumbers(table1) table2 petl-1.7.15/examples/transform/dedup.py000066400000000000000000000024541457414240700200720ustar00rootroot00000000000000from __future__ import division, print_function, absolute_import # duplicates() ############## import petl as etl table1 = [['foo', 'bar', 'baz'], ['A', 1, 2.0], ['B', 2, 3.4], ['D', 6, 9.3], ['B', 3, 7.8], ['B', 2, 12.3], ['E', None, 1.3], ['D', 4, 14.5]] table2 = etl.duplicates(table1, 'foo') table2 # compound keys are supported table3 = etl.duplicates(table1, key=['foo', 'bar']) table3 # unique() ########## import petl as etl table1 = [['foo', 'bar', 'baz'], ['A', 1, 2], ['B', '2', '3.4'], ['D', 'xyz', 9.0], ['B', u'3', u'7.8'], ['B', '2', 42], ['E', None, None], ['D', 4, 12.3], ['F', 7, 2.3]] table2 = etl.unique(table1, 'foo') table2 # conflicts() ############# import petl as etl table1 = [['foo', 'bar', 'baz'], ['A', 1, 2.7], ['B', 2, None], ['D', 3, 9.4], ['B', None, 7.8], ['E', None], ['D', 3, 12.3], ['A', 2, None]] table2 = etl.conflicts(table1, 'foo') table2 # isunique() ############ import petl as etl table1 = [['foo', 'bar'], ['a', 1], ['b'], ['b', 2], ['c', 3, True]] etl.isunique(table1, 'foo') etl.isunique(table1, 'bar') petl-1.7.15/examples/transform/fills.py000066400000000000000000000021421457414240700200740ustar00rootroot00000000000000from __future__ import division, print_function, absolute_import # filldown() ############ import petl as etl table1 = [['foo', 'bar', 'baz'], [1, 'a', None], [1, None, .23], [1, 'b', None], [2, None, None], [2, None, .56], [2, 'c', None], [None, 'c', .72]] table2 = etl.filldown(table1) table2.lookall() table3 = etl.filldown(table1, 'bar') table3.lookall() table4 = etl.filldown(table1, 'bar', 'baz') table4.lookall() # fillright() ############# import petl as etl table1 = [['foo', 'bar', 'baz'], [1, 'a', None], [1, None, .23], [1, 'b', None], [2, None, None], [2, None, .56], [2, 'c', None], [None, 'c', .72]] table2 = etl.fillright(table1) table2.lookall() # fillleft() ############ import petl as etl table1 = [['foo', 'bar', 'baz'], [1, 'a', None], [1, None, .23], [1, 'b', None], [2, None, None], [2, None, .56], [2, 'c', None], [None, 'c', .72]] table2 = etl.fillleft(table1) table2.lookall() petl-1.7.15/examples/transform/headers.py000066400000000000000000000021471457414240700204030ustar00rootroot00000000000000from __future__ import division, print_function, absolute_import # rename() ########## import petl as etl table1 = [['sex', 'age'], ['m', 12], ['f', 34], ['-', 56]] # rename a single field table2 = etl.rename(table1, 'sex', 'gender') table2 # rename multiple fields by passing a dictionary as the second argument table3 = etl.rename(table1, {'sex': 'gender', 'age': 'age_years'}) table3 # setheader() ############# import petl as etl table1 = [['foo', 'bar'], ['a', 1], ['b', 2]] table2 = etl.setheader(table1, ['foofoo', 'barbar']) table2 # extendheader() ################ import petl as etl table1 = [['foo'], ['a', 1, True], ['b', 2, False]] table2 = etl.extendheader(table1, ['bar', 'baz']) table2 # pushheader() ############## import petl as etl table1 = [['a', 1], ['b', 2]] table2 = etl.pushheader(table1, ['foo', 'bar']) table2 # skip() ######### import petl as etl table1 = [['#aaa', 'bbb', 'ccc'], ['#mmm'], ['foo', 'bar'], ['a', 1], ['b', 2]] table2 = etl.skip(table1, 2) table2 petl-1.7.15/examples/transform/intervals.py000066400000000000000000000072051457414240700207770ustar00rootroot00000000000000from __future__ import division, print_function, absolute_import # intervallookup() ################## import petl as etl table = [['start', 'stop', 'value'], [1, 4, 'foo'], [3, 7, 'bar'], [4, 9, 'baz']] lkp = etl.intervallookup(table, 'start', 'stop') lkp.search(0, 1) lkp.search(1, 2) lkp.search(2, 4) lkp.search(2, 5) lkp.search(9, 14) lkp.search(19, 140) lkp.search(0) lkp.search(1) lkp.search(2) lkp.search(4) lkp.search(5) import petl as etl table = [['start', 'stop', 'value'], [1, 4, 'foo'], [3, 7, 'bar'], [4, 9, 'baz']] lkp = etl.intervallookup(table, 'start', 'stop', include_stop=True, value='value') lkp.search(0, 1) lkp.search(1, 2) lkp.search(2, 4) lkp.search(2, 5) lkp.search(9, 14) lkp.search(19, 140) lkp.search(0) lkp.search(1) lkp.search(2) lkp.search(4) lkp.search(5) # intervallookupone() ##################### import petl as etl table = [['start', 'stop', 'value'], [1, 4, 'foo'], [3, 7, 'bar'], [4, 9, 'baz']] lkp = etl.intervallookupone(table, 'start', 'stop', strict=False) lkp.search(0, 1) lkp.search(1, 2) lkp.search(2, 4) lkp.search(2, 5) lkp.search(9, 14) lkp.search(19, 140) lkp.search(0) lkp.search(1) lkp.search(2) lkp.search(4) lkp.search(5) # facetintervallookup() ####################### import petl as etl table = (('type', 'start', 'stop', 'value'), ('apple', 1, 4, 'foo'), ('apple', 3, 7, 'bar'), ('orange', 4, 9, 'baz')) lkp = etl.facetintervallookup(table, key='type', start='start', stop='stop') lkp['apple'].search(1, 2) lkp['apple'].search(2, 4) lkp['apple'].search(2, 5) lkp['orange'].search(2, 5) lkp['orange'].search(9, 14) lkp['orange'].search(19, 140) lkp['apple'].search(1) lkp['apple'].search(2) lkp['apple'].search(4) lkp['apple'].search(5) lkp['orange'].search(5) # intervaljoin() ################ import petl as etl left = [['begin', 'end', 'quux'], [1, 2, 'a'], [2, 4, 'b'], [2, 5, 'c'], [9, 14, 'd'], [1, 1, 'e'], [10, 10, 'f']] right = [['start', 'stop', 'value'], [1, 4, 'foo'], [3, 7, 'bar'], [4, 9, 'baz']] table1 = etl.intervaljoin(left, right, lstart='begin', lstop='end', rstart='start', rstop='stop') table1.lookall() # include stop coordinate in intervals table2 = etl.intervaljoin(left, right, lstart='begin', lstop='end', rstart='start', rstop='stop', include_stop=True) table2.lookall() # with facet key import petl as etl left = (('fruit', 'begin', 'end'), ('apple', 1, 2), ('apple', 2, 4), ('apple', 2, 5), ('orange', 2, 5), ('orange', 9, 14), ('orange', 19, 140), ('apple', 1, 1)) right = (('type', 'start', 'stop', 'value'), ('apple', 1, 4, 'foo'), ('apple', 3, 7, 'bar'), ('orange', 4, 9, 'baz')) table3 = etl.intervaljoin(left, right, lstart='begin', lstop='end', lkey='fruit', rstart='start', rstop='stop', rkey='type') table3.lookall() # intervalleftjoin() #################### import petl as etl left = [['begin', 'end', 'quux'], [1, 2, 'a'], [2, 4, 'b'], [2, 5, 'c'], [9, 14, 'd'], [1, 1, 'e'], [10, 10, 'f']] right = [['start', 'stop', 'value'], [1, 4, 'foo'], [3, 7, 'bar'], [4, 9, 'baz']] table1 = etl.intervalleftjoin(left, right, lstart='begin', lstop='end', rstart='start', rstop='stop') table1.lookall() petl-1.7.15/examples/transform/joins.py000066400000000000000000000066661457414240700201240ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division # join() ######## import petl as etl table1 = [['id', 'colour'], [1, 'blue'], [2, 'red'], [3, 'purple']] table2 = [['id', 'shape'], [1, 'circle'], [3, 'square'], [4, 'ellipse']] table3 = etl.join(table1, table2, key='id') table3 # if no key is given, a natural join is tried table4 = etl.join(table1, table2) table4 # note behaviour if the key is not unique in either or both tables table5 = [['id', 'colour'], [1, 'blue'], [1, 'red'], [2, 'purple']] table6 = [['id', 'shape'], [1, 'circle'], [1, 'square'], [2, 'ellipse']] table7 = etl.join(table5, table6, key='id') table7 # compound keys are supported table8 = [['id', 'time', 'height'], [1, 1, 12.3], [1, 2, 34.5], [2, 1, 56.7]] table9 = [['id', 'time', 'weight'], [1, 2, 4.5], [2, 1, 6.7], [2, 2, 8.9]] table10 = etl.join(table8, table9, key=['id', 'time']) table10 # leftjoin() ############ import petl as etl table1 = [['id', 'colour'], [1, 'blue'], [2, 'red'], [3, 'purple']] table2 = [['id', 'shape'], [1, 'circle'], [3, 'square'], [4, 'ellipse']] table3 = etl.leftjoin(table1, table2, key='id') table3 # rightjoin() ############# import petl as etl table1 = [['id', 'colour'], [1, 'blue'], [2, 'red'], [3, 'purple']] table2 = [['id', 'shape'], [1, 'circle'], [3, 'square'], [4, 'ellipse']] table3 = etl.rightjoin(table1, table2, key='id') table3 # outerjoin() ############# import petl as etl table1 = [['id', 'colour'], [1, 'blue'], [2, 'red'], [3, 'purple']] table2 = [['id', 'shape'], [1, 'circle'], [3, 'square'], [4, 'ellipse']] table3 = etl.outerjoin(table1, table2, key='id') table3 # crossjoin() ############# import petl as etl table1 = [['id', 'colour'], [1, 'blue'], [2, 'red']] table2 = [['id', 'shape'], [1, 'circle'], [3, 'square']] table3 = etl.crossjoin(table1, table2) table3 # antijoin() ############ import petl as etl table1 = [['id', 'colour'], [0, 'black'], [1, 'blue'], [2, 'red'], [4, 'yellow'], [5, 'white']] table2 = [['id', 'shape'], [1, 'circle'], [3, 'square']] table3 = etl.antijoin(table1, table2, key='id') table3 # lookupjoin() ############## import petl as etl table1 = [['id', 'color', 'cost'], [1, 'blue', 12], [2, 'red', 8], [3, 'purple', 4]] table2 = [['id', 'shape', 'size'], [1, 'circle', 'big'], [1, 'circle', 'small'], [2, 'square', 'tiny'], [2, 'square', 'big'], [3, 'ellipse', 'small'], [3, 'ellipse', 'tiny']] table3 = etl.lookupjoin(table1, table2, key='id') table3 # unjoin() ########## import petl as etl # join key is present in the table table1 = (('foo', 'bar', 'baz'), ('A', 1, 'apple'), ('B', 1, 'apple'), ('C', 2, 'orange')) table2, table3 = etl.unjoin(table1, 'baz', key='bar') table2 table3 # an integer join key can also be reconstructed table4 = (('foo', 'bar'), ('A', 'apple'), ('B', 'apple'), ('C', 'orange')) table5, table6 = etl.unjoin(table4, 'bar') table5 table6 petl-1.7.15/examples/transform/maps.py000066400000000000000000000040731457414240700177300ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division # fieldmap() ############ import petl as etl from collections import OrderedDict table1 = [['id', 'sex', 'age', 'height', 'weight'], [1, 'male', 16, 1.45, 62.0], [2, 'female', 19, 1.34, 55.4], [3, 'female', 17, 1.78, 74.4], [4, 'male', 21, 1.33, 45.2], [5, '-', 25, 1.65, 51.9]] mappings = OrderedDict() # rename a field mappings['subject_id'] = 'id' # translate a field mappings['gender'] = 'sex', {'male': 'M', 'female': 'F'} # apply a calculation to a field mappings['age_months'] = 'age', lambda v: v * 12 # apply a calculation to a combination of fields mappings['bmi'] = lambda rec: rec['weight'] / rec['height']**2 # transform and inspect the output table2 = etl.fieldmap(table1, mappings) table2 # rowmap() ########## import petl as etl table1 = [['id', 'sex', 'age', 'height', 'weight'], [1, 'male', 16, 1.45, 62.0], [2, 'female', 19, 1.34, 55.4], [3, 'female', 17, 1.78, 74.4], [4, 'male', 21, 1.33, 45.2], [5, '-', 25, 1.65, 51.9]] def rowmapper(row): transmf = {'male': 'M', 'female': 'F'} return [row[0], transmf[row['sex']] if row['sex'] in transmf else None, row.age * 12, row.height / row.weight ** 2] table2 = etl.rowmap(table1, rowmapper, fields=['subject_id', 'gender', 'age_months', 'bmi']) table2 # rowmapmany() ############## import petl as etl table1 = [['id', 'sex', 'age', 'height', 'weight'], [1, 'male', 16, 1.45, 62.0], [2, 'female', 19, 1.34, 55.4], [3, '-', 17, 1.78, 74.4], [4, 'male', 21, 1.33]] def rowgenerator(row): transmf = {'male': 'M', 'female': 'F'} yield [row[0], 'gender', transmf[row['sex']] if row['sex'] in transmf else None] yield [row[0], 'age_months', row.age * 12] yield [row[0], 'bmi', row.height / row.weight ** 2] table2 = etl.rowmapmany(table1, rowgenerator, fields=['subject_id', 'variable', 'value']) table2.lookall() petl-1.7.15/examples/transform/reductions.py000066400000000000000000000045341457414240700211510ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division # rowreduce() ############# import petl as etl table1 = [['foo', 'bar'], ['a', 3], ['a', 7], ['b', 2], ['b', 1], ['b', 9], ['c', 4]] def sumbar(key, rows): return [key, sum(row[1] for row in rows)] table2 = etl.rowreduce(table1, key='foo', reducer=sumbar, fields=['foo', 'barsum']) table2 # aggregate() ############# import petl as etl table1 = [['foo', 'bar', 'baz'], ['a', 3, True], ['a', 7, False], ['b', 2, True], ['b', 2, False], ['b', 9, False], ['c', 4, True]] # aggregate whole rows table2 = etl.aggregate(table1, 'foo', len) table2 # aggregate single field table3 = etl.aggregate(table1, 'foo', sum, 'bar') table3 # alternative signature using keyword args table4 = etl.aggregate(table1, key=('foo', 'bar'), aggregation=list, value=('bar', 'baz')) table4 # aggregate multiple fields from collections import OrderedDict import petl as etl aggregation = OrderedDict() aggregation['count'] = len aggregation['minbar'] = 'bar', min aggregation['maxbar'] = 'bar', max aggregation['sumbar'] = 'bar', sum # default aggregation function is list aggregation['listbar'] = 'bar' aggregation['listbarbaz'] = ('bar', 'baz'), list aggregation['bars'] = 'bar', etl.strjoin(', ') table5 = etl.aggregate(table1, 'foo', aggregation) table5 # mergeduplicates() ################### import petl as etl table1 = [['foo', 'bar', 'baz'], ['A', 1, 2.7], ['B', 2, None], ['D', 3, 9.4], ['B', None, 7.8], ['E', None, 42.], ['D', 3, 12.3], ['A', 2, None]] table2 = etl.mergeduplicates(table1, 'foo') table2 # merge() ######### import petl as etl table1 = [['foo', 'bar', 'baz'], [1, 'A', True], [2, 'B', None], [4, 'C', True]] table2 = [['bar', 'baz', 'quux'], ['A', True, 42.0], ['B', False, 79.3], ['C', False, 12.4]] table3 = etl.merge(table1, table2, key='bar') table3 # fold() ######## import petl as etl table1 = [['id', 'count'], [1, 3], [1, 5], [2, 4], [2, 8]] import operator table2 = etl.fold(table1, 'id', operator.add, 'count', presorted=True) table2 petl-1.7.15/examples/transform/regex.py000066400000000000000000000032061457414240700200770ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division # capture() ############ import petl as etl table1 = [['id', 'variable', 'value'], ['1', 'A1', '12'], ['2', 'A2', '15'], ['3', 'B1', '18'], ['4', 'C12', '19']] table2 = etl.capture(table1, 'variable', '(\\w)(\\d+)', ['treat', 'time']) table2 # using the include_original argument table3 = etl.capture(table1, 'variable', '(\\w)(\\d+)', ['treat', 'time'], include_original=True) table3 # split() ######### import petl as etl table1 = [['id', 'variable', 'value'], ['1', 'parad1', '12'], ['2', 'parad2', '15'], ['3', 'tempd1', '18'], ['4', 'tempd2', '19']] table2 = etl.split(table1, 'variable', 'd', ['variable', 'day']) table2 # search() ########## import petl as etl table1 = [['foo', 'bar', 'baz'], ['orange', 12, 'oranges are nice fruit'], ['mango', 42, 'I like them'], ['banana', 74, 'lovely too'], ['cucumber', 41, 'better than mango']] # search any field table2 = etl.search(table1, '.g.') table2 # search a specific field table3 = etl.search(table1, 'foo', '.g.') table3 # searchcomplement() #################### import petl as etl table1 = [['foo', 'bar', 'baz'], ['orange', 12, 'oranges are nice fruit'], ['mango', 42, 'I like them'], ['banana', 74, 'lovely too'], ['cucumber', 41, 'better than mango']] # search any field table2 = etl.searchcomplement(table1, '.g.') table2 # search a specific field table3 = etl.searchcomplement(table1, 'foo', '.g.') table3 petl-1.7.15/examples/transform/reshape.py000066400000000000000000000074031457414240700204170ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division # melt() ######## import petl as etl table1 = [['id', 'gender', 'age'], [1, 'F', 12], [2, 'M', 17], [3, 'M', 16]] table2 = etl.melt(table1, 'id') table2.lookall() # compound keys are supported table3 = [['id', 'time', 'height', 'weight'], [1, 11, 66.4, 12.2], [2, 16, 53.2, 17.3], [3, 12, 34.5, 9.4]] table4 = etl.melt(table3, key=['id', 'time']) table4.lookall() # a subset of variable fields can be selected table5 = etl.melt(table3, key=['id', 'time'], variables=['height']) table5.lookall() # recast() ########## import petl as etl table1 = [['id', 'variable', 'value'], [3, 'age', 16], [1, 'gender', 'F'], [2, 'gender', 'M'], [2, 'age', 17], [1, 'age', 12], [3, 'gender', 'M']] table2 = etl.recast(table1) table2 # specifying variable and value fields table3 = [['id', 'vars', 'vals'], [3, 'age', 16], [1, 'gender', 'F'], [2, 'gender', 'M'], [2, 'age', 17], [1, 'age', 12], [3, 'gender', 'M']] table4 = etl.recast(table3, variablefield='vars', valuefield='vals') table4 # if there are multiple values for each key/variable pair, and no # reducers function is provided, then all values will be listed table6 = [['id', 'time', 'variable', 'value'], [1, 11, 'weight', 66.4], [1, 14, 'weight', 55.2], [2, 12, 'weight', 53.2], [2, 16, 'weight', 43.3], [3, 12, 'weight', 34.5], [3, 17, 'weight', 49.4]] table7 = etl.recast(table6, key='id') table7 # multiple values can be reduced via an aggregation function def mean(values): return float(sum(values)) / len(values) table8 = etl.recast(table6, key='id', reducers={'weight': mean}) table8 # missing values are padded with whatever is provided via the # missing keyword argument (None by default) table9 = [['id', 'variable', 'value'], [1, 'gender', 'F'], [2, 'age', 17], [1, 'age', 12], [3, 'gender', 'M']] table10 = etl.recast(table9, key='id') table10 # transpose() ############# import petl as etl table1 = [['id', 'colour'], [1, 'blue'], [2, 'red'], [3, 'purple'], [5, 'yellow'], [7, 'orange']] table2 = etl.transpose(table1) table2 # pivot() ######### import petl as etl table1 = [['region', 'gender', 'style', 'units'], ['east', 'boy', 'tee', 12], ['east', 'boy', 'golf', 14], ['east', 'boy', 'fancy', 7], ['east', 'girl', 'tee', 3], ['east', 'girl', 'golf', 8], ['east', 'girl', 'fancy', 18], ['west', 'boy', 'tee', 12], ['west', 'boy', 'golf', 15], ['west', 'boy', 'fancy', 8], ['west', 'girl', 'tee', 6], ['west', 'girl', 'golf', 16], ['west', 'girl', 'fancy', 1]] table2 = etl.pivot(table1, 'region', 'gender', 'units', sum) table2 table3 = etl.pivot(table1, 'region', 'style', 'units', sum) table3 table4 = etl.pivot(table1, 'gender', 'style', 'units', sum) table4 # flatten() ########### import petl as etl table1 = [['foo', 'bar', 'baz'], ['A', 1, True], ['C', 7, False], ['B', 2, False], ['C', 9, True]] list(etl.flatten(table1)) # unflatten() ############# import petl as etl a = ['A', 1, True, 'C', 7, False, 'B', 2, False, 'C', 9] table1 = etl.unflatten(a, 3) table1 # a table and field name can also be provided as arguments table2 = [['lines'], ['A'], [1], [True], ['C'], [7], [False], ['B'], [2], [False], ['C'], [9]] table3 = etl.unflatten(table2, 'lines', 3) table3 petl-1.7.15/examples/transform/selects.py000066400000000000000000000033061457414240700204300ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division # select() ########## import petl as etl table1 = [['foo', 'bar', 'baz'], ['a', 4, 9.3], ['a', 2, 88.2], ['b', 1, 23.3], ['c', 8, 42.0], ['d', 7, 100.9], ['c', 2]] # the second positional argument can be a function accepting # a row table2 = etl.select(table1, lambda rec: rec.foo == 'a' and rec.baz > 88.1) table2 # the second positional argument can also be an expression # string, which will be converted to a function using petl.expr() table3 = etl.select(table1, "{foo} == 'a' and {baz} > 88.1") table3 # the condition can also be applied to a single field table4 = etl.select(table1, 'foo', lambda v: v == 'a') table4 # selectre() ############ import petl as etl table1 = [['foo', 'bar', 'baz'], ['aa', 4, 9.3], ['aaa', 2, 88.2], ['b', 1, 23.3], ['ccc', 8, 42.0], ['bb', 7, 100.9], ['c', 2]] table2 = etl.selectre(table1, 'foo', '[ab]{2}') table2 # selectusingcontext() ###################### import petl as etl table1 = [['foo', 'bar'], ['A', 1], ['B', 4], ['C', 5], ['D', 9]] def query(prv, cur, nxt): return ((prv is not None and (cur.bar - prv.bar) < 2) or (nxt is not None and (nxt.bar - cur.bar) < 2)) table2 = etl.selectusingcontext(table1, query) table2 # facet() ######### import petl as etl table1 = [['foo', 'bar', 'baz'], ['a', 4, 9.3], ['a', 2, 88.2], ['b', 1, 23.3], ['c', 8, 42.0], ['d', 7, 100.9], ['c', 2]] foo = etl.facet(table1, 'foo') sorted(foo.keys()) foo['a'] foo['c'] petl-1.7.15/examples/transform/setops.py000066400000000000000000000035201457414240700203010ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division # complement() ############## import petl as etl a = [['foo', 'bar', 'baz'], ['A', 1, True], ['C', 7, False], ['B', 2, False], ['C', 9, True]] b = [['x', 'y', 'z'], ['B', 2, False], ['A', 9, False], ['B', 3, True], ['C', 9, True]] aminusb = etl.complement(a, b) aminusb bminusa = etl.complement(b, a) bminusa # recordcomplement() #################### import petl as etl a = [['foo', 'bar', 'baz'], ['A', 1, True], ['C', 7, False], ['B', 2, False], ['C', 9, True]] b = [['bar', 'foo', 'baz'], [2, 'B', False], [9, 'A', False], [3, 'B', True], [9, 'C', True]] aminusb = etl.recordcomplement(a, b) aminusb bminusa = etl.recordcomplement(b, a) bminusa # diff() ######## import petl as etl a = [['foo', 'bar', 'baz'], ['A', 1, True], ['C', 7, False], ['B', 2, False], ['C', 9, True]] b = [['x', 'y', 'z'], ['B', 2, False], ['A', 9, False], ['B', 3, True], ['C', 9, True]] added, subtracted = etl.diff(a, b) # rows in b not in a added # rows in a not in b subtracted # recorddiff() ############## import petl as etl a = [['foo', 'bar', 'baz'], ['A', 1, True], ['C', 7, False], ['B', 2, False], ['C', 9, True]] b = [['bar', 'foo', 'baz'], [2, 'B', False], [9, 'A', False], [3, 'B', True], [9, 'C', True]] added, subtracted = etl.recorddiff(a, b) added subtracted # intersection() ################ import petl as etl table1 = [['foo', 'bar', 'baz'], ['A', 1, True], ['C', 7, False], ['B', 2, False], ['C', 9, True]] table2 = [['x', 'y', 'z'], ['B', 2, False], ['A', 9, False], ['B', 3, True], ['C', 9, True]] table3 = etl.intersection(table1, table2) table3 petl-1.7.15/examples/transform/sorts.py000066400000000000000000000021321457414240700201340ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division # sort() ######## import petl as etl table1 = [['foo', 'bar'], ['C', 2], ['A', 9], ['A', 6], ['F', 1], ['D', 10]] table2 = etl.sort(table1, 'foo') table2 # sorting by compound key is supported table3 = etl.sort(table1, key=['foo', 'bar']) table3 # if no key is specified, the default is a lexical sort table4 = etl.sort(table1) table4 # mergesort() ############# import petl as etl table1 = [['foo', 'bar'], ['A', 9], ['C', 2], ['D', 10], ['A', 6], ['F', 1]] table2 = [['foo', 'bar'], ['B', 3], ['D', 10], ['A', 10], ['F', 4]] table3 = etl.mergesort(table1, table2, key='foo') table3.lookall() # issorted() ############ import petl as etl table1 = [['foo', 'bar', 'baz'], ['a', 1, True], ['b', 3, True], ['b', 2]] etl.issorted(table1, key='foo') etl.issorted(table1, key='bar') etl.issorted(table1, key='foo', strict=True) etl.issorted(table1, key='foo', reverse=True) petl-1.7.15/examples/transform/unpacks.py000066400000000000000000000010151457414240700204250ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division # unpack() ########## import petl as etl table1 = [['foo', 'bar'], [1, ['a', 'b']], [2, ['c', 'd']], [3, ['e', 'f']]] table2 = etl.unpack(table1, 'bar', ['baz', 'quux']) table2 # unpackdict() ############## import petl as etl table1 = [['foo', 'bar'], [1, {'baz': 'a', 'quux': 'b'}], [2, {'baz': 'c', 'quux': 'd'}], [3, {'baz': 'e', 'quux': 'f'}]] table2 = etl.unpackdict(table1, 'bar') table2 petl-1.7.15/examples/transform/validation.py000066400000000000000000000016021457414240700211150ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division # validate() ############ import petl as etl # define some validation constraints header = ('foo', 'bar', 'baz') constraints = [ dict(name='foo_int', field='foo', test=int), dict(name='bar_date', field='bar', test=etl.dateparser('%Y-%m-%d')), dict(name='baz_enum', field='baz', assertion=lambda v: v in ['Y', 'N']), dict(name='not_none', assertion=lambda row: None not in row) ] # now validate a table table = (('foo', 'bar', 'bazzz'), (1, '2000-01-01', 'Y'), ('x', '2010-10-10', 'N'), (2, '2000/01/01', 'Y'), (3, '2015-12-12', 'x'), (4, None, 'N'), ('y', '1999-99-99', 'z'), (6, '2000-01-01'), (7, '2001-02-02', 'N', True)) problems = etl.validate(table, constraints=constraints, header=header) problems.lookall() petl-1.7.15/examples/util/000077500000000000000000000000001457414240700153545ustar00rootroot00000000000000petl-1.7.15/examples/util/base.py000066400000000000000000000033771457414240700166520ustar00rootroot00000000000000from __future__ import division, print_function, absolute_import, \ unicode_literals # values() ########## import petl as etl table1 = [['foo', 'bar'], ['a', True], ['b'], ['b', True], ['c', False]] foo = etl.values(table1, 'foo') foo list(foo) bar = etl.values(table1, 'bar') bar list(bar) # values from multiple fields table2 = [['foo', 'bar', 'baz'], [1, 'a', True], [2, 'bb', True], [3, 'd', False]] foobaz = etl.values(table2, 'foo', 'baz') foobaz list(foobaz) # header() ########## import petl as etl table = [['foo', 'bar'], ['a', 1], ['b', 2]] etl.header(table) # fieldnames() ############## import petl as etl table = [['foo', 'bar'], ['a', 1], ['b', 2]] etl.fieldnames(table) etl.header(table) # data() ######## import petl as etl table = [['foo', 'bar'], ['a', 1], ['b', 2]] d = etl.data(table) list(d) # dicts() ######### import petl as etl table = [['foo', 'bar'], ['a', 1], ['b', 2]] d = etl.dicts(table) d list(d) # namedtuples() ############### import petl as etl table = [['foo', 'bar'], ['a', 1], ['b', 2]] d = etl.namedtuples(table) d list(d) # records() ############### import petl as etl table = [['foo', 'bar'], ['a', 1], ['b', 2]] d = etl.records(table) d list(d) # rowgroupby() ############## import petl as etl table1 = [['foo', 'bar', 'baz'], ['a', 1, True], ['b', 3, True], ['b', 2]] # group entire rows for key, group in etl.rowgroupby(table1, 'foo'): print(key, list(group)) # group specific values for key, group in etl.rowgroupby(table1, 'foo', 'bar'): print(key, list(group)) # empty() ######### import petl as etl table = ( etl .empty() .addcolumn('foo', ['A', 'B']) .addcolumn('bar', [1, 2]) ) table petl-1.7.15/examples/util/counting.py000066400000000000000000000052071457414240700175600ustar00rootroot00000000000000from __future__ import division, print_function, absolute_import # nrows() ######### import petl as etl table = [['foo', 'bar'], ['a', 1], ['b', 2]] etl.nrows(table) # valuecount() ############## import petl as etl table = [['foo', 'bar'], ['a', 1], ['b', 2], ['b', 7]] etl.valuecount(table, 'foo', 'b') # valuecounter() ################ import petl as etl table = [['foo', 'bar'], ['a', True], ['b'], ['b', True], ['c', False]] etl.valuecounter(table, 'foo').most_common() # valuecounts() ############### import petl as etl table = [['foo', 'bar', 'baz'], ['a', True, 0.12], ['a', True, 0.17], ['b', False, 0.34], ['b', False, 0.44], ['b']] etl.valuecounts(table, 'foo') etl.valuecounts(table, 'foo', 'bar') # parsecounter() ################ import petl as etl table = [['foo', 'bar', 'baz'], ['A', 'aaa', 2], ['B', u'2', '3.4'], [u'B', u'3', u'7.8', True], ['D', '3.7', 9.0], ['E', 42]] counter, errors = etl.parsecounter(table, 'bar') counter.most_common() errors.most_common() # parsecounts() ############### import petl as etl table = [['foo', 'bar', 'baz'], ['A', 'aaa', 2], ['B', u'2', '3.4'], [u'B', u'3', u'7.8', True], ['D', '3.7', 9.0], ['E', 42]] etl.parsecounts(table, 'bar') # typecounter() ############### import petl as etl table = [['foo', 'bar', 'baz'], ['A', 1, 2], ['B', u'2', '3.4'], [u'B', u'3', u'7.8', True], ['D', u'xyz', 9.0], ['E', 42]] etl.typecounter(table, 'foo').most_common() etl.typecounter(table, 'bar').most_common() etl.typecounter(table, 'baz').most_common() # typecounts() ############## import petl as etl table = [['foo', 'bar', 'baz'], [b'A', 1, 2], [b'B', '2', b'3.4'], ['B', '3', '7.8', True], ['D', u'xyz', 9.0], ['E', 42]] etl.typecounts(table, 'foo') etl.typecounts(table, 'bar') etl.typecounts(table, 'baz') # stringpatterns() ################## import petl as etl table = [['foo', 'bar'], ['Mr. Foo', '123-1254'], ['Mrs. Bar', '234-1123'], ['Mr. Spo', '123-1254'], [u'Mr. Baz', u'321 1434'], [u'Mrs. Baz', u'321 1434'], ['Mr. Quux', '123-1254-XX']] etl.stringpatterns(table, 'foo') etl.stringpatterns(table, 'bar') # rowlengths() ############### import petl as etl table = [['foo', 'bar', 'baz'], ['A', 1, 2], ['B', '2', '3.4'], [u'B', u'3', u'7.8', True], ['D', 'xyz', 9.0], ['E', None], ['F', 9]] etl.rowlengths(table) petl-1.7.15/examples/util/lookups.py000066400000000000000000000073461457414240700174340ustar00rootroot00000000000000from __future__ import division, print_function, absolute_import # lookup() ########## import petl as etl table1 = [['foo', 'bar'], ['a', 1], ['b', 2], ['b', 3]] lkp = etl.lookup(table1, 'foo', 'bar') lkp['a'] lkp['b'] # if no valuespec argument is given, defaults to the whole # row (as a tuple) lkp = etl.lookup(table1, 'foo') lkp['a'] lkp['b'] # compound keys are supported table2 = [['foo', 'bar', 'baz'], ['a', 1, True], ['b', 2, False], ['b', 3, True], ['b', 3, False]] lkp = etl.lookup(table2, ('foo', 'bar'), 'baz') lkp[('a', 1)] lkp[('b', 2)] lkp[('b', 3)] # data can be loaded into an existing dictionary-like # object, including persistent dictionaries created via the # shelve module import shelve lkp = shelve.open('example.dat', flag='n') lkp = etl.lookup(table1, 'foo', 'bar', lkp) lkp.close() lkp = shelve.open('example.dat', flag='r') lkp['a'] lkp['b'] # lookupone() ############# import petl as etl table1 = [['foo', 'bar'], ['a', 1], ['b', 2], ['b', 3]] # if the specified key is not unique and strict=False (default), # the first value wins lkp = etl.lookupone(table1, 'foo', 'bar') lkp['a'] lkp['b'] # if the specified key is not unique and strict=True, will raise # DuplicateKeyError try: lkp = etl.lookupone(table1, 'foo', strict=True) except etl.errors.DuplicateKeyError as e: print(e) # compound keys are supported table2 = [['foo', 'bar', 'baz'], ['a', 1, True], ['b', 2, False], ['b', 3, True], ['b', 3, False]] lkp = etl.lookupone(table2, ('foo', 'bar'), 'baz') lkp[('a', 1)] lkp[('b', 2)] lkp[('b', 3)] # data can be loaded into an existing dictionary-like # object, including persistent dictionaries created via the # shelve module import shelve lkp = shelve.open('example.dat', flag='n') lkp = etl.lookupone(table1, 'foo', 'bar', lkp) lkp.close() lkp = shelve.open('example.dat', flag='r') lkp['a'] lkp['b'] # dictlookup() ############## import petl as etl table1 = [['foo', 'bar'], ['a', 1], ['b', 2], ['b', 3]] lkp = etl.dictlookup(table1, 'foo') lkp['a'] lkp['b'] # compound keys are supported table2 = [['foo', 'bar', 'baz'], ['a', 1, True], ['b', 2, False], ['b', 3, True], ['b', 3, False]] lkp = etl.dictlookup(table2, ('foo', 'bar')) lkp[('a', 1)] lkp[('b', 2)] lkp[('b', 3)] # data can be loaded into an existing dictionary-like # object, including persistent dictionaries created via the # shelve module import shelve lkp = shelve.open('example.dat', flag='n') lkp = etl.dictlookup(table1, 'foo', lkp) lkp.close() lkp = shelve.open('example.dat', flag='r') lkp['a'] lkp['b'] # dictlookupone() ################# import petl as etl table1 = [['foo', 'bar'], ['a', 1], ['b', 2], ['b', 3]] # if the specified key is not unique and strict=False (default), # the first value wins lkp = etl.dictlookupone(table1, 'foo') lkp['a'] lkp['b'] # if the specified key is not unique and strict=True, will raise # DuplicateKeyError try: lkp = etl.dictlookupone(table1, 'foo', strict=True) except etl.errors.DuplicateKeyError as e: print(e) # compound keys are supported table2 = [['foo', 'bar', 'baz'], ['a', 1, True], ['b', 2, False], ['b', 3, True], ['b', 3, False]] lkp = etl.dictlookupone(table2, ('foo', 'bar')) lkp[('a', 1)] lkp[('b', 2)] lkp[('b', 3)] # data can be loaded into an existing dictionary-like # object, including persistent dictionaries created via the # shelve module import shelve lkp = shelve.open('example.dat', flag='n') lkp = etl.dictlookupone(table1, 'foo', lkp) lkp.close() lkp = shelve.open('example.dat', flag='r') lkp['a'] lkp['b'] petl-1.7.15/examples/util/materialise.py000066400000000000000000000006511457414240700202270ustar00rootroot00000000000000from __future__ import division, print_function, absolute_import # columns() ########### import petl as etl table = [['foo', 'bar'], ['a', 1], ['b', 2], ['b', 3]] cols = etl.columns(table) cols['foo'] cols['bar'] # facetcolumns() ################ import petl as etl table = [['foo', 'bar', 'baz'], ['a', 1, True], ['b', 2, True], ['b', 3]] fc = etl.facetcolumns(table, 'foo') fc['a'] fc['b'] petl-1.7.15/examples/util/misc.py000066400000000000000000000016711457414240700166660ustar00rootroot00000000000000from __future__ import division, print_function, absolute_import # typeset() ########### import petl as etl table = [['foo', 'bar', 'baz'], ['A', 1, '2'], ['B', u'2', '3.4'], [u'B', u'3', '7.8', True], ['D', u'xyz', 9.0], ['E', 42]] sorted(etl.typeset(table, 'foo')) sorted(etl.typeset(table, 'bar')) sorted(etl.typeset(table, 'baz')) # diffheaders() ############### import petl as etl table1 = [['foo', 'bar', 'baz'], ['a', 1, .3]] table2 = [['baz', 'bar', 'quux'], ['a', 1, .3]] add, sub = etl.diffheaders(table1, table2) add sub # diffvalues() ############## import petl as etl table1 = [['foo', 'bar'], ['a', 1], ['b', 3]] table2 = [['bar', 'foo'], [1, 'a'], [3, 'c']] add, sub = etl.diffvalues(table1, table2, 'foo') add sub # nthword() ########### import petl as etl s = 'foo bar' f = etl.nthword(0) f(s) g = etl.nthword(1) g(s) petl-1.7.15/examples/util/parsers.py000066400000000000000000000020471457414240700174100ustar00rootroot00000000000000from __future__ import division, print_function, absolute_import # datetimeparser() ################## from petl import datetimeparser isodatetime = datetimeparser('%Y-%m-%dT%H:%M:%S') isodatetime('2002-12-25T00:00:00') try: isodatetime('2002-12-25T00:00:99') except ValueError as e: print(e) # dateparser() ############## from petl import dateparser isodate = dateparser('%Y-%m-%d') isodate('2002-12-25') try: isodate('2002-02-30') except ValueError as e: print(e) # timeparser() ############## from petl import timeparser isotime = timeparser('%H:%M:%S') isotime('00:00:00') isotime('13:00:00') try: isotime('12:00:99') except ValueError as e: print(e) try: isotime('25:00:00') except ValueError as e: print(e) # boolparser() ############## from petl import boolparser mybool = boolparser(true_strings=['yes', 'y'], false_strings=['no', 'n']) mybool('y') mybool('yes') mybool('Y') mybool('No') try: mybool('foo') except ValueError as e: print(e) try: mybool('True') except ValueError as e: print(e) petl-1.7.15/examples/util/random.py000066400000000000000000000011051457414240700172030ustar00rootroot00000000000000from __future__ import division, print_function, absolute_import # randomtable() ############### import petl as etl table = etl.randomtable(3, 100, seed=42) table # dummytable() ############## import petl as etl table1 = etl.dummytable(100, seed=42) table1 # customise fields import random from functools import partial fields = [('foo', random.random), ('bar', partial(random.randint, 0, 500)), ('baz', partial(random.choice, ['chocolate', 'strawberry', 'vanilla']))] table2 = etl.dummytable(100, fields=fields, seed=42) table2 petl-1.7.15/examples/util/statistics.py000066400000000000000000000007011457414240700201160ustar00rootroot00000000000000from __future__ import division, print_function, absolute_import # limits() ########## import petl as etl table = [['foo', 'bar'], ['a', 1], ['b', 2], ['b', 3]] minv, maxv = etl.limits(table, 'bar') minv maxv # stats() ######### import petl as etl table = [['foo', 'bar', 'baz'], ['A', 1, 2], ['B', '2', '3.4'], [u'B', u'3', u'7.8', True], ['D', 'xyz', 9.0], ['E', None]] etl.stats(table, 'bar') petl-1.7.15/examples/util/timing.py000066400000000000000000000010521457414240700172130ustar00rootroot00000000000000from __future__ import division, print_function, absolute_import # progress() ############ import petl as etl table = etl.dummytable(100000) table.progress(10000).tocsv('example.csv') # clock() ######### import petl as etl t1 = etl.dummytable(100000) c1 = etl.clock(t1) t2 = etl.convert(c1, 'foo', lambda v: v**2) c2 = etl.clock(t2) p = etl.progress(c2, 10000) etl.tocsv(p, 'example.csv') # time consumed retrieving rows from t1 c1.time # time consumed retrieving rows from t2 c2.time # actual time consumed by the convert step c2.time - c1.time petl-1.7.15/examples/util/vis.py000066400000000000000000000010621457414240700165260ustar00rootroot00000000000000from __future__ import division, print_function, absolute_import # look() ######## import petl as etl table1 = [['foo', 'bar'], ['a', 1], ['b', 2]] etl.look(table1) # alternative formatting styles etl.look(table1, style='simple') etl.look(table1, style='minimal') # any irregularities in the length of header and/or data # rows will appear as blank cells table2 = [['foo', 'bar'], ['a'], ['b', 2, True]] etl.look(table2) # see() ####### import petl as etl table = [['foo', 'bar'], ['a', 1], ['b', 2]] etl.see(table) petl-1.7.15/petl/000077500000000000000000000000001457414240700135255ustar00rootroot00000000000000petl-1.7.15/petl/__init__.py000066400000000000000000000006201457414240700156340ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division from petl.version import version as __version__ from petl import comparison from petl.comparison import Comparable from petl import util from petl.util import * from petl import io from petl.io import * from petl import transform from petl.transform import * from petl import config from petl import errors from petl.errors import * petl-1.7.15/petl/comparison.py000066400000000000000000000072571457414240700162640ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division import operator from functools import partial from petl.compat import text_type, binary_type, numeric_types class Comparable(object): """Wrapper class to allow for flexible comparison of objects of different types, preserving the relaxed sorting behaviour of Python 2 with additional flexibility to allow for comparison of arbitrary objects with the `None` value (for example, the date and time objects from the standard library cannot be directly compared with `None` in Python 2). """ __slots__ = ['obj', 'inner'] def __init__(self, obj): # store wrapped object unchanged self.inner = obj # handle lists and tuples if isinstance(obj, (list, tuple)): obj = tuple(Comparable(o) for o in obj) self.obj = obj def __lt__(self, other): # convenience obj = self.obj if isinstance(other, Comparable): other = other.obj # None < everything else if other is None: return False if obj is None: return True # numbers < everything else (except None) if isinstance(obj, numeric_types) \ and not isinstance(other, numeric_types): return True if not isinstance(obj, numeric_types) \ and isinstance(other, numeric_types): return False # binary < unicode if isinstance(obj, text_type) and isinstance(other, binary_type): return False if isinstance(obj, binary_type) and isinstance(other, text_type): return True try: # attempt native comparison return obj < other except TypeError: # fall back to comparing type names return _typestr(obj) < _typestr(other) def __eq__(self, other): if isinstance(other, Comparable): return self.obj == other.obj return self.obj == other def __le__(self, other): return self < other or self == other def __gt__(self, other): return not (self < other or self == other) def __ge__(self, other): return not (self < other) def __str__(self): return str(self.obj) def __unicode__(self): return text_type(self.obj) def __repr__(self): return 'Comparable(' + repr(self.obj) + ')' def __iter__(self, *args, **kwargs): return iter(self.obj, *args, **kwargs) def __len__(self): return len(self.obj) def __getitem__(self, item): return self.obj.__getitem__(item) def _typestr(x): # attempt to preserve Python 2 name orderings if isinstance(x, binary_type): return 'str' if isinstance(x, text_type): return 'unicode' return type(x).__name__ def comparable_itemgetter(*args): getter = operator.itemgetter(*args) getter_with_default = _itemgetter_with_default(*args) def _getter_with_fallback(obj): try: return getter(obj) except (IndexError, KeyError): return getter_with_default(obj) g = lambda x: Comparable(_getter_with_fallback(x)) return g def _itemgetter_with_default(*args): """ itemgetter compatible with `operator.itemgetter` behavior, filling missing values with default instead of raising IndexError or KeyError """ def _get_default(obj, item, default): try: return obj[item] except (IndexError, KeyError): return default if len(args) == 1: return partial(_get_default, item=args[0], default=None) return lambda obj: tuple(_get_default(obj, item=item, default=None) for item in args) petl-1.7.15/petl/compat.py000066400000000000000000000033021457414240700153600ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division import sys ########################## # Python 3 compatibility # ########################## PY2 = sys.version_info.major == 2 PY3 = sys.version_info.major == 3 if PY2: from itertools import ifilter, ifilterfalse, imap, izip, izip_longest from string import maketrans from decimal import Decimal string_types = basestring, integer_types = int, long numeric_types = bool, int, long, float, Decimal text_type = unicode binary_type = str from urllib2 import urlopen try: from cStringIO import StringIO except ImportError: from StringIO import StringIO BytesIO = StringIO try: import cPickle as pickle except ImportError: import pickle maxint = sys.maxint long = long xrange = xrange reduce = reduce else: ifilter = filter imap = map izip = zip xrange = range from decimal import Decimal from itertools import filterfalse as ifilterfalse from itertools import zip_longest as izip_longest from functools import reduce maketrans = str.maketrans string_types = str, integer_types = int, numeric_types = bool, int, float, Decimal class_types = type, text_type = str binary_type = bytes long = int from urllib.request import urlopen from io import StringIO, BytesIO import pickle maxint = sys.maxsize try: advance_iterator = next except NameError: def advance_iterator(it): return it.next() next = advance_iterator try: callable = callable except NameError: def callable(obj): return any("__call__" in klass.__dict__ for klass in type(obj).__mro__) petl-1.7.15/petl/config.py000066400000000000000000000013731457414240700153500ustar00rootroot00000000000000from __future__ import division, print_function, absolute_import from petl.compat import text_type look_style = 'grid' # alternatives: 'simple', 'minimal' look_limit = 5 look_index_header = False look_vrepr = repr look_width = None see_limit = 5 see_index_header = False see_vrepr = repr display_limit = 5 display_index_header = False display_vrepr = text_type sort_buffersize = 100000 failonerror=False # False, True, 'inline' """ Controls what happens when unhandled exceptions are raised in a transformation: - If `False`, exceptions are suppressed. If present, the value provided in the `errorvalue` argument is returned. - If `True`, the first unhandled exception is raised. - If `'inline'`, unhandled exceptions are returned. """ petl-1.7.15/petl/errors.py000066400000000000000000000011331457414240700154110ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division class DuplicateKeyError(Exception): def __init__(self, key): self.key = key def __str__(self): return 'duplicate key: %r' % self.key class FieldSelectionError(Exception): def __init__(self, value): self.value = value def __str__(self): return 'selection is not a field or valid field index: %r' % self.value class ArgumentError(Exception): def __init__(self, message): self.message = message def __str__(self): return 'argument error: %s' % self.message petl-1.7.15/petl/io/000077500000000000000000000000001457414240700141345ustar00rootroot00000000000000petl-1.7.15/petl/io/__init__.py000066400000000000000000000026361457414240700162540ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division from petl.io.base import fromcolumns from petl.io.sources import FileSource, GzipSource, BZ2Source, ZipSource, \ StdinSource, StdoutSource, URLSource, StringSource, PopenSource, \ MemorySource from petl.io.csv import fromcsv, fromtsv, tocsv, appendcsv, totsv, appendtsv, \ teecsv, teetsv from petl.io.pickle import frompickle, topickle, appendpickle, teepickle from petl.io.text import fromtext, totext, appendtext, teetext from petl.io.xml import fromxml, toxml from petl.io.html import tohtml, teehtml from petl.io.json import fromjson, tojson, tojsonarrays, fromdicts from petl.io.db import fromdb, todb, appenddb from petl.io.xls import fromxls, toxls from petl.io.xlsx import fromxlsx, toxlsx, appendxlsx from petl.io.numpy import fromarray, toarray, torecarray from petl.io.pandas import fromdataframe, todataframe from petl.io.pytables import fromhdf5, fromhdf5sorted, tohdf5, appendhdf5 from petl.io.whoosh import fromtextindex, searchtextindex, \ searchtextindexpage, totextindex, appendtextindex from petl.io.bcolz import frombcolz, tobcolz, appendbcolz from petl.io.avro import fromavro, toavro, appendavro from petl.io.sources import register_codec, register_reader, register_writer from petl.io.remotes import RemoteSource from petl.io.remotes import SMBSource from petl.io.gsheet import fromgsheet, togsheet, appendgsheet petl-1.7.15/petl/io/avro.py000066400000000000000000000474401457414240700154660ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, division, print_function import sys import math from collections import OrderedDict from datetime import datetime, date, time from decimal import Decimal from petl.compat import izip_longest, text_type, string_types, PY3 from petl.io.sources import read_source_from_arg, write_source_from_arg from petl.transform.headers import skip, setheader from petl.util.base import Table, dicts, fieldnames, iterpeek, wrap # region API def fromavro(source, limit=None, skips=0, **avro_args): """Extract a table from the records of a avro file. The `source` argument (string or file-like or fastavro.reader) can either be the path of the file, a file-like input stream or a instance from fastavro.reader. The `limit` and `skip` arguments can be used to limit the range of rows to extract. The `sample` argument (int, optional) defines how many rows are inspected for discovering the field types and building a schema for the avro file when the `schema` argument is not passed. The rows fields read from file can have scalar values like int, string, float, datetime, date and decimal but can also have compound types like enum, :ref:`array `, map, union and record. The fields types can also have recursive structures defined in :ref:`complex schemas `. Also types with :ref:`logical types ` types are read and translated to coresponding python types: long timestamp-millis and long timestamp-micros: datetime.datetime, int date: datetime.date, bytes decimal and fixed decimal: Decimal, int time-millis and long time-micros: datetime.time. Example usage for reading files:: >>> # set up a Avro file to demonstrate with ... >>> schema1 = { ... 'doc': 'Some people records.', ... 'name': 'People', ... 'namespace': 'test', ... 'type': 'record', ... 'fields': [ ... {'name': 'name', 'type': 'string'}, ... {'name': 'friends', 'type': 'int'}, ... {'name': 'age', 'type': 'int'}, ... ] ... } ... >>> records1 = [ ... {'name': 'Bob', 'friends': 42, 'age': 33}, ... {'name': 'Jim', 'friends': 13, 'age': 69}, ... {'name': 'Joe', 'friends': 86, 'age': 17}, ... {'name': 'Ted', 'friends': 23, 'age': 51} ... ] ... >>> import fastavro >>> parsed_schema1 = fastavro.parse_schema(schema1) >>> with open('example.file1.avro', 'wb') as f1: ... fastavro.writer(f1, parsed_schema1, records1) ... >>> # now demonstrate the use of fromavro() >>> import petl as etl >>> tbl1 = etl.fromavro('example.file1.avro') >>> tbl1 +-------+---------+-----+ | name | friends | age | +=======+=========+=====+ | 'Bob' | 42 | 33 | +-------+---------+-----+ | 'Jim' | 13 | 69 | +-------+---------+-----+ | 'Joe' | 86 | 17 | +-------+---------+-----+ | 'Ted' | 23 | 51 | +-------+---------+-----+ .. versionadded:: 1.4.0 """ source2 = read_source_from_arg(source) return AvroView(source=source2, limit=limit, skips=skips, **avro_args) def toavro(table, target, schema=None, sample=9, codec='deflate', compression_level=None, **avro_args): """ Write the table into a new avro file according to schema passed. This method assume that each column has values with the same type for all rows of the source `table`. `Apache Avro`_ is a data serialization framework. It is used in data serialization (especially in Hadoop ecosystem), for dataexchange for databases (Redshift) and RPC protocols (like in Kafka). It has libraries to support many languages and generally is faster and safer than text formats like Json, XML or CSV. The `target` argument is the file path for creating the avro file. Note that if a file already exists at the given location, it will be overwritten. The `schema` argument (dict) defines the rows field structure of the file. Check fastavro `documentation`_ and Avro schema `reference`_ for details. The `sample` argument (int, optional) defines how many rows are inspected for discovering the field types and building a schema for the avro file when the `schema` argument is not passed. The `codec` argument (string, optional) sets the compression codec used to shrink data in the file. It can be 'null', 'deflate' (default), 'bzip2' or 'snappy', 'zstandard', 'lz4', 'xz' (if installed) The `compression_level` argument (int, optional) sets the level of compression to use with the specified codec (if the codec supports it) Additionally there are support for passing extra options in the argument `**avro_args` that are fowarded directly to fastavro. Check the fastavro `documentation`_ for reference. The avro file format preserves type information, i.e., reading and writing is round-trippable for tables with non-string data values. However the conversion from Python value types to avro fields is not perfect. Use the `schema` argument to define proper type to the conversion. The following avro types are supported by the schema: null, boolean, string, int, long, float, double, bytes, fixed, enum, :ref:`array `, map, union, record, and recursive types defined in :ref:`complex schemas `. Also :ref:`logical types ` are supported and translated to coresponding python types: long timestamp-millis, long timestamp-micros, int date, bytes decimal, fixed decimal, string uuid, int time-millis, long time-micros. Example usage for writing files:: >>> # set up a Avro file to demonstrate with >>> table2 = [['name', 'friends', 'age'], ... ['Bob', 42, 33], ... ['Jim', 13, 69], ... ['Joe', 86, 17], ... ['Ted', 23, 51]] ... >>> schema2 = { ... 'doc': 'Some people records.', ... 'name': 'People', ... 'namespace': 'test', ... 'type': 'record', ... 'fields': [ ... {'name': 'name', 'type': 'string'}, ... {'name': 'friends', 'type': 'int'}, ... {'name': 'age', 'type': 'int'}, ... ] ... } ... >>> # now demonstrate what writing with toavro() >>> import petl as etl >>> etl.toavro(table2, 'example.file2.avro', schema=schema2) ... >>> # this was what was saved above >>> tbl2 = etl.fromavro('example.file2.avro') >>> tbl2 +-------+---------+-----+ | name | friends | age | +=======+=========+=====+ | 'Bob' | 42 | 33 | +-------+---------+-----+ | 'Jim' | 13 | 69 | +-------+---------+-----+ | 'Joe' | 86 | 17 | +-------+---------+-----+ | 'Ted' | 23 | 51 | +-------+---------+-----+ .. versionadded:: 1.4.0 .. _Apache Avro: https://avro.apache.org/docs/current/spec.html .. _reference: https://avro.apache.org/docs/current/spec.html#schemas .. _documentation : https://fastavro.readthedocs.io/en/latest/writer.html """ _write_toavro(table, target=target, mode='wb', schema=schema, sample=sample, codec=codec, compression_level=compression_level, **avro_args) def appendavro(table, target, schema=None, sample=9, **avro_args): """ Append rows into a avro existing avro file or create a new one. The `target` argument can be either an existing avro file or the file path for creating new one. The `schema` argument is checked against the schema of the existing file. So it must be the same schema as used by `toavro()` or the schema of the existing file. The `sample` argument (int, optional) defines how many rows are inspected for discovering the field types and building a schema for the avro file when the `schema` argument is not passed. Additionally there are support for passing extra options in the argument `**avro_args` that are fowarded directly to fastavro. Check the fastavro documentation for reference. See :meth:`petl.io.avro.toavro` method for more information and examples. .. versionadded:: 1.4.0 """ _write_toavro(table, target=target, mode='a+b', schema=schema, sample=sample, **avro_args) # endregion API # region Implementation class AvroView(Table): '''Read rows from avro file with their types and logical types''' def __init__(self, source, limit, skips, **avro_args): self.source = source self.limit = limit self.skip = skips self.avro_args = avro_args self.avro_schema = None def get_avro_schema(self): '''gets the schema stored in avro file header''' return self.avro_schema def __iter__(self): with self.source.open('rb') as source_file: avro_reader = self._open_reader(source_file) header = self._decode_schema(avro_reader) yield header for row in self._read_rows_from(avro_reader, header): yield row def _open_reader(self, source_file): '''This could raise a error when the file is corrupt or is not avro''' # delay the import of fastavro for not breaking when unused import fastavro avro_reader = fastavro.reader(source_file, **self.avro_args) return avro_reader def _decode_schema(self, avro_reader): '''extract the header from schema stored in avro file header''' self.avro_schema = avro_reader.writer_schema if self.avro_schema is None: return None, None schema_fields = self.avro_schema['fields'] header = tuple(col['name'] for col in schema_fields) return header def _read_rows_from(self, avro_reader, header): count = 0 maximum = self.limit if self.limit is not None else sys.maxsize for i, record in enumerate(avro_reader): if i < self.skip: continue if count >= maximum: break count += 1 row = self._map_row_from(header, record) yield row def _map_row_from(self, header, record): ''' fastavro auto converts logical types defined in avro schema to correspoding python types. E.g: - avro type: long logicalType: timestamp-millis -> python datetime - avro type: int logicalType: date -> python date - avro type: bytes logicalType: decimal -> python Decimal ''' if header is None or PY3: r = tuple(record.values()) else: # fastavro on python2 does not respect dict order r = tuple(record.get(col) for col in header) return r def _write_toavro(table, target, mode, schema, sample, codec='deflate', compression_level=None, **avro_args): if table is None: return # build a schema when not defined by user if not schema: schema, table2 = _build_schema_from_values(table, sample) else: table2 = _fix_missing_headers(table, schema) # fastavro expects a iterator of dicts rows = dicts(table2) if PY3 else _ordered_dict_iterator(table2) target2 = write_source_from_arg(target, mode=mode) with target2.open(mode) as target_file: # delay the import of fastavro for not breaking when unused from fastavro import parse_schema from fastavro.write import Writer parsed_schema = parse_schema(schema) writer = Writer(fo=target_file, schema=parsed_schema, codec=codec, compression_level=compression_level, **avro_args) num = 1 for record in rows: try: writer.write(record) num = num + 1 except ValueError as verr: vmsg = _get_error_details(target, num, verr, record, schema) _raise_error(ValueError, vmsg) except TypeError as terr: tmsg = _get_error_details(target, num, terr, record, schema) _raise_error(TypeError, tmsg) # finish writing writer.flush() # endregion Implementation # region Helpers def _build_schema_from_values(table, sample): # table2: try not advance iterators samples, table2 = iterpeek(table, sample + 1) props = fieldnames(samples) peek = skip(samples, 1) schema_fields = _build_schema_fields_from_values(peek, props) schema_source = _build_schema_with(schema_fields) return schema_source, table2 def _build_schema_with(schema_fields): schema = { 'type': 'record', 'name': 'output', 'namespace': 'avro', 'doc': 'generated by petl', 'fields': schema_fields, } return schema def _build_schema_fields_from_values(peek, props): # store the previous for calculate max precision and max scale previous = OrderedDict() # set a default when value is None in the first row but allow override after fill_missing = True fields = OrderedDict() # iterate on sample rows for dealing with columns with None values for row in peek: _update_field_defs_from(props, row, fields, previous, fill_missing) fill_missing = False schema_fields = [item for item in fields.values()] return schema_fields def _update_field_defs_from(props, row, fields, previous, fill_missing): for prop, val in izip_longest(props, row): if prop is None: break dprev = previous.get(prop + '_prec') fprev = previous.get(prop + '_prop') fcurr = None if isinstance(val, dict): # get the fields from a recursive definition of record inside this field tdef, dcurr, fcurr = _get_definition_from_record(prop, val, fprev, dprev, fill_missing) else: # get the field definition for building the schema tdef, dcurr = _get_definition_from_type_of(prop, val, dprev) if tdef is not None: fields[prop] = {'name': prop, 'type': ['null', tdef]} elif fill_missing: fields[prop] = {'name': prop, 'type': ['null', 'string']} if dcurr is not None: previous[prop + '_prec'] = dcurr if fcurr is not None: previous[prop + '_prop'] = fcurr def _get_definition_from_type_of(prop, val, prev): # TODO: get type for enum, map and other python types tdef = None curr = None if isinstance(val, datetime): tdef = {'type': 'long', 'logicalType': 'timestamp-millis'} elif isinstance(val, time): tdef = {'type': 'int', 'logicalType': 'time-millis'} elif isinstance(val, date): tdef = {'type': 'int', 'logicalType': 'date'} elif isinstance(val, Decimal): curr, precision, scale = _get_precision_from_decimal(curr, val, prev) tdef = {'type': 'bytes', 'logicalType': 'decimal', 'precision': precision, 'scale': scale, } elif isinstance(val, bytes): tdef = 'bytes' elif isinstance(val, list): tdef, curr = _get_definition_from_array(prop, val, prev) elif isinstance(val, bool): tdef = 'boolean' elif isinstance(val, float): tdef = 'double' elif isinstance(val, int): tdef = 'long' elif val is not None: tdef = 'string' else: return None, None return tdef, curr def _get_definition_from_array(prop, val, prev): afield = None for item in iter(val): if item is None: continue field2, curr2 = _get_definition_from_type_of(prop, item, prev) if field2 is not None: afield = field2 if curr2 is not None: prev = curr2 bfield = 'string' if afield is None else afield tdef = {'type': 'array', 'items': bfield} return tdef, prev def _get_definition_from_record(prop, val, fprev, dprev, fill_missing): if fprev is None: fprev = OrderedDict() if dprev is None: dprev = OrderedDict() props = list(val.keys()) row = list(val.values()) _update_field_defs_from(props, row, fprev, dprev, fill_missing) schema_fields = [item for item in fprev.values()] tdef = { 'type': 'record', 'name': prop + '_record', 'namespace': 'avro', 'fields': schema_fields, } return tdef, dprev, fprev def _get_precision_from_decimal(curr, val, prev): if val is None: prec = scale = 0 else: prec, scale, _, _ = precision_and_scale(val) if prev is not None: # get the greatests precision and scale of the sample prec0, scale0 = prev.get('precision'), prev.get('scale') prec, scale = max(prec, prec0), max(scale, scale0) prec = max(prec, 8) curr = {'precision': prec, 'scale': scale, } return curr, prec, scale def precision_and_scale(numeric_value): sign, digits, exp = numeric_value.as_tuple() number = 0 for digit in digits: number = (number * 10) + digit # delta = exp + scale delta = 1 number = 10 ** delta * number inumber = int(number) bits_req = inumber.bit_length() + 1 bytes_req = (bits_req + 8) // 8 if sign: inumber = - inumber prec = int(math.ceil(math.log10(abs(inumber)))) scale = abs(exp) return prec, scale, bytes_req, inumber def _fix_missing_headers(table, schema): '''add missing columns headers from schema''' if schema is None or 'fields' not in schema: return table # table2: try not advance iterators sample, table2 = iterpeek(table, 2) cols = fieldnames(sample) headers = _get_schema_header_names(schema) if len(cols) >= len(headers): return table2 table3 = setheader(table2, headers) return table3 def _get_error_details(target, num, err, record, schema): '''show last row when failed writing for throubleshooting''' headers = _get_schema_header_names(schema) if isinstance(record, dict): table = [headers, list(record.values())] else: table = [headers, record] example = wrap(table).look() dest = " output: %s" % target if isinstance(target, string_types) else '' printed = "failed writing on row #%d: %s\n%s\n schema: %s\n%s" details = printed % (num, err, dest, schema, example) return details def _get_schema_header_names(schema): fields = schema.get('fields') if fields is None: return [] header = [field.get('name') for field in fields] return header def _raise_error(ErrorType, new_message): """Works like raise Excetion(msg) from prev_exp in python3.""" exinf = sys.exc_info() tracebk = exinf[2] try: if PY3: raise ErrorType(new_message).with_traceback(tracebk) # Python2 compatibility workaround exec('raise ErrorType, new_message, tracebk') finally: exinf = None tracebk = None # noqa: F841 def _ordered_dict_iterator(table): it = iter(table) hdr = next(it) flds = [text_type(f) for f in hdr] for row in it: items = list() for i, f in enumerate(flds): try: v = row[i] except IndexError: v = None items.append((f, v)) yield OrderedDict(items) Table.toavro = toavro Table.appendavro = appendavro # endregion petl-1.7.15/petl/io/base.py000066400000000000000000000035151457414240700154240ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import division, print_function, absolute_import import locale import codecs from petl.compat import izip_longest from petl.util.base import Table def getcodec(encoding): if encoding is None: encoding = locale.getpreferredencoding() codec = codecs.lookup(encoding) return codec def fromcolumns(cols, header=None, missing=None): """View a sequence of columns as a table, e.g.:: >>> import petl as etl >>> cols = [[0, 1, 2], ['a', 'b', 'c']] >>> tbl = etl.fromcolumns(cols) >>> tbl +----+-----+ | f0 | f1 | +====+=====+ | 0 | 'a' | +----+-----+ | 1 | 'b' | +----+-----+ | 2 | 'c' | +----+-----+ If columns are not the same length, values will be padded to the length of the longest column with `missing`, which is None by default, e.g.:: >>> cols = [[0, 1, 2], ['a', 'b']] >>> tbl = etl.fromcolumns(cols, missing='NA') >>> tbl +----+------+ | f0 | f1 | +====+======+ | 0 | 'a' | +----+------+ | 1 | 'b' | +----+------+ | 2 | 'NA' | +----+------+ See also :func:`petl.io.json.fromdicts`. .. versionadded:: 1.1.0 """ return ColumnsView(cols, header=header, missing=missing) class ColumnsView(Table): def __init__(self, cols, header=None, missing=None): self.cols = cols self.header = header self.missing = missing def __iter__(self): return itercolumns(self.cols, self.header, self.missing) def itercolumns(cols, header, missing): if header is None: header = ['f%s' % i for i in range(len(cols))] yield tuple(header) for row in izip_longest(*cols, **dict(fillvalue=missing)): yield row petl-1.7.15/petl/io/bcolz.py000066400000000000000000000137001457414240700156200ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division import itertools from petl.compat import string_types, text_type from petl.util.base import Table, iterpeek from petl.io.numpy import construct_dtype def frombcolz(source, expression=None, outcols=None, limit=None, skip=0): """Extract a table from a bcolz ctable, e.g.:: >>> import petl as etl >>> >>> def example_from_bcolz(): ... import bcolz ... cols = [ ... ['apples', 'oranges', 'pears'], ... [1, 3, 7], ... [2.5, 4.4, .1] ... ] ... names = ('foo', 'bar', 'baz') ... ctbl = bcolz.ctable(cols, names=names) ... return etl.frombcolz(ctbl) >>> >>> example_from_bcolz() # doctest: +SKIP +-----------+-----+-----+ | foo | bar | baz | +===========+=====+=====+ | 'apples' | 1 | 2.5 | +-----------+-----+-----+ | 'oranges' | 3 | 4.4 | +-----------+-----+-----+ | 'pears' | 7 | 0.1 | +-----------+-----+-----+ If `expression` is provided it will be executed by bcolz and only matching rows returned, e.g.:: >>> tbl2 = etl.frombcolz(ctbl, expression='bar > 1') # doctest: +SKIP >>> tbl2 # doctest: +SKIP +-----------+-----+-----+ | foo | bar | baz | +===========+=====+=====+ | 'oranges' | 3 | 4.4 | +-----------+-----+-----+ | 'pears' | 7 | 0.1 | +-----------+-----+-----+ .. versionadded:: 1.1.0 """ return BcolzView(source, expression=expression, outcols=outcols, limit=limit, skip=skip) class BcolzView(Table): def __init__(self, source, expression=None, outcols=None, limit=None, skip=0): self.source = source self.expression = expression self.outcols = outcols self.limit = limit self.skip = skip def __iter__(self): # obtain ctable if isinstance(self.source, string_types): import bcolz ctbl = bcolz.open(self.source, mode='r') else: # assume bcolz ctable ctbl = self.source # obtain header if self.outcols is None: header = tuple(ctbl.names) else: header = tuple(self.outcols) assert all(h in ctbl.names for h in header), 'invalid outcols' yield header # obtain iterator if self.expression is None: it = ctbl.iter(outcols=self.outcols, skip=self.skip, limit=self.limit) else: it = ctbl.where(self.expression, outcols=self.outcols, skip=self.skip, limit=self.limit) for row in it: yield row def tobcolz(table, dtype=None, sample=1000, **kwargs): """Load data into a bcolz ctable, e.g.:: >>> import petl as etl >>> >>> def example_to_bcolz(): ... table = [('foo', 'bar', 'baz'), ... ('apples', 1, 2.5), ... ('oranges', 3, 4.4), ... ('pears', 7, .1)] ... return etl.tobcolz(table) >>> >>> ctbl = example_to_bcolz() # doctest: +SKIP >>> ctbl # doctest: +SKIP ctable((3,), [('foo', '>> ctbl.names # doctest: +SKIP ['foo', 'bar', 'baz'] >>> ctbl['foo'] # doctest: +SKIP carray((3,), >> import petl as etl >>> import csv >>> # set up a CSV file to demonstrate with ... table1 = [['foo', 'bar'], ... ['a', 1], ... ['b', 2], ... ['c', 2]] >>> with open('example.csv', 'w') as f: ... writer = csv.writer(f) ... writer.writerows(table1) ... >>> # now demonstrate the use of fromcsv() ... table2 = etl.fromcsv('example.csv') >>> table2 +-----+-----+ | foo | bar | +=====+=====+ | 'a' | '1' | +-----+-----+ | 'b' | '2' | +-----+-----+ | 'c' | '2' | +-----+-----+ The `source` argument is the path of the delimited file, all other keyword arguments are passed to :func:`csv.reader`. So, e.g., to override the delimiter from the default CSV dialect, provide the `delimiter` keyword argument. Note that all data values are strings, and any intended numeric values will need to be converted, see also :func:`petl.transform.conversions.convert`. """ source = read_source_from_arg(source) csvargs.setdefault('dialect', 'excel') return fromcsv_impl(source=source, encoding=encoding, errors=errors, header=header, **csvargs) def fromtsv(source=None, encoding=None, errors='strict', header=None, **csvargs): """ Convenience function, as :func:`petl.io.csv.fromcsv` but with different default dialect (tab delimited). """ csvargs.setdefault('dialect', 'excel-tab') return fromcsv(source, encoding=encoding, errors=errors, header=header, **csvargs) def tocsv(table, source=None, encoding=None, errors='strict', write_header=True, **csvargs): """ Write the table to a CSV file. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar'], ... ['a', 1], ... ['b', 2], ... ['c', 2]] >>> etl.tocsv(table1, 'example.csv') >>> # look what it did ... print(open('example.csv').read()) foo,bar a,1 b,2 c,2 The `source` argument is the path of the delimited file, and the optional `write_header` argument specifies whether to include the field names in the delimited file. All other keyword arguments are passed to :func:`csv.writer`. So, e.g., to override the delimiter from the default CSV dialect, provide the `delimiter` keyword argument. Note that if a file already exists at the given location, it will be overwritten. """ source = write_source_from_arg(source) csvargs.setdefault('dialect', 'excel') tocsv_impl(table, source=source, encoding=encoding, errors=errors, write_header=write_header, **csvargs) Table.tocsv = tocsv def appendcsv(table, source=None, encoding=None, errors='strict', write_header=False, **csvargs): """ Append data rows to an existing CSV file. As :func:`petl.io.csv.tocsv` but the file is opened in append mode and the table header is not written by default. Note that no attempt is made to check that the fields or row lengths are consistent with the existing data, the data rows from the table are simply appended to the file. """ source = write_source_from_arg(source, mode='ab') csvargs.setdefault('dialect', 'excel') appendcsv_impl(table, source=source, encoding=encoding, errors=errors, write_header=write_header, **csvargs) Table.appendcsv = appendcsv def totsv(table, source=None, encoding=None, errors='strict', write_header=True, **csvargs): """ Convenience function, as :func:`petl.io.csv.tocsv` but with different default dialect (tab delimited). """ csvargs.setdefault('dialect', 'excel-tab') return tocsv(table, source=source, encoding=encoding, errors=errors, write_header=write_header, **csvargs) Table.totsv = totsv def appendtsv(table, source=None, encoding=None, errors='strict', write_header=False, **csvargs): """ Convenience function, as :func:`petl.io.csv.appendcsv` but with different default dialect (tab delimited). """ csvargs.setdefault('dialect', 'excel-tab') return appendcsv(table, source=source, encoding=encoding, errors=errors, write_header=write_header, **csvargs) Table.appendtsv = appendtsv def teecsv(table, source=None, encoding=None, errors='strict', write_header=True, **csvargs): """ Returns a table that writes rows to a CSV file as they are iterated over. """ source = write_source_from_arg(source) csvargs.setdefault('dialect', 'excel') return teecsv_impl(table, source=source, encoding=encoding, errors=errors, write_header=write_header, **csvargs) Table.teecsv = teecsv def teetsv(table, source=None, encoding=None, errors='strict', write_header=True, **csvargs): """ Convenience function, as :func:`petl.io.csv.teecsv` but with different default dialect (tab delimited). """ csvargs.setdefault('dialect', 'excel-tab') return teecsv(table, source=source, encoding=encoding, errors=errors, write_header=write_header, **csvargs) Table.teetsv = teetsv petl-1.7.15/petl/io/csv_py2.py000066400000000000000000000123141457414240700160740ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division # standard library dependencies import csv import cStringIO # internal dependencies from petl.util.base import Table, data from petl.io.base import getcodec def fromcsv_impl(source, **kwargs): return CSVView(source, **kwargs) class CSVView(Table): def __init__(self, source=None, encoding=None, errors='strict', header=None, **csvargs): self.source = source self.encoding = encoding self.errors = errors self.csvargs = csvargs self.header = header def __iter__(self): if self.header is not None: yield tuple(self.header) # determine encoding codec = getcodec(self.encoding) # ascii if codec.name == 'ascii': # bypass encoding with self.source.open('rU') as csvfile: reader = csv.reader(csvfile, **self.csvargs) for row in reader: yield tuple(row) # non-ascii else: with self.source.open('rb') as buf: reader = UnicodeReader(buf, encoding=self.encoding, errors=self.errors, **self.csvargs) for row in reader: yield tuple(row) def tocsv_impl(table, source, **kwargs): _writecsv(table, source=source, mode='wb', **kwargs) def appendcsv_impl(table, source, **kwargs): _writecsv(table, source=source, mode='ab', **kwargs) def _writecsv(table, source, mode, write_header, encoding, errors, **csvargs): rows = table if write_header else data(table) with source.open(mode) as buf: # determine encoding codec = getcodec(encoding) # ascii if codec.name == 'ascii': # bypass encoding writer = csv.writer(buf, **csvargs) # non-ascii else: writer = UnicodeWriter(buf, encoding=encoding, errors=errors, **csvargs) for row in rows: writer.writerow(row) def teecsv_impl(table, source, **kwargs): return TeeCSVView(table, source=source, **kwargs) class TeeCSVView(Table): def __init__(self, table, source=None, encoding=None, errors='strict', write_header=True, **csvargs): self.table = table self.source = source self.encoding = encoding self.errors = errors self.write_header = write_header self.csvargs = csvargs def __iter__(self): with self.source.open('wb') as buf: # determine encoding codec = getcodec(self.encoding) # ascii if codec.name == 'ascii': # bypass encoding writer = csv.writer(buf, **self.csvargs) # non-ascii else: writer = UnicodeWriter(buf, encoding=self.encoding, errors=self.errors, **self.csvargs) it = iter(self.table) # deal with header row try: hdr = next(it) except StopIteration: return if self.write_header: writer.writerow(hdr) # N.B., always yield header, even if we don't write it yield tuple(hdr) # data rows for row in it: writer.writerow(row) yield tuple(row) # Additional classes for Unicode CSV support in PY2 # taken from the original csv module docs # http://docs.python.org/2/library/csv.html#examples class UTF8Recoder: def __init__(self, buf, encoding, errors): codec = getcodec(encoding) self.reader = codec.streamreader(buf, errors=errors) def __iter__(self): return self def next(self): return self.reader.next().encode('utf-8') class UnicodeReader: def __init__(self, f, encoding=None, errors='strict', **csvargs): f = UTF8Recoder(f, encoding=encoding, errors=errors) self.reader = csv.reader(f, **csvargs) def next(self): row = self.reader.next() return [unicode(s, 'utf-8') if isinstance(s, basestring) else s for s in row] def __iter__(self): return self class UnicodeWriter: def __init__(self, buf, encoding=None, errors='strict', **csvargs): # Redirect output to a queue self.queue = cStringIO.StringIO() self.writer = csv.writer(self.queue, **csvargs) self.stream = buf codec = getcodec(encoding) self.encoder = codec.incrementalencoder(errors) def writerow(self, row): self.writer.writerow( [unicode(s).encode('utf-8') if s is not None else None for s in row] ) # Fetch UTF-8 output from the queue ... data = self.queue.getvalue() data = data.decode('utf-8') # ... and reencode it into the target encoding data = self.encoder.encode(data) # write to the target stream self.stream.write(data) # empty queue self.queue.truncate(0) def writerows(self, rows): for row in rows: self.writerow(row) petl-1.7.15/petl/io/csv_py3.py000066400000000000000000000056311457414240700161010ustar00rootroot00000000000000# -*- coding: utf-8 -*- import io import csv import logging from petl.util.base import Table, data logger = logging.getLogger(__name__) warning = logger.warning info = logger.info debug = logger.debug def fromcsv_impl(source, **kwargs): return CSVView(source, **kwargs) class CSVView(Table): def __init__(self, source, encoding, errors, header, **csvargs): self.source = source self.encoding = encoding self.errors = errors self.csvargs = csvargs self.header = header def __iter__(self): if self.header is not None: yield tuple(self.header) with self.source.open('rb') as buf: csvfile = io.TextIOWrapper(buf, encoding=self.encoding, errors=self.errors, newline='') try: reader = csv.reader(csvfile, **self.csvargs) for row in reader: yield tuple(row) finally: csvfile.detach() def tocsv_impl(table, source, **kwargs): _writecsv(table, source=source, mode='wb', **kwargs) def appendcsv_impl(table, source, **kwargs): _writecsv(table, source=source, mode='ab', **kwargs) def _writecsv(table, source, mode, write_header, encoding, errors, **csvargs): rows = table if write_header else data(table) with source.open(mode) as buf: # wrap buffer for text IO csvfile = io.TextIOWrapper(buf, encoding=encoding, errors=errors, newline='') try: writer = csv.writer(csvfile, **csvargs) for row in rows: writer.writerow(row) csvfile.flush() finally: csvfile.detach() def teecsv_impl(table, source, **kwargs): return TeeCSVView(table, source=source, **kwargs) class TeeCSVView(Table): def __init__(self, table, source=None, encoding=None, errors='strict', write_header=True, **csvargs): self.table = table self.source = source self.write_header = write_header self.encoding = encoding self.errors = errors self.csvargs = csvargs def __iter__(self): with self.source.open('wb') as buf: # wrap buffer for text IO csvfile = io.TextIOWrapper(buf, encoding=self.encoding, errors=self.errors, newline='') try: writer = csv.writer(csvfile, **self.csvargs) it = iter(self.table) try: hdr = next(it) except StopIteration: return if self.write_header: writer.writerow(hdr) yield tuple(hdr) for row in it: writer.writerow(row) yield tuple(row) csvfile.flush() finally: csvfile.detach() petl-1.7.15/petl/io/db.py000066400000000000000000000601411457414240700150750ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division # standard library dependencies import logging from petl.compat import next, text_type, string_types # internal dependencies from petl.errors import ArgumentError from petl.util.base import Table from petl.io.db_utils import _is_dbapi_connection, _is_dbapi_cursor, \ _is_sqlalchemy_connection, _is_sqlalchemy_engine, _is_sqlalchemy_session, \ _is_clikchouse_dbapi_connection, _quote, _placeholders from petl.io.db_create import drop_table, create_table logger = logging.getLogger(__name__) debug = logger.debug warning = logger.warning def fromdb(dbo, query, *args, **kwargs): """Provides access to data from any DB-API 2.0 connection via a given query. E.g., using :mod:`sqlite3`:: >>> import petl as etl >>> import sqlite3 >>> connection = sqlite3.connect('example.db') >>> table = etl.fromdb(connection, 'SELECT * FROM example') E.g., using :mod:`psycopg2` (assuming you've installed it first):: >>> import petl as etl >>> import psycopg2 >>> connection = psycopg2.connect('dbname=example user=postgres') >>> table = etl.fromdb(connection, 'SELECT * FROM example') E.g., using :mod:`pymysql` (assuming you've installed it first):: >>> import petl as etl >>> import pymysql >>> connection = pymysql.connect(password='moonpie', database='thangs') >>> table = etl.fromdb(connection, 'SELECT * FROM example') The `dbo` argument may also be a function that creates a cursor. N.B., each call to the function should return a new cursor. E.g.:: >>> import petl as etl >>> import psycopg2 >>> connection = psycopg2.connect('dbname=example user=postgres') >>> mkcursor = lambda: connection.cursor(cursor_factory=psycopg2.extras.DictCursor) >>> table = etl.fromdb(mkcursor, 'SELECT * FROM example') The parameter `dbo` may also be an SQLAlchemy engine, session or connection object. The parameter `dbo` may also be a string, in which case it is interpreted as the name of a file containing an :mod:`sqlite3` database. Note that the default behaviour of most database servers and clients is for the entire result set for each query to be sent from the server to the client. If your query returns a large result set this can result in significant memory usage at the client. Some databases support server-side cursors which provide a means for client libraries to fetch result sets incrementally, reducing memory usage at the client. To use a server-side cursor with a PostgreSQL database, e.g.:: >>> import petl as etl >>> import psycopg2 >>> connection = psycopg2.connect('dbname=example user=postgres') >>> table = etl.fromdb(lambda: connection.cursor(name='arbitrary'), ... 'SELECT * FROM example') For more information on server-side cursors see the following links: * http://initd.org/psycopg/docs/usage.html#server-side-cursors * http://mysql-python.sourceforge.net/MySQLdb.html#using-and-extending """ # convenience for working with sqlite3 if isinstance(dbo, string_types): import sqlite3 dbo = sqlite3.connect(dbo) return DbView(dbo, query, *args, **kwargs) class DbView(Table): def __init__(self, dbo, query, *args, **kwargs): self.dbo = dbo self.query = query self.args = args self.kwargs = kwargs def __iter__(self): # does it quack like a standard DB-API 2.0 connection? if _is_dbapi_connection(self.dbo): debug('assuming %r is standard DB-API 2.0 connection', self.dbo) _iter = _iter_dbapi_connection # does it quack like a standard DB-API 2.0 cursor? elif _is_dbapi_cursor(self.dbo): debug('assuming %r is standard DB-API 2.0 cursor') warning('using a DB-API cursor with fromdb() is not recommended ' 'and may lead to unexpected results, a DB-API connection ' 'is better') _iter = _iter_dbapi_cursor # does it quack like an SQLAlchemy engine? elif _is_sqlalchemy_engine(self.dbo): debug('assuming %r instance of sqlalchemy.engine.base.Engine', self.dbo) _iter = _iter_sqlalchemy_engine # does it quack like an SQLAlchemy session? elif _is_sqlalchemy_session(self.dbo): debug('assuming %r instance of sqlalchemy.orm.session.Session', self.dbo) _iter = _iter_sqlalchemy_session # does it quack like an SQLAlchemy connection? elif _is_sqlalchemy_connection(self.dbo): debug('assuming %r instance of sqlalchemy.engine.base.Connection', self.dbo) _iter = _iter_sqlalchemy_connection elif callable(self.dbo): debug('assuming %r is a function returning a cursor', self.dbo) _iter = _iter_dbapi_mkcurs # some other sort of duck... else: raise ArgumentError('unsupported database object type: %r' % self.dbo) return _iter(self.dbo, self.query, *self.args, **self.kwargs) def _iter_dbapi_mkcurs(mkcurs, query, *args, **kwargs): cursor = mkcurs() try: for row in _iter_dbapi_cursor(cursor, query, *args, **kwargs): yield row finally: cursor.close() def _iter_dbapi_connection(connection, query, *args, **kwargs): cursor = connection.cursor() try: for row in _iter_dbapi_cursor(cursor, query, *args, **kwargs): yield row finally: cursor.close() def _iter_dbapi_cursor(cursor, query, *args, **kwargs): cursor.execute(query, *args, **kwargs) # fetch one row before iterating, to force population of cursor.description # which may be postponed if using server-side cursors # not all database drivers populate cursor after execute so we call fetchall try: it = iter(cursor) except TypeError: it = iter(cursor.fetchall()) try: first_row = next(it) except StopIteration: first_row = None # fields should be available now hdr = [d[0] for d in cursor.description] yield tuple(hdr) if first_row is None: return yield first_row for row in it: yield row # don't wrap, return whatever the database engine returns def _iter_sqlalchemy_engine(engine, query, *args, **kwargs): connection = engine.connect() for row in _iter_sqlalchemy_connection(connection, query, *args, **kwargs): yield row connection.close() def _iter_sqlalchemy_connection(connection, query, *args, **kwargs): debug('connection: %r', connection) results = connection.execute(query, *args, **kwargs) hdr = results.keys() yield tuple(hdr) for row in results: yield row def _iter_sqlalchemy_session(session, query, *args, **kwargs): results = session.execute(query, *args, **kwargs) hdr = results.keys() yield tuple(hdr) for row in results: yield row def todb(table, dbo, tablename, schema=None, commit=True, create=False, drop=False, constraints=True, metadata=None, dialect=None, sample=1000): """ Load data into an existing database table via a DB-API 2.0 connection or cursor. Note that the database table will be truncated, i.e., all existing rows will be deleted prior to inserting the new data. E.g.:: >>> import petl as etl >>> table = [['foo', 'bar'], ... ['a', 1], ... ['b', 2], ... ['c', 2]] >>> # using sqlite3 ... import sqlite3 >>> connection = sqlite3.connect('example.db') >>> # assuming table "foobar" already exists in the database ... etl.todb(table, connection, 'foobar') >>> # using psycopg2 >>> import psycopg2 >>> connection = psycopg2.connect('dbname=example user=postgres') >>> # assuming table "foobar" already exists in the database ... etl.todb(table, connection, 'foobar') >>> # using pymysql >>> import pymysql >>> connection = pymysql.connect(password='moonpie', database='thangs') >>> # tell MySQL to use standard quote character ... connection.cursor().execute('SET SQL_MODE=ANSI_QUOTES') >>> # load data, assuming table "foobar" already exists in the database ... etl.todb(table, connection, 'foobar') N.B., for MySQL the statement ``SET SQL_MODE=ANSI_QUOTES`` is required to ensure MySQL uses SQL-92 standard quote characters. A cursor can also be provided instead of a connection, e.g.:: >>> import psycopg2 >>> connection = psycopg2.connect('dbname=example user=postgres') >>> cursor = connection.cursor() >>> etl.todb(table, cursor, 'foobar') The parameter `dbo` may also be an SQLAlchemy engine, session or connection object. The parameter `dbo` may also be a string, in which case it is interpreted as the name of a file containing an :mod:`sqlite3` database. If ``create=True`` this function will attempt to automatically create a database table before loading the data. This functionality requires `SQLAlchemy `_ to be installed. **Keyword arguments:** table : table container Table data to load dbo : database object DB-API 2.0 connection, callable returning a DB-API 2.0 cursor, or SQLAlchemy connection, engine or session tablename : string Name of the table in the database schema : string Name of the database schema to find the table in commit : bool If True commit the changes create : bool If True attempt to create the table before loading, inferring types from a sample of the data (requires SQLAlchemy) drop : bool If True attempt to drop the table before recreating (only relevant if create=True) constraints : bool If True use length and nullable constraints (only relevant if create=True) metadata : sqlalchemy.MetaData Custom table metadata (only relevant if create=True) dialect : string One of {'access', 'sybase', 'sqlite', 'informix', 'firebird', 'mysql', 'oracle', 'maxdb', 'postgresql', 'mssql'} (only relevant if create=True) sample : int Number of rows to sample when inferring types etc. Set to 0 to use the whole table (only relevant if create=True) .. note:: This function is in principle compatible with any DB-API 2.0 compliant database driver. However, at the time of writing some DB-API 2.0 implementations, including cx_Oracle and MySQL's Connector/Python, are not compatible with this function, because they only accept a list argument to the cursor.executemany() function called internally by :mod:`petl`. This can be worked around by proxying the cursor objects, e.g.:: >>> import cx_Oracle >>> connection = cx_Oracle.Connection(...) >>> class CursorProxy(object): ... def __init__(self, cursor): ... self._cursor = cursor ... def executemany(self, statement, parameters, **kwargs): ... # convert parameters to a list ... parameters = list(parameters) ... # pass through to proxied cursor ... return self._cursor.executemany(statement, parameters, **kwargs) ... def __getattr__(self, item): ... return getattr(self._cursor, item) ... >>> def get_cursor(): ... return CursorProxy(connection.cursor()) ... >>> import petl as etl >>> etl.todb(tbl, get_cursor, ...) Note however that this does imply loading the entire table into memory as a list prior to inserting into the database. """ needs_closing = False # convenience for working with sqlite3 if isinstance(dbo, string_types): import sqlite3 dbo = sqlite3.connect(dbo) needs_closing = True try: if create: if drop: drop_table(dbo, tablename, schema=schema, commit=commit) create_table(table, dbo, tablename, schema=schema, commit=commit, constraints=constraints, metadata=metadata, dialect=dialect, sample=sample) _todb(table, dbo, tablename, schema=schema, commit=commit, truncate=True) finally: if needs_closing: dbo.close() Table.todb = todb def _todb(table, dbo, tablename, schema=None, commit=True, truncate=False): # need to deal with polymorphic dbo argument # what sort of duck is it? if _is_clikchouse_dbapi_connection(dbo): debug('assuming %r is clickhosue DB-API 2.0 connection', dbo) _todb_clikchouse_dbapi_connection(table, dbo, tablename, schema=schema, commit=commit, truncate=truncate) # does it quack like a standard DB-API 2.0 connection? elif _is_dbapi_connection(dbo): debug('assuming %r is standard DB-API 2.0 connection', dbo) _todb_dbapi_connection(table, dbo, tablename, schema=schema, commit=commit, truncate=truncate) # does it quack like a standard DB-API 2.0 cursor? elif _is_dbapi_cursor(dbo): debug('assuming %r is standard DB-API 2.0 cursor') _todb_dbapi_cursor(table, dbo, tablename, schema=schema, commit=commit, truncate=truncate) # does it quack like an SQLAlchemy engine? elif _is_sqlalchemy_engine(dbo): debug('assuming %r instance of sqlalchemy.engine.base.Engine', dbo) _todb_sqlalchemy_engine(table, dbo, tablename, schema=schema, commit=commit, truncate=truncate) # does it quack like an SQLAlchemy session? elif _is_sqlalchemy_session(dbo): debug('assuming %r instance of sqlalchemy.orm.session.Session', dbo) _todb_sqlalchemy_session(table, dbo, tablename, schema=schema, commit=commit, truncate=truncate) # does it quack like an SQLAlchemy connection? elif _is_sqlalchemy_connection(dbo): debug('assuming %r instance of sqlalchemy.engine.base.Connection', dbo) _todb_sqlalchemy_connection(table, dbo, tablename, schema=schema, commit=commit, truncate=truncate) elif callable(dbo): debug('assuming %r is a function returning standard DB-API 2.0 cursor ' 'objects', dbo) _todb_dbapi_mkcurs(table, dbo, tablename, schema=schema, commit=commit, truncate=truncate) # some other sort of duck... else: raise ArgumentError('unsupported database object type: %r' % dbo) SQL_TRUNCATE_QUERY = 'DELETE FROM %s' SQL_INSERT_QUERY = 'INSERT INTO %s (%s) VALUES (%s)' def _todb_dbapi_connection(table, connection, tablename, schema=None, commit=True, truncate=False): # sanitise table name tablename = _quote(tablename) if schema is not None: tablename = _quote(schema) + '.' + tablename debug('tablename: %r', tablename) # sanitise field names it = iter(table) hdr = next(it) flds = list(map(text_type, hdr)) colnames = [_quote(n) for n in flds] debug('column names: %r', colnames) # determine paramstyle and build placeholders string placeholders = _placeholders(connection, colnames) debug('placeholders: %r', placeholders) # get a cursor cursor = connection.cursor() if truncate: # TRUNCATE is not supported in some databases and causing locks with # MySQL used via SQLAlchemy, fall back to DELETE FROM for now truncatequery = SQL_TRUNCATE_QUERY % tablename debug('truncate the table via query %r', truncatequery) cursor.execute(truncatequery) # just in case, close and resurrect cursor cursor.close() cursor = connection.cursor() insertcolnames = ', '.join(colnames) insertquery = SQL_INSERT_QUERY % (tablename, insertcolnames, placeholders) debug('insert data via query %r' % insertquery) cursor.executemany(insertquery, it) # finish up debug('close the cursor') cursor.close() if commit: debug('commit transaction') connection.commit() def _todb_clikchouse_dbapi_connection(table, connection, tablename, schema=None, commit=True, truncate=False): # sanitise table name tablename = _quote(tablename) if schema is not None: tablename = _quote(schema) + '.' + tablename debug('tablename: %r', tablename) # sanitise field names it = iter(table) hdr = next(it) flds = list(map(text_type, hdr)) colnames = [_quote(n) for n in flds] debug('column names: %r', colnames) # determine paramstyle and build placeholders string placeholders = _placeholders(connection, colnames) debug('placeholders: %r', placeholders) # get a cursor cursor = connection.cursor() if truncate: # TRUNCATE is not supported in some databases and causing locks with # MySQL used via SQLAlchemy, fall back to DELETE FROM for now truncatequery = 'TRUNCATE TABLE IF EXISTS %s' % tablename debug('truncate the table via query %r', truncatequery) cursor.execute(truncatequery) # just in case, close and resurrect cursor cursor.close() cursor = connection.cursor() insertcolnames = ', '.join(colnames) insertquery = 'INSERT INTO %s (%s) VALUES' % (tablename, insertcolnames) debug('insert data via query %r' % insertquery) cursor.executemany(insertquery, it) # finish up debug('close the cursor') cursor.close() if commit: debug('commit transaction') connection.commit() def _todb_dbapi_mkcurs(table, mkcurs, tablename, schema=None, commit=True, truncate=False): # sanitise table name tablename = _quote(tablename) if schema is not None: tablename = _quote(schema) + '.' + tablename debug('tablename: %r', tablename) # sanitise field names it = iter(table) hdr = next(it) flds = list(map(text_type, hdr)) colnames = [_quote(n) for n in flds] debug('column names: %r', colnames) debug('obtain cursor and connection') cursor = mkcurs() # N.B., we depend on this optional DB-API 2.0 attribute being implemented assert hasattr(cursor, 'connection'), \ 'could not obtain connection via cursor' connection = cursor.connection # determine paramstyle and build placeholders string placeholders = _placeholders(connection, colnames) debug('placeholders: %r', placeholders) if truncate: # TRUNCATE is not supported in some databases and causing locks with # MySQL used via SQLAlchemy, fall back to DELETE FROM for now truncatequery = SQL_TRUNCATE_QUERY % tablename debug('truncate the table via query %r', truncatequery) cursor.execute(truncatequery) # N.B., may be server-side cursor, need to resurrect cursor.close() cursor = mkcurs() insertcolnames = ', '.join(colnames) insertquery = SQL_INSERT_QUERY % (tablename, insertcolnames, placeholders) debug('insert data via query %r' % insertquery) cursor.executemany(insertquery, it) cursor.close() if commit: debug('commit transaction') connection.commit() def _todb_dbapi_cursor(table, cursor, tablename, schema=None, commit=True, truncate=False): # sanitise table name tablename = _quote(tablename) if schema is not None: tablename = _quote(schema) + '.' + tablename debug('tablename: %r', tablename) # sanitise field names it = iter(table) hdr = next(it) flds = list(map(text_type, hdr)) colnames = [_quote(n) for n in flds] debug('column names: %r', colnames) debug('obtain connection via cursor') # N.B., we depend on this optional DB-API 2.0 attribute being implemented assert hasattr(cursor, 'connection'), \ 'could not obtain connection via cursor' connection = cursor.connection # determine paramstyle and build placeholders string placeholders = _placeholders(connection, colnames) debug('placeholders: %r', placeholders) if truncate: # TRUNCATE is not supported in some databases and causing locks with # MySQL used via SQLAlchemy, fall back to DELETE FROM for now truncatequery = SQL_TRUNCATE_QUERY % tablename debug('truncate the table via query %r', truncatequery) cursor.execute(truncatequery) insertcolnames = ', '.join(colnames) insertquery = SQL_INSERT_QUERY % (tablename, insertcolnames, placeholders) debug('insert data via query %r' % insertquery) cursor.executemany(insertquery, it) # N.B., don't close the cursor, leave that to the application if commit: debug('commit transaction') connection.commit() def _todb_sqlalchemy_engine(table, engine, tablename, schema=None, commit=True, truncate=False): _todb_sqlalchemy_connection(table, engine.connect(), tablename, schema=schema, commit=commit, truncate=truncate) def _todb_sqlalchemy_connection(table, connection, tablename, schema=None, commit=True, truncate=False): debug('connection: %r', connection) # sanitise table name tablename = _quote(tablename) if schema is not None: tablename = _quote(schema) + '.' + tablename debug('tablename: %r', tablename) # sanitise field names it = iter(table) hdr = next(it) flds = list(map(text_type, hdr)) colnames = [_quote(n) for n in flds] debug('column names: %r', colnames) # N.B., we need to obtain a reference to the underlying DB-API connection so # we can import the module and determine the paramstyle proxied_raw_connection = connection.connection actual_raw_connection = proxied_raw_connection.connection # determine paramstyle and build placeholders string placeholders = _placeholders(actual_raw_connection, colnames) debug('placeholders: %r', placeholders) if commit: debug('begin transaction') trans = connection.begin() if truncate: # TRUNCATE is not supported in some databases and causing locks with # MySQL used via SQLAlchemy, fall back to DELETE FROM for now truncatequery = SQL_TRUNCATE_QUERY % tablename debug('truncate the table via query %r', truncatequery) connection.execute(truncatequery) insertcolnames = ', '.join(colnames) insertquery = SQL_INSERT_QUERY % (tablename, insertcolnames, placeholders) debug('insert data via query %r' % insertquery) for row in it: connection.execute(insertquery, row) # finish up if commit: debug('commit transaction') trans.commit() # N.B., don't close connection, leave that to the application def _todb_sqlalchemy_session(table, session, tablename, schema=None, commit=True, truncate=False): _todb_sqlalchemy_connection(table, session.connection(), tablename, schema=schema, commit=commit, truncate=truncate) def appenddb(table, dbo, tablename, schema=None, commit=True): """ Load data into an existing database table via a DB-API 2.0 connection or cursor. As :func:`petl.io.db.todb` except that the database table will be appended, i.e., the new data will be inserted into the table, and any existing rows will remain. """ needs_closing = False # convenience for working with sqlite3 if isinstance(dbo, string_types): import sqlite3 dbo = sqlite3.connect(dbo) needs_closing = True try: _todb(table, dbo, tablename, schema=schema, commit=commit, truncate=False) finally: if needs_closing: dbo.close() Table.appenddb = appenddb petl-1.7.15/petl/io/db_create.py000066400000000000000000000254671457414240700164340ustar00rootroot00000000000000# -*- coding: utf-8 -*- """ Module providing some convenience functions for working with SQL databases. SQLAlchemy is required, try ``apt-get install python-sqlalchemy`` or ``pip install SQLAlchemy``. Acknowledgments: much of the code of this module is based on the ``csvsql`` utility in the `csvkit `_ package. """ import datetime import logging from petl.compat import long, text_type from petl.errors import ArgumentError from petl.util.materialise import columns from petl.transform.basics import head from petl.io.db_utils import _is_dbapi_connection, _is_dbapi_cursor, \ _is_sqlalchemy_engine, _is_sqlalchemy_session, _is_sqlalchemy_connection,\ _quote logger = logging.getLogger(__name__) debug = logger.debug DIALECTS = { 'access': 'access.base', 'firebird': 'firebird.kinterbasdb', 'informix': 'informix.informixdb', 'maxdb': 'maxdb.sapdb', 'mssql': 'mssql.pyodbc', 'mysql': 'mysql.mysqlconnector', 'oracle': 'oracle.cx_oracle', 'postgresql': 'postgresql.psycopg2', 'sqlite': 'sqlite.pysqlite', 'sybase': 'sybase.pyodbc' } SQL_INTEGER_MAX = 2147483647 SQL_INTEGER_MIN = -2147483647 NULL_COLUMN_MAX_LENGTH = 32 def make_sqlalchemy_column(col, colname, constraints=True): """ Infer an appropriate SQLAlchemy column type based on a sequence of values. Keyword arguments: col : sequence A sequence of values to use to infer type, length etc. colname : string Name of column constraints : bool If True use length and nullable constraints """ import sqlalchemy col_not_none = [v for v in col if v is not None] sql_column_kwargs = {} sql_type_kwargs = {} if len(col_not_none) == 0: sql_column_type = sqlalchemy.String if constraints: sql_type_kwargs['length'] = NULL_COLUMN_MAX_LENGTH elif all(isinstance(v, bool) for v in col_not_none): sql_column_type = sqlalchemy.Boolean elif all(isinstance(v, int) for v in col_not_none): if max(col_not_none) > SQL_INTEGER_MAX \ or min(col_not_none) < SQL_INTEGER_MIN: sql_column_type = sqlalchemy.BigInteger else: sql_column_type = sqlalchemy.Integer elif all(isinstance(v, long) for v in col_not_none): sql_column_type = sqlalchemy.BigInteger elif all(isinstance(v, (int, long)) for v in col_not_none): sql_column_type = sqlalchemy.BigInteger elif all(isinstance(v, (int, long, float)) for v in col_not_none): sql_column_type = sqlalchemy.Float elif all(isinstance(v, datetime.datetime) for v in col_not_none): sql_column_type = sqlalchemy.DateTime elif all(isinstance(v, datetime.date) for v in col_not_none): sql_column_type = sqlalchemy.Date elif all(isinstance(v, datetime.time) for v in col_not_none): sql_column_type = sqlalchemy.Time else: sql_column_type = sqlalchemy.String if constraints: sql_type_kwargs['length'] = max([len(text_type(v)) for v in col]) if constraints: sql_column_kwargs['nullable'] = len(col_not_none) < len(col) return sqlalchemy.Column(colname, sql_column_type(**sql_type_kwargs), **sql_column_kwargs) def make_sqlalchemy_table(table, tablename, schema=None, constraints=True, metadata=None): """ Create an SQLAlchemy table definition based on data in `table`. Keyword arguments: table : table container Table data to use to infer types etc. tablename : text Name of the table schema : text Name of the database schema to create the table in constraints : bool If True use length and nullable constraints metadata : sqlalchemy.MetaData Custom table metadata """ import sqlalchemy if not metadata: metadata = sqlalchemy.MetaData() sql_table = sqlalchemy.Table(tablename, metadata, schema=schema) cols = columns(table) flds = list(cols.keys()) for f in flds: sql_column = make_sqlalchemy_column(cols[f], f, constraints=constraints) sql_table.append_column(sql_column) return sql_table def make_create_table_statement(table, tablename, schema=None, constraints=True, metadata=None, dialect=None): """ Generate a CREATE TABLE statement based on data in `table`. Keyword arguments: table : table container Table data to use to infer types etc. tablename : text Name of the table schema : text Name of the database schema to create the table in constraints : bool If True use length and nullable constraints metadata : sqlalchemy.MetaData Custom table metadata dialect : text One of {'access', 'sybase', 'sqlite', 'informix', 'firebird', 'mysql', 'oracle', 'maxdb', 'postgresql', 'mssql'} """ import sqlalchemy sql_table = make_sqlalchemy_table(table, tablename, schema=schema, constraints=constraints, metadata=metadata) if dialect: module = __import__('sqlalchemy.dialects.%s' % DIALECTS[dialect], fromlist=['dialect']) sql_dialect = module.dialect() else: sql_dialect = None return text_type(sqlalchemy.schema.CreateTable(sql_table) .compile(dialect=sql_dialect)).strip() def create_table(table, dbo, tablename, schema=None, commit=True, constraints=True, metadata=None, dialect=None, sample=1000): """ Create a database table based on a sample of data in the given `table`. Keyword arguments: table : table container Table data to load dbo : database object DB-API 2.0 connection, callable returning a DB-API 2.0 cursor, or SQLAlchemy connection, engine or session tablename : text Name of the table schema : text Name of the database schema to create the table in commit : bool If True commit the changes constraints : bool If True use length and nullable constraints metadata : sqlalchemy.MetaData Custom table metadata dialect : text One of {'access', 'sybase', 'sqlite', 'informix', 'firebird', 'mysql', 'oracle', 'maxdb', 'postgresql', 'mssql'} sample : int Number of rows to sample when inferring types etc., set to 0 to use the whole table """ if sample > 0: table = head(table, sample) sql = make_create_table_statement(table, tablename, schema=schema, constraints=constraints, metadata=metadata, dialect=dialect) _execute(sql, dbo, commit=commit) def drop_table(dbo, tablename, schema=None, commit=True): """ Drop a database table. Keyword arguments: dbo : database object DB-API 2.0 connection, callable returning a DB-API 2.0 cursor, or SQLAlchemy connection, engine or session tablename : text Name of the table schema : text Name of the database schema the table is in commit : bool If True commit the changes """ # sanitise table name tablename = _quote(tablename) if schema is not None: tablename = _quote(schema) + '.' + tablename sql = u'DROP TABLE %s' % tablename _execute(sql, dbo, commit) def _execute(sql, dbo, commit): debug(sql) # need to deal with polymorphic dbo argument # what sort of duck is it? # does it quack like a standard DB-API 2.0 connection? if _is_dbapi_connection(dbo): debug('assuming %r is standard DB-API 2.0 connection', dbo) _execute_dbapi_connection(sql, dbo, commit) # does it quack like a standard DB-API 2.0 cursor? elif _is_dbapi_cursor(dbo): debug('assuming %r is standard DB-API 2.0 cursor') _execute_dbapi_cursor(sql, dbo, commit) # does it quack like an SQLAlchemy engine? elif _is_sqlalchemy_engine(dbo): debug('assuming %r is an instance of sqlalchemy.engine.base.Engine', dbo) _execute_sqlalchemy_engine(sql, dbo, commit) # does it quack like an SQLAlchemy session? elif _is_sqlalchemy_session(dbo): debug('assuming %r is an instance of sqlalchemy.orm.session.Session', dbo) _execute_sqlalchemy_session(sql, dbo, commit) # does it quack like an SQLAlchemy connection? elif _is_sqlalchemy_connection(dbo): debug('assuming %r is an instance of ' 'sqlalchemy.engine.base.Connection', dbo) _execute_sqlalchemy_connection(sql, dbo, commit) elif callable(dbo): debug('assuming %r is a function returning standard DB-API 2.0 cursor ' 'objects', dbo) _execute_dbapi_mkcurs(sql, dbo, commit) # some other sort of duck... else: raise ArgumentError('unsupported database object type: %r' % dbo) def _execute_dbapi_connection(sql, connection, commit): debug('obtain a cursor') cursor = connection.cursor() debug('execute SQL') cursor.execute(sql) debug('close the cursor') cursor.close() if commit: debug('commit transaction') connection.commit() def _execute_dbapi_mkcurs(sql, mkcurs, commit): debug('obtain a cursor') cursor = mkcurs() debug('execute SQL') cursor.execute(sql) debug('close the cursor') cursor.close() if commit: debug('commit transaction') # N.B., we depend on this optional DB-API 2.0 attribute being # implemented assert hasattr(cursor, 'connection'), \ 'could not obtain connection via cursor' connection = cursor.connection connection.commit() def _execute_dbapi_cursor(sql, cursor, commit): debug('execute SQL') cursor.execute(sql) # don't close the cursor, leave that to the application if commit: debug('commit transaction') # N.B., we depend on this optional DB-API 2.0 attribute being # implemented assert hasattr(cursor, 'connection'),\ 'could not obtain connection via cursor' connection = cursor.connection connection.commit() def _execute_sqlalchemy_connection(sql, connection, commit): if commit: debug('begin transaction') trans = connection.begin() debug('execute SQL') connection.execute(sql) if commit: debug('commit transaction') trans.commit() # N.B., don't close connection, leave that to the application def _execute_sqlalchemy_engine(sql, engine, commit): _execute_sqlalchemy_connection(sql, engine.connect(), commit) def _execute_sqlalchemy_session(sql, session, commit): _execute_sqlalchemy_connection(sql, session.connection(), commit) petl-1.7.15/petl/io/db_utils.py000066400000000000000000000057061457414240700163230ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division import logging from petl.compat import callable logger = logging.getLogger(__name__) debug = logger.debug def _is_dbapi_connection(dbo): return _hasmethod(dbo, 'cursor') def _is_clikchouse_dbapi_connection(dbo): return 'clickhouse_driver' in str(type(dbo)) def _is_dbapi_cursor(dbo): return _hasmethods(dbo, 'execute', 'executemany', 'fetchone', 'fetchmany', 'fetchall') def _is_sqlalchemy_engine(dbo): return (_hasmethods(dbo, 'execute', 'connect', 'raw_connection') and _hasprop(dbo, 'driver')) def _is_sqlalchemy_session(dbo): return _hasmethods(dbo, 'execute', 'connection', 'get_bind') def _is_sqlalchemy_connection(dbo): # N.B., this are not completely selective conditions, this test needs # to be applied after ruling out DB-API cursor return _hasmethod(dbo, 'execute') and _hasprop(dbo, 'connection') def _hasmethod(o, n): return hasattr(o, n) and callable(getattr(o, n)) def _hasmethods(o, *l): return all(_hasmethod(o, n) for n in l) def _hasprop(o, n): return hasattr(o, n) and not callable(getattr(o, n)) # default DB quote char per SQL-92 quotechar = '"' def _quote(s): # crude way to sanitise table and field names # conform with the SQL-92 standard. See http://stackoverflow.com/a/214344 return quotechar + s.replace(quotechar, quotechar+quotechar) + quotechar def _placeholders(connection, names): # discover the paramstyle if connection is None: # default to using question mark debug('connection is None, default to using qmark paramstyle') placeholders = ', '.join(['?'] * len(names)) else: mod = __import__(connection.__class__.__module__) if not hasattr(mod, 'paramstyle'): debug('module %r from connection %r has no attribute paramstyle, ' 'defaulting to qmark', mod, connection) # default to using question mark placeholders = ', '.join(['?'] * len(names)) elif mod.paramstyle == 'qmark': debug('found paramstyle qmark') placeholders = ', '.join(['?'] * len(names)) elif mod.paramstyle in ('format', 'pyformat'): debug('found paramstyle pyformat') placeholders = ', '.join(['%s'] * len(names)) elif mod.paramstyle == 'numeric': debug('found paramstyle numeric') placeholders = ', '.join([':' + str(i + 1) for i in range(len(names))]) elif mod.paramstyle == 'named': debug('found paramstyle named') placeholders = ', '.join([':%s' % name for name in names]) else: debug('found unexpected paramstyle %r, defaulting to qmark', mod.paramstyle) placeholders = ', '.join(['?'] * len(names)) return placeholders petl-1.7.15/petl/io/gsheet.py000066400000000000000000000175371457414240700160020ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division from petl.util.base import Table, iterdata from petl.compat import text_type from petl.errors import ArgumentError as PetlArgError def _get_gspread_client(auth_info): import gspread if isinstance(auth_info, gspread.Client): return auth_info if isinstance(auth_info, dict): gd = gspread.service_account_from_dict(auth_info) return gd import google if isinstance(auth_info, google.oauth2.service_account.Credentials): gc = gspread.authorize(auth_info) return gc if auth_info is None: ga = gspread.service_account() return ga raise PetlArgError("gspread: Invalid account credentials") def _open_spreadsheet(gspread_client, spreadsheet, open_by_key=False): if open_by_key: from gspread.exceptions import SpreadsheetNotFound try: wb = gspread_client.open_by_key(spreadsheet) except SpreadsheetNotFound: wb = gspread_client.open(spreadsheet) elif spreadsheet is not None: wb = gspread_client.open(spreadsheet) else: raise PetlArgError("gspread requires argument spreadsheet") return wb def _select_worksheet(wb, worksheet, find_or_create=False): # Allow for user to specify no sheet, sheet index or sheet name if worksheet is None: ws = wb.sheet1 elif isinstance(worksheet, int): ws = wb.get_worksheet(worksheet) elif isinstance(worksheet, text_type): sheetname = text_type(worksheet) if find_or_create: if worksheet in [wbs.title for wbs in wb.worksheets()]: ws = wb.worksheet(sheetname) else: ws = wb.add_worksheet(sheetname, 1, 1) else: # use text_type for cross version compatibility ws = wb.worksheet(sheetname) else: raise PetlArgError("Only can find worksheet by name or by number") return ws def fromgsheet( credentials_or_client, spreadsheet, worksheet=None, cell_range=None, open_by_key=False ): """ Extract a table from a google spreadsheet. The `credentials_or_client` are used to authenticate with the google apis. For more info, check `authentication`_. The `spreadsheet` can either be the key of the spreadsheet or its name. The `worksheet` argument can be omitted, in which case the first sheet in the workbook is used by default. The `cell_range` argument can be used to provide a range string specifying the top left and bottom right corners of a set of cells to extract. (i.e. 'A1:C7'). Set `open_by_key` to `True` in order to treat `spreadsheet` as spreadsheet key. .. note:: - Only the top level of google drive will be searched for the spreadsheet filename due to API limitations. - The worksheet name is case sensitive. Example usage follows:: >>> from petl import fromgsheet >>> import gspread # doctest: +SKIP >>> client = gspread.service_account() # doctest: +SKIP >>> tbl1 = fromgsheet(client, 'example_spreadsheet', 'Sheet1') # doctest: +SKIP >>> tbl2 = fromgsheet(client, '9zDNETemfau0uY8ZJF0YzXEPB_5GQ75JV', credentials) # doctest: +SKIP This functionality relies heavily on the work by @burnash and his great `gspread module`_. .. _gspread module: http://gspread.readthedocs.io/ .. _authentication: http://gspread.readthedocs.io/en/latest/oauth2.html """ return GoogleSheetView( credentials_or_client, spreadsheet, worksheet, cell_range, open_by_key, ) class GoogleSheetView(Table): """Conects to a worksheet and iterates over its rows.""" def __init__( self, credentials_or_client, spreadsheet, worksheet, cell_range, open_by_key ): self.auth_info = credentials_or_client self.spreadsheet = spreadsheet self.worksheet = worksheet self.cell_range = cell_range self.open_by_key = open_by_key def __iter__(self): gspread_client = _get_gspread_client(self.auth_info) wb = _open_spreadsheet(gspread_client, self.spreadsheet, self.open_by_key) ws = _select_worksheet(wb, self.worksheet) # grab the range or grab the whole sheet if self.cell_range is not None: return self._yield_by_range(ws) return self._yield_all_rows(ws) def _yield_all_rows(self, ws): # no range specified, so return all the rows for row in ws.get_all_values(): yield tuple(row) def _yield_by_range(self, ws): found = ws.get_values(self.cell_range) for row in found: yield tuple(row) def togsheet( table, credentials_or_client, spreadsheet, worksheet=None, cell_range=None, share_emails=None, role="reader" ): """ Write a table to a new google sheet. The `credentials_or_client` are used to authenticate with the google apis. For more info, check `authentication`_. The `spreadsheet` will be the title of the workbook created google sheets. If there is a spreadsheet with same title a new one will be created. If `worksheet` is specified, the first worksheet in the spreadsheet will be renamed to its value. The spreadsheet will be shared with all emails in `share_emails` with `role` permissions granted. For more info, check `sharing`_. Returns: the spreadsheet key that can be used in `appendgsheet` further. .. _sharing: https://developers.google.com/drive/v3/web/manage-sharing .. note:: The `gspread`_ package doesn't support serialization of `date` and `datetime` types yet. Example usage:: >>> from petl import fromcolumns, togsheet >>> import gspread # doctest: +SKIP >>> client = gspread.service_account() # doctest: +SKIP >>> cols = [[0, 1, 2], ['a', 'b', 'c']] >>> tbl = fromcolumns(cols) >>> togsheet(tbl, client, 'example_spreadsheet') # doctest: +SKIP """ gspread_client = _get_gspread_client(credentials_or_client) wb = gspread_client.create(spreadsheet) ws = wb.sheet1 ws.resize(rows=1, cols=1) # make smallest table possible # rename sheet if set if worksheet is not None: ws.update_title(title=worksheet) # gspread indices start at 1, therefore row index insert starts at 1 ws.append_rows(table, table_range=cell_range) # specify the user account to share to if share_emails is not None: for user_email in share_emails: wb.share(user_email, perm_type="user", role=role) return wb.id def appendgsheet( table, credentials_or_client, spreadsheet, worksheet=None, open_by_key=False, include_header=False ): """ Append a table to an existing google shoot at either a new worksheet or the end of an existing worksheet. The `credentials_or_client` are used to authenticate with the google apis. For more info, check `authentication`_. The `spreadsheet` is the name of the workbook to append to. The `worksheet` is the title of the worksheet to append to or create when it does not exist yet. Set `open_by_key` to `True` in order to treat `spreadsheet` as spreadsheet key. Set `include_header` to `True` if you don't want omit fieldnames as the first row appended. .. note:: The sheet index cannot be used, and None is not an option. """ gspread_client = _get_gspread_client(credentials_or_client) # be able to give filename or key for file wb = _open_spreadsheet(gspread_client, spreadsheet, open_by_key) # check to see if worksheet exists, if so append, otherwise create ws = _select_worksheet(wb, worksheet, True) rows = table if include_header else list(iterdata(table)) ws.append_rows(rows) return wb.id Table.togsheet = togsheet Table.appendgsheet = appendgsheet petl-1.7.15/petl/io/html.py000066400000000000000000000207011457414240700154520ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division # standard library dependencies import io from petl.compat import text_type, numeric_types, next, PY2, izip_longest, \ string_types, callable # internal dependencies from petl.errors import ArgumentError from petl.util.base import Table, Record from petl.io.base import getcodec from petl.io.sources import write_source_from_arg def tohtml(table, source=None, encoding=None, errors='strict', caption=None, vrepr=text_type, lineterminator='\n', index_header=False, tr_style=None, td_styles=None, truncate=None): """ Write the table as HTML to a file. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar'], ... ['a', 1], ... ['b', 2], ... ['c', 2]] >>> etl.tohtml(table1, 'example.html', caption='example table') >>> print(open('example.html').read())
example table
foo bar
a 1
b 2
c 2
The `caption` keyword argument is used to provide a table caption in the output HTML. """ source = write_source_from_arg(source) with source.open('wb') as buf: # deal with text encoding if PY2: codec = getcodec(encoding) f = codec.streamwriter(buf, errors=errors) else: f = io.TextIOWrapper(buf, encoding=encoding, errors=errors, newline='') # write the table try: it = iter(table) # write header try: hdr = next(it) except StopIteration: hdr = [] _write_begin(f, hdr, lineterminator, caption, index_header, truncate) # write body if tr_style and callable(tr_style): # wrap as records it = (Record(row, hdr) for row in it) for row in it: _write_row(f, hdr, row, lineterminator, vrepr, tr_style, td_styles, truncate) # finish up _write_end(f, lineterminator) f.flush() finally: if not PY2: f.detach() Table.tohtml = tohtml def teehtml(table, source=None, encoding=None, errors='strict', caption=None, vrepr=text_type, lineterminator='\n', index_header=False, tr_style=None, td_styles=None, truncate=None): """ Return a table that writes rows to a Unicode HTML file as they are iterated over. """ source = write_source_from_arg(source) return TeeHTMLView(table, source=source, encoding=encoding, errors=errors, caption=caption, vrepr=vrepr, lineterminator=lineterminator, index_header=index_header, tr_style=tr_style, td_styles=td_styles, truncate=truncate) Table.teehtml = teehtml class TeeHTMLView(Table): def __init__(self, table, source=None, encoding=None, errors='strict', caption=None, vrepr=text_type, lineterminator='\n', index_header=False, tr_style=None, td_styles=None, truncate=None): self.table = table self.source = source self.encoding = encoding self.errors = errors self.caption = caption self.vrepr = vrepr self.lineterminator = lineterminator self.index_header = index_header self.tr_style = tr_style self.td_styles = td_styles self.truncate = truncate def __iter__(self): table = self.table source = self.source encoding = self.encoding errors = self.errors lineterminator = self.lineterminator caption = self.caption index_header = self.index_header tr_style = self.tr_style td_styles = self.td_styles vrepr = self.vrepr truncate = self.truncate with source.open('wb') as buf: # deal with text encoding if PY2: codec = getcodec(encoding) f = codec.streamwriter(buf, errors=errors) else: f = io.TextIOWrapper(buf, encoding=encoding, errors=errors, newline='') # write the table try: it = iter(table) # write header try: hdr = next(it) yield hdr except StopIteration: hdr = [] _write_begin(f, hdr, lineterminator, caption, index_header, truncate) # write body if tr_style and callable(tr_style): # wrap as records it = (Record(row, hdr) for row in it) for row in it: _write_row(f, hdr, row, lineterminator, vrepr, tr_style, td_styles, truncate) yield row # finish up _write_end(f, lineterminator) f.flush() finally: if not PY2: f.detach() def _write_begin(f, flds, lineterminator, caption, index_header, truncate): f.write("" + lineterminator) if caption is not None: f.write(('' % caption) + lineterminator) if flds: f.write('' + lineterminator) f.write('' + lineterminator) for i, h in enumerate(flds): if index_header: h = '%s|%s' % (i, h) if truncate: h = h[:truncate] f.write(('' % h) + lineterminator) f.write('' + lineterminator) f.write('' + lineterminator) f.write('' + lineterminator) def _write_row(f, flds, row, lineterminator, vrepr, tr_style, td_styles, truncate): tr_css = _get_tr_css(row, tr_style) if tr_css: f.write(("" % tr_css) + lineterminator) else: f.write("" + lineterminator) for h, v in izip_longest(flds, row, fillvalue=None): r = vrepr(v) if truncate: r = r[:truncate] td_css = _get_td_css(h, v, td_styles) if td_css: f.write(("" % (td_css, r)) + lineterminator) else: f.write(("" % r) + lineterminator) f.write('' + lineterminator) def _get_tr_css(row, tr_style): # check for user-provided style if tr_style: if isinstance(tr_style, string_types): return tr_style elif callable(tr_style): return tr_style(row) else: raise ArgumentError('expected string or callable, got %r' % tr_style) # fall back to default style return '' def _get_td_css(h, v, td_styles): # check for user-provided style if td_styles: if isinstance(td_styles, string_types): return td_styles elif callable(td_styles): return td_styles(v) elif isinstance(td_styles, dict): if h in td_styles: s = td_styles[h] if isinstance(s, string_types): return s elif callable(s): return s(v) else: raise ArgumentError('expected string or callable, got %r' % s) else: raise ArgumentError('expected string, callable or dict, got %r' % td_styles) # fall back to default style if isinstance(v, numeric_types) and not isinstance(v, bool): return 'text-align: right' else: return '' def _write_end(f, lineterminator): f.write('' + lineterminator) f.write('
%s
%s
%s%s
' + lineterminator) petl-1.7.15/petl/io/json.py000066400000000000000000000324421457414240700154640ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division # standard library dependencies import io import json import inspect from json.encoder import JSONEncoder from os import unlink from tempfile import NamedTemporaryFile from petl.compat import PY2 from petl.compat import pickle from petl.io.sources import read_source_from_arg, write_source_from_arg # internal dependencies from petl.util.base import data, Table, dicts as _dicts, iterpeek def fromjson(source, *args, **kwargs): """ Extract data from a JSON file. The file must contain a JSON array as the top level object, and each member of the array will be treated as a row of data. E.g.:: >>> import petl as etl >>> data = ''' ... [{"foo": "a", "bar": 1}, ... {"foo": "b", "bar": 2}, ... {"foo": "c", "bar": 2}] ... ''' >>> with open('example.file1.json', 'w') as f: ... f.write(data) ... 74 >>> table1 = etl.fromjson('example.file1.json', header=['foo', 'bar']) >>> table1 +-----+-----+ | foo | bar | +=====+=====+ | 'a' | 1 | +-----+-----+ | 'b' | 2 | +-----+-----+ | 'c' | 2 | +-----+-----+ Setting argument `lines` to `True` will enable to infer the document as a JSON lines document. For more details about JSON lines please visit https://jsonlines.org/. >>> import petl as etl >>> data_with_jlines = '''{"name": "Gilbert", "wins": [["straight", "7S"], ["one pair", "10H"]]} ... {"name": "Alexa", "wins": [["two pair", "4S"], ["two pair", "9S"]]} ... {"name": "May", "wins": []} ... {"name": "Deloise", "wins": [["three of a kind", "5S"]]}''' ... >>> with open('example.file2.json', 'w') as f: ... f.write(data_with_jlines) ... 223 >>> table2 = etl.fromjson('example.file2.json', lines=True) >>> table2 +-----------+-------------------------------------------+ | name | wins | +===========+===========================================+ | 'Gilbert' | [['straight', '7S'], ['one pair', '10H']] | +-----------+-------------------------------------------+ | 'Alexa' | [['two pair', '4S'], ['two pair', '9S']] | +-----------+-------------------------------------------+ | 'May' | [] | +-----------+-------------------------------------------+ | 'Deloise' | [['three of a kind', '5S']] | +-----------+-------------------------------------------+ If your JSON file does not fit this structure, you will need to parse it via :func:`json.load` and select the array to treat as the data, see also :func:`petl.io.json.fromdicts`. .. versionchanged:: 1.1.0 If no `header` is specified, fields will be discovered by sampling keys from the first `sample` objects in `source`. The header will be constructed from keys in the order discovered. Note that this ordering may not be stable, and therefore it may be advisable to specify an explicit `header` or to use another function like :func:`petl.transform.headers.sortheader` on the resulting table to guarantee stability. """ source = read_source_from_arg(source) return JsonView(source, *args, **kwargs) class JsonView(Table): def __init__(self, source, *args, **kwargs): self.source = source self.missing = kwargs.pop('missing', None) self.header = kwargs.pop('header', None) self.sample = kwargs.pop('sample', 1000) self.lines = kwargs.pop('lines', False) self.args = args self.kwargs = kwargs def __iter__(self): with self.source.open('rb') as f: if not PY2: # wrap buffer for text IO f = io.TextIOWrapper(f, encoding='utf-8', newline='', write_through=True) try: if self.lines: for row in iterjlines(f, self.header, self.missing): yield row else: dicts = json.load(f, *self.args, **self.kwargs) for row in iterdicts(dicts, self.header, self.sample, self.missing): yield row finally: if not PY2: f.detach() def fromdicts(dicts, header=None, sample=1000, missing=None): """ View a sequence of Python :class:`dict` as a table. E.g.:: >>> import petl as etl >>> dicts = [{"foo": "a", "bar": 1}, ... {"foo": "b", "bar": 2}, ... {"foo": "c", "bar": 2}] >>> table1 = etl.fromdicts(dicts, header=['foo', 'bar']) >>> table1 +-----+-----+ | foo | bar | +=====+=====+ | 'a' | 1 | +-----+-----+ | 'b' | 2 | +-----+-----+ | 'c' | 2 | +-----+-----+ Argument `dicts` can also be a generator, the output of generator is iterated and cached using a temporary file to support further transforms and multiple passes of the table: >>> import petl as etl >>> dicts = ({"foo": chr(ord("a")+i), "bar":i+1} for i in range(3)) >>> table1 = etl.fromdicts(dicts, header=['foo', 'bar']) >>> table1 +-----+-----+ | foo | bar | +=====+=====+ | 'a' | 1 | +-----+-----+ | 'b' | 2 | +-----+-----+ | 'c' | 3 | +-----+-----+ If `header` is not specified, `sample` items from `dicts` will be inspected to discovery dictionary keys. Note that the order in which dictionary keys are discovered may not be stable, See also :func:`petl.io.json.fromjson`. .. versionchanged:: 1.1.0 If no `header` is specified, fields will be discovered by sampling keys from the first `sample` dictionaries in `dicts`. The header will be constructed from keys in the order discovered. Note that this ordering may not be stable, and therefore it may be advisable to specify an explicit `header` or to use another function like :func:`petl.transform.headers.sortheader` on the resulting table to guarantee stability. .. versionchanged:: 1.7.5 Full support of generators passed as `dicts` has been added, leveraging `itertools.tee`. .. versionchanged:: 1.7.11 Generator support has been modified to use temporary file cache instead of `itertools.tee` due to high memory usage. """ view = DictsGeneratorView if inspect.isgenerator(dicts) else DictsView return view(dicts, header=header, sample=sample, missing=missing) class DictsView(Table): def __init__(self, dicts, header=None, sample=1000, missing=None): self.dicts = dicts self._header = header self.sample = sample self.missing = missing def __iter__(self): return iterdicts(self.dicts, self._header, self.sample, self.missing) class DictsGeneratorView(DictsView): def __init__(self, dicts, header=None, sample=1000, missing=None): super(DictsGeneratorView, self).__init__(dicts, header, sample, missing) self._filecache = None self._cached = 0 def __iter__(self): if not self._header: self._determine_header() yield self._header if not self._filecache: if PY2: self._filecache = NamedTemporaryFile(delete=False, mode='wb+', bufsize=0) else: self._filecache = NamedTemporaryFile(delete=False, mode='wb+', buffering=0) position = 0 it = iter(self.dicts) while True: if position < self._cached: self._filecache.seek(position) row = pickle.load(self._filecache) position = self._filecache.tell() yield row continue try: o = next(it) except StopIteration: break row = tuple(o.get(f, self.missing) for f in self._header) self._filecache.seek(self._cached) pickle.dump(row, self._filecache, protocol=-1) self._cached = position = self._filecache.tell() yield row def _determine_header(self): it = iter(self.dicts) header = list() peek, it = iterpeek(it, self.sample) self.dicts = it if isinstance(peek, dict): peek = [peek] for o in peek: if hasattr(o, 'keys'): header += [k for k in o.keys() if k not in header] self._header = tuple(header) return it def __del__(self): if self._filecache: self._filecache.close() unlink(self._filecache.name) def iterjlines(f, header, missing): it = iter(f) if header is None: header = list() peek, it = iterpeek(it, 1) json_obj = json.loads(peek) if hasattr(json_obj, 'keys'): header += [k for k in json_obj.keys() if k not in header] yield tuple(header) for o in it: json_obj = json.loads(o) yield tuple(json_obj[f] if f in json_obj else missing for f in header) def iterdicts(dicts, header, sample, missing): it = iter(dicts) # determine header row if header is None: # discover fields header = list() peek, it = iterpeek(it, sample) for o in peek: if hasattr(o, 'keys'): header += [k for k in o.keys() if k not in header] yield tuple(header) # generate data rows for o in it: yield tuple(o.get(f, missing) for f in header) def tojson(table, source=None, prefix=None, suffix=None, *args, **kwargs): """ Write a table in JSON format, with rows output as JSON objects. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar'], ... ['a', 1], ... ['b', 2], ... ['c', 2]] >>> etl.tojson(table1, 'example.file3.json', sort_keys=True) >>> # check what it did ... print(open('example.file3.json').read()) [{"bar": 1, "foo": "a"}, {"bar": 2, "foo": "b"}, {"bar": 2, "foo": "c"}] Setting argument `lines` to `True` will enable to infer the writing format as a JSON lines . For more details about JSON lines please visit https://jsonlines.org/. >>> import petl as etl >>> table1 = [['name', 'wins'], ... ['Gilbert', [['straight', '7S'], ['one pair', '10H']]], ... ['Alexa', [['two pair', '4S'], ['two pair', '9S']]], ... ['May', []], ... ['Deloise',[['three of a kind', '5S']]]] >>> etl.tojson(table1, 'example.file3.jsonl', lines = True, sort_keys=True) >>> # check what it did ... print(open('example.file3.jsonl').read()) {"name": "Gilbert", "wins": [["straight", "7S"], ["one pair", "10H"]]} {"name": "Alexa", "wins": [["two pair", "4S"], ["two pair", "9S"]]} {"name": "May", "wins": []} {"name": "Deloise", "wins": [["three of a kind", "5S"]]} Note that this is currently not streaming, all data is loaded into memory before being written to the file. """ obj = list(_dicts(table)) _writejson(source, obj, prefix, suffix, *args, **kwargs) Table.tojson = tojson def tojsonarrays(table, source=None, prefix=None, suffix=None, output_header=False, *args, **kwargs): """ Write a table in JSON format, with rows output as JSON arrays. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar'], ... ['a', 1], ... ['b', 2], ... ['c', 2]] >>> etl.tojsonarrays(table1, 'example.file4.json') >>> # check what it did ... print(open('example.file4.json').read()) [["a", 1], ["b", 2], ["c", 2]] Note that this is currently not streaming, all data is loaded into memory before being written to the file. """ if output_header: obj = list(table) else: obj = list(data(table)) _writejson(source, obj, prefix, suffix, *args, **kwargs) Table.tojsonarrays = tojsonarrays def _writejson(source, obj, prefix, suffix, *args, **kwargs): lines = kwargs.pop('lines', False) encoder = JSONEncoder(*args, **kwargs) source = write_source_from_arg(source) with source.open('wb') as f: if PY2: # write directly to buffer _writeobj(encoder, obj, f, prefix, suffix, lines=lines) else: # wrap buffer for text IO f = io.TextIOWrapper(f, encoding='utf-8', newline='', write_through=True) try: _writeobj(encoder, obj, f, prefix, suffix, lines=lines) finally: f.detach() def _writeobj(encoder, obj, f, prefix, suffix, lines=False): if prefix is not None: f.write(prefix) if lines: for rec in obj: for chunk in encoder.iterencode(rec): f.write(chunk) f.write('\n') else: for chunk in encoder.iterencode(obj): f.write(chunk) if suffix is not None: f.write(suffix) petl-1.7.15/petl/io/numpy.py000066400000000000000000000122031457414240700156540ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import division, print_function, absolute_import from petl.compat import next, string_types from petl.util.base import iterpeek, ValuesView, Table from petl.util.materialise import columns def infer_dtype(table): import numpy as np # get numpy to infer dtype it = iter(table) hdr = next(it) flds = list(map(str, hdr)) rows = tuple(it) dtype = np.rec.array(rows).dtype dtype.names = flds return dtype def construct_dtype(flds, peek, dtype): import numpy as np if dtype is None: dtype = infer_dtype(peek) elif isinstance(dtype, string_types): # insert field names from source table typestrings = [s.strip() for s in dtype.split(',')] dtype = [(f, t) for f, t in zip(flds, typestrings)] elif (isinstance(dtype, dict) and ('names' not in dtype or 'formats' not in dtype)): # allow for partial specification of dtype cols = columns(peek) newdtype = {'names': [], 'formats': []} for f in flds: newdtype['names'].append(f) if f in dtype and isinstance(dtype[f], tuple): # assume fully specified newdtype['formats'].append(dtype[f][0]) elif f not in dtype: # not specified at all a = np.array(cols[f]) newdtype['formats'].append(a.dtype) else: # assume directly specified, just need to add offset newdtype['formats'].append(dtype[f]) dtype = newdtype return dtype def toarray(table, dtype=None, count=-1, sample=1000): """ Load data from the given `table` into a `numpy `_ structured array. E.g.:: >>> import petl as etl >>> table = [('foo', 'bar', 'baz'), ... ('apples', 1, 2.5), ... ('oranges', 3, 4.4), ... ('pears', 7, .1)] >>> a = etl.toarray(table) >>> a array([('apples', 1, 2.5), ('oranges', 3, 4.4), ('pears', 7, 0.1)], dtype=(numpy.record, [('foo', '>> # the dtype can be specified as a string ... a = etl.toarray(table, dtype='a4, i2, f4') >>> a array([(b'appl', 1, 2.5), (b'oran', 3, 4.4), (b'pear', 7, 0.1)], dtype=[('foo', 'S4'), ('bar', '>> # the dtype can also be partially specified ... a = etl.toarray(table, dtype={'foo': 'a4'}) >>> a array([(b'appl', 1, 2.5), (b'oran', 3, 4.4), (b'pear', 7, 0.1)], dtype=[('foo', 'S4'), ('bar', '`_ structured array, e.g.:: >>> import petl as etl >>> import numpy as np >>> a = np.array([('apples', 1, 2.5), ... ('oranges', 3, 4.4), ... ('pears', 7, 0.1)], ... dtype='U8, i4,f4') >>> table = etl.fromarray(a) >>> table +-----------+----+-----+ | f0 | f1 | f2 | +===========+====+=====+ | 'apples' | 1 | 2.5 | +-----------+----+-----+ | 'oranges' | 3 | 4.4 | +-----------+----+-----+ | 'pears' | 7 | 0.1 | +-----------+----+-----+ """ return ArrayView(a) class ArrayView(Table): def __init__(self, a): self.a = a def __iter__(self): yield tuple(self.a.dtype.names) for row in self.a: yield tuple(row) def valuestoarray(vals, dtype=None, count=-1, sample=1000): """ Load values from a table column into a `numpy `_ array, e.g.:: >>> import petl as etl >>> table = [('foo', 'bar', 'baz'), ... ('apples', 1, 2.5), ... ('oranges', 3, 4.4), ... ('pears', 7, .1)] >>> table = etl.wrap(table) >>> table.values('bar').array() array([1, 3, 7]) >>> # specify dtype ... table.values('bar').array(dtype='i4') array([1, 3, 7], dtype=int32) """ import numpy as np it = iter(vals) if dtype is None: peek, it = iterpeek(it, sample) dtype = np.array(peek).dtype a = np.fromiter(it, dtype=dtype, count=count) return a ValuesView.toarray = valuestoarray ValuesView.array = valuestoarray petl-1.7.15/petl/io/pandas.py000066400000000000000000000052101457414240700157520ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import division, print_function, absolute_import import inspect from petl.util.base import Table def todataframe(table, index=None, exclude=None, columns=None, coerce_float=False, nrows=None): """ Load data from the given `table` into a `pandas `_ DataFrame. E.g.:: >>> import petl as etl >>> table = [('foo', 'bar', 'baz'), ... ('apples', 1, 2.5), ... ('oranges', 3, 4.4), ... ('pears', 7, .1)] >>> df = etl.todataframe(table) >>> df foo bar baz 0 apples 1 2.5 1 oranges 3 4.4 2 pears 7 0.1 """ import pandas as pd it = iter(table) try: header = next(it) except StopIteration: header = None # Will create an Empty DataFrame if columns is None: columns = header return pd.DataFrame.from_records(it, index=index, exclude=exclude, columns=columns, coerce_float=coerce_float, nrows=nrows) Table.todataframe = todataframe Table.todf = todataframe def fromdataframe(df, include_index=False): """ Extract a table from a `pandas `_ DataFrame. E.g.:: >>> import petl as etl >>> import pandas as pd >>> records = [('apples', 1, 2.5), ('oranges', 3, 4.4), ('pears', 7, 0.1)] >>> df = pd.DataFrame.from_records(records, columns=('foo', 'bar', 'baz')) >>> table = etl.fromdataframe(df) >>> table +-----------+-----+-----+ | foo | bar | baz | +===========+=====+=====+ | 'apples' | 1 | 2.5 | +-----------+-----+-----+ | 'oranges' | 3 | 4.4 | +-----------+-----+-----+ | 'pears' | 7 | 0.1 | +-----------+-----+-----+ """ return DataFrameView(df, include_index=include_index) class DataFrameView(Table): def __init__(self, df, include_index=False): assert hasattr(df, 'columns') \ and hasattr(df, 'iterrows') \ and inspect.ismethod(df.iterrows), \ 'bad argument, expected pandas.DataFrame, found %r' % df self.df = df self.include_index = include_index def __iter__(self): if self.include_index: yield ('index',) + tuple(self.df.columns) for i, row in self.df.iterrows(): yield (i,) + tuple(row) else: yield tuple(self.df.columns) for _, row in self.df.iterrows(): yield tuple(row) petl-1.7.15/petl/io/pickle.py000066400000000000000000000106611457414240700157610ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division # standard library dependencies from petl.compat import pickle, next # internal dependencies from petl.util.base import Table from petl.io.sources import read_source_from_arg, write_source_from_arg def frompickle(source=None): """ Extract a table From data pickled in the given file. The rows in the table should have been pickled to the file one at a time. E.g.:: >>> import petl as etl >>> import pickle >>> # set up a file to demonstrate with ... with open('example.p', 'wb') as f: ... pickle.dump(['foo', 'bar'], f) ... pickle.dump(['a', 1], f) ... pickle.dump(['b', 2], f) ... pickle.dump(['c', 2.5], f) ... >>> # now demonstrate the use of frompickle() ... table1 = etl.frompickle('example.p') >>> table1 +-----+-----+ | foo | bar | +=====+=====+ | 'a' | 1 | +-----+-----+ | 'b' | 2 | +-----+-----+ | 'c' | 2.5 | +-----+-----+ """ source = read_source_from_arg(source) return PickleView(source) class PickleView(Table): def __init__(self, source): self.source = source def __iter__(self): with self.source.open('rb') as f: try: while True: yield tuple(pickle.load(f)) except EOFError: pass def topickle(table, source=None, protocol=-1, write_header=True): """ Write the table to a pickle file. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar'], ... ['a', 1], ... ['b', 2], ... ['c', 2]] >>> etl.topickle(table1, 'example.p') >>> # look what it did ... table2 = etl.frompickle('example.p') >>> table2 +-----+-----+ | foo | bar | +=====+=====+ | 'a' | 1 | +-----+-----+ | 'b' | 2 | +-----+-----+ | 'c' | 2 | +-----+-----+ Note that if a file already exists at the given location, it will be overwritten. The pickle file format preserves type information, i.e., reading and writing is round-trippable for tables with non-string data values. """ _writepickle(table, source=source, mode='wb', protocol=protocol, write_header=write_header) Table.topickle = topickle def appendpickle(table, source=None, protocol=-1, write_header=False): """ Append data to an existing pickle file. I.e., as :func:`petl.io.pickle.topickle` but the file is opened in append mode. Note that no attempt is made to check that the fields or row lengths are consistent with the existing data, the data rows from the table are simply appended to the file. """ _writepickle(table, source=source, mode='ab', protocol=protocol, write_header=write_header) Table.appendpickle = appendpickle def _writepickle(table, source, mode, protocol, write_header): source = write_source_from_arg(source, mode) with source.open(mode) as f: it = iter(table) try: hdr = next(it) except StopIteration: return if write_header: pickle.dump(hdr, f, protocol) for row in it: pickle.dump(row, f, protocol) def teepickle(table, source=None, protocol=-1, write_header=True): """ Return a table that writes rows to a pickle file as they are iterated over. """ return TeePickleView(table, source=source, protocol=protocol, write_header=write_header) Table.teepickle = teepickle class TeePickleView(Table): def __init__(self, table, source=None, protocol=-1, write_header=True): self.table = table self.source = source self.protocol = protocol self.write_header = write_header def __iter__(self): protocol = self.protocol source = write_source_from_arg(self.source) with source.open('wb') as f: it = iter(self.table) try: hdr = next(it) except StopIteration: return if self.write_header: pickle.dump(hdr, f, protocol) yield tuple(hdr) for row in it: pickle.dump(row, f, protocol) yield tuple(row) petl-1.7.15/petl/io/pytables.py000066400000000000000000000304601457414240700163340ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division from contextlib import contextmanager from petl.compat import string_types from petl.errors import ArgumentError from petl.util.base import Table, iterpeek, data from petl.io.numpy import infer_dtype def fromhdf5(source, where=None, name=None, condition=None, condvars=None, start=None, stop=None, step=None): """ Provides access to an HDF5 table. E.g.:: >>> import petl as etl >>> >>> # set up a new hdf5 table to demonstrate with >>> class FooBar(tables.IsDescription): # doctest: +SKIP ... foo = tables.Int32Col(pos=0) # doctest: +SKIP ... bar = tables.StringCol(6, pos=2) # doctest: +SKIP >>> # >>> def setup_hdf5_table(): ... import tables ... h5file = tables.open_file('example.h5', mode='w', ... title='Example file') ... h5file.create_group('/', 'testgroup', 'Test Group') ... h5table = h5file.create_table('/testgroup', 'testtable', FooBar, ... 'Test Table') ... # load some data into the table ... table1 = (('foo', 'bar'), ... (1, b'asdfgh'), ... (2, b'qwerty'), ... (3, b'zxcvbn')) ... for row in table1[1:]: ... for i, f in enumerate(table1[0]): ... h5table.row[f] = row[i] ... h5table.row.append() ... h5file.flush() ... h5file.close() >>> >>> setup_hdf5_table() # doctest: +SKIP >>> >>> # now demonstrate use of fromhdf5 >>> table1 = etl.fromhdf5('example.h5', '/testgroup', 'testtable') # doctest: +SKIP >>> table1 # doctest: +SKIP +-----+-----------+ | foo | bar | +=====+===========+ | 1 | b'asdfgh' | +-----+-----------+ | 2 | b'qwerty' | +-----+-----------+ | 3 | b'zxcvbn' | +-----+-----------+ >>> # alternatively just specify path to table node ... table1 = etl.fromhdf5('example.h5', '/testgroup/testtable') # doctest: +SKIP >>> # ...or use an existing tables.File object ... h5file = tables.open_file('example.h5') # doctest: +SKIP >>> table1 = etl.fromhdf5(h5file, '/testgroup/testtable') # doctest: +SKIP >>> # ...or use an existing tables.Table object ... h5tbl = h5file.get_node('/testgroup/testtable') # doctest: +SKIP >>> table1 = etl.fromhdf5(h5tbl) # doctest: +SKIP >>> # use a condition to filter data ... table2 = etl.fromhdf5(h5tbl, condition='foo < 3') # doctest: +SKIP >>> table2 # doctest: +SKIP +-----+-----------+ | foo | bar | +=====+===========+ | 1 | b'asdfgh' | +-----+-----------+ | 2 | b'qwerty' | +-----+-----------+ >>> h5file.close() # doctest: +SKIP """ return HDF5View(source, where=where, name=name, condition=condition, condvars=condvars, start=start, stop=stop, step=step) class HDF5View(Table): def __init__(self, source, where=None, name=None, condition=None, condvars=None, start=None, stop=None, step=None): self.source = source self.where = where self.name = name self.condition = condition self.condvars = condvars self.start = start self.stop = stop self.step = step def __iter__(self): return iterhdf5(self.source, self.where, self.name, self.condition, self.condvars, self.start, self.stop, self.step) @contextmanager def _get_hdf5_table(source, where, name, mode='r'): import tables needs_closing = False h5file = None # allow for polymorphic args if isinstance(source, tables.Table): # source is a table h5tbl = source elif isinstance(source, string_types): # assume source is the name of an HDF5 file, try to open it h5file = tables.open_file(source, mode=mode) needs_closing = True h5tbl = h5file.get_node(where, name=name) elif isinstance(source, tables.File): # source is an HDF5 file object h5file = source h5tbl = h5file.get_node(where, name=name) else: # invalid source raise ArgumentError('invalid source argument, expected file name or ' 'tables.File or tables.Table object, found: %r' % source) try: yield h5tbl finally: # tidy up if needs_closing: h5file.close() @contextmanager def _get_hdf5_file(source, mode='r'): import tables needs_closing = False # allow for polymorphic args if isinstance(source, string_types): # assume source is the name of an HDF5 file, try to open it h5file = tables.open_file(source, mode=mode) needs_closing = True elif isinstance(source, tables.File): # source is an HDF5 file object h5file = source else: # invalid source raise ArgumentError('invalid source argument, expected file name or ' 'tables.File object, found: %r' % source) try: yield h5file finally: if needs_closing: h5file.close() def iterhdf5(source, where, name, condition, condvars, start, stop, step): with _get_hdf5_table(source, where, name) as h5tbl: # header row hdr = tuple(h5tbl.colnames) yield hdr # determine how to iterate over the table if condition is not None: it = h5tbl.where(condition, condvars=condvars, start=start, stop=stop, step=step) else: it = h5tbl.iterrows(start=start, stop=stop, step=step) # data rows for row in it: yield row[:] # access row as a tuple def fromhdf5sorted(source, where=None, name=None, sortby=None, checkCSI=False, start=None, stop=None, step=None): """ Provides access to an HDF5 table, sorted by an indexed column, e.g.:: >>> import petl as etl >>> >>> # set up a new hdf5 table to demonstrate with >>> class FooBar(tables.IsDescription): # doctest: +SKIP ... foo = tables.Int32Col(pos=0) # doctest: +SKIP ... bar = tables.StringCol(6, pos=2) # doctest: +SKIP >>> >>> def setup_hdf5_index(): ... import tables ... h5file = tables.open_file('example.h5', mode='w', ... title='Example file') ... h5file.create_group('/', 'testgroup', 'Test Group') ... h5table = h5file.create_table('/testgroup', 'testtable', FooBar, ... 'Test Table') ... # load some data into the table ... table1 = (('foo', 'bar'), ... (1, b'asdfgh'), ... (2, b'qwerty'), ... (3, b'zxcvbn')) ... for row in table1[1:]: ... for i, f in enumerate(table1[0]): ... h5table.row[f] = row[i] ... h5table.row.append() ... h5table.cols.foo.create_csindex() # CS index is required ... h5file.flush() ... h5file.close() >>> >>> setup_hdf5_index() # doctest: +SKIP >>> ... # access the data, sorted by the indexed column ... table2 = etl.fromhdf5sorted('example.h5', '/testgroup', 'testtable', sortby='foo') # doctest: +SKIP >>> table2 # doctest: +SKIP +-----+-----------+ | foo | bar | +=====+===========+ | 1 | b'zxcvbn' | +-----+-----------+ | 2 | b'qwerty' | +-----+-----------+ | 3 | b'asdfgh' | +-----+-----------+ """ assert sortby is not None, 'no column specified to sort by' return HDF5SortedView(source, where=where, name=name, sortby=sortby, checkCSI=checkCSI, start=start, stop=stop, step=step) class HDF5SortedView(Table): def __init__(self, source, where=None, name=None, sortby=None, checkCSI=False, start=None, stop=None, step=None): self.source = source self.where = where self.name = name self.sortby = sortby self.checkCSI = checkCSI self.start = start self.stop = stop self.step = step def __iter__(self): return iterhdf5sorted(self.source, self.where, self.name, self.sortby, self.checkCSI, self.start, self.stop, self.step) def iterhdf5sorted(source, where, name, sortby, checkCSI, start, stop, step): with _get_hdf5_table(source, where, name) as h5tbl: # header row hdr = tuple(h5tbl.colnames) yield hdr it = h5tbl.itersorted(sortby, checkCSI=checkCSI, start=start, stop=stop, step=step) for row in it: yield row[:] # access row as a tuple def tohdf5(table, source, where=None, name=None, create=False, drop=False, description=None, title='', filters=None, expectedrows=10000, chunkshape=None, byteorder=None, createparents=False, sample=1000): """ Write to an HDF5 table. If `create` is `False`, assumes the table already exists, and attempts to truncate it before loading. If `create` is `True`, a new table will be created, and if `drop` is True, any existing table will be dropped first. If `description` is `None`, the description will be guessed. E.g.:: >>> import petl as etl >>> table1 = (('foo', 'bar'), ... (1, b'asdfgh'), ... (2, b'qwerty'), ... (3, b'zxcvbn')) >>> etl.tohdf5(table1, 'example.h5', '/testgroup', 'testtable', ... drop=True, create=True, createparents=True) # doctest: +SKIP >>> etl.fromhdf5('example.h5', '/testgroup', 'testtable') # doctest: +SKIP +-----+-----------+ | foo | bar | +=====+===========+ | 1 | b'asdfgh' | +-----+-----------+ | 2 | b'qwerty' | +-----+-----------+ | 3 | b'zxcvbn' | +-----+-----------+ """ import tables it = iter(table) if create: with _get_hdf5_file(source, mode='a') as h5file: if drop: try: h5file.get_node(where, name) except tables.NoSuchNodeError: pass else: h5file.remove_node(where, name) # determine datatype if description is None: peek, it = iterpeek(it, sample) # use a numpy dtype description = infer_dtype(peek) # create the table h5file.create_table(where, name, description, title=title, filters=filters, expectedrows=expectedrows, chunkshape=chunkshape, byteorder=byteorder, createparents=createparents) with _get_hdf5_table(source, where, name, mode='a') as h5table: # truncate the existing table h5table.truncate(0) # load the data _insert(it, h5table) Table.tohdf5 = tohdf5 def appendhdf5(table, source, where=None, name=None): """ As :func:`petl.io.hdf5.tohdf5` but don't truncate the target table before loading. """ with _get_hdf5_table(source, where, name, mode='a') as h5table: # load the data _insert(table, h5table) Table.appendhdf5 = appendhdf5 def _insert(table, h5table): it = data(table) # don't need header for row in it: for i, f in enumerate(h5table.colnames): # depends on order of fields being the same in input table # and hd5 table, but field names don't need to match h5table.row[f] = row[i] h5table.row.append() h5table.flush() petl-1.7.15/petl/io/remotes.py000066400000000000000000000204661457414240700161740ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division import logging import sys from contextlib import contextmanager from petl.compat import PY3 from petl.io.sources import register_reader, register_writer, get_reader, get_writer logger = logging.getLogger(__name__) # region RemoteSource class RemoteSource(object): """Read or write directly from files in remote filesystems. This source handles many filesystems that are selected based on the protocol passed in the `url` argument. The url should be specified in `to..()` and `from...()` functions. E.g.:: >>> import petl as etl >>> >>> def example_s3(): ... url = 's3://mybucket/prefix/to/myfilename.csv' ... data = b'foo,bar\\na,1\\nb,2\\nc,2\\n' ... ... etl.tocsv(data, url) ... tbl = etl.fromcsv(url) ... >>> example_s3() # doctest: +SKIP +-----+-----+ | foo | bar | +=====+=====+ | 'a' | '1' | +-----+-----+ | 'b' | '2' | +-----+-----+ | 'c' | '2' | +-----+-----+ This source uses `fsspec`_ to provide the data transfer with the remote filesystem. Check the `Built-in Implementations `_ for available remote implementations. Some filesystem can use `URL chaining `_ for compound I/O. .. note:: For working this source require `fsspec`_ to be installed, e.g.:: $ pip install fsspec Some remote filesystems require aditional packages to be installed. Check `Known Implementations `_ for checking what packages need to be installed, e.g.:: $ pip install s3fs # AWS S3 $ pip install gcsfs # Google Cloud Storage $ pip install adlfs # Azure Blob service $ pip install paramiko # SFTP $ pip install requests # HTTP, github .. versionadded:: 1.6.0 .. _fsspec: https://filesystem-spec.readthedocs.io/en/latest/ .. _fs_builtin: https://filesystem-spec.readthedocs.io/en/latest/api.html#built-in-implementations .. _fs_known: https://filesystem-spec.readthedocs.io/en/latest/api.html#other-known-implementations .. _fs_chain: https://filesystem-spec.readthedocs.io/en/latest/features.html#url-chaining """ def __init__(self, url, **kwargs): self.url = url self.kwargs = kwargs def open_file(self, mode="rb"): import fsspec # auto_mkdir=True can fail in some filesystems or without permission for full path # E.g: s3fs tries to create a bucket when writing into a folder does not exists fs = fsspec.open(self.url, mode=mode, compression='infer', auto_mkdir=False, **self.kwargs) return fs @contextmanager def open(self, mode="rb"): mode2 = mode[:1] + r"b" # python2 fs = self.open_file(mode=mode2) with fs as source: yield source # registering filesystems with packages installed def _register_filesystems(only_available=False): """Register all known fsspec implementations as remote source.""" from fsspec.registry import known_implementations, registry # https://filesystem-spec.readthedocs.io/en/latest/api.html#built-in-implementations # https://filesystem-spec.readthedocs.io/en/latest/api.html#other-known-implementations _register_filesystems_from(known_implementations, only_available) # https://filesystem-spec.readthedocs.io/en/latest/api.html#fsspec.registry.register_implementation _register_filesystems_from(registry, only_available) def _register_filesystems_from(fsspec_registry, only_available): """Register each fsspec provider from this registry as remote source.""" for protocol, spec in fsspec_registry.items(): missing_deps = isinstance(spec, dict) and "err" in spec if missing_deps and only_available: # this could lead to only buit-in implementations available # Other Known Implementations are reported with 'err' even even # the package is installed continue # When missing a package for fsspec use the available source in petl # E.g: fsspec requires `requests` package installed for handling http and https # but petl has URLSource that can work with urlib has_reader = get_reader(protocol) if not missing_deps or has_reader is None: register_reader(protocol, RemoteSource) has_writer = get_writer(protocol) if not missing_deps or has_writer is None: register_writer(protocol, RemoteSource) def _try_register_filesystems(): try: # pylint: disable=unused-import import fsspec # noqa: F401 except ImportError: logger.debug("# Missing fsspec package. Install with: pip install fsspec") else: try: _register_filesystems() except Exception as ex: raise ImportError("# ERROR: failed to register fsspec filesystems", ex) if PY3: _try_register_filesystems() # endregion # region SMBSource class SMBSource(object): """Downloads or uploads to Windows and Samba network drives. E.g.:: >>> def example_smb(): ... import petl as etl ... url = 'smb://user:password@server/share/folder/file.csv' ... data = b'foo,bar\\na,1\\nb,2\\nc,2\\n' ... etl.tocsv(data, url) ... tbl = etl.fromcsv(url) ... >>> example_smb() # doctest: +SKIP +-----+-----+ | foo | bar | +=====+=====+ | 'a' | '1' | +-----+-----+ | 'b' | '2' | +-----+-----+ | 'c' | '2' | +-----+-----+ The argument `url` (str) must have a URI with format: `smb://workgroup;user:password@server:port/share/folder/file.csv`. Note that you need to pass in a valid hostname or IP address for the host component of the URL. Do not use the Windows/NetBIOS machine name for the host component. The first component of the path in the URL points to the name of the shared folder. Subsequent path components will point to the directory/folder/file. .. note:: For working this source require `smbprotocol`_ to be installed, e.g.:: $ pip install smbprotocol[kerberos] .. versionadded:: 1.5.0 .. _smbprotocol: https://github.com/jborean93/smbprotocol#requirements """ def __init__(self, url, **kwargs): self.url = url self.kwargs = kwargs @contextmanager def open(self, mode="rb"): mode2 = mode[:1] + r"b" # python2 source = _open_file_smbprotocol(self.url, mode=mode2, **self.kwargs) try: yield source finally: source.close() def _open_file_smbprotocol(url, mode="rb", **kwargs): _domain, host, port, user, passwd, server_path = _parse_smb_url(url) import smbclient try: # register the server with explicit credentials if user: smbclient.register_session( host, username=user, password=passwd, port=port ) # Read an existing file as bytes mode2 = mode[:1] + r"b" filehandle = smbclient.open_file(server_path, mode=mode2, **kwargs) return filehandle except Exception as ex: raise ConnectionError("SMB error: %s" % ex).with_traceback(sys.exc_info()[2]) def _parse_smb_url(url): e = "SMB url must be smb://workgroup;user:password@server:port/share/folder/file.txt: " if not url: raise ValueError("SMB error: no host given") if not url.startswith("smb://"): raise ValueError(e + url) if PY3: from urllib.parse import urlparse else: from urlparse import urlparse parsed = urlparse(url) if not parsed.path: raise ValueError(e + url) unc_path = parsed.path.replace("/", "\\") server_path = "\\\\{}{}".format(parsed.hostname, unc_path) if not parsed.username: domain = None username = None elif ";" in parsed.username: domain, username = parsed.username.split(";") else: domain, username = None, parsed.username port = 445 if not parsed.port else int(parsed.port) return domain, parsed.hostname, port, username, parsed.password, server_path register_reader("smb", SMBSource) register_writer("smb", SMBSource) # endregion petl-1.7.15/petl/io/sources.py000066400000000000000000000305371457414240700162010ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division import os import io import gzip import sys import bz2 import zipfile from contextlib import contextmanager import subprocess import logging from petl.errors import ArgumentError from petl.compat import urlopen, StringIO, BytesIO, string_types, PY2 logger = logging.getLogger(__name__) warning = logger.warning info = logger.info debug = logger.debug class FileSource(object): def __init__(self, filename, **kwargs): self.filename = filename self.kwargs = kwargs def open(self, mode='r'): return io.open(self.filename, mode, **self.kwargs) class GzipSource(object): def __init__(self, filename, remote=False, **kwargs): self.filename = filename self.remote = remote self.kwargs = kwargs @contextmanager def open(self, mode='r'): if self.remote: if not mode.startswith('r'): raise ArgumentError('source is read-only') filehandle = urlopen(self.filename) else: filehandle = self.filename source = gzip.open(filehandle, mode, **self.kwargs) try: yield source finally: source.close() class BZ2Source(object): def __init__(self, filename, remote=False, **kwargs): self.filename = filename self.remote = remote self.kwargs = kwargs @contextmanager def open(self, mode='r'): if self.remote: if not mode.startswith('r'): raise ArgumentError('source is read-only') filehandle = urlopen(self.filename) else: filehandle = self.filename source = bz2.BZ2File(filehandle, mode, **self.kwargs) try: yield source finally: source.close() class ZipSource(object): def __init__(self, filename, membername, pwd=None, **kwargs): self.filename = filename self.membername = membername self.pwd = pwd self.kwargs = kwargs @contextmanager def open(self, mode): if PY2: mode = mode.translate(None, 'bU') else: mode = mode.translate({ord('b'): None, ord('U'): None}) zf = zipfile.ZipFile(self.filename, mode, **self.kwargs) try: if self.pwd is not None: yield zf.open(self.membername, mode, self.pwd) else: yield zf.open(self.membername, mode) finally: zf.close() class Uncloseable(object): def __init__(self, inner): object.__setattr__(self, '_inner', inner) def __getattr__(self, item): return getattr(self._inner, item) def __setattr__(self, key, value): setattr(self._inner, key, value) def close(self): debug('Uncloseable: close called (%r)' % self._inner) pass def _get_stdout_binary(): try: return sys.stdout.buffer except AttributeError: pass try: fd = sys.stdout.fileno() return os.fdopen(fd, 'ab', 0) except Exception: pass try: return sys.__stdout__.buffer except AttributeError: pass try: fd = sys.__stdout__.fileno() return os.fdopen(fd, 'ab', 0) except Exception: pass # fallback return sys.stdout stdout_binary = _get_stdout_binary() def _get_stdin_binary(): try: return sys.stdin.buffer except AttributeError: pass try: fd = sys.stdin.fileno() return os.fdopen(fd, 'rb', 0) except Exception: pass try: return sys.__stdin__.buffer except AttributeError: pass try: fd = sys.__stdin__.fileno() return os.fdopen(fd, 'rb', 0) except Exception: pass # fallback return sys.stdin stdin_binary = _get_stdin_binary() class StdoutSource(object): @contextmanager def open(self, mode): if mode.startswith('r'): raise ArgumentError('source is write-only') if 'b' in mode: yield Uncloseable(stdout_binary) else: yield Uncloseable(sys.stdout) class StdinSource(object): @contextmanager def open(self, mode='r'): if not mode.startswith('r'): raise ArgumentError('source is read-only') if 'b' in mode: yield Uncloseable(stdin_binary) else: yield Uncloseable(sys.stdin) class URLSource(object): def __init__(self, *args, **kwargs): self.args = args self.kwargs = kwargs @contextmanager def open(self, mode='r'): if not mode.startswith('r'): raise ArgumentError('source is read-only') f = urlopen(*self.args, **self.kwargs) try: yield f finally: f.close() class MemorySource(object): """Memory data source. E.g.:: >>> import petl as etl >>> data = b'foo,bar\\na,1\\nb,2\\nc,2\\n' >>> source = etl.MemorySource(data) >>> tbl = etl.fromcsv(source) >>> tbl +-----+-----+ | foo | bar | +=====+=====+ | 'a' | '1' | +-----+-----+ | 'b' | '2' | +-----+-----+ | 'c' | '2' | +-----+-----+ >>> sink = etl.MemorySource() >>> tbl.tojson(sink) >>> sink.getvalue() b'[{"foo": "a", "bar": "1"}, {"foo": "b", "bar": "2"}, {"foo": "c", "bar": "2"}]' Also supports appending. """ def __init__(self, s=None): self.s = s self.buffer = None @contextmanager def open(self, mode='rb'): try: if 'r' in mode: if self.s is not None: if 'b' in mode: self.buffer = BytesIO(self.s) else: self.buffer = StringIO(self.s) else: raise ArgumentError('no string data supplied') elif 'w' in mode: if self.buffer is not None: self.buffer.close() if 'b' in mode: self.buffer = BytesIO() else: self.buffer = StringIO() elif 'a' in mode: if self.buffer is None: if 'b' in mode: self.buffer = BytesIO() else: self.buffer = StringIO() yield Uncloseable(self.buffer) except: raise finally: pass # don't close the buffer def getvalue(self): if self.buffer: return self.buffer.getvalue() # backwards compatibility StringSource = MemorySource class PopenSource(object): def __init__(self, *args, **kwargs): self.args = args self.kwargs = kwargs @contextmanager def open(self, mode='r'): if not mode.startswith('r'): raise ArgumentError('source is read-only') self.kwargs['stdout'] = subprocess.PIPE proc = subprocess.Popen(*self.args, **self.kwargs) try: yield proc.stdout finally: pass class CompressedSource(object): '''Handle IO from a file-like object and (de)compress with a codec The `source` argument (source class) is the source class that will handle the actual input/output stream. E.g: :class:`petl.io.sources.URLSource`. The `codec` argument (source class) is the source class that will handle the (de)compression of the stream. E.g: :class:`petl.io.sources.GzipSource`. ''' def __init__(self, source, codec): self.source = source self.codec = codec @contextmanager def open(self, mode='rb'): with self.source.open(mode=mode) as filehandle: transcoder = self.codec(filehandle) with transcoder.open(mode=mode) as stream: yield stream _invalid_source_msg = 'invalid source argument, expected None or a string or ' \ 'an object implementing open(), found %r' _READERS = {} _CODECS = {} _WRITERS = {} def _assert_source_has_open(source_class): source = source_class('test') assert (hasattr(source, 'open') and callable(getattr(source, 'open'))), \ _invalid_source_msg % source def _register_handler(handler_type, handler_class, handler_list): assert isinstance(handler_type, string_types), _invalid_source_msg % handler_type assert isinstance(handler_class, type), _invalid_source_msg % handler_type _assert_source_has_open(handler_class) handler_list[handler_type] = handler_class def _get_handler(handler_type, handler_list): if isinstance(handler_type, string_types): if handler_type in handler_list: return handler_list[handler_type] return None def register_codec(extension, handler_class): ''' Register handler for automatic compression and decompression for file I/O Use of the handler is determined matching the file `extension` with the source specified in ``from...()`` and ``to...()`` functions. .. versionadded:: 1.5.0 ''' _register_handler(extension, handler_class, _CODECS) def register_reader(protocol, handler_class): ''' Register handler for automatic reading using a remote protocol. Use of the handler is determined matching the `protocol` with the scheme part of the url in ``from...()`` function (e.g: `http://`). .. versionadded:: 1.5.0 ''' _register_handler(protocol, handler_class, _READERS) def register_writer(protocol, handler_class): ''' Register handler for automatic writing using a remote protocol. Use of the handler is determined matching the `protocol` with the scheme part of the url in ``to...()`` function (e.g: `smb://`). .. versionadded:: 1.5.0 ''' _register_handler(protocol, handler_class, _WRITERS) def get_reader(protocol): ''' Retrieve the handler responsible for reading from a remote protocol. .. versionadded:: 1.6.0 ''' return _get_handler(protocol, _READERS) def get_writer(protocol): ''' Retrieve the handler responsible for writing from a remote protocol. .. versionadded:: 1.6.0 ''' return _get_handler(protocol, _WRITERS) # Setup default sources register_codec('.gz', GzipSource) register_codec('.bgz', GzipSource) register_codec('.bz2', BZ2Source) register_reader('ftp', URLSource) register_reader('http', URLSource) register_reader('https', URLSource) def _get_codec_for(source): for ext, codec_class in _CODECS.items(): if source.endswith(ext): return codec_class return None def _get_handler_from(source, handlers): protocol_index = source.find('://') if protocol_index <= 0: return None protocol = source[:protocol_index] for prefix, handler_class in handlers.items(): if prefix == protocol: return handler_class return None def _resolve_source_from_arg(source, handlers): if isinstance(source, string_types): handler = _get_handler_from(source, handlers) codec = _get_codec_for(source) if handler is None: if codec is not None: return codec(source) assert '://' not in source, _invalid_source_msg % source return FileSource(source) return handler(source) else: assert (hasattr(source, 'open') and callable(getattr(source, 'open'))), \ _invalid_source_msg % source return source def read_source_from_arg(source): ''' Retrieve a open stream for reading from the source provided. The result stream will be open by a handler that would return raw bytes and transparently take care of the decompression, remote authentication, network transfer, format decoding, and data extraction. .. versionadded:: 1.4.0 ''' if source is None: return StdinSource() return _resolve_source_from_arg(source, _READERS) def write_source_from_arg(source, mode='wb'): ''' Retrieve a open stream for writing to the source provided. The result stream will be open by a handler that would write raw bytes and transparently take care of the compression, remote authentication, network transfer, format encoding, and data writing. .. versionadded:: 1.4.0 ''' if source is None: return StdoutSource() return _resolve_source_from_arg(source, _WRITERS) petl-1.7.15/petl/io/text.py000066400000000000000000000177311457414240700155030ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division # standard library dependencies import io from petl.compat import next, PY2, text_type # internal dependencies from petl.util.base import Table, asdict from petl.io.base import getcodec from petl.io.sources import read_source_from_arg, write_source_from_arg def fromtext(source=None, encoding=None, errors='strict', strip=None, header=('lines',)): """ Extract a table from lines in the given text file. E.g.:: >>> import petl as etl >>> # setup example file ... text = 'a,1\\nb,2\\nc,2\\n' >>> with open('example.txt', 'w') as f: ... f.write(text) ... 12 >>> table1 = etl.fromtext('example.txt') >>> table1 +-------+ | lines | +=======+ | 'a,1' | +-------+ | 'b,2' | +-------+ | 'c,2' | +-------+ >>> # post-process, e.g., with capture() ... table2 = table1.capture('lines', '(.*),(.*)$', ['foo', 'bar']) >>> table2 +-----+-----+ | foo | bar | +=====+=====+ | 'a' | '1' | +-----+-----+ | 'b' | '2' | +-----+-----+ | 'c' | '2' | +-----+-----+ Note that the strip() function is called on each line, which by default will remove leading and trailing whitespace, including the end-of-line character - use the `strip` keyword argument to specify alternative characters to strip. Set the `strip` argument to `False` to disable this behaviour and leave line endings in place. """ source = read_source_from_arg(source) return TextView(source, header=header, encoding=encoding, errors=errors, strip=strip) class TextView(Table): def __init__(self, source, header=('lines',), encoding=None, errors='strict', strip=None): self.source = source self.header = header self.encoding = encoding self.errors = errors self.strip = strip def __iter__(self): with self.source.open('rb') as buf: # deal with text encoding if PY2: codec = getcodec(self.encoding) f = codec.streamreader(buf, errors=self.errors) else: f = io.TextIOWrapper(buf, encoding=self.encoding, errors=self.errors, newline='') # generate the table try: if self.header is not None: yield tuple(self.header) if self.strip is False: for line in f: yield (line,) else: for line in f: yield (line.strip(self.strip),) finally: if not PY2: f.detach() def totext(table, source=None, encoding=None, errors='strict', template=None, prologue=None, epilogue=None): """ Write the table to a text file. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar'], ... ['a', 1], ... ['b', 2], ... ['c', 2]] >>> prologue = '''{| class="wikitable" ... |- ... ! foo ... ! bar ... ''' >>> template = '''|- ... | {foo} ... | {bar} ... ''' >>> epilogue = '|}' >>> etl.totext(table1, 'example.txt', template=template, ... prologue=prologue, epilogue=epilogue) >>> # see what we did ... print(open('example.txt').read()) {| class="wikitable" |- ! foo ! bar |- | a | 1 |- | b | 2 |- | c | 2 |} The `template` will be used to format each row via `str.format `_. """ _writetext(table, source=source, mode='wb', encoding=encoding, errors=errors, template=template, prologue=prologue, epilogue=epilogue) Table.totext = totext def appendtext(table, source=None, encoding=None, errors='strict', template=None, prologue=None, epilogue=None): """ Append the table to a text file. """ _writetext(table, source=source, mode='ab', encoding=encoding, errors=errors, template=template, prologue=prologue, epilogue=epilogue) Table.appendtext = appendtext def _writetext(table, source, mode, encoding, errors, template, prologue, epilogue): # guard conditions assert template is not None, 'template is required' # prepare source source = write_source_from_arg(source, mode) with source.open(mode) as buf: # deal with text encoding if PY2: codec = getcodec(encoding) f = codec.streamwriter(buf, errors=errors) else: f = io.TextIOWrapper(buf, encoding=encoding, errors=errors, newline='') # write the table try: if prologue is not None: f.write(prologue) it = iter(table) try: hdr = next(it) except StopIteration: hdr = [] flds = list(map(text_type, hdr)) for row in it: rec = asdict(flds, row) s = template.format(**rec) f.write(s) if epilogue is not None: f.write(epilogue) f.flush() finally: if not PY2: f.detach() def teetext(table, source=None, encoding=None, errors='strict', template=None, prologue=None, epilogue=None): """ Return a table that writes rows to a text file as they are iterated over. """ assert template is not None, 'template is required' return TeeTextView(table, source=source, encoding=encoding, errors=errors, template=template, prologue=prologue, epilogue=epilogue) Table.teetext = teetext class TeeTextView(Table): def __init__(self, table, source=None, encoding=None, errors='strict', template=None, prologue=None, epilogue=None): self.table = table self.source = source self.encoding = encoding self.errors = errors self.template = template self.prologue = prologue self.epilogue = epilogue def __iter__(self): return _iterteetext(self.table, self.source, self.encoding, self.errors, self.template, self.prologue, self.epilogue) def _iterteetext(table, source, encoding, errors, template, prologue, epilogue): # guard conditions assert template is not None, 'template is required' # prepare source source = write_source_from_arg(source) with source.open('wb') as buf: # deal with text encoding if PY2: codec = getcodec(encoding) f = codec.streamwriter(buf, errors=errors) else: f = io.TextIOWrapper(buf, encoding=encoding, errors=errors) # write the data try: if prologue is not None: f.write(prologue) it = iter(table) try: hdr = next(it) except StopIteration: return yield tuple(hdr) flds = list(map(text_type, hdr)) for row in it: rec = asdict(flds, row) s = template.format(**rec) f.write(s) yield row if epilogue is not None: f.write(epilogue) f.flush() finally: if not PY2: f.detach() petl-1.7.15/petl/io/whoosh.py000066400000000000000000000417031457414240700160220ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division import operator from petl.compat import string_types, izip from petl.errors import ArgumentError from petl.util.base import Table, dicts def fromtextindex(index_or_dirname, indexname=None, docnum_field=None): """ Extract all documents from a Whoosh index. E.g.:: >>> import petl as etl >>> import os >>> # set up an index and load some documents via the Whoosh API ... from whoosh.index import create_in >>> from whoosh.fields import * >>> schema = Schema(title=TEXT(stored=True), path=ID(stored=True), ... content=TEXT) >>> dirname = 'example.whoosh' >>> if not os.path.exists(dirname): ... os.mkdir(dirname) ... >>> index = create_in(dirname, schema) >>> writer = index.writer() >>> writer.add_document(title=u"First document", path=u"/a", ... content=u"This is the first document we've added!") >>> writer.add_document(title=u"Second document", path=u"/b", ... content=u"The second one is even more interesting!") >>> writer.commit() >>> # extract documents as a table ... table = etl.fromtextindex(dirname) >>> table +------+-------------------+ | path | title | +======+===================+ | '/a' | 'First document' | +------+-------------------+ | '/b' | 'Second document' | +------+-------------------+ Keyword arguments: index_or_dirname Either an instance of `whoosh.index.Index` or a string containing the directory path where the index is stored. indexname String containing the name of the index, if multiple indexes are stored in the same directory. docnum_field If not None, an extra field will be added to the output table containing the internal document number stored in the index. The name of the field will be the value of this argument. """ return TextIndexView(index_or_dirname, indexname=indexname, docnum_field=docnum_field) class TextIndexView(Table): def __init__(self, index_or_dirname, indexname=None, docnum_field=None): self.index_or_dirname = index_or_dirname self.indexname = indexname self.docnum_field = docnum_field def __iter__(self): return itertextindex(self.index_or_dirname, self.indexname, self.docnum_field) def itertextindex(index_or_dirname, indexname, docnum_field): import whoosh.index if isinstance(index_or_dirname, string_types): dirname = index_or_dirname index = whoosh.index.open_dir(dirname, indexname=indexname, readonly=True) needs_closing = True elif isinstance(index_or_dirname, whoosh.index.Index): index = index_or_dirname needs_closing = False else: raise ArgumentError('expected string or index, found %r' % index_or_dirname) try: if docnum_field is None: # figure out the field names hdr = tuple(index.schema.stored_names()) yield hdr # yield all documents astuple = operator.itemgetter(*index.schema.stored_names()) for _, stored_fields_dict in index.reader().iter_docs(): yield astuple(stored_fields_dict) else: # figure out the field names hdr = (docnum_field,) + tuple(index.schema.stored_names()) yield hdr # yield all documents astuple = operator.itemgetter(*index.schema.stored_names()) for docnum, stored_fields_dict in index.reader().iter_docs(): yield (docnum,) + astuple(stored_fields_dict) except: raise finally: if needs_closing: # close the index if we're the ones who opened it index.close() def totextindex(table, index_or_dirname, schema=None, indexname=None, merge=False, optimize=False): """ Load all rows from `table` into a Whoosh index. N.B., this will clear any existing data in the index before loading. E.g.:: >>> import petl as etl >>> import datetime >>> import os >>> # here is the table we want to load into an index ... table = (('f0', 'f1', 'f2', 'f3', 'f4'), ... ('AAA', 12, 4.3, True, datetime.datetime.now()), ... ('BBB', 6, 3.4, False, datetime.datetime(1900, 1, 31)), ... ('CCC', 42, 7.8, True, datetime.datetime(2100, 12, 25))) >>> # define a schema for the index ... from whoosh.fields import * >>> schema = Schema(f0=TEXT(stored=True), ... f1=NUMERIC(int, stored=True), ... f2=NUMERIC(float, stored=True), ... f3=BOOLEAN(stored=True), ... f4=DATETIME(stored=True)) >>> # load index ... dirname = 'example.whoosh' >>> if not os.path.exists(dirname): ... os.mkdir(dirname) ... >>> etl.totextindex(table, dirname, schema=schema) Keyword arguments: table A table container with the data to be loaded. index_or_dirname Either an instance of `whoosh.index.Index` or a string containing the directory path where the index is to be stored. schema Index schema to use if creating the index. indexname String containing the name of the index, if multiple indexes are stored in the same directory. merge Merge small segments during commit? optimize Merge all segments together? """ import whoosh.index import whoosh.writing # deal with polymorphic argument if isinstance(index_or_dirname, string_types): dirname = index_or_dirname index = whoosh.index.create_in(dirname, schema, indexname=indexname) needs_closing = True elif isinstance(index_or_dirname, whoosh.index.Index): index = index_or_dirname needs_closing = False else: raise ArgumentError('expected string or index, found %r' % index_or_dirname) writer = index.writer() try: for d in dicts(table): writer.add_document(**d) writer.commit(merge=merge, optimize=optimize, mergetype=whoosh.writing.CLEAR) except: writer.cancel() raise finally: if needs_closing: index.close() def appendtextindex(table, index_or_dirname, indexname=None, merge=True, optimize=False): """ Load all rows from `table` into a Whoosh index, adding them to any existing data in the index. Keyword arguments: table A table container with the data to be loaded. index_or_dirname Either an instance of `whoosh.index.Index` or a string containing the directory path where the index is to be stored. indexname String containing the name of the index, if multiple indexes are stored in the same directory. merge Merge small segments during commit? optimize Merge all segments together? """ import whoosh.index # deal with polymorphic argument if isinstance(index_or_dirname, string_types): dirname = index_or_dirname index = whoosh.index.open_dir(dirname, indexname=indexname, readonly=False) needs_closing = True elif isinstance(index_or_dirname, whoosh.index.Index): index = index_or_dirname needs_closing = False else: raise ArgumentError('expected string or index, found %r' % index_or_dirname) writer = index.writer() try: for d in dicts(table): writer.add_document(**d) writer.commit(merge=merge, optimize=optimize) except Exception: writer.cancel() raise finally: if needs_closing: index.close() def searchtextindex(index_or_dirname, query, limit=10, indexname=None, docnum_field=None, score_field=None, fieldboosts=None, search_kwargs=None): """ Search a Whoosh index using a query. E.g.:: >>> import petl as etl >>> import os >>> # set up an index and load some documents via the Whoosh API ... from whoosh.index import create_in >>> from whoosh.fields import * >>> schema = Schema(title=TEXT(stored=True), path=ID(stored=True), ... content=TEXT) >>> dirname = 'example.whoosh' >>> if not os.path.exists(dirname): ... os.mkdir(dirname) ... >>> index = create_in('example.whoosh', schema) >>> writer = index.writer() >>> writer.add_document(title=u"Oranges", path=u"/a", ... content=u"This is the first document we've added!") >>> writer.add_document(title=u"Apples", path=u"/b", ... content=u"The second document is even more " ... u"interesting!") >>> writer.commit() >>> # demonstrate the use of searchtextindex() ... table1 = etl.searchtextindex('example.whoosh', 'oranges') >>> table1 +------+-----------+ | path | title | +======+===========+ | '/a' | 'Oranges' | +------+-----------+ >>> table2 = etl.searchtextindex('example.whoosh', 'doc*') >>> table2 +------+-----------+ | path | title | +======+===========+ | '/a' | 'Oranges' | +------+-----------+ | '/b' | 'Apples' | +------+-----------+ Keyword arguments: index_or_dirname Either an instance of `whoosh.index.Index` or a string containing the directory path where the index is to be stored. query Either a string or an instance of `whoosh.query.Query`. If a string, it will be parsed as a multi-field query, i.e., any terms not bound to a specific field will match **any** field. limit Return at most `limit` results. indexname String containing the name of the index, if multiple indexes are stored in the same directory. docnum_field If not None, an extra field will be added to the output table containing the internal document number stored in the index. The name of the field will be the value of this argument. score_field If not None, an extra field will be added to the output table containing the score of the result. The name of the field will be the value of this argument. fieldboosts An optional dictionary mapping field names to boosts. search_kwargs Any extra keyword arguments to be passed through to the Whoosh `search()` method. """ return SearchTextIndexView(index_or_dirname, query, limit=limit, indexname=indexname, docnum_field=docnum_field, score_field=score_field, fieldboosts=fieldboosts, search_kwargs=search_kwargs) def searchtextindexpage(index_or_dirname, query, pagenum, pagelen=10, indexname=None, docnum_field=None, score_field=None, fieldboosts=None, search_kwargs=None): """ Search an index using a query, returning a result page. Keyword arguments: index_or_dirname Either an instance of `whoosh.index.Index` or a string containing the directory path where the index is to be stored. query Either a string or an instance of `whoosh.query.Query`. If a string, it will be parsed as a multi-field query, i.e., any terms not bound to a specific field will match **any** field. pagenum Number of the page to return (e.g., 1 = first page). pagelen Number of results per page. indexname String containing the name of the index, if multiple indexes are stored in the same directory. docnum_field If not None, an extra field will be added to the output table containing the internal document number stored in the index. The name of the field will be the value of this argument. score_field If not None, an extra field will be added to the output table containing the score of the result. The name of the field will be the value of this argument. fieldboosts An optional dictionary mapping field names to boosts. search_kwargs Any extra keyword arguments to be passed through to the Whoosh `search()` method. """ return SearchTextIndexView(index_or_dirname, query, pagenum=pagenum, pagelen=pagelen, indexname=indexname, docnum_field=docnum_field, score_field=score_field, fieldboosts=fieldboosts, search_kwargs=search_kwargs) class SearchTextIndexView(Table): def __init__(self, index_or_dirname, query, limit=None, pagenum=None, pagelen=None, indexname=None, docnum_field=None, score_field=None, fieldboosts=None, search_kwargs=None): self._index_or_dirname = index_or_dirname self._query = query self._limit = limit self._pagenum = pagenum self._pagelen = pagelen self._indexname = indexname self._docnum_field = docnum_field self._score_field = score_field self._fieldboosts = fieldboosts self._search_kwargs = search_kwargs def __iter__(self): return itersearchindex(self._index_or_dirname, self._query, self._limit, self._pagenum, self._pagelen, self._indexname, self._docnum_field, self._score_field, self._fieldboosts, self._search_kwargs) def itersearchindex(index_or_dirname, query, limit, pagenum, pagelen, indexname, docnum_field, score_field, fieldboosts, search_kwargs): import whoosh.index import whoosh.query import whoosh.qparser if not search_kwargs: search_kwargs = dict() if isinstance(index_or_dirname, string_types): dirname = index_or_dirname index = whoosh.index.open_dir(dirname, indexname=indexname, readonly=True) needs_closing = True elif isinstance(index_or_dirname, whoosh.index.Index): index = index_or_dirname needs_closing = False else: raise ArgumentError('expected string or index, found %r' % index_or_dirname) try: # figure out header hdr = tuple() if docnum_field is not None: hdr += (docnum_field,) if score_field is not None: hdr += (score_field,) stored_names = tuple(index.schema.stored_names()) hdr += stored_names yield hdr # parse the query if isinstance(query, string_types): # search all fields by default parser = whoosh.qparser.MultifieldParser( index.schema.names(), index.schema, fieldboosts=fieldboosts ) query = parser.parse(query) elif isinstance(query, whoosh.query.Query): pass else: raise ArgumentError( 'expected string or whoosh.query.Query, found %r' % query ) # make a function to turn docs into tuples astuple = operator.itemgetter(*index.schema.stored_names()) with index.searcher() as searcher: if limit is not None: results = searcher.search(query, limit=limit, **search_kwargs) else: results = searcher.search_page(query, pagenum, pagelen=pagelen, **search_kwargs) if docnum_field is None and score_field is None: for doc in results: yield astuple(doc) else: for (docnum, score), doc in izip(results.items(), results): row = tuple() if docnum_field is not None: row += (docnum,) if score_field is not None: row += (score,) row += astuple(doc) yield row except: raise finally: if needs_closing: # close the index if we're the ones who opened it index.close() # TODO guess schemapetl-1.7.15/petl/io/xls.py000066400000000000000000000067241457414240700153250ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import division, print_function, absolute_import import locale from petl.compat import izip_longest, next, xrange, BytesIO from petl.util.base import Table from petl.io.sources import read_source_from_arg, write_source_from_arg def fromxls(filename, sheet=None, use_view=True, **kwargs): """ Extract a table from a sheet in an Excel .xls file. Sheet is identified by its name or index number. N.B., the sheet name is case sensitive. """ return XLSView(filename, sheet=sheet, use_view=use_view, **kwargs) class XLSView(Table): def __init__(self, filename, sheet=None, use_view=True, **kwargs): self.filename = filename self.sheet = sheet self.use_view = use_view self.kwargs = kwargs def __iter__(self): # prefer implementation using xlutils.view as dates are automatically # converted if self.use_view: from petl.io import xlutils_view source = read_source_from_arg(self.filename) with source.open('rb') as source2: source3 = source2.read() wb = xlutils_view.View(source3, **self.kwargs) if self.sheet is None: ws = wb[0] else: ws = wb[self.sheet] for row in ws: yield tuple(row) else: import xlrd source = read_source_from_arg(self.filename) with source.open('rb') as source2: source3 = source2.read() with xlrd.open_workbook(file_contents=source3, on_demand=True, **self.kwargs) as wb: if self.sheet is None: ws = wb.sheet_by_index(0) elif isinstance(self.sheet, int): ws = wb.sheet_by_index(self.sheet) else: ws = wb.sheet_by_name(str(self.sheet)) for rownum in xrange(ws.nrows): yield tuple(ws.row_values(rownum)) def toxls(tbl, filename, sheet, encoding=None, style_compression=0, styles=None): """ Write a table to a new Excel .xls file. """ import xlwt if encoding is None: encoding = locale.getpreferredencoding() wb = xlwt.Workbook(encoding=encoding, style_compression=style_compression) ws = wb.add_sheet(sheet) if styles is None: # simple version, don't worry about styles for r, row in enumerate(tbl): for c, v in enumerate(row): ws.write(r, c, label=v) else: # handle styles it = iter(tbl) try: hdr = next(it) flds = list(map(str, hdr)) for c, f in enumerate(flds): ws.write(0, c, label=f) if f not in styles or styles[f] is None: styles[f] = xlwt.Style.default_style except StopIteration: pass # no header written # convert to list for easy zipping styles = [styles[f] for f in flds] for r, row in enumerate(it): for c, (v, style) in enumerate(izip_longest(row, styles, fillvalue=None)): ws.write(r+1, c, label=v, style=style) target = write_source_from_arg(filename) with target.open('wb') as target2: wb.save(target2) Table.toxls = toxls petl-1.7.15/petl/io/xlsx.py000066400000000000000000000153411457414240700155100ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division import itertools from petl.compat import PY3, text_type from petl.util.base import Table, data from petl.io.sources import read_source_from_arg, write_source_from_arg def fromxlsx(filename, sheet=None, range_string=None, min_row=None, min_col=None, max_row=None, max_col=None, read_only=False, **kwargs): """ Extract a table from a sheet in an Excel .xlsx file. N.B., the sheet name is case sensitive. The `sheet` argument can be omitted, in which case the first sheet in the workbook is used by default. The `range_string` argument can be used to provide a range string specifying a range of cells to extract. The `min_row`, `min_col`, `max_row` and `max_col` arguments can be used to limit the range of cells to extract. They will be ignored if `range_string` is provided. The `read_only` argument determines how openpyxl returns the loaded workbook. Default is `False` as it prevents some LibreOffice files from getting truncated at 65536 rows. `True` should be faster if the file use is read-only and the files are made with Microsoft Excel. Any other keyword arguments are passed through to :func:`openpyxl.load_workbook()`. """ return XLSXView(filename, sheet=sheet, range_string=range_string, min_row=min_row, min_col=min_col, max_row=max_row, max_col=max_col, read_only=read_only, **kwargs) class XLSXView(Table): def __init__(self, filename, sheet=None, range_string=None, min_row=None, min_col=None, max_row=None, max_col=None, read_only=False, **kwargs): self.filename = filename self.sheet = sheet self.range_string = range_string self.min_row = min_row self.min_col = min_col self.max_row = max_row self.max_col = max_col self.read_only = read_only self.kwargs = kwargs def __iter__(self): import openpyxl source = read_source_from_arg(self.filename) with source.open('rb') as source2: wb = openpyxl.load_workbook(filename=source2, read_only=self.read_only, **self.kwargs) if self.sheet is None: ws = wb[wb.sheetnames[0]] elif isinstance(self.sheet, int): ws = wb[wb.sheetnames[self.sheet]] else: ws = wb[str(self.sheet)] if self.range_string is not None: rows = ws[self.range_string] else: rows = ws.iter_rows(min_row=self.min_row, min_col=self.min_col, max_row=self.max_row, max_col=self.max_col) for row in rows: yield tuple(cell.value for cell in row) try: wb._archive.close() except AttributeError: # just here in case openpyxl stops exposing an _archive property. pass def toxlsx(tbl, filename, sheet=None, write_header=True, mode="replace"): """ Write a table to a new Excel .xlsx file. N.B., the sheet name is case sensitive. The `mode` argument controls how the file and sheet are treated: - `replace`: This is the default. It either replaces or adds a named sheet, or if no sheet name is provided, all sheets (overwrites the entire file). - `overwrite`: Always overwrites the file. This produces a file with a single sheet. - `add`: Adds a new sheet. Raises `ValueError` if a named sheet already exists. The `sheet` argument can be omitted in all cases. The new sheet will then get a default name. If the file does not exist, it will be created, unless `replace` mode is used with a named sheet. In the latter case, the file must exist and be a valid .xlsx file. """ wb = _load_or_create_workbook(filename, mode, sheet) ws = _insert_sheet_on_workbook(mode, sheet, wb) if write_header: it = iter(tbl) try: hdr = next(it) flds = list(map(text_type, hdr)) rows = itertools.chain([flds], it) except StopIteration: rows = it else: rows = data(tbl) for row in rows: ws.append(row) target = write_source_from_arg(filename) with target.open('wb') as target2: wb.save(target2) def _load_or_create_workbook(filename, mode, sheet): if PY3: FileNotFound = FileNotFoundError else: FileNotFound = IOError import openpyxl wb = None if not (mode == "overwrite" or (mode == "replace" and sheet is None)): try: source = read_source_from_arg(filename) with source.open('rb') as source2: wb = openpyxl.load_workbook(filename=source2, read_only=False) except FileNotFound: wb = None if wb is None: wb = openpyxl.Workbook(write_only=True) return wb def _insert_sheet_on_workbook(mode, sheet, wb): if mode == "replace": try: ws = wb[str(sheet)] ws.delete_rows(1, ws.max_row) except KeyError: ws = wb.create_sheet(title=sheet) elif mode == "add": ws = wb.create_sheet(title=sheet) # it creates a sheet named "foo1" if "foo" exists. if sheet is not None and ws.title != sheet: raise ValueError("Sheet %s already exists in file" % sheet) elif mode == "overwrite": ws = wb.create_sheet(title=sheet) else: raise ValueError("Unknown mode '%s'" % mode) return ws Table.toxlsx = toxlsx def appendxlsx(tbl, filename, sheet=None, write_header=False): """ Appends rows to an existing Excel .xlsx file. """ import openpyxl source = read_source_from_arg(filename) with source.open('rb') as source2: wb = openpyxl.load_workbook(filename=source2, read_only=False) if sheet is None: ws = wb[wb.sheetnames[0]] elif isinstance(sheet, int): ws = wb[wb.sheetnames[sheet]] else: ws = wb[str(sheet)] if write_header: it = iter(tbl) try: hdr = next(it) flds = list(map(text_type, hdr)) rows = itertools.chain([flds], it) except StopIteration: rows = it else: rows = data(tbl) for row in rows: ws.append(row) target = write_source_from_arg(filename) with target.open('wb') as target2: wb.save(target2) Table.appendxlsx = appendxlsx petl-1.7.15/petl/io/xlutils_view.py000066400000000000000000000104771457414240700172550ustar00rootroot00000000000000# -*- coding: utf-8 -*- # Copyright (c) 2013 Simplistix Ltd # # This Software is released under the MIT License: # http://www.opensource.org/licenses/mit-license.html # See license.txt for more details. from datetime import datetime, time from petl.compat import xrange class Index(object): def __init__(self, name): self.name = name class Row(Index): """ A one-based, end-inclusive row index for use in slices, eg:: ``[Row(1):Row(2), :]`` """ def __index__(self): return int(self.name) - 1 class Col(Index): """ An end-inclusive column label index for use in slices, eg: ``[:, Col('A'), Col('B')]`` """ def __index__(self): from xlwt.Utils import col_by_name return col_by_name(self.name) class SheetView(object): """ A view on a sheet in a workbook. Should be created by indexing a :class:`View`. These can be sliced to create smaller views. Views can be iterated over to return a set of iterables, one for each row in the view. Data is returned as in the cell values with the exception of dates and times which are converted into :class:`~datetime.datetime` instances. """ def __init__(self, book, sheet, row_slice=None, col_slice=None): #: The workbook used by this view. self.book = book #: The sheet in the workbook used by this view. self.sheet = sheet for name, source in (('rows', row_slice), ('cols', col_slice)): start = 0 stop = max_n = getattr(self.sheet, 'n'+name) if isinstance(source, slice): if source.start is not None: start_val = source.start if isinstance(start_val, Index): start_val = start_val.__index__() if start_val < 0: start = max(0, max_n + start_val) elif start_val > 0: start = min(max_n, start_val) if source.stop is not None: stop_val = source.stop if isinstance(stop_val, Index): stop_val = stop_val.__index__() + 1 if stop_val < 0: stop = max(0, max_n + stop_val) elif stop_val > 0: stop = min(max_n, stop_val) setattr(self, name, xrange(start, stop)) def __row(self, rowx): from xlrd import XL_CELL_DATE, xldate_as_tuple for colx in self.cols: value = self.sheet.cell_value(rowx, colx) if self.sheet.cell_type(rowx, colx) == XL_CELL_DATE: date_parts = xldate_as_tuple(value, self.book.datemode) # Times come out with a year of 0. if date_parts[0]: value = datetime(*date_parts) else: value = time(*date_parts[3:]) yield value def __iter__(self): for rowx in self.rows: yield self.__row(rowx) def __getitem__(self, slices): assert isinstance(slices, tuple) assert len(slices) == 2 return self.__class__(self.book, self.sheet, *slices) class View(object): """ A view wrapper around a :class:`~xlrd.Book` that allows for easy iteration over the data in a group of cells. :param path: The path of the .xls from which to create views. :param class_: An class to use instead of :class:`SheetView` for views of sheets. """ #: This can be replaced in a sub-class to use something other than #: :class:`SheetView` for the views of sheets returned. class_ = SheetView def __init__(self, file_contents, class_=None, **kwargs): self.class_ = class_ or self.class_ from xlrd import open_workbook self.book = open_workbook(file_contents=file_contents, on_demand=True, **kwargs) def __getitem__(self, item): """ Returns of a view of a sheet in the workbook this view is created for. :param item: either zero-based integer index or a sheet name. """ if isinstance(item, int): sheet = self.book.sheet_by_index(item) else: sheet = self.book.sheet_by_name(item) return self.class_(self.book, sheet) petl-1.7.15/petl/io/xml.py000066400000000000000000000364711457414240700153210ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division # standard library dependencies try: # prefer lxml as it supports XPath from lxml import etree except ImportError: import xml.etree.ElementTree as etree from operator import attrgetter import itertools from petl.compat import string_types, text_type # internal dependencies from petl.util.base import Table, fieldnames, iterpeek from petl.io.sources import read_source_from_arg from petl.io.text import totext def fromxml(source, *args, **kwargs): """ Extract data from an XML file. E.g.:: >>> import petl as etl >>> # setup a file to demonstrate with ... d = ''' ... ... ... ... ... ... ... ... ... ... ... ... ...
foobar
a1
b2
c2
''' >>> with open('example.file1.xml', 'w') as f: ... f.write(d) ... 212 >>> table1 = etl.fromxml('example.file1.xml', 'tr', 'td') >>> table1 +-----+-----+ | foo | bar | +=====+=====+ | 'a' | '1' | +-----+-----+ | 'b' | '2' | +-----+-----+ | 'c' | '2' | +-----+-----+ If the data values are stored in an attribute, provide the attribute name as an extra positional argument:: >>> d = ''' ... ... ... ... ... ... ... ... ...
...
...
...
...
''' >>> with open('example.file2.xml', 'w') as f: ... f.write(d) ... 220 >>> table2 = etl.fromxml('example.file2.xml', 'tr', 'td', 'v') >>> table2 +-----+-----+ | foo | bar | +=====+=====+ | 'a' | '1' | +-----+-----+ | 'b' | '2' | +-----+-----+ | 'c' | '2' | +-----+-----+ Data values can also be extracted by providing a mapping of field names to element paths:: >>> d = ''' ... ... a ... ... ... b ... ... ... c ... ...
''' >>> with open('example.file3.xml', 'w') as f: ... f.write(d) ... 223 >>> table3 = etl.fromxml('example.file3.xml', 'row', ... {'foo': 'foo', 'bar': ('baz/bar', 'v')}) >>> table3 +------------+-----+ | bar | foo | +============+=====+ | ('1', '3') | 'a' | +------------+-----+ | '2' | 'b' | +------------+-----+ | '2' | 'c' | +------------+-----+ If `lxml `_ is installed, full XPath expressions can be used. Note that the implementation is currently **not** streaming, i.e., the whole document is loaded into memory. If multiple elements match a given field, all values are reported as a tuple. If there is more than one element name used for row values, a tuple or list of paths can be provided, e.g., ``fromxml('example.file.html', './/tr', ('th', 'td'))``. Optionally a custom parser can be provided, e.g.:: >>> from lxml import etree # doctest: +SKIP ... my_parser = etree.XMLParser(resolve_entities=False) # doctest: +SKIP ... table4 = etl.fromxml('example.file1.xml', 'tr', 'td', parser=my_parser) # doctest: +SKIP """ source = read_source_from_arg(source) return XmlView(source, *args, **kwargs) class XmlView(Table): def __init__(self, source, *args, **kwargs): self.source = source self.args = args if len(args) == 2 and isinstance(args[1], (string_types, tuple, list)): self.rmatch = args[0] self.vmatch = args[1] self.vdict = None self.attr = None elif len(args) == 2 and isinstance(args[1], dict): self.rmatch = args[0] self.vmatch = None self.vdict = args[1] self.attr = None elif len(args) == 3: self.rmatch = args[0] self.vmatch = args[1] self.vdict = None self.attr = args[2] else: assert False, 'bad parameters' self.missing = kwargs.get('missing', None) self.user_parser = kwargs.get('parser', None) def __iter__(self): vmatch = self.vmatch vdict = self.vdict with self.source.open('rb') as xmlf: parser2 = _create_xml_parser(self.user_parser) tree = etree.parse(xmlf, parser=parser2) if not hasattr(tree, 'iterfind'): # Python 2.6 compatibility tree.iterfind = tree.findall if vmatch is not None: # simple case, all value paths are the same for rowelm in tree.iterfind(self.rmatch): if self.attr is None: getv = attrgetter('text') else: getv = lambda e: e.get(self.attr) if isinstance(vmatch, string_types): # match only one path velms = rowelm.findall(vmatch) else: # match multiple paths velms = itertools.chain(*[rowelm.findall(enm) for enm in vmatch]) yield tuple(getv(velm) for velm in velms) else: # difficult case, deal with different paths for each field # determine output header flds = tuple(sorted(map(text_type, vdict.keys()))) yield flds # setup value getters vmatches = dict() vgetters = dict() for f in flds: vmatch = self.vdict[f] if isinstance(vmatch, string_types): # match element path vmatches[f] = vmatch vgetters[f] = element_text_getter(self.missing) else: # match element path and attribute name vmatches[f] = vmatch[0] attr = vmatch[1] vgetters[f] = attribute_text_getter(attr, self.missing) # determine data rows for rowelm in tree.iterfind(self.rmatch): yield tuple(vgetters[f](rowelm.findall(vmatches[f])) for f in flds) def _create_xml_parser(user_parser): if user_parser is not None: return user_parser try: # Default lxml parser. # This will throw an error if parser is not set and lxml could not be imported # because Python's built XML parser doesn't like the `resolve_entities` kwarg. # return etree.XMLParser(resolve_entities=False) return etree.XMLParser(resolve_entities=False) except TypeError: # lxml not available return None def element_text_getter(missing): def _get(v): if len(v) > 1: return tuple(e.text for e in v) elif len(v) == 1: return v[0].text else: return missing return _get def attribute_text_getter(attr, missing): def _get(v): if len(v) > 1: return tuple(e.get(attr) for e in v) elif len(v) == 1: return v[0].get(attr) else: return missing return _get def toxml(table, target=None, root=None, head=None, rows=None, prologue=None, epilogue=None, style='tag', encoding='utf-8'): """ Write the table into a new xml file according to elements defined in the function arguments. The `root`, `head` and `rows` (string, optional) arguments define the tags and the nesting of the xml file. Each one defines xml elements with tags separated by slashes (`/`) like in `root/level/tag`. They can have a arbitrary number of tags that will reflect in more nesting levels for the header or record/row written in the xml file. For details on tag naming and nesting rules check xml `specification`_ or xml `references`_. The `rows` argument define the elements for each row of data to be written in the xml file. When specified, it must have at least 2 tags for defining the tags for `row/column`. Additional tags will add nesting enclosing all records/rows/lines. The `head` argument is similar to the rows, but aplies only to one line/row of header with fieldnames. When specified, it must have at least 2 tags for `fields/name` and the remaining will increase nesting. The `root` argument defines the elements enclosing `head` and `rows` and is required when using `head` for specifying valid xml documents. When none of this arguments are specified, they will default to tags that generate output similar to a html table: `root='table', head='there/tr/td', rows='tbody/tr/td'`. The `prologue` argument (string, optional) could be a snippet of valid xml that will be inserted before other elements in the xml. It can optionally specify the `XML Prolog` of the file. The `epilogue` argument (string, optional) could be a snippet of valid xml that will be inserted after all other xml elements except the root closing tag. It must specify a closing tag if the `root` argument is not specified. The `style` argument select the format of the elements in the xml file. It can be `tag` (default), `name`, `attribute` or a custom string to format each row via `str.format `_. Example usage for writing files:: >>> import petl as etl >>> table1 = [['foo', 'bar'], ... ['a', 1], ... ['b', 2]] >>> etl.toxml(table1, 'example.file4.xml') >>> # see what we did is similar a html table: >>> print(open('example.file4.xml').read())
foobar
a1
b2
>>> # define the nesting in xml file: >>> etl.toxml(table1, 'example.file5.xml', rows='plan/line/cell') >>> print(open('example.file5.xml').read()) a1 b2 >>> # choose other style: >>> etl.toxml(table1, 'example.file6.xml', rows='row/col', style='attribute') >>> print(open('example.file6.xml').read()) >>> etl.toxml(table1, 'example.file6.xml', rows='row/col', style='name') >>> print(open('example.file6.xml').read()) a1 b2 The `toxml()` function is just a wrapper over :func:`petl.io.text.totext`. For advanced cases use a template with `totext()` for generating xml files. .. versionadded:: 1.7.0 .. _specification: https://www.w3.org/TR/xml/ .. _references: https://www.w3schools.com/xml/xml_syntax.asp """ if not root and not head and not rows: root = 'table' head = 'thead/tr/th' rows = 'tbody/tr/td' sample, table2 = iterpeek(table, 2) props = fieldnames(sample) top = _build_xml_header(style, props, root, head, rows, prologue, encoding) template = _build_cols(style, props, rows, True) bottom = _build_xml_footer(style, epilogue, rows, root) totext(table2, source=target, encoding=encoding, errors='strict', template=template, prologue=top, epilogue=bottom) def _build_xml_header(style, props, root, head, rows, prologue, encoding): tab = _build_nesting(root, False, None) if root else '' nested = -1 if style in ('attribute', 'name') else -2 if head: th1 = _build_nesting(head, False, nested) col = _build_cols(style, props, head, False) th2 = _build_nesting(head, True, nested) thd = '{0}\n{1}{2}'.format(th1, col, th2) else: thd = '' tbd = _build_nesting(rows, False, nested) if prologue and prologue.startswith('' % enc pre = prologue + '\n' if prologue and not root else '' pos = '\n' + prologue if prologue and root else '' res = '{0}\n{1}{2}{3}{4}{5}\n'.format(xml, pre, tab, thd, tbd, pos) return res def _build_xml_footer(style, epilogue, rows, root): nested = -1 if style in ('attribute', 'name') else -2 tbd = _build_nesting(rows, True, nested) tab = _build_nesting(root, True, 0) pre = epilogue + '\n' if epilogue and root else '' pos = '\n' + epilogue if epilogue and not root else '' return pre + tbd + tab + pos def _build_nesting(path, closing, index): if not path: return '' fmt = '' if closing else '<%s>' if '/' not in path: return fmt % path parts = path.split('/') elements = parts[0:index] if index else parts if closing: elements.reverse() tags = [fmt % e for e in elements] return ''.join(tags) def _build_cols(style, props, path, is_value): is_header = not is_value if style == 'tag' or is_header: return _build_cols_inline(props, path, is_value, True) if style == 'name': return _build_cols_inline(props, path, is_value, False) if style == 'attribute': return _build_cols_attribs(props, path) return style # custom def _build_cols_inline(props, path, is_value, use_tag): parts = path.split('/') if use_tag: if len(parts) < 2: raise ValueError("Tag not in format 'row/col': %s" % path) col = parts[-1] row = parts[-2:-1][0] else: col = '{0}' row = parts[-1] fld = '{{{0}}}' if is_value else '{0}' fmt = '<{0}>{1}'.format(col, fld) cols = [fmt.format(e) for e in props] tags = ''.join(cols) res = ' <{0}>{1}\n'.format(row, tags) return res def _build_cols_attribs(props, path): parts = path.split('/') row = parts[-1] fmt = '{0}="{{{0}}}"' cols = [fmt.format(e) for e in props] atts = ' '.join(cols) res = ' <{0} {1} />\n'.format(row, atts) return res petl-1.7.15/petl/test/000077500000000000000000000000001457414240700145045ustar00rootroot00000000000000petl-1.7.15/petl/test/__init__.py000066400000000000000000000000001457414240700166030ustar00rootroot00000000000000petl-1.7.15/petl/test/conftest.py000066400000000000000000000003031457414240700166770ustar00rootroot00000000000000import logging def pytest_configure(): org = logging.Logger.debug def debug(self, msg, *args, **kwargs): org(self, str(msg), *args, **kwargs) logging.Logger.debug = debug petl-1.7.15/petl/test/failonerror.py000066400000000000000000000067771457414240700174210ustar00rootroot00000000000000import pytest from petl.test.helpers import ieq, eq_ import petl.config as config def assert_failonerror(input_fn, expected_output): """In the input rows, the first row should process through the transformation cleanly. The second row should generate an exception. There are no requirements for any other rows.""" #========================================================= # Test function parameters with default config settings #========================================================= # test the default config setting: failonerror == False eq_(config.failonerror, False) # By default, a bad conversion does not raise an exception, and # values for the failed conversion are returned as None table2 = input_fn() ieq(expected_output, table2) ieq(expected_output, table2) # When called with failonerror is False or None, a bad conversion # does not raise an exception, and values for the failed conversion # are returned as None table3 = input_fn(failonerror=False) ieq(expected_output, table3) ieq(expected_output, table3) table3 = input_fn(failonerror=None) ieq(expected_output, table3) ieq(expected_output, table3) # When called with failonerror=True, a bad conversion raises an # exception with pytest.raises(Exception): table4 = input_fn(failonerror=True) table4.nrows() # When called with failonerror='inline', a bad conversion # does not raise an exception, and an Exception for the failed # conversion is returned in the result. expect5 = expected_output[0], expected_output[1] table5 = input_fn(failonerror='inline') ieq(expect5, table5.head(1)) ieq(expect5, table5.head(1)) excp = table5[2][0] assert isinstance(excp, Exception) #========================================================= # Test config settings #========================================================= # Save config setting saved_config_failonerror = config.failonerror # When config.failonerror == True, a bad conversion raises an # exception config.failonerror = True with pytest.raises(Exception): table6 = input_fn() table6.nrows() # When config.failonerror == 'inline', a bad conversion # does not raise an exception, and an Exception for the failed # conversion is returned in the result. expect7 = expected_output[0], expected_output[1] config.failonerror = 'inline' table7 = input_fn() ieq(expect7, table7.head(1)) ieq(expect7, table7.head(1)) excp = table7[2][0] assert isinstance(excp, Exception) # When config.failonerror is an invalid value, but still truthy, it # behaves the same as if == True config.failonerror = 'invalid' with pytest.raises(Exception): table8 = input_fn() table8.nrows() # When config.failonerror is None, it behaves the same as if # config.failonerror is False config.failonerror = None table9 = input_fn() ieq(expected_output, table9) ieq(expected_output, table9) # A False keyword parameter overrides config.failonerror == True config.failonerror = True table10 = input_fn(failonerror=False) ieq(expected_output, table10) ieq(expected_output, table10) # A None keyword parameter uses config.failonerror == True config.failonerror = True with pytest.raises(Exception): table11 = input_fn(failonerror=None) table11.nrows() # restore config setting config.failonerror = saved_config_failonerror petl-1.7.15/petl/test/helpers.py000066400000000000000000000047341457414240700165300ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division import os import sys import pytest from petl.compat import izip_longest def eq_(expect, actual, msg=None): """Test when two values from a python variable are exactly equals (==)""" assert expect == actual, msg or ('%r != %s' % (expect, actual)) def assert_almost_equal(first, second, places=None, msg=None): """Test when the values are aproximatedly equals by a places exponent""" vabs = None if places is None else 10 ** (- places) assert pytest.approx(first, second, abs=vabs), msg def ieq(expect, actual, cast=None): """Test when values of a iterable are equals for each row and column""" ie = iter(expect) ia = iter(actual) ir = 0 for re, ra in izip_longest(ie, ia, fillvalue=None): if cast: ra = cast(ra) if re is None and ra is None: continue if type(re) in (int, float, bool, str): eq_(re, ra) continue _ieq_row(re, ra, ir) ir = ir + 1 def _ieq_row(re, ra, ir): assert ra is not None, "Expected row #%d is None, but result row is not None" % ir assert re is not None, "Expected row #%d is not None, but result row is None" % ir ic = 0 for ve, va in izip_longest(re, ra, fillvalue=None): if isinstance(ve, list): for je, ja in izip_longest(ve, va, fillvalue=None): _ieq_col(je, ja, re, ra, ir, ic) elif not isinstance(ve, dict): _ieq_col(ve, va, re, ra, ir, ic) ic = ic + 1 def _ieq_col(ve, va, re, ra, ir, ic): """Print two values when they aren't exactly equals (==)""" try: eq_(ve, va) except AssertionError as ea: # Show the values but only when they differ print('\nrow #%d' % ir, re, ' != ', ra, file=sys.stderr) print('col #%d: ' % ic, ve, ' != ', va, file=sys.stderr) raise ea def ieq2(expect, actual, cast=None): """Test when iterables values are equals twice looking for side effects""" ieq(expect, actual, cast) ieq(expect, actual, cast) def get_env_vars_named(prefix, remove_prefix=True): """Get all named variables starting with prefix""" res = {} varlen = len(prefix) for varname, varvalue in os.environ.items(): if varname.upper().startswith(prefix.upper()): if remove_prefix: varname = varname[varlen:] res[varname] = varvalue if len(res) == 0: return None return res petl-1.7.15/petl/test/io/000077500000000000000000000000001457414240700151135ustar00rootroot00000000000000petl-1.7.15/petl/test/io/__init__.py000066400000000000000000000001011457414240700172140ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division petl-1.7.15/petl/test/io/test_avro.py000066400000000000000000000262531457414240700175030ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division import math from datetime import datetime, date from decimal import Decimal from tempfile import NamedTemporaryFile import pytest from petl.compat import PY3 from petl.transform.basics import cat from petl.util.base import dicts from petl.util.vis import look from petl.test.helpers import ieq from petl.io.avro import fromavro, toavro, appendavro from petl.test.io.test_avro_schemas import schema0, schema1, schema2, \ schema3, schema4, schema5, schema6 if PY3: from datetime import timezone try: import fastavro # import fastavro dependencies import pytz except ImportError as e: pytest.skip('SKIP avro tests: %s' % e, allow_module_level=True) else: # region Test Cases def test_fromavro11(): _read_from_mavro_file(table1, schema1) def test_fromavro22(): _read_from_mavro_file(table2, schema2) def test_fromavro33(): _read_from_mavro_file(table3, schema3) def test_toavro11(): _write_to_avro_file(table1, schema1) def test_toavro22(): _write_to_avro_file(table2, schema2) def test_toavro33(): _write_to_avro_file(table3, schema3) def test_toavro10(): _write_to_avro_file(table1, None) def test_toavro13(): _write_to_avro_file(table01, schema0, table10) def test_toavro20(): _write_to_avro_file(table2, None) def test_toavro30(): _write_to_avro_file(table3, None) def test_toavro44(): _write_to_avro_file(table4, schema4) def test_toavro55(): _write_to_avro_file(table5, schema5) def test_toavro50(): _write_to_avro_file(table5, None) def test_toavro70(): _write_to_avro_file(table71, None) def test_toavro80(): _write_to_avro_file(table8, None) def test_toavro90(): _write_to_avro_file(table9, None) def test_toavro61(): _write_to_avro_file(table61, schema6, print_tables=False) def test_toavro62(): _write_to_avro_file(table62, schema6, print_tables=False) def test_toavro63(): _write_to_avro_file(table63, schema6, print_tables=False) def test_toavro60(): _write_to_avro_file(table60, schema6, print_tables=False) def test_appendavro11(): _append_to_avro_file(table11, table12, schema1, table1) def test_appendavro22(): _append_to_avro_file(table21, table22, schema2, table2) def test_appendavro10(): _append_to_avro_file(table11, table12, schema1) def test_toavro_troubleshooting10(): nullable_schema = dict(schema0) schema_fields = nullable_schema['fields'] for field in schema_fields: field['type'] = ['null', 'string'] try: _write_temp_avro_file(table1, nullable_schema) except ValueError as vex: bob = "%s" % vex assert 'Bob' in bob return assert False, 'Failed schema conversion' def test_toavro_troubleshooting11(): table0 = list(table1) table0[3][1] = None try: _write_temp_avro_file(table0, schema1) except TypeError as tex: joe = "%s" % tex assert 'Joe' in joe return assert False, 'Failed schema conversion' # endregion # region Execution def _read_from_mavro_file(test_rows, test_schema, test_expect=None, print_tables=True): _show__expect_rows(test_rows, print_tables) test_filename = _create_avro_example(test_schema, test_rows) test_actual = fromavro(test_filename) test_expect2 = test_rows if test_expect is None else test_expect _assert_rows_are_equals(test_expect2, test_actual, print_tables) return test_filename def _write_temp_avro_file(test_rows, test_schema): test_filename = _get_tempfile_path() print("Writing avro file:", test_filename) toavro(test_rows, test_filename, schema=test_schema) return test_filename def _write_to_avro_file(test_rows, test_schema, test_expect=None, print_tables=True): _show__expect_rows(test_rows, print_tables) test_filename = _write_temp_avro_file(test_rows, test_schema) test_actual = fromavro(test_filename) test_expect2 = test_rows if test_expect is None else test_expect _assert_rows_are_equals(test_expect2, test_actual, print_tables) def _append_to_avro_file(test_rows1, test_rows2, test_schema, test_expect=None, print_tables=True): _show__expect_rows(test_rows1, print_tables) _show__expect_rows(test_rows2, print_tables) test_filename = _get_tempfile_path() toavro(test_rows1, test_filename, schema=test_schema) appendavro(test_rows2, test_filename, schema=test_schema) test_actual = fromavro(test_filename) if test_expect is not None: test_expect2 = test_expect else: test_expect2 = cat(test_rows1, test_rows2) _assert_rows_are_equals(test_expect2, test_actual, print_tables) # endregion # region Helpers def _assert_rows_are_equals(test_expect, test_actual, print_tables=True): if print_tables: _show__rows_from('Actual:', test_actual) avro_schema = test_actual.get_avro_schema() print('\nSchema:\n', avro_schema) ieq(test_expect, test_actual) ieq(test_expect, test_actual) # verify can iterate twice def _show__expect_rows(test_rows, print_tables=True, limit=0): if print_tables: _show__rows_from('\nExpected:', test_rows, limit) def _show__rows_from(label, test_rows, limit=0): print(label) print(look(test_rows, limit=limit)) def _decs(float_value, rounding=12): return Decimal(str(round(float_value, rounding))) def _utc(year, month, day, hour=0, minute=0, second=0, microsecond=0): u = datetime(year, month, day, hour, minute, second, microsecond) if PY3: return u.replace(tzinfo=timezone.utc) return u.replace(tzinfo=pytz.utc) def _get_tempfile_path(delete_on_close=False): f = NamedTemporaryFile(delete=delete_on_close, mode='r') test_filename = f.name f.close() return test_filename def _create_avro_example(test_schema, test_table): parsed_schema = fastavro.parse_schema(test_schema) rows = dicts(test_table) with NamedTemporaryFile(delete=False, mode='wb') as fo: fastavro.writer(fo, parsed_schema, rows) return fo.name # endregion # region Mockup data header1 = [u'name', u'friends', u'age'] rows1 = [[u'Bob', 42, 33], [u'Jim', 13, 69], [u'Joe', 86, 17], [u'Ted', 23, 51]] table1 = [header1] + rows1 table11 = [header1] + rows1[0:2] table12 = [header1] + rows1[2:] table01 = [header1[0:2]] + [item[0:2] for item in rows1] table10 = [header1] + [item[0:2] + [None] for item in rows1] table2 = [[u'name', u'age', u'birthday', u'death', u'insurance', u'deny'], [u'pete', 17, date(2012, 10, 11), _utc(2018, 10, 14, 15, 16, 17, 18000), Decimal('1.100'), False], [u'mike', 27, date(2002, 11, 12), _utc(2015, 12, 13, 14, 15, 16, 17000), Decimal('1.010'), False], [u'zack', 37, date(1992, 12, 13), _utc(2010, 11, 12, 13, 14, 15, 16000), Decimal('123.456'), True], [u'gene', 47, date(1982, 12, 25), _utc(2009, 10, 11, 12, 13, 14, 15000), Decimal('-1.010'), False]] table21 = table2[0:3] table22 = [table2[0]] + table2[3:] table3 = [[u'name', u'age', u'birthday', u'death'], [u'pete', 17, date(2012, 10, 11), _utc(2018, 10, 14, 15, 16, 17, 18000)], [u'mike', 27, date(2002, 11, 12), _utc(2015, 12, 13, 14, 15, 16, 17000)], [u'zack', 37, date(1992, 12, 13), _utc(2010, 11, 12, 13, 14, 15, 16000)], [u'gene', 47, date(1982, 12, 25), _utc(2009, 10, 11, 12, 13, 14, 15000)]] table4 = [[u'name', u'friends', u'age', u'birthday'], [u'Bob', 42, 33, date(2012, 10, 11)], [u'Jim', 13, 69, None], [None, 86, 17, date(1992, 12, 13)], [u'Ted', 23, None, date(1982, 12, 25)]] table5 = [[u'palette', u'colors'], [u'red', [u'red', u'salmon', u'crimson', u'firebrick', u'coral']], [u'pink', [u'pink', u'rose']], [u'purple', [u'purple', u'violet', u'fuchsia', u'magenta', u'indigo', u'orchid', u'lavender']], [u'green', [u'green', u'lime', u'seagreen', u'grass', u'olive', u'forest', u'teal']], [u'blue', [u'blue', u'cyan', u'aqua', u'aquamarine', u'turquoise', u'royal', u'sky', u'navy']], [u'gold', [u'gold', u'yellow', u'khaki', u'mocassin', u'papayawhip', u'lemonchiffon']], [u'black', None]] header6 = [u'array_string', u'array_record', u'nulable_date', u'multi_union_time', u'array_bytes_decimal', u'array_fixed_decimal'] rows61 = [[u'a', u'b', u'c'], [{u'f1': u'1', u'f2': Decimal('654.321')}], date(2020, 1, 10), _utc(2020, 12, 19, 18, 17, 16, 15000), [Decimal('123.456')], [Decimal('987.654')], ] rows62 = [[u'a', u'b', u'c'], [{u'f1': u'1', u'f2': Decimal('654.321')}], date(2020, 1, 10), _utc(2020, 12, 19, 18, 17, 16, 15000), [Decimal('123.456'), Decimal('456.789')], [Decimal('987.654'), Decimal('321.123'), Decimal('456.654')]] table61 = [header6, rows61] table62 = [header6, rows62] table63 = [header6, rows61, rows62] table60 = [header6, [rows61[0], rows61[1], ]] header7 = [u'col', u'sqrt_pow_ij'] rows70 = [[j, [round(math.sqrt(math.pow(i*j, i+j)), 9) for i in range(1, j+1)]] for j in range(1, 7)] rows71 = [[j, [Decimal(str(round(math.sqrt(math.pow(i*j, i+j)), 9))) for i in range(1, j+1)]] for j in range(1, 7)] table70 = [header7] + rows70 table71 = [header7] + rows71 header8 = [u'number', u'properties'] rows8 = [[_decs(x), { u'atan': _decs(math.atan(x)), u'sin': math.sin(x), u'cos': math.cos(x), u'tan': math.tan(x), u'square': x*x, u'sqrt': math.sqrt(x), u'log': math.log(x), u'log10': math.log10(x), u'exp': math.log10(x), u'power_x': x**x, u'power_minus_x': x**-x, }] for x in range(1, 12)] table8 = [header8] + rows8 rows9 = [[1, { u'name': u'Bob', u'age': 20 }], [2, { u'name': u'Ted', u'budget': _decs(54321.25) }], [2, { u'name': u'Jim', u'color': u'blue' }], [2, { u'name': u'Joe', u'alias': u'terminator' }]] table9 = [header8] + rows9 # endregion # region testing # endregion # end of tests # petl-1.7.15/petl/test/io/test_avro_schemas.py000066400000000000000000000137651457414240700212120ustar00rootroot00000000000000# -*- coding: utf-8 -*- # begin_nullable_schema schema0 = { 'doc': 'Nullable records.', 'name': 'anyone', 'namespace': 'test', 'type': 'record', 'fields': [ {'name': 'name', 'type': ['null', 'string']}, {'name': 'friends', 'type': ['null', 'int']}, {'name': 'age', 'type': ['null', 'int']}, ], } # end_nullable_schema # begin_basic_schema schema1 = { 'doc': 'Some people records.', 'name': 'People', 'namespace': 'test', 'type': 'record', 'fields': [ {'name': 'name', 'type': 'string'}, {'name': 'friends', 'type': 'int'}, {'name': 'age', 'type': 'int'}, ], } # end_basic_schema # begin_logicalType_schema schema2 = { 'doc': 'Some random people.', 'name': 'Crowd', 'namespace': 'test', 'type': 'record', 'fields': [ {'name': 'name', 'type': 'string'}, {'name': 'age', 'type': 'int'}, {'name': 'birthday', 'type': { 'type': 'int', 'logicalType': 'date' }}, {'name': 'death', 'type': { 'type': 'long', 'logicalType': 'timestamp-millis' }}, {'name': 'insurance', 'type': { 'type': 'bytes', 'logicalType': 'decimal', 'precision': 12, 'scale': 3 }}, {'name': 'deny', 'type': 'boolean'}, ], } # end_logicalType_schema # begin_micros_schema schema3 = { 'doc': 'Some random people.', 'name': 'Crowd', 'namespace': 'test', 'type': 'record', 'fields': [ {'name': 'name', 'type': 'string'}, {'name': 'age', 'type': 'int'}, {'name': 'birthday', 'type': { 'type': 'int', 'logicalType': 'date' }}, {'name': 'death', 'type': { 'type': 'long', 'logicalType': 'timestamp-micros' }}, ], } # end_micros_schema # begin_mixed_schema schema4 = { 'doc': 'Some people records.', 'name': 'People', 'namespace': 'test', 'type': 'record', 'fields': [ {'name': 'name', 'type': ['null', 'string']}, {'name': 'friends', 'type': ['null', 'long']}, {'name': 'age', 'type': ['null', 'int']}, {'name': 'birthday', 'type': ['null', {'type': 'int', 'logicalType': 'date'}] } ], } # end_mixed_schema # begin_array_schema schema5 = { 'name': 'palettes', 'namespace': 'color', 'type': 'record', 'fields': [ {'name': 'palette', 'type': 'string'}, {'name': 'colors', 'type': ['null', {'type': 'array', 'items': 'string'}] } ], } # end_array_schema # begin_complex_schema schema6 = { 'fields': [ { 'name': 'array_string', 'type': {'type': 'array', 'items': 'string'} }, { 'name': 'array_record', 'type': {'type': 'array', 'items': { 'type': 'record', 'name': 'some_record', 'fields': [ { 'name': 'f1', 'type': 'string' }, { 'name': 'f2', 'type': {'type': 'bytes', 'logicalType': 'decimal', 'precision': 18, 'scale': 6, } } ] } } }, { 'name': 'nulable_date', 'type': ['null', {'type': 'int', 'logicalType': 'date'}] }, { 'name': 'multi_union_time', 'type': ['null', 'string', {'type': 'long', 'logicalType': 'timestamp-micros'}] }, { 'name': 'array_bytes_decimal', 'type': ['null', {'type': 'array', 'items': {'type': 'bytes', 'logicalType': 'decimal', 'precision': 18, 'scale': 6, } }] }, { 'name': 'array_fixed_decimal', 'type': ['null', {'type': 'array', 'items': {'type': 'fixed', 'name': 'FixedDecimal', 'size': 8, 'logicalType': 'decimal', 'precision': 18, 'scale': 6, } }] }, ], 'namespace': 'namespace', 'name': 'name', 'type': 'record' } # end_complex_schema # begin_logical_schema logical_schema = { 'fields': [ { 'name': 'date', 'type': {'type': 'int', 'logicalType': 'date'} }, { 'name': 'datetime', 'type': {'type': 'long', 'logicalType': 'timestamp-millis'} }, { 'name': 'datetime2', 'type': {'type': 'long', 'logicalType': 'timestamp-micros'} }, { 'name': 'uuid', 'type': {'type': 'string', 'logicalType': 'uuid'} }, { 'name': 'time', 'type': {'type': 'int', 'logicalType': 'time-millis'} }, { 'name': 'time2', 'type': {'type': 'long', 'logicalType': 'time-micros'} }, { 'name': 'Decimal', 'type': { 'type': 'bytes', 'logicalType': 'decimal', 'precision': 15, 'scale': 6 } }, { 'name': 'Decimal2', 'type': { 'type': 'fixed', 'size': 8, 'logicalType': 'decimal', 'precision': 15, 'scale': 3 } } ], 'namespace': 'namespace', 'name': 'name', 'type': 'record' } # end_logical_schema petl-1.7.15/petl/test/io/test_bcolz.py000066400000000000000000000041121457414240700176330ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division import tempfile import pytest from petl.test.helpers import ieq, eq_ from petl.io.bcolz import frombcolz, tobcolz, appendbcolz try: import bcolz except ImportError as e: pytest.skip('SKIP bcolz tests: %s' % e, allow_module_level=True) else: def test_frombcolz(): cols = [ ['apples', 'oranges', 'pears'], [1, 3, 7], [2.5, 4.4, .1] ] names = ('foo', 'bar', 'baz') rootdir = tempfile.mkdtemp() ctbl = bcolz.ctable(cols, names=names, rootdir=rootdir, mode='w') ctbl.flush() expect = [names] + list(zip(*cols)) # from ctable object actual = frombcolz(ctbl) ieq(expect, actual) ieq(expect, actual) # from rootdir actual = frombcolz(rootdir) ieq(expect, actual) ieq(expect, actual) def test_tobcolz(): t = [('foo', 'bar', 'baz'), ('apples', 1, 2.5), ('oranges', 3, 4.4), ('pears', 7, .1)] ctbl = tobcolz(t) assert isinstance(ctbl, bcolz.ctable) eq_(t[0], tuple(ctbl.names)) ieq(t[1:], (tuple(r) for r in ctbl.iter())) ctbl = tobcolz(t, chunklen=2) assert isinstance(ctbl, bcolz.ctable) eq_(t[0], tuple(ctbl.names)) ieq(t[1:], (tuple(r) for r in ctbl.iter())) eq_(2, ctbl.cols[ctbl.names[0]].chunklen) def test_appendbcolz(): t = [('foo', 'bar', 'baz'), ('apples', 1, 2.5), ('oranges', 3, 4.4), ('pears', 7, .1)] # append to in-memory ctable ctbl = tobcolz(t) appendbcolz(t, ctbl) eq_(t[0], tuple(ctbl.names)) ieq(t[1:] + t[1:], (tuple(r) for r in ctbl.iter())) # append to on-disk ctable rootdir = tempfile.mkdtemp() tobcolz(t, rootdir=rootdir) appendbcolz(t, rootdir) ctbl = bcolz.open(rootdir, mode='r') eq_(t[0], tuple(ctbl.names)) ieq(t[1:] + t[1:], (tuple(r) for r in ctbl.iter())) petl-1.7.15/petl/test/io/test_csv.py000066400000000000000000000167031457414240700173260ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division from tempfile import NamedTemporaryFile import gzip import os import logging from petl.compat import PY2 from petl.test.helpers import ieq, eq_ from petl.io.csv import fromcsv, fromtsv, tocsv, appendcsv, totsv, appendtsv logger = logging.getLogger(__name__) debug = logger.debug def test_fromcsv(): data = [b'foo,bar', b'a,1', b'b,2', b'c,2'] f = NamedTemporaryFile(mode='wb', delete=False) f.write(b'\n'.join(data)) f.close() expect = (('foo', 'bar'), ('a', '1'), ('b', '2'), ('c', '2')) actual = fromcsv(f.name, encoding='ascii') debug(actual) ieq(expect, actual) ieq(expect, actual) # verify can iterate twice def test_fromcsv_lineterminators(): data = [b'foo,bar', b'a,1', b'b,2', b'c,2'] expect = (('foo', 'bar'), ('a', '1'), ('b', '2'), ('c', '2')) for lt in b'\r', b'\n', b'\r\n': debug(repr(lt)) f = NamedTemporaryFile(mode='wb', delete=False) f.write(lt.join(data)) f.close() with open(f.name, 'rb') as g: debug(repr(g.read())) actual = fromcsv(f.name, encoding='ascii') debug(actual) ieq(expect, actual) def test_fromcsv_quoted(): import csv data = [b'"foo","bar"', b'"a",1', b'"b",2', b'"c",2'] f = NamedTemporaryFile(mode='wb', delete=False) f.write(b'\n'.join(data)) f.close() expect = (('foo', 'bar'), ('a', 1), ('b', 2), ('c', 2)) actual = fromcsv(f.name, quoting=csv.QUOTE_NONNUMERIC) debug(actual) ieq(expect, actual) ieq(expect, actual) # verify can iterate twice def test_fromtsv(): data = [b'foo\tbar', b'a\t1', b'b\t2', b'c\t2'] f = NamedTemporaryFile(mode='wb', delete=False) f.write(b'\n'.join(data)) f.close() expect = (('foo', 'bar'), ('a', '1'), ('b', '2'), ('c', '2')) actual = fromtsv(f.name, encoding='ascii') ieq(expect, actual) ieq(expect, actual) # verify can iterate twice def test_fromcsv_gz(): data = [b'foo,bar', b'a,1', b'b,2', b'c,2'] expect = (('foo', 'bar'), ('a', '1'), ('b', '2'), ('c', '2')) # '\r' not supported in PY2 because universal newline mode is # not supported by gzip module if PY2: lts = b'\n', b'\r\n' else: lts = b'\r', b'\n', b'\r\n' for lt in lts: f = NamedTemporaryFile(delete=False) f.close() fn = f.name + '.gz' os.rename(f.name, fn) fz = gzip.open(fn, 'wb') fz.write(lt.join(data)) fz.close() actual = fromcsv(fn, encoding='ascii') ieq(expect, actual) ieq(expect, actual) # verify can iterate twice def test_tocsv_appendcsv(): # exercise function table = (('foo', 'bar'), ('a', 1), ('b', 2), ('c', 2)) f = NamedTemporaryFile(delete=False) f.close() tocsv(table, f.name, encoding='ascii', lineterminator='\n') # check what it did with open(f.name, 'rb') as o: data = [b'foo,bar', b'a,1', b'b,2', b'c,2'] # don't forget final terminator expect = b'\n'.join(data) + b'\n' actual = o.read() eq_(expect, actual) # check appending table2 = (('foo', 'bar'), ('d', 7), ('e', 9), ('f', 1)) appendcsv(table2, f.name, encoding='ascii', lineterminator='\n') # check what it did with open(f.name, 'rb') as o: data = [b'foo,bar', b'a,1', b'b,2', b'c,2', b'd,7', b'e,9', b'f,1'] # don't forget final terminator expect = b'\n'.join(data) + b'\n' actual = o.read() eq_(expect, actual) def test_tocsv_noheader(): # check explicit no header table = (('foo', 'bar'), ('a', 1), ('b', 2), ('c', 2)) f = NamedTemporaryFile(delete=False) tocsv(table, f.name, encoding='ascii', lineterminator='\n', write_header=False) # check what it did with open(f.name, 'rb') as o: data = [b'a,1', b'b,2', b'c,2'] # don't forget final terminator expect = b'\n'.join(data) + b'\n' actual = o.read() eq_(expect, actual) def test_totsv_appendtsv(): # exercise function table = (('foo', 'bar'), ('a', 1), ('b', 2), ('c', 2)) f = NamedTemporaryFile(delete=False) f.close() totsv(table, f.name, encoding='ascii', lineterminator='\n') # check what it did with open(f.name, 'rb') as o: data = [b'foo\tbar', b'a\t1', b'b\t2', b'c\t2'] # don't forget final terminator expect = b'\n'.join(data) + b'\n' actual = o.read() eq_(expect, actual) # check appending table2 = (('foo', 'bar'), ('d', 7), ('e', 9), ('f', 1)) appendtsv(table2, f.name, encoding='ascii', lineterminator='\n') # check what it did with open(f.name, 'rb') as o: data = [b'foo\tbar', b'a\t1', b'b\t2', b'c\t2', b'd\t7', b'e\t9', b'f\t1'] # don't forget final terminator expect = b'\n'.join(data) + b'\n' actual = o.read() eq_(expect, actual) def test_tocsv_appendcsv_gz(): # exercise function table = (('foo', 'bar'), ('a', 1), ('b', 2), ('c', 2)) f = NamedTemporaryFile(delete=False) fn = f.name + '.gz' f.close() tocsv(table, fn, encoding='ascii', lineterminator='\n') # check what it did o = gzip.open(fn, 'rb') try: data = [b'foo,bar', b'a,1', b'b,2', b'c,2'] # don't forget final terminator expect = b'\n'.join(data) + b'\n' actual = o.read() eq_(expect, actual) finally: o.close() # check appending table2 = (('foo', 'bar'), ('d', 7), ('e', 9), ('f', 1)) appendcsv(table2, fn, encoding='ascii', lineterminator='\n') # check what it did o = gzip.open(fn, 'rb') try: data = [b'foo,bar', b'a,1', b'b,2', b'c,2', b'd,7', b'e,9', b'f,1'] # don't forget final terminator expect = b'\n'.join(data) + b'\n' actual = o.read() eq_(expect, actual) finally: o.close() def test_fromcsv_header(): header = ['foo', 'bar'] data = [b'a,1', b'b,2', b'c,2'] f = NamedTemporaryFile(mode='wb', delete=False) f.write(b'\n'.join(data)) f.close() expect = (('foo', 'bar'), ('a', '1'), ('b', '2'), ('c', '2')) actual = fromcsv(f.name, encoding='ascii', header=header) debug(actual) ieq(expect, actual) ieq(expect, actual) # verify can iterate twicepetl-1.7.15/petl/test/io/test_csv_unicode.py000066400000000000000000000101271457414240700210260ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division import io from tempfile import NamedTemporaryFile from petl.test.helpers import ieq, eq_ from petl.io.csv import fromcsv, tocsv, appendcsv def test_fromcsv(): data = ( u"name,id\n" u"Արամ Խաչատրյան,1\n" u"Johann Strauß,2\n" u"Вагиф Сәмәдоғлу,3\n" u"章子怡,4\n" ) fn = NamedTemporaryFile().name uf = io.open(fn, encoding='utf-8', mode='wt') uf.write(data) uf.close() actual = fromcsv(fn, encoding='utf-8') expect = ((u'name', u'id'), (u'Արամ Խաչատրյան', u'1'), (u'Johann Strauß', u'2'), (u'Вагиф Сәмәдоғлу', u'3'), (u'章子怡', u'4')) ieq(expect, actual) ieq(expect, actual) # verify can iterate twice def test_fromcsv_lineterminators(): data = (u'name,id', u'Արամ Խաչատրյան,1', u'Johann Strauß,2', u'Вагиф Сәмәдоғлу,3', u'章子怡,4') expect = ((u'name', u'id'), (u'Արամ Խաչատրյան', u'1'), (u'Johann Strauß', u'2'), (u'Вагиф Сәмәдоғлу', u'3'), (u'章子怡', u'4')) for lt in u'\r', u'\n', u'\r\n': fn = NamedTemporaryFile().name uf = io.open(fn, encoding='utf-8', mode='wt', newline='') uf.write(lt.join(data)) uf.close() actual = fromcsv(fn, encoding='utf-8') ieq(expect, actual) def test_tocsv(): tbl = ((u'name', u'id'), (u'Արամ Խաչատրյան', 1), (u'Johann Strauß', 2), (u'Вагиф Сәмәдоғлу', 3), (u'章子怡', 4)) fn = NamedTemporaryFile().name tocsv(tbl, fn, encoding='utf-8', lineterminator='\n') expect = ( u"name,id\n" u"Արամ Խաչատրյան,1\n" u"Johann Strauß,2\n" u"Вагиф Сәмәдоғлу,3\n" u"章子怡,4\n" ) uf = io.open(fn, encoding='utf-8', mode='rt', newline='') actual = uf.read() eq_(expect, actual) # Test with write_header=False tbl = ((u'name', u'id'), (u'Արամ Խաչատրյան', 1), (u'Johann Strauß', 2), (u'Вагиф Сәмәдоғлу', 3), (u'章子怡', 4)) tocsv(tbl, fn, encoding='utf-8', lineterminator='\n', write_header=False) expect = ( u"Արամ Խաչատրյան,1\n" u"Johann Strauß,2\n" u"Вагиф Сәмәдоғлу,3\n" u"章子怡,4\n" ) uf = io.open(fn, encoding='utf-8', mode='rt', newline='') actual = uf.read() eq_(expect, actual) def test_appendcsv(): data = ( u"name,id\n" u"Արամ Խաչատրյան,1\n" u"Johann Strauß,2\n" u"Вагиф Сәмәдоғлу,3\n" u"章子怡,4\n" ) fn = NamedTemporaryFile().name uf = io.open(fn, encoding='utf-8', mode='wt') uf.write(data) uf.close() tbl = ((u'name', u'id'), (u'ኃይሌ ገብረሥላሴ', 5), (u'ედუარდ შევარდნაძე', 6)) appendcsv(tbl, fn, encoding='utf-8', lineterminator='\n') expect = ( u"name,id\n" u"Արամ Խաչատրյան,1\n" u"Johann Strauß,2\n" u"Вагиф Сәмәдоғлу,3\n" u"章子怡,4\n" u"ኃይሌ ገብረሥላሴ,5\n" u"ედუარდ შევარდნაძე,6\n" ) uf = io.open(fn, encoding='utf-8', mode='rt') actual = uf.read() eq_(expect, actual) def test_tocsv_none(): tbl = ((u'col1', u'colNone'), (u'a', 1), (u'b', None), (u'c', None), (u'd', 4)) fn = NamedTemporaryFile().name tocsv(tbl, fn, encoding='utf-8', lineterminator='\n') expect = ( u'col1,colNone\n' u'a,1\n' u'b,\n' u'c,\n' u'd,4\n' ) uf = io.open(fn, encoding='utf-8', mode='rt', newline='') actual = uf.read() eq_(expect, actual) petl-1.7.15/petl/test/io/test_db.py000066400000000000000000000110761457414240700171160ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division import sqlite3 from tempfile import NamedTemporaryFile from petl.compat import next from petl.test.helpers import ieq, eq_ from petl.io.db import fromdb, todb, appenddb # N.B., this file only tests the DB-related functions using sqlite3, # as anything else requires database connection configuration. See # docs/dbtests.py for a script to exercise the DB-related functions with # MySQL and Postgres. def test_fromdb(): # initial data data = (('a', 1), ('b', 2), ('c', 2.0)) connection = sqlite3.connect(':memory:') c = connection.cursor() c.execute('create table foobar (foo, bar)') for row in data: c.execute('insert into foobar values (?, ?)', row) connection.commit() c.close() # test the function actual = fromdb(connection, 'select * from foobar') expect = (('foo', 'bar'), ('a', 1), ('b', 2), ('c', 2.0)) ieq(expect, actual) ieq(expect, actual) # verify can iterate twice # test iterators are isolated i1 = iter(actual) i2 = iter(actual) eq_(('foo', 'bar'), next(i1)) eq_(('a', 1), next(i1)) eq_(('foo', 'bar'), next(i2)) eq_(('b', 2), next(i1)) def test_fromdb_mkcursor(): # initial data data = (('a', 1), ('b', 2), ('c', 2.0)) connection = sqlite3.connect(':memory:') c = connection.cursor() c.execute('create table foobar (foo, bar)') for row in data: c.execute('insert into foobar values (?, ?)', row) connection.commit() c.close() # test the function mkcursor = lambda: connection.cursor() actual = fromdb(mkcursor, 'select * from foobar') expect = (('foo', 'bar'), ('a', 1), ('b', 2), ('c', 2.0)) ieq(expect, actual) ieq(expect, actual) # verify can iterate twice # test iterators are isolated i1 = iter(actual) i2 = iter(actual) eq_(('foo', 'bar'), next(i1)) eq_(('a', 1), next(i1)) eq_(('foo', 'bar'), next(i2)) eq_(('b', 2), next(i1)) def test_fromdb_withargs(): # initial data data = (('a', 1), ('b', 2), ('c', 2.0)) connection = sqlite3.connect(':memory:') c = connection.cursor() c.execute('create table foobar (foo, bar)') for row in data: c.execute('insert into foobar values (?, ?)', row) connection.commit() c.close() # test the function actual = fromdb( connection, 'select * from foobar where bar > ? and bar < ?', (1, 3) ) expect = (('foo', 'bar'), ('b', 2), ('c', 2.0)) ieq(expect, actual) ieq(expect, actual) # verify can iterate twice def test_todb_appenddb(): f = NamedTemporaryFile(delete=False) conn = sqlite3.connect(f.name) conn.execute('create table foobar (foo, bar)') conn.commit() # exercise function table = (('foo', 'bar'), ('a', 1), ('b', 2), ('c', 2)) todb(table, conn, 'foobar') # check what it did actual = conn.execute('select * from foobar') expect = (('a', 1), ('b', 2), ('c', 2)) ieq(expect, actual) # try appending table2 = (('foo', 'bar'), ('d', 7), ('e', 9), ('f', 1)) appenddb(table2, conn, 'foobar') # check what it did actual = conn.execute('select * from foobar') expect = (('a', 1), ('b', 2), ('c', 2), ('d', 7), ('e', 9), ('f', 1)) ieq(expect, actual) def test_todb_appenddb_cursor(): f = NamedTemporaryFile(delete=False) conn = sqlite3.connect(f.name) conn.execute('create table foobar (foo, bar)') conn.commit() # exercise function table = (('foo', 'bar'), ('a', 1), ('b', 2), ('c', 2)) cursor = conn.cursor() todb(table, cursor, 'foobar') # check what it did actual = conn.execute('select * from foobar') expect = (('a', 1), ('b', 2), ('c', 2)) ieq(expect, actual) # try appending table2 = (('foo', 'bar'), ('d', 7), ('e', 9), ('f', 1)) appenddb(table2, cursor, 'foobar') # check what it did actual = conn.execute('select * from foobar') expect = (('a', 1), ('b', 2), ('c', 2), ('d', 7), ('e', 9), ('f', 1)) ieq(expect, actual) petl-1.7.15/petl/test/io/test_db_create.py000066400000000000000000000170741457414240700204450ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division import logging from datetime import datetime, date import sqlite3 import pytest from petl.io.db import fromdb, todb from petl.io.db_create import make_sqlalchemy_column from petl.test.helpers import ieq, eq_ from petl.util.vis import look from petl.test.io.test_db_server import user, password, host, database logger = logging.getLogger(__name__) debug = logger.debug def _test_create(dbo): expect = (('foo', 'bar'), ('a', 1), ('b', 2)) expect_extended = (('foo', 'bar', 'baz'), ('a', 1, 2.3), ('b', 2, 4.1)) actual = fromdb(dbo, 'SELECT * FROM test_create') debug('verify table does not exist to start with') try: debug(look(actual)) except Exception as e: debug('expected exception: ' + str(e)) else: raise Exception('expected exception not raised') debug('verify cannot write without create') try: todb(expect, dbo, 'test_create') except Exception as e: debug('expected exception: ' + str(e)) else: raise Exception('expected exception not raised') debug('create table and verify') todb(expect, dbo, 'test_create', create=True) ieq(expect, actual) debug(look(actual)) debug('verify cannot overwrite with new cols without recreate') try: todb(expect_extended, dbo, 'test_create') except Exception as e: debug('expected exception: ' + str(e)) else: raise Exception('expected exception not raised') debug('verify recreate') todb(expect_extended, dbo, 'test_create', create=True, drop=True) ieq(expect_extended, actual) debug(look(actual)) debug('horrendous identifiers') table = (('foo foo', 'bar.baz."spong`'), ('a', 1), ('b', 2), ('c', 2)) todb(table, dbo, 'foo " bar`', create=True) actual = fromdb(dbo, 'SELECT * FROM "foo "" bar`"') ieq(table, actual) def _setup_mysql(dbapi_connection): # setup table cursor = dbapi_connection.cursor() # deal with quote compatibility cursor.execute('SET SQL_MODE=ANSI_QUOTES') cursor.execute('DROP TABLE IF EXISTS test_create') cursor.execute('DROP TABLE IF EXISTS "foo "" bar`"') cursor.close() dbapi_connection.commit() def _setup_generic(dbapi_connection): # setup table cursor = dbapi_connection.cursor() cursor.execute('DROP TABLE IF EXISTS test_create') cursor.execute('DROP TABLE IF EXISTS "foo "" bar`"') cursor.close() dbapi_connection.commit() try: # noinspection PyUnresolvedReferences import sqlalchemy except ImportError as e: pytest.skip('SKIP generic create tests: %s' % e, allow_module_level=True) else: from sqlalchemy import Column, DateTime, Date def test_make_datetime_column(): sql_col = make_sqlalchemy_column([datetime(2014, 1, 1, 1, 1, 1, 1), datetime(2014, 1, 1, 1, 1, 1, 2)], 'name') expect = Column('name', DateTime(), nullable=False) eq_(str(expect.type), str(sql_col.type)) def test_make_date_column(): sql_col = make_sqlalchemy_column([date(2014, 1, 1), date(2014, 1, 2)], 'name') expect = Column('name', Date(), nullable=False) eq_(str(expect.type), str(sql_col.type)) def test_sqlite3_create(): dbapi_connection = sqlite3.connect(':memory:') # exercise using a dbapi_connection _setup_generic(dbapi_connection) _test_create(dbapi_connection) # exercise using a dbapi_cursor _setup_generic(dbapi_connection) dbapi_cursor = dbapi_connection.cursor() _test_create(dbapi_cursor) dbapi_cursor.close() SKIP_PYMYSQL = False try: import pymysql import sqlalchemy pymysql.connect(host=host, user=user, password=password, database=database) except Exception as e: SKIP_PYMYSQL = 'SKIP pymysql create tests: %s' % e finally: @pytest.mark.skipif(bool(SKIP_PYMYSQL), reason=str(SKIP_PYMYSQL)) def test_mysql_create(): import pymysql connect = pymysql.connect # assume database already created dbapi_connection = connect(host=host, user=user, password=password, database=database) # exercise using a dbapi_connection _setup_mysql(dbapi_connection) _test_create(dbapi_connection) # exercise using a dbapi_cursor _setup_mysql(dbapi_connection) dbapi_cursor = dbapi_connection.cursor() _test_create(dbapi_cursor) dbapi_cursor.close() # exercise sqlalchemy dbapi_connection _setup_mysql(dbapi_connection) from sqlalchemy import create_engine sqlalchemy_engine = create_engine('mysql+pymysql://%s:%s@%s/%s' % (user, password, host, database)) sqlalchemy_connection = sqlalchemy_engine.connect() sqlalchemy_connection.execute('SET SQL_MODE=ANSI_QUOTES') _test_create(sqlalchemy_connection) sqlalchemy_connection.close() # exercise sqlalchemy session _setup_mysql(dbapi_connection) from sqlalchemy.orm import sessionmaker Session = sessionmaker(bind=sqlalchemy_engine) sqlalchemy_session = Session() _test_create(sqlalchemy_session) sqlalchemy_session.close() SKIP_POSTGRES = False try: import psycopg2 import sqlalchemy psycopg2.connect( 'host=%s dbname=%s user=%s password=%s' % (host, database, user, password) ) except Exception as e: SKIP_POSTGRES = 'SKIP psycopg2 create tests: %s' % e finally: @pytest.mark.skipif(bool(SKIP_POSTGRES), reason=str(SKIP_POSTGRES)) def test_postgresql_create(): import psycopg2 import psycopg2.extensions psycopg2.extensions.register_type(psycopg2.extensions.UNICODE) psycopg2.extensions.register_type(psycopg2.extensions.UNICODEARRAY) # assume database already created dbapi_connection = psycopg2.connect( 'host=%s dbname=%s user=%s password=%s' % (host, database, user, password) ) dbapi_connection.autocommit = True # exercise using a dbapi_connection _setup_generic(dbapi_connection) _test_create(dbapi_connection) # exercise using a dbapi_cursor _setup_generic(dbapi_connection) dbapi_cursor = dbapi_connection.cursor() _test_create(dbapi_cursor) dbapi_cursor.close() # # ignore these for now, having trouble with autocommit # # # exercise sqlalchemy dbapi_connection # _setup_generic(dbapi_connection) # from sqlalchemy import create_engine # sqlalchemy_engine = create_engine( # 'postgresql+psycopg2://%s:%s@%s/%s' # % (user, password, host, database) # ) # sqlalchemy_connection = sqlalchemy_engine.connect() # _test_create(sqlalchemy_connection) # sqlalchemy_connection.close() # # # exercise sqlalchemy session # _setup_generic(dbapi_connection) # from sqlalchemy.orm import sessionmaker # Session = sessionmaker(bind=sqlalchemy_engine) # sqlalchemy_session = Session() # _test_create(sqlalchemy_session) # sqlalchemy_session.close() petl-1.7.15/petl/test/io/test_db_server.py000066400000000000000000000252421457414240700205040ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division import logging import pytest import petl as etl from petl.test.helpers import ieq logger = logging.getLogger(__name__) debug = logger.debug def _test_dbo(write_dbo, read_dbo=None): if read_dbo is None: read_dbo = write_dbo expect_empty = (('foo', 'bar'),) expect = (('foo', 'bar'), ('a', 1), ('b', 2)) expect_appended = (('foo', 'bar'), ('a', 1), ('b', 2), ('a', 1), ('b', 2)) actual = etl.fromdb(read_dbo, 'SELECT * FROM test') debug('verify empty to start with...') debug(etl.look(actual)) ieq(expect_empty, actual) debug('write some data and verify...') etl.todb(expect, write_dbo, 'test') debug(etl.look(actual)) ieq(expect, actual) debug('append some data and verify...') etl.appenddb(expect, write_dbo, 'test') debug(etl.look(actual)) ieq(expect_appended, actual) debug('overwrite and verify...') etl.todb(expect, write_dbo, 'test') debug(etl.look(actual)) ieq(expect, actual) debug('cut, overwrite and verify') etl.todb(etl.cut(expect, 'bar', 'foo'), write_dbo, 'test') debug(etl.look(actual)) ieq(expect, actual) debug('cut, append and verify') etl.appenddb(etl.cut(expect, 'bar', 'foo'), write_dbo, 'test') debug(etl.look(actual)) ieq(expect_appended, actual) debug('try a single row') etl.todb(etl.head(expect, 1), write_dbo, 'test') debug(etl.look(actual)) ieq(etl.head(expect, 1), actual) def _test_with_schema(dbo, schema): expect = (('foo', 'bar'), ('a', 1), ('b', 2)) expect_appended = (('foo', 'bar'), ('a', 1), ('b', 2), ('a', 1), ('b', 2)) actual = etl.fromdb(dbo, 'SELECT * FROM test') print('write some data and verify...') etl.todb(expect, dbo, 'test', schema=schema) ieq(expect, actual) print(etl.look(actual)) print('append some data and verify...') etl.appenddb(expect, dbo, 'test', schema=schema) ieq(expect_appended, actual) print(etl.look(actual)) def _test_unicode(dbo): expect = ((u'name', u'id'), (u'Արամ Խաչատրյան', 1), (u'Johann Strauß', 2), (u'Вагиф Сәмәдоғлу', 3), (u'章子怡', 4), ) actual = etl.fromdb(dbo, 'SELECT * FROM test_unicode') print('write some data and verify...') etl.todb(expect, dbo, 'test_unicode') ieq(expect, actual) print(etl.look(actual)) def _setup_mysql(dbapi_connection): # setup table cursor = dbapi_connection.cursor() # deal with quote compatibility cursor.execute('SET SQL_MODE=ANSI_QUOTES') cursor.execute('DROP TABLE IF EXISTS test') cursor.execute('CREATE TABLE test (foo TEXT, bar INT)') cursor.execute('DROP TABLE IF EXISTS test_unicode') cursor.execute('CREATE TABLE test_unicode (name TEXT, id INT) ' 'CHARACTER SET utf8') cursor.close() dbapi_connection.commit() def _setup_postgresql(dbapi_connection): # setup table cursor = dbapi_connection.cursor() cursor.execute('DROP TABLE IF EXISTS test') cursor.execute('CREATE TABLE test (foo TEXT, bar INT)') cursor.execute('DROP TABLE IF EXISTS test_unicode') # assume character encoding UTF-8 already set at database level cursor.execute('CREATE TABLE test_unicode (name TEXT, id INT)') cursor.close() dbapi_connection.commit() def _setup_sqlalchemy_quotes(dbapi_connection, connection_record): cursor = dbapi_connection.cursor() cursor.execute("SET sql_mode = 'ANSI_QUOTES'") host, user, password, database = '127.0.0.1', 'petl', 'test', 'petl' SKIP_PYMYSQL = False try: import pymysql import sqlalchemy pymysql.connect(host=host, user=user, password=password, database=database) except Exception as e: SKIP_PYMYSQL = 'SKIP pymysql tests: %s' % e finally: @pytest.mark.skipif(bool(SKIP_PYMYSQL), reason=str(SKIP_PYMYSQL)) def test_pymysql(): import pymysql connect = pymysql.connect # assume database already created dbapi_connection = connect(host=host, user=user, password=password, database=database) # exercise using a dbapi_connection _setup_mysql(dbapi_connection) _test_dbo(dbapi_connection) # exercise using a dbapi_cursor _setup_mysql(dbapi_connection) dbapi_cursor = dbapi_connection.cursor() _test_dbo(dbapi_cursor) dbapi_cursor.close() # exercise sqlalchemy dbapi_connection _setup_mysql(dbapi_connection) from sqlalchemy import create_engine sqlalchemy_engine = create_engine('mysql+pymysql://%s:%s@%s/%s' % (user, password, host, database)) from sqlalchemy.event import listen listen(sqlalchemy_engine, "connect", _setup_sqlalchemy_quotes) sqlalchemy_connection = sqlalchemy_engine.connect() _test_dbo(sqlalchemy_connection) sqlalchemy_connection.close() # exercise sqlalchemy session _setup_mysql(dbapi_connection) from sqlalchemy.orm import sessionmaker Session = sessionmaker(bind=sqlalchemy_engine) sqlalchemy_session = Session() _test_dbo(sqlalchemy_session) sqlalchemy_session.close() # exercise sqlalchemy engine _setup_mysql(dbapi_connection) sqlalchemy_engine2 = create_engine('mysql+pymysql://%s:%s@%s/%s' % (user, password, host, database), echo_pool='debug') listen(sqlalchemy_engine2, "connect", _setup_sqlalchemy_quotes) _test_dbo(sqlalchemy_engine2) sqlalchemy_engine2.dispose() # other exercises _test_with_schema(dbapi_connection, database) utf8_connection = connect(host=host, user=user, password=password, database=database, charset='utf8') utf8_connection.cursor().execute('SET SQL_MODE=ANSI_QUOTES') _test_unicode(utf8_connection) utf8_connection.close() SKIP_MYSQLDB = False try: import MySQLdb import sqlalchemy MySQLdb.connect(host=host, user=user, passwd=password, db=database) except Exception as e: SKIP_MYSQLDB = 'SKIP MySQLdb tests: %s' % e finally: @pytest.mark.skipif(bool(SKIP_MYSQLDB), reason=str(SKIP_MYSQLDB)) def test_mysqldb(): import MySQLdb connect = MySQLdb.connect # assume database already created dbapi_connection = connect(host=host, user=user, passwd=password, db=database) # exercise using a dbapi_connection _setup_mysql(dbapi_connection) _test_dbo(dbapi_connection) # exercise using a dbapi_cursor _setup_mysql(dbapi_connection) dbapi_cursor = dbapi_connection.cursor() _test_dbo(dbapi_cursor) dbapi_cursor.close() # exercise sqlalchemy dbapi_connection _setup_mysql(dbapi_connection) from sqlalchemy import create_engine sqlalchemy_engine = create_engine('mysql+mysqldb://%s:%s@%s/%s' % (user, password, host, database)) from sqlalchemy.event import listen listen(sqlalchemy_engine, "connect", _setup_sqlalchemy_quotes) sqlalchemy_connection = sqlalchemy_engine.connect() sqlalchemy_connection.execute('SET SQL_MODE=ANSI_QUOTES') _test_dbo(sqlalchemy_connection) sqlalchemy_connection.close() # exercise sqlalchemy session _setup_mysql(dbapi_connection) from sqlalchemy.orm import sessionmaker Session = sessionmaker(bind=sqlalchemy_engine) sqlalchemy_session = Session() _test_dbo(sqlalchemy_session) sqlalchemy_session.close() # other exercises _test_with_schema(dbapi_connection, database) utf8_connection = connect(host=host, user=user, passwd=password, db=database, charset='utf8') utf8_connection.cursor().execute('SET SQL_MODE=ANSI_QUOTES') _test_unicode(utf8_connection) SKIP_TEST_POSTGRES = False try: import psycopg2 import sqlalchemy psycopg2.connect( 'host=%s dbname=%s user=%s password=%s' % (host, database, user, password) ) except Exception as e: SKIP_TEST_POSTGRES = 'SKIP psycopg2 tests: %s' % e finally: @pytest.mark.skipif(bool(SKIP_TEST_POSTGRES), reason=str(SKIP_TEST_POSTGRES)) def test_postgresql(): import psycopg2 import psycopg2.extensions psycopg2.extensions.register_type(psycopg2.extensions.UNICODE) psycopg2.extensions.register_type(psycopg2.extensions.UNICODEARRAY) # assume database already created dbapi_connection = psycopg2.connect( 'host=%s dbname=%s user=%s password=%s' % (host, database, user, password) ) # exercise using a dbapi_connection _setup_postgresql(dbapi_connection) _test_dbo(dbapi_connection) # exercise using a dbapi_cursor _setup_postgresql(dbapi_connection) dbapi_cursor = dbapi_connection.cursor() _test_dbo(dbapi_cursor) dbapi_cursor.close() # exercise sqlalchemy dbapi_connection _setup_postgresql(dbapi_connection) from sqlalchemy import create_engine sqlalchemy_engine = create_engine('postgresql+psycopg2://%s:%s@%s/%s' % (user, password, host, database)) sqlalchemy_connection = sqlalchemy_engine.connect() _test_dbo(sqlalchemy_connection) sqlalchemy_connection.close() # exercise sqlalchemy session _setup_postgresql(dbapi_connection) from sqlalchemy.orm import sessionmaker Session = sessionmaker(bind=sqlalchemy_engine) sqlalchemy_session = Session() _test_dbo(sqlalchemy_session) sqlalchemy_session.close() # other exercises _test_dbo(dbapi_connection, lambda: dbapi_connection.cursor(name='arbitrary')) _test_with_schema(dbapi_connection, 'public') _test_unicode(dbapi_connection) petl-1.7.15/petl/test/io/test_gsheet.py000066400000000000000000000206431457414240700200100ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division import datetime import os import json import time import pytest from petl.compat import text_type from petl.io.gsheet import fromgsheet, togsheet, appendgsheet from petl.test.helpers import ieq, get_env_vars_named gspread = pytest.importorskip("gspread") uuid = pytest.importorskip("uuid") # region helpers def _get_gspread_credentials(): json_path = os.getenv("PETL_GCP_JSON_PATH", None) if json_path is not None and os.path.exists(json_path): return json_path json_props = get_env_vars_named("PETL_GCP_CREDS_") if json_props is not None: return json_props user_path = os.path.expanduser("~/.config/gspread/service_account.json") if os.path.isfile(user_path) and os.path.exists(user_path): return user_path return None found_gcp_credentials = pytest.mark.skipif( _get_gspread_credentials() is None, reason="""SKIPPED. to/from gspread needs json credentials for testing. In order to run google spreadsheet tests, follow the steps bellow: 1. Create a json authorization file, following the steps described at http://gspread.readthedocs.io/en/latest/oauth2.html, and save to a local path 2. Point the envvar `PETL_GCP_JSON_PATH` to the json authorization file path 2. Or fill the properties inside the json authorization file in envrionment variables named with prefix PETL_GCP_CREDS_: PETL_GCP_CREDS_project_id=petl 3. Or else save the file in one of the following paths: unix: ~/.config/gspread/service_account.json windows: %APPDATA%\\gspread\\service_account.json""" ) def _get_env_credentials(): creds = _get_gspread_credentials() if isinstance(creds, dict): return creds if isinstance(creds, text_type): with open(creds, encoding="utf-8") as json_file: creds = json.load(json_file) return creds return None def _get_gspread_client(): credentials = _get_env_credentials() try: if credentials is None: gspread_client = gspread.service_account() else: gspread_client = gspread.service_account_from_dict(credentials) except gspread.exceptions.APIError as ex: pytest.skip("SKIPPED. to/from gspread authentication error: %s" % ex) return None return gspread_client def _get_env_sharing_emails(): emails = get_env_vars_named("PETL_GSHEET_EMAIL", remove_prefix=False) if emails is not None: return list(emails.values()) return [] def _get_gspread_test_params(): filename = "test-{}".format(str(uuid.uuid4())) gspread_client = _get_gspread_client() emails = _get_env_sharing_emails() return filename, gspread_client, emails def _test_to_fromg_sheet(table, sheetname, cell_range, expected): filename, gspread_client, emails = _get_gspread_test_params() # test to from gsheet spread_id = togsheet( table, gspread_client, filename, worksheet=sheetname, share_emails=emails ) try: result = fromgsheet( gspread_client, filename, worksheet=sheetname, cell_range=cell_range ) # make sure the expected_result matches the result ieq(expected, result) finally: # clean up created table gspread_client.del_spreadsheet(spread_id) def _test_append_from_gsheet(table_list, expected, sheetname=None): filename, gspread_client, emails = _get_gspread_test_params() # append from the second table from the list table1 = table_list[0] other_tables = table_list[1:] # create the spreadshteet and the 1st sheet spread_id = togsheet( table1, gspread_client, filename, worksheet=sheetname, share_emails=emails ) try: for tableN in other_tables: appendgsheet( tableN, gspread_client, spread_id, worksheet=sheetname, open_by_key=True ) # read the result appended to the sheet result = fromgsheet( gspread_client, spread_id, worksheet=sheetname, open_by_key=True ) # make sure the expected_result matches the result ieq(expected, result) finally: # clean up created table gspread_client.del_spreadsheet(spread_id) def teardown_function(): # try to avoid: User rate limit exceeded. time.sleep(3) # endregion # region test cases data TEST_TABLE = [ ["foo", "bar"], ["A", "1"], ["B", "2"], ["C", "3"], ["D", "random_stuff-in+_名字"], ["é", "3/4/2012"], ["F", "6"], ] # endregion # region test cases execution @found_gcp_credentials def test_tofromgsheet_01_basic(): _test_to_fromg_sheet( TEST_TABLE[:], None, None, TEST_TABLE[:] ) @found_gcp_credentials def test_tofromgsheet_02_uneven_row(): test_table_t1 = [x + ["3"] if i in [2] else x for i, x in enumerate(TEST_TABLE[:])] test_table_f1 = [x + [""] if len(x) < 3 else x for x in test_table_t1[:]] _test_to_fromg_sheet( test_table_t1, None, None, test_table_f1 ) @found_gcp_credentials def test_tofromgsheet_03_empty_table(): _test_to_fromg_sheet( (), None, None, () ) @found_gcp_credentials def test_tofromgsheet_04_cell_range(): test_table_f2 = [[x[1]] for x in TEST_TABLE[0:4]] _test_to_fromg_sheet( TEST_TABLE[:], None, "B1:B4", test_table_f2 ) @found_gcp_credentials def test_tofromgsheet_05_sheet_title(): _test_to_fromg_sheet( TEST_TABLE[:], "random_stuff-in+_名字", None, TEST_TABLE[:] ) @found_gcp_credentials @pytest.mark.xfail( raises=TypeError, reason="When this stop failing, uncomment datetime.date in TEST1 and TEST2" ) def test_tofromgsheet_06_datetime_date(): test_table_dt = [[x[0], datetime.date(2012, 5, 6)] if i in [5] else x for i, x in enumerate(TEST_TABLE[:])] _test_to_fromg_sheet( test_table_dt[:], None, "B1:B4", test_table_dt[:] ) @found_gcp_credentials def test_tofromgsheet_07_open_by_key(): filename, gspread_client, emails = _get_gspread_test_params() # test to from gsheet table = TEST_TABLE[:] # test to from gsheet spread_id = togsheet(table, gspread_client, filename, share_emails=emails) try: result = fromgsheet(gspread_client, spread_id, open_by_key=True) # make sure the expected_result matches the result ieq(table, result) finally: # clean up created table gspread_client.del_spreadsheet(spread_id) @found_gcp_credentials def test_tofromgsheet_08_recreate(): filename, gspread_client, emails = _get_gspread_test_params() # test to from gsheet table1 = TEST_TABLE[:] table2 = [[ x[0] , text_type(i)] if i > 0 else x for i, x in enumerate(table1)] # test to from gsheet spread_id = togsheet(table1, gspread_client, filename, share_emails=emails) try: result1 = fromgsheet(gspread_client, spread_id, open_by_key=True) ieq(table1, result1) spread_id2 = togsheet(table2, gspread_client, filename, share_emails=emails) try: result2 = fromgsheet(gspread_client, spread_id2, open_by_key=True) ieq(table2, result2) finally: gspread_client.del_spreadsheet(spread_id2) # make sure the expected_result matches the result finally: # clean up created table gspread_client.del_spreadsheet(spread_id) def _get_testcase_for_append(): table_list = [TEST_TABLE[:], TEST_TABLE[:]] expected = TEST_TABLE[:] + TEST_TABLE[1:] return table_list, expected @found_gcp_credentials def test_appendgsheet_10_double(): table_list, expected = _get_testcase_for_append() _test_append_from_gsheet(table_list, expected) @found_gcp_credentials def test_appendgsheet_11_named_sheet(): table_list, expected = _get_testcase_for_append() _test_append_from_gsheet(table_list, expected, sheetname="petl_append") @found_gcp_credentials def test_appendgsheet_12_other_sheet(): filename, gspread_client, emails = _get_gspread_test_params() # test to append gsheet table = TEST_TABLE[:] table2 = TEST_TABLE[1:] spread_id = togsheet(table, gspread_client, filename, share_emails=emails) try: appendgsheet(table, gspread_client, filename, worksheet="petl") # get the results from the 2 sheets result1 = fromgsheet(gspread_client, filename, worksheet=None) ieq(result1, table) result2 = fromgsheet(gspread_client, filename, worksheet="petl") ieq(result2, table2) finally: gspread_client.del_spreadsheet(spread_id) # endregion petl-1.7.15/petl/test/io/test_html.py000066400000000000000000000070511457414240700174730ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division from tempfile import NamedTemporaryFile import io from petl.test.helpers import eq_ from petl.io.html import tohtml def test_tohtml(): # exercise function table = (('foo', 'bar'), ('a', 1), ('b', (1, 2)), ('c', False)) f = NamedTemporaryFile(delete=False) tohtml(table, f.name, encoding='ascii', lineterminator='\n') # check what it did with io.open(f.name, mode='rt', encoding='ascii', newline='') as o: actual = o.read() expect = ( u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"
foobar
a1
b(1, 2)
cFalse
\n" ) eq_(expect, actual) def test_tohtml_caption(): # exercise function table = (('foo', 'bar'), ('a', 1), ('b', (1, 2))) f = NamedTemporaryFile(delete=False) tohtml(table, f.name, encoding='ascii', caption='my table', lineterminator='\n') # check what it did with io.open(f.name, mode='rt', encoding='ascii', newline='') as o: actual = o.read() expect = ( u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"
my table
foobar
a1
b(1, 2)
\n" ) eq_(expect, actual) def test_tohtml_with_style(): # exercise function table = (('foo', 'bar'), ('a', 1)) f = NamedTemporaryFile(delete=False) tohtml(table, f.name, encoding='ascii', lineterminator='\n', tr_style='text-align: right', td_styles='text-align: center') # check what it did with io.open(f.name, mode='rt', encoding='ascii', newline='') as o: actual = o.read() expect = ( u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"
foobar
a1
\n" ) eq_(expect, actual) def test_tohtml_headerless(): table = [] f = NamedTemporaryFile(delete=False) tohtml(table, f.name, encoding='ascii', lineterminator='\n') # check what it did with io.open(f.name, mode='rt', encoding='ascii', newline='') as o: actual = o.read() expect = ( u"\n" u"\n" u"\n" u"
\n" ) eq_(expect, actual) petl-1.7.15/petl/test/io/test_html_unicode.py000066400000000000000000000026441457414240700212040ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division import io from tempfile import NamedTemporaryFile from petl.test.helpers import eq_ from petl.io.html import tohtml def test_tohtml(): # exercise function tbl = ((u'name', u'id'), (u'Արամ Խաչատրյան', 1), (u'Johann Strauß', 2), (u'Вагиф Сәмәдоғлу', 3), (u'章子怡', 4)) fn = NamedTemporaryFile().name tohtml(tbl, fn, encoding='utf-8', lineterminator='\n') # check what it did f = io.open(fn, mode='rt', encoding='utf-8', newline='') actual = f.read() expect = ( u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"\n" u"
nameid
Արամ Խաչատրյան1
Johann Strauß2
Вагиф Сәмәдоғлу3
章子怡4
\n" ) eq_(expect, actual) petl-1.7.15/petl/test/io/test_json.py000066400000000000000000000172431457414240700175040ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division from collections import OrderedDict from tempfile import NamedTemporaryFile import json import pytest from petl.test.helpers import ieq from petl import fromjson, fromdicts, tojson, tojsonarrays def test_fromjson_1(): f = NamedTemporaryFile(delete=False, mode='w') data = '[{"foo": "a", "bar": 1}, ' \ '{"foo": "b", "bar": 2}, ' \ '{"foo": "c", "bar": 2}]' f.write(data) f.close() actual = fromjson(f.name, header=['foo', 'bar']) expect = (('foo', 'bar'), ('a', 1), ('b', 2), ('c', 2)) ieq(expect, actual) ieq(expect, actual) # verify can iterate twice def test_fromjson_2(): f = NamedTemporaryFile(delete=False, mode='w') data = '[{"foo": "a", "bar": 1}, ' \ '{"foo": "b"}, ' \ '{"foo": "c", "bar": 2, "baz": true}]' f.write(data) f.close() actual = fromjson(f.name, header=['bar', 'baz', 'foo']) expect = (('bar', 'baz', 'foo'), (1, None, 'a'), (None, None, 'b'), (2, True, 'c')) ieq(expect, actual) ieq(expect, actual) # verify can iterate twice def test_fromjson_3(): f = NamedTemporaryFile(delete=False, mode='w') data = '[{"foo": "a", "bar": 1}, ' \ '{"foo": "b"}, ' \ '{"foo": "c", "bar": 2, "baz": true}]' f.write(data) f.close() actual = fromjson(f.name, header=['foo', 'bar']) expect = (('foo', 'bar'), ('a', 1), ('b', None), ('c', 2)) ieq(expect, actual) ieq(expect, actual) # verify can iterate twice def test_fromdicts_1(): data = [{'foo': 'a', 'bar': 1}, {'foo': 'b', 'bar': 2}, {'foo': 'c', 'bar': 2}] actual = fromdicts(data, header=['foo', 'bar']) expect = (('foo', 'bar'), ('a', 1), ('b', 2), ('c', 2)) ieq(expect, actual) ieq(expect, actual) # verify can iterate twice def test_fromdicts_2(): data = [{'foo': 'a', 'bar': 1}, {'foo': 'b'}, {'foo': 'c', 'bar': 2, 'baz': True}] actual = fromdicts(data, header=['bar', 'baz', 'foo']) expect = (('bar', 'baz', 'foo'), (1, None, 'a'), (None, None, 'b'), (2, True, 'c')) ieq(expect, actual) ieq(expect, actual) # verify can iterate twice def test_fromdicts_3(): data = [{'foo': 'a', 'bar': 1}, {'foo': 'b'}, {'foo': 'c', 'bar': 2, 'baz': True}] actual = fromdicts(data, header=['foo', 'bar']) expect = (('foo', 'bar'), ('a', 1), ('b', None), ('c', 2)) ieq(expect, actual) ieq(expect, actual) # verify can iterate twice def test_fromdicts_onepass(): # check that fromdicts() only makes a single pass through the data data = iter([{'foo': 'a', 'bar': 1}, {'foo': 'b', 'bar': 2}, {'foo': 'c', 'bar': 2}]) actual = fromdicts(data, header=['foo', 'bar']) expect = (('foo', 'bar'), ('a', 1), ('b', 2), ('c', 2)) ieq(expect, actual) def test_fromdicts_ordered(): data = [OrderedDict([('foo', 'a'), ('bar', 1)]), OrderedDict([('foo', 'b')]), OrderedDict([('foo', 'c'), ('bar', 2), ('baz', True)])] actual = fromdicts(data) # N.B., fields come out in original order expect = (('foo', 'bar', 'baz'), ('a', 1, None), ('b', None, None), ('c', 2, True)) ieq(expect, actual) def test_fromdicts_missing(): data = [OrderedDict([('foo', 'a'), ('bar', 1)]), OrderedDict([('foo', 'b')]), OrderedDict([('foo', 'c'), ('bar', 2), ('baz', True)])] actual = fromdicts(data, missing="x") expect = (('foo', 'bar', 'baz'), ('a', 1, "x"), ('b', "x", "x"), ('c', 2, True)) ieq(expect, actual) def test_tojson(): # exercise function table = (('foo', 'bar'), ('a', 1), ('b', 2), ('c', 2)) f = NamedTemporaryFile(delete=False, mode='r') tojson(table, f.name) result = json.load(f) assert len(result) == 3 assert result[0]['foo'] == 'a' assert result[0]['bar'] == 1 assert result[1]['foo'] == 'b' assert result[1]['bar'] == 2 assert result[2]['foo'] == 'c' assert result[2]['bar'] == 2 def test_tojsonarrays(): # exercise function table = (('foo', 'bar'), ('a', 1), ('b', 2), ('c', 2)) f = NamedTemporaryFile(delete=False, mode='r') tojsonarrays(table, f.name) result = json.load(f) assert len(result) == 3 assert result[0][0] == 'a' assert result[0][1] == 1 assert result[1][0] == 'b' assert result[1][1] == 2 assert result[2][0] == 'c' assert result[2][1] == 2 def test_fromdicts_header_does_not_raise(): data = [{'foo': 'a', 'bar': 1}, {'foo': 'b', 'bar': 2}, {'foo': 'c', 'bar': 2}] actual = fromdicts(data) assert actual.header() def test_fromdicts_header_list(): data = [OrderedDict([('foo', 'a'), ('bar', 1)]), OrderedDict([('foo', 'b'), ('bar', 2)]), OrderedDict([('foo', 'c'), ('bar', 2)])] actual = fromdicts(data) header = actual.header() assert header == ('foo', 'bar') expect = (('foo', 'bar'), ('a', 1), ('b', 2), ('c', 2)) ieq(expect, actual) ieq(expect, actual) @pytest.fixture def dicts_generator(): def generator(): yield OrderedDict([('foo', 'a'), ('bar', 1)]) yield OrderedDict([('foo', 'b'), ('bar', 2)]) yield OrderedDict([('foo', 'c'), ('bar', 2)]) return generator() def test_fromdicts_generator_single(dicts_generator): actual = fromdicts(dicts_generator) expect = (('foo', 'bar'), ('a', 1), ('b', 2), ('c', 2)) ieq(expect, actual) def test_fromdicts_generator_twice(dicts_generator): actual = fromdicts(dicts_generator) expect = (('foo', 'bar'), ('a', 1), ('b', 2), ('c', 2)) ieq(expect, actual) ieq(expect, actual) def test_fromdicts_generator_header(dicts_generator): actual = fromdicts(dicts_generator) header = actual.header() assert header == ('foo', 'bar') expect = (('foo', 'bar'), ('a', 1), ('b', 2), ('c', 2)) ieq(expect, actual) ieq(expect, actual) def test_fromdicts_generator_random_access(): def generator(): for i in range(5): yield OrderedDict([('n', i), ('foo', 100*i), ('bar', 200*i)]) actual = fromdicts(generator(), sample=3) assert actual.header() == ('n', 'foo', 'bar') # first pass it1 = iter(actual) first_row1 = next(it1) first_row2 = next(it1) # second pass it2 = iter(actual) second_row1 = next(it2) second_row2 = next(it2) assert first_row1 == second_row1 assert first_row2 == second_row2 # reverse order second_row3 = next(it2) first_row3 = next(it1) assert second_row3 == first_row3 ieq(actual, actual) assert actual.header() == ('n', 'foo', 'bar') assert len(actual) == 6 def test_fromdicts_generator_missing(): def generator(): yield OrderedDict([('foo', 'a'), ('bar', 1)]) yield OrderedDict([('foo', 'b'), ('bar', 2)]) yield OrderedDict([('foo', 'c'), ('baz', 2)]) actual = fromdicts(generator(), missing="x") expect = (('foo', 'bar', 'baz'), ('a', 1, "x"), ('b', 2, "x"), ('c', "x", 2)) ieq(expect, actual) petl-1.7.15/petl/test/io/test_json_unicode.py000066400000000000000000000013501457414240700212020ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division import json from tempfile import NamedTemporaryFile from petl.test.helpers import ieq from petl.io.json import tojson, fromjson def test_json_unicode(): tbl = ((u'id', u'name'), (1, u'Արամ Խաչատրյան'), (2, u'Johann Strauß'), (3, u'Вагиф Сәмәдоғлу'), (4, u'章子怡'), ) fn = NamedTemporaryFile().name tojson(tbl, fn) result = json.load(open(fn)) assert len(result) == 4 for a, b in zip(tbl[1:], result): assert a[0] == b['id'] assert a[1] == b['name'] actual = fromjson(fn, header=['id', 'name']) ieq(tbl, actual) petl-1.7.15/petl/test/io/test_jsonl.py000066400000000000000000000056061457414240700176600ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division from tempfile import NamedTemporaryFile import json from petl import fromjson, tojson from petl.test.helpers import ieq def test_fromjson_1(): f = NamedTemporaryFile(delete=False, mode='w') data = '{"name": "Gilbert", "wins": [["straight", "7S"], ["one pair", "10H"]]}\n' \ '{"name": "Alexa", "wins": [["two pair", "4S"], ["two pair", "9S"]]}\n' \ '{"name": "May", "wins": []}\n' \ '{"name": "Deloise", "wins": [["three of a kind", "5S"]]}' f.write(data) f.close() actual = fromjson(f.name, header=['name', 'wins'], lines=True) expect = (('name', 'wins'), ('Gilbert', [["straight", "7S"], ["one pair", "10H"]]), ('Alexa', [["two pair", "4S"], ["two pair", "9S"]]), ('May', []), ('Deloise', [["three of a kind", "5S"]])) ieq(expect, actual) ieq(expect, actual) # verify can iterate twice def test_fromjson_2(): f = NamedTemporaryFile(delete=False, mode='w') data = '{"foo": "bar1", "baz": 1}\n' \ '{"foo": "bar2", "baz": 2}\n' \ '{"foo": "bar3", "baz": 3}\n' \ '{"foo": "bar4", "baz": 4}\n' f.write(data) f.close() actual = fromjson(f.name, header=['foo', 'baz'], lines=True) expect = (('foo', 'baz'), ('bar1', 1), ('bar2', 2), ('bar3', 3), ('bar4', 4)) ieq(expect, actual) ieq(expect, actual) # verify can iterate twice def test_tojson_1(): table = (('foo', 'bar'), ('a', 1), ('b', 2), ('c', 2)) f = NamedTemporaryFile(delete=False, mode='r') tojson(table, f.name, lines=True) result = [] for line in f: result.append(json.loads(line)) assert len(result) == 3 assert result[0]['foo'] == 'a' assert result[0]['bar'] == 1 assert result[1]['foo'] == 'b' assert result[1]['bar'] == 2 assert result[2]['foo'] == 'c' assert result[2]['bar'] == 2 def test_tojson_2(): table = [['name', 'wins'], ['Gilbert', [['straight', '7S'], ['one pair', '10H']]], ['Alexa', [['two pair', '4S'], ['two pair', '9S']]], ['May', []], ['Deloise', [['three of a kind', '5S']]]] f = NamedTemporaryFile(delete=False, mode='r') tojson(table, f.name, lines=True) result = [] for line in f: result.append(json.loads(line)) assert len(result) == 4 assert result[0]['name'] == 'Gilbert' assert result[0]['wins'] == [['straight', '7S'], ['one pair', '10H']] assert result[1]['name'] == 'Alexa' assert result[1]['wins'] == [['two pair', '4S'], ['two pair', '9S']] assert result[2]['name'] == 'May' assert result[2]['wins'] == [] assert result[3]['name'] == 'Deloise' assert result[3]['wins'] == [['three of a kind', '5S']] petl-1.7.15/petl/test/io/test_numpy.py000066400000000000000000000135721457414240700177040ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division import pytest import petl as etl from petl.test.helpers import ieq, eq_, assert_almost_equal from petl.io.numpy import toarray, fromarray, torecarray try: # noinspection PyUnresolvedReferences import numpy as np except ImportError as e: pytest.skip('SKIP numpy tests: %s' % e, allow_module_level=True) else: def test_toarray_nodtype(): t = [('foo', 'bar', 'baz'), ('apples', 1, 2.5), ('oranges', 3, 4.4), ('pears', 7, .1)] a = toarray(t) assert isinstance(a, np.ndarray) assert isinstance(a['foo'], np.ndarray) assert isinstance(a['bar'], np.ndarray) assert isinstance(a['baz'], np.ndarray) eq_('apples', a['foo'][0]) eq_('oranges', a['foo'][1]) eq_('pears', a['foo'][2]) eq_(1, a['bar'][0]) eq_(3, a['bar'][1]) eq_(7, a['bar'][2]) assert_almost_equal(2.5, a['baz'][0]) assert_almost_equal(4.4, a['baz'][1]) assert_almost_equal(.1, a['baz'][2]) def test_toarray_lists(): t = [['foo', 'bar', 'baz'], ['apples', 1, 2.5], ['oranges', 3, 4.4], ['pears', 7, .1]] a = toarray(t) assert isinstance(a, np.ndarray) assert isinstance(a['foo'], np.ndarray) assert isinstance(a['bar'], np.ndarray) assert isinstance(a['baz'], np.ndarray) eq_('apples', a['foo'][0]) eq_('oranges', a['foo'][1]) eq_('pears', a['foo'][2]) eq_(1, a['bar'][0]) eq_(3, a['bar'][1]) eq_(7, a['bar'][2]) assert_almost_equal(2.5, a['baz'][0], places=6) assert_almost_equal(4.4, a['baz'][1], places=6) assert_almost_equal(.1, a['baz'][2], places=6) def test_torecarray(): t = [('foo', 'bar', 'baz'), ('apples', 1, 2.5), ('oranges', 3, 4.4), ('pears', 7, .1)] a = torecarray(t) assert isinstance(a, np.ndarray) assert isinstance(a.foo, np.ndarray) assert isinstance(a.bar, np.ndarray) assert isinstance(a.baz, np.ndarray) eq_('apples', a.foo[0]) eq_('oranges', a.foo[1]) eq_('pears', a.foo[2]) eq_(1, a.bar[0]) eq_(3, a.bar[1]) eq_(7, a.bar[2]) assert_almost_equal(2.5, a.baz[0], places=6) assert_almost_equal(4.4, a.baz[1], places=6) assert_almost_equal(.1, a.baz[2], places=6) def test_toarray_stringdtype(): t = [('foo', 'bar', 'baz'), ('apples', 1, 2.5), ('oranges', 3, 4.4), ('pears', 7, .1)] a = toarray(t, dtype='U4, i2, f4') assert isinstance(a, np.ndarray) assert isinstance(a['foo'], np.ndarray) assert isinstance(a['bar'], np.ndarray) assert isinstance(a['baz'], np.ndarray) eq_('appl', a['foo'][0]) eq_('oran', a['foo'][1]) eq_('pear', a['foo'][2]) eq_(1, a['bar'][0]) eq_(3, a['bar'][1]) eq_(7, a['bar'][2]) assert_almost_equal(2.5, a['baz'][0], places=6) assert_almost_equal(4.4, a['baz'][1], places=6) assert_almost_equal(.1, a['baz'][2], places=6) def test_toarray_dictdtype(): t = [('foo', 'bar', 'baz'), ('apples', 1, 2.5), ('oranges', 3, 4.4), ('pears', 7, .1)] a = toarray(t, dtype={'foo': 'U4'}) # specify partial dtype assert isinstance(a, np.ndarray) assert isinstance(a['foo'], np.ndarray) assert isinstance(a['bar'], np.ndarray) assert isinstance(a['baz'], np.ndarray) eq_('appl', a['foo'][0]) eq_('oran', a['foo'][1]) eq_('pear', a['foo'][2]) eq_(1, a['bar'][0]) eq_(3, a['bar'][1]) eq_(7, a['bar'][2]) assert_almost_equal(2.5, a['baz'][0]) assert_almost_equal(4.4, a['baz'][1]) assert_almost_equal(.1, a['baz'][2]) def test_toarray_explicitdtype(): t = [('foo', 'bar', 'baz'), ('apples', 1, 2.5), ('oranges', 3, 4.4), ('pears', 7, .1)] a = toarray(t, dtype=[('A', 'U4'), ('B', 'i2'), ('C', 'f4')]) assert isinstance(a, np.ndarray) assert isinstance(a['A'], np.ndarray) assert isinstance(a['B'], np.ndarray) assert isinstance(a['C'], np.ndarray) eq_('appl', a['A'][0]) eq_('oran', a['A'][1]) eq_('pear', a['A'][2]) eq_(1, a['B'][0]) eq_(3, a['B'][1]) eq_(7, a['B'][2]) assert_almost_equal(2.5, a['C'][0], places=6) assert_almost_equal(4.4, a['C'][1], places=6) assert_almost_equal(.1, a['C'][2], places=6) def test_fromarray(): t = [('foo', 'bar', 'baz'), ('apples', 1, 2.5), ('oranges', 3, 4.4), ('pears', 7, .1)] a = toarray(t) u = fromarray(a) ieq(t, u) def test_integration(): t = etl.wrap([('foo', 'bar', 'baz'), ('apples', 1, 2.5), ('oranges', 3, 4.4), ('pears', 7, .1)]) a = t.toarray() u = etl.fromarray(a).convert('bar', int) ieq(t, u) def test_valuesarray_no_dtype(): t = [('foo', 'bar', 'baz'), ('apples', 1, 2.5), ('oranges', 3, 4.4), ('pears', 7, .1)] expect = np.array([1, 3, 7]) actual = etl.wrap(t).values('bar').array() eq_(expect.dtype, actual.dtype) assert np.all(expect == actual) def test_valuesarray_explicit_dtype(): t = [('foo', 'bar', 'baz'), ('apples', 1, 2.5), ('oranges', 3, 4.4), ('pears', 7, .1)] expect = np.array([1, 3, 7], dtype='i2') actual = etl.wrap(t).values('bar').array(dtype='i2') eq_(expect.dtype, actual.dtype) assert np.all(expect == actual) petl-1.7.15/petl/test/io/test_pandas.py000066400000000000000000000026521457414240700177770ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import division, print_function, absolute_import import pytest import petl as etl from petl.test.helpers import ieq from petl.io.pandas import todataframe, fromdataframe try: # noinspection PyUnresolvedReferences import pandas as pd except ImportError as e: pytest.skip('SKIP pandas tests: %s' % e, allow_module_level=True) else: def test_todataframe(): tbl = [('foo', 'bar', 'baz'), ('apples', 1, 2.5), ('oranges', 3, 4.4), ('pears', 7, .1)] expect = pd.DataFrame.from_records(tbl[1:], columns=tbl[0]) actual = todataframe(tbl) assert expect.equals(actual) def test_headerless(): tbl = [] expect = pd.DataFrame() actual = todataframe(tbl) assert expect.equals(actual) def test_fromdataframe(): tbl = [('foo', 'bar', 'baz'), ('apples', 1, 2.5), ('oranges', 3, 4.4), ('pears', 7, .1)] df = pd.DataFrame.from_records(tbl[1:], columns=tbl[0]) ieq(tbl, fromdataframe(df)) ieq(tbl, fromdataframe(df)) def test_integration(): tbl = [('foo', 'bar', 'baz'), ('apples', 1, 2.5), ('oranges', 3, 4.4), ('pears', 7, .1)] df = etl.wrap(tbl).todataframe() tbl2 = etl.fromdataframe(df) ieq(tbl, tbl2) ieq(tbl, tbl2) petl-1.7.15/petl/test/io/test_pickle.py000066400000000000000000000033251457414240700177760ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division from tempfile import NamedTemporaryFile from petl.compat import pickle from petl.test.helpers import ieq from petl.io.pickle import frompickle, topickle, appendpickle def picklereader(fl): try: while True: yield pickle.load(fl) except EOFError: pass def test_frompickle(): f = NamedTemporaryFile(delete=False) table = (('foo', 'bar'), ('a', 1), ('b', 2), ('c', 2)) for row in table: pickle.dump(row, f) f.close() actual = frompickle(f.name) ieq(table, actual) ieq(table, actual) # verify can iterate twice def test_topickle_appendpickle(): # exercise function table = (('foo', 'bar'), ('a', 1), ('b', 2), ('c', 2)) f = NamedTemporaryFile(delete=False) topickle(table, f.name) # check what it did with open(f.name, 'rb') as o: actual = picklereader(o) ieq(table, actual) # check appending table2 = (('foo', 'bar'), ('d', 7), ('e', 9), ('f', 1)) appendpickle(table2, f.name) # check what it did with open(f.name, 'rb') as o: actual = picklereader(o) expect = (('foo', 'bar'), ('a', 1), ('b', 2), ('c', 2), ('d', 7), ('e', 9), ('f', 1)) ieq(expect, actual) def test_topickle_headerless(): table = [] f = NamedTemporaryFile(delete=False) topickle(table, f.name) expect = [] with open(f.name, 'rb') as o: ieq(expect, picklereader(o)) petl-1.7.15/petl/test/io/test_pytables.py000066400000000000000000000153271457414240700203570ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import division, print_function, absolute_import from itertools import chain from tempfile import NamedTemporaryFile import pytest from petl.test.helpers import ieq from petl.transform.sorts import sort import petl as etl from petl.io.pytables import fromhdf5, fromhdf5sorted, tohdf5, appendhdf5 try: # noinspection PyUnresolvedReferences import tables except ImportError as e: pytest.skip('SKIP pytables tests: %s' % e, allow_module_level=True) else: class FooBar(tables.IsDescription): foo = tables.Int32Col(pos=0) bar = tables.StringCol(6, pos=2) def test_fromhdf5(): f = NamedTemporaryFile() # set up a new hdf5 table to work with h5file = tables.open_file(f.name, mode='w', title='Test file') h5file.create_group('/', 'testgroup', 'Test Group') h5table = h5file.create_table('/testgroup', 'testtable', FooBar, 'Test Table') # load some data into the table table1 = (('foo', 'bar'), (1, b'asdfgh'), (2, b'qwerty'), (3, b'zxcvbn')) for row in table1[1:]: for i, fld in enumerate(table1[0]): h5table.row[fld] = row[i] h5table.row.append() h5file.flush() h5file.close() # verify we can get the data back out table2a = fromhdf5(f.name, '/testgroup', 'testtable') ieq(table1, table2a) ieq(table1, table2a) # verify we can get the data back out table2b = fromhdf5(f.name, '/testgroup/testtable') ieq(table1, table2b) ieq(table1, table2b) # verify using an existing tables.File object h5file = tables.open_file(f.name) table3 = fromhdf5(h5file, '/testgroup/testtable') ieq(table1, table3) # verify using an existing tables.Table object h5tbl = h5file.get_node('/testgroup/testtable') table4 = fromhdf5(h5tbl) ieq(table1, table4) # verify using a condition to filter data table5 = fromhdf5(h5tbl, condition="(foo < 3)") ieq(table1[:3], table5) # clean up h5file.close() def test_fromhdf5sorted(): f = NamedTemporaryFile() # set up a new hdf5 table to work with h5file = tables.open_file(f.name, mode='w', title='Test file') h5file.create_group('/', 'testgroup', 'Test Group') h5table = h5file.create_table('/testgroup', 'testtable', FooBar, 'Test Table') # load some data into the table table1 = (('foo', 'bar'), (3, b'asdfgh'), (2, b'qwerty'), (1, b'zxcvbn')) for row in table1[1:]: for i, f in enumerate(table1[0]): h5table.row[f] = row[i] h5table.row.append() h5table.cols.foo.create_csindex() h5file.flush() # verify we can get the data back out table2 = fromhdf5sorted(h5table, sortby='foo') ieq(sort(table1, 'foo'), table2) ieq(sort(table1, 'foo'), table2) # clean up h5file.close() def test_tohdf5(): f = NamedTemporaryFile() # set up a new hdf5 table to work with h5file = tables.open_file(f.name, mode="w", title="Test file") h5file.create_group('/', 'testgroup', 'Test Group') h5file.create_table('/testgroup', 'testtable', FooBar, 'Test Table') h5file.flush() h5file.close() # load some data via tohdf5 table1 = (('foo', 'bar'), (1, b'asdfgh'), (2, b'qwerty'), (3, b'zxcvbn')) tohdf5(table1, f.name, '/testgroup', 'testtable') ieq(table1, fromhdf5(f.name, '/testgroup', 'testtable')) tohdf5(table1, f.name, '/testgroup/testtable') ieq(table1, fromhdf5(f.name, '/testgroup/testtable')) h5file = tables.open_file(f.name, mode="a") tohdf5(table1, h5file, '/testgroup/testtable') ieq(table1, fromhdf5(h5file, '/testgroup/testtable')) h5table = h5file.get_node('/testgroup/testtable') tohdf5(table1, h5table) ieq(table1, fromhdf5(h5table)) # clean up h5file.close() def test_tohdf5_create(): table1 = (('foo', 'bar'), (1, b'asdfgh'), (2, b'qwerty'), (3, b'zxcvbn')) f = NamedTemporaryFile() # test creation with defined datatype tohdf5(table1, f.name, '/testgroup', 'testtable', create=True, drop=True, description=FooBar, createparents=True) ieq(table1, fromhdf5(f.name, '/testgroup', 'testtable')) # test dynamically determined datatype tohdf5(table1, f.name, '/testgroup', 'testtable2', create=True, drop=True, createparents=True) ieq(table1, fromhdf5(f.name, '/testgroup', 'testtable2')) def test_appendhdf5(): f = NamedTemporaryFile() # set up a new hdf5 table to work with h5file = tables.open_file(f.name, mode="w", title="Test file") h5file.create_group('/', 'testgroup', 'Test Group') h5file.create_table('/testgroup', 'testtable', FooBar, 'Test Table') h5file.flush() h5file.close() # load some initial data via tohdf5() table1 = (('foo', 'bar'), (1, b'asdfgh'), (2, b'qwerty'), (3, b'zxcvbn')) tohdf5(table1, f.name, '/testgroup', 'testtable') ieq(table1, fromhdf5(f.name, '/testgroup', 'testtable')) # append some more data appendhdf5(table1, f.name, '/testgroup', 'testtable') ieq(chain(table1, table1[1:]), fromhdf5(f.name, '/testgroup', 'testtable')) def test_integration(): f = NamedTemporaryFile() # set up a new hdf5 table to work with h5file = tables.open_file(f.name, mode="w", title="Test file") h5file.create_group('/', 'testgroup', 'Test Group') h5file.create_table('/testgroup', 'testtable', FooBar, 'Test Table') h5file.flush() h5file.close() # load some initial data via tohdf5() table1 = etl.wrap((('foo', 'bar'), (1, b'asdfgh'), (2, b'qwerty'), (3, b'zxcvbn'))) table1.tohdf5(f.name, '/testgroup', 'testtable') ieq(table1, etl.fromhdf5(f.name, '/testgroup', 'testtable')) # append some more data table1.appendhdf5(f.name, '/testgroup', 'testtable') ieq(chain(table1, table1[1:]), etl.fromhdf5(f.name, '/testgroup', 'testtable')) petl-1.7.15/petl/test/io/test_remotes.py000066400000000000000000000140441457414240700202050ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division import sys import os from importlib import import_module import pytest from petl.compat import PY3 from petl.test.helpers import ieq, eq_ from petl.io.avro import fromavro, toavro from petl.io.csv import fromcsv, tocsv from petl.io.json import fromjson, tojson from petl.io.xlsx import fromxlsx, toxlsx from petl.io.xls import fromxls, toxls from petl.util.vis import look # region Codec test cases def test_helper_local(): if PY3: _ensure_dir("./tmp") _write_read_into_url("./tmp/example.") def test_helper_fsspec(): try: # pylint: disable=unused-import import fsspec # noqa: F401 except ImportError as e: pytest.skip("SKIP FSSPEC helper tests: %s" % e) else: _write_read_from_env_matching("PETL_TEST_") def test_helper_smb(): try: # pylint: disable=unused-import import smbclient # noqa: F401 except ImportError as e: pytest.skip("SKIP SMB helper tests: %s" % e) else: _write_read_from_env_url("PETL_SMB_URL") def test_helper_smb_url_parse(): from petl.io.remotes import _parse_smb_url url = r"smb://workgroup;user:password@server:444/share/folder/file.csv" domain, host, port, user, passwd, server_path = _parse_smb_url(url) # print("Parsed:", domain, host, port, user, passwd, server_path) eq_(domain, r"workgroup") eq_(host, r"server") eq_(port, 444) eq_(user, r"user") eq_(passwd, r"password") eq_(server_path, "\\\\server\\share\\folder\\file.csv") # endregion # region Execution def _ensure_dir(directory): if not os.path.exists(directory): os.makedirs(directory) def _write_read_from_env_matching(prefix): q = 0 for variable, base_url in os.environ.items(): if variable.upper().startswith(prefix.upper()): fmsg = "\n {}: {} -> ".format(variable, base_url) print(fmsg, file=sys.stderr, end="") _write_read_into_url(base_url) print("DONE ", file=sys.stderr, end="") q += 1 if q < 1: msg = """SKIPPED For testing remote source define a environment variable: $ export PETL_TEST_='://myuser:mypassword@host:port/path/to/folder'""" print(msg, file=sys.stderr) def _write_read_from_env_url(env_var_name): base_url = os.getenv(env_var_name, "skip") if base_url == "skip": print("SKIPPED ", file=sys.stderr, end="") else: _write_read_into_url(base_url) print("DONE ", file=sys.stderr, end="") def _write_read_into_url(base_url): _write_read_file_into_url(base_url, "filename10.csv") _write_read_file_into_url(base_url, "filename11.csv", "gz") _write_read_file_into_url(base_url, "filename12.csv", "xz") _write_read_file_into_url(base_url, "filename13.csv", "zst") _write_read_file_into_url(base_url, "filename14.csv", "lz4") _write_read_file_into_url(base_url, "filename15.csv", "snappy") _write_read_file_into_url(base_url, "filename20.json") _write_read_file_into_url(base_url, "filename21.json", "gz") _write_read_file_into_url(base_url, "filename30.avro", pkg='fastavro') _write_read_file_into_url(base_url, "filename40.xlsx", pkg='openpyxl') _write_read_file_into_url(base_url, "filename50.xls", pkg='xlwt') def _build_source_url_from(base_url, filename, compression=None): is_local = base_url.startswith("./") if compression is not None: if is_local: return None filename = filename + "." + compression import fsspec codec = fsspec.utils.infer_compression(filename) if codec is None: print("\n - %s SKIPPED " % filename, file=sys.stderr, end="") return None print("\n - %s " % filename, file=sys.stderr, end="") if is_local: source_url = base_url + filename else: source_url = os.path.join(base_url, filename) return source_url def _write_read_file_into_url(base_url, filename, compression=None, pkg=None): if not _is_installed(pkg, filename): return source_url = _build_source_url_from(base_url, filename, compression) if source_url is None: return actual = None if ".avro" in filename: toavro(_table, source_url) actual = fromavro(source_url) elif ".xlsx" in filename: toxlsx(_table, source_url, 'test1', mode='overwrite') toxlsx(_table2, source_url, 'test2', mode='add') actual = fromxlsx(source_url, 'test1') elif ".xls" in filename: toxls(_table, source_url, 'test') actual = fromxls(source_url, 'test') elif ".json" in filename: tojson(_table, source_url) actual = fromjson(source_url) elif ".csv" in filename: tocsv(_table, source_url, encoding="ascii", lineterminator="\n") actual = fromcsv(source_url, encoding="ascii") if actual is not None: _show__rows_from("Expected:", _table) _show__rows_from("Actual:", actual) ieq(_table, actual) ieq(_table, actual) # verify can iterate twice else: print("\n - %s SKIPPED " % filename, file=sys.stderr, end="") def _show__rows_from(label, test_rows, limit=0): print(label) print(look(test_rows, limit=limit)) def _is_installed(package_name, message=None): if package_name is None: return True # Not required try: mod = import_module(package_name) found = mod is not None if not found: msg = message or package_name print("\n - %s SKIPPED " % msg, file=sys.stderr, end="") return found except Exception as exm: print(exm, file=sys.stderr) return False # endregion # region Mockup data _table = ( (u"name", u"friends", u"age"), (u"Bob", "42", "33"), (u"Jim", "13", "69"), (u"Joe", "86", "17"), (u"Ted", "23", "51"), ) _table2 = ( (u"name", u"friends", u"age"), (u"Giannis", "31", "12"), (u"James", "38", "8"), (u"Stephen", "28", "4"), (u"Jason", "23", "12"), ) # endregion petl-1.7.15/petl/test/io/test_sources.py000066400000000000000000000076711457414240700202220ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division import gzip import bz2 import zipfile from tempfile import NamedTemporaryFile from petl.compat import PY2 from petl.test.helpers import ieq, eq_ import petl as etl from petl.io.sources import MemorySource, PopenSource, ZipSource, \ StdoutSource, GzipSource, BZ2Source def test_memorysource(): tbl1 = (('foo', 'bar'), ('a', '1'), ('b', '2'), ('c', '2')) # test writing to a string buffer ss = MemorySource() etl.tocsv(tbl1, ss) expect = "foo,bar\r\na,1\r\nb,2\r\nc,2\r\n" if not PY2: expect = expect.encode('ascii') actual = ss.getvalue() eq_(expect, actual) # test reading from a string buffer tbl2 = etl.fromcsv(MemorySource(actual)) ieq(tbl1, tbl2) ieq(tbl1, tbl2) # test appending etl.appendcsv(tbl1, ss) actual = ss.getvalue() expect = "foo,bar\r\na,1\r\nb,2\r\nc,2\r\na,1\r\nb,2\r\nc,2\r\n" if not PY2: expect = expect.encode('ascii') eq_(expect, actual) def test_memorysource_2(): data = 'foo,bar\r\na,1\r\nb,2\r\nc,2\r\n' if not PY2: data = data.encode('ascii') actual = etl.fromcsv(MemorySource(data)) expect = (('foo', 'bar'), ('a', '1'), ('b', '2'), ('c', '2')) ieq(expect, actual) ieq(expect, actual) def test_popensource(): expect = (('foo', 'bar'),) delimiter = ' ' actual = etl.fromcsv(PopenSource(r'echo foo bar', shell=True), delimiter=delimiter) ieq(expect, actual) def test_zipsource(): # setup tbl = [('foo', 'bar'), ('a', '1'), ('b', '2')] fn_tsv = NamedTemporaryFile().name etl.totsv(tbl, fn_tsv) fn_zip = NamedTemporaryFile().name z = zipfile.ZipFile(fn_zip, mode='w') z.write(fn_tsv, 'data.tsv') z.close() # test actual = etl.fromtsv(ZipSource(fn_zip, 'data.tsv')) ieq(tbl, actual) def test_stdoutsource(): tbl = [('foo', 'bar'), ('a', 1), ('b', 2)] etl.tocsv(tbl, StdoutSource(), encoding='ascii') etl.tohtml(tbl, StdoutSource(), encoding='ascii') etl.topickle(tbl, StdoutSource()) def test_stdoutsource_none(capfd): tbl = [('foo', 'bar'), ('a', 1), ('b', 2)] etl.tocsv(tbl, encoding='ascii') captured = capfd.readouterr() outp = captured.out # TODO: capfd works on vscode but not in console/tox if outp: assert outp in ( 'foo,bar\r\na,1\r\nb,2\r\n' , 'foo,bar\na,1\nb,2\n' ) def test_stdoutsource_unicode(): tbl = [('foo', 'bar'), (u'Արամ Խաչատրյան', 1), (u'Johann Strauß', 2)] etl.tocsv(tbl, StdoutSource(), encoding='utf-8') etl.tohtml(tbl, StdoutSource(), encoding='utf-8') etl.topickle(tbl, StdoutSource()) def test_gzipsource(): # setup tbl = [('foo', 'bar'), ('a', '1'), ('b', '2')] fn = NamedTemporaryFile().name + '.gz' expect = b"foo,bar\na,1\nb,2\n" # write explicit etl.tocsv(tbl, GzipSource(fn), lineterminator='\n') actual = gzip.open(fn).read() eq_(expect, actual) # write implicit etl.tocsv(tbl, fn, lineterminator='\n') actual = gzip.open(fn).read() eq_(expect, actual) # read explicit tbl2 = etl.fromcsv(GzipSource(fn)) ieq(tbl, tbl2) # read implicit tbl2 = etl.fromcsv(fn) ieq(tbl, tbl2) def test_bzip2source(): # setup tbl = [('foo', 'bar'), ('a', '1'), ('b', '2')] fn = NamedTemporaryFile().name + '.bz2' expect = b"foo,bar\na,1\nb,2\n" # write explicit etl.tocsv(tbl, BZ2Source(fn), lineterminator='\n') actual = bz2.BZ2File(fn).read() eq_(expect, actual) # write implicit etl.tocsv(tbl, fn, lineterminator='\n') actual = bz2.BZ2File(fn).read() eq_(expect, actual) # read explicit tbl2 = etl.fromcsv(BZ2Source(fn)) ieq(tbl, tbl2) # read implicit tbl2 = etl.fromcsv(fn) ieq(tbl, tbl2) petl-1.7.15/petl/test/io/test_sqlite3.py000066400000000000000000000113041457414240700201070ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division from tempfile import NamedTemporaryFile import sqlite3 from petl.test.helpers import ieq from petl.io.db import fromdb, todb, appenddb def test_fromsqlite3(): # initial data f = NamedTemporaryFile(delete=False) f.close() data = (('a', 1), ('b', 2), ('c', 2.0)) connection = sqlite3.connect(f.name) c = connection.cursor() c.execute('CREATE TABLE foobar (foo, bar)') for row in data: c.execute('INSERT INTO foobar VALUES (?, ?)', row) connection.commit() c.close() connection.close() # test the function actual = fromdb(f.name, 'SELECT * FROM foobar') expect = (('foo', 'bar'), ('a', 1), ('b', 2), ('c', 2.0)) ieq(expect, actual, cast=tuple) ieq(expect, actual, cast=tuple) # verify can iterate twice def test_fromsqlite3_connection(): # initial data data = (('a', 1), ('b', 2), ('c', 2.0)) connection = sqlite3.connect(':memory:') c = connection.cursor() c.execute('CREATE TABLE foobar (foo, bar)') for row in data: c.execute('INSERT INTO foobar VALUES (?, ?)', row) connection.commit() c.close() # test the function actual = fromdb(connection, 'SELECT * FROM foobar') expect = (('foo', 'bar'), ('a', 1), ('b', 2), ('c', 2.0)) ieq(expect, actual, cast=tuple) ieq(expect, actual, cast=tuple) # verify can iterate twice def test_fromsqlite3_withargs(): # initial data data = (('a', 1), ('b', 2), ('c', 2.0)) connection = sqlite3.connect(':memory:') c = connection.cursor() c.execute('CREATE TABLE foobar (foo, bar)') for row in data: c.execute('INSERT INTO foobar VALUES (?, ?)', row) connection.commit() c.close() # test the function actual = fromdb( connection, 'SELECT * FROM foobar WHERE bar > ? AND bar < ?', (1, 3) ) expect = (('foo', 'bar'), ('b', 2), ('c', 2.0)) ieq(expect, actual) ieq(expect, actual) # verify can iterate twice def test_tosqlite3_appendsqlite3(): # exercise function table = (('foo', 'bar'), ('a', 1), ('b', 2), ('c', 2)) f = NamedTemporaryFile(delete=False) f.close() conn = sqlite3.connect(f.name) conn.execute('CREATE TABLE foobar (foo TEXT, bar INT)') conn.close() todb(table, f.name, 'foobar') # check what it did conn = sqlite3.connect(f.name) actual = conn.execute('SELECT * FROM foobar') expect = (('a', 1), ('b', 2), ('c', 2)) ieq(expect, actual) # check appending table2 = (('foo', 'bar'), ('d', 7), ('e', 9), ('f', 1)) appenddb(table2, f.name, 'foobar') # check what it did conn = sqlite3.connect(f.name) actual = conn.execute('SELECT * FROM foobar') expect = (('a', 1), ('b', 2), ('c', 2), ('d', 7), ('e', 9), ('f', 1)) ieq(expect, actual) def test_tosqlite3_appendsqlite3_connection(): conn = sqlite3.connect(':memory:') conn.execute('CREATE TABLE foobar (foo TEXT, bar INT)') # exercise function table = (('foo', 'bar'), ('a', 1), ('b', 2), ('c', 2)) todb(table, conn, 'foobar') # check what it did actual = conn.execute('SELECT * FROM foobar') expect = (('a', 1), ('b', 2), ('c', 2)) ieq(expect, actual) # check appending table2 = (('foo', 'bar'), ('d', 7), ('e', 9), ('f', 1)) appenddb(table2, conn, 'foobar') # check what it did actual = conn.execute('SELECT * FROM foobar') expect = (('a', 1), ('b', 2), ('c', 2), ('d', 7), ('e', 9), ('f', 1)) ieq(expect, actual) def test_tosqlite3_identifiers(): # exercise function table = (('foo foo', 'bar.baz.spong`'), ('a', 1), ('b', 2), ('c', 2)) f = NamedTemporaryFile(delete=False) f.close() conn = sqlite3.connect(f.name) conn.execute('CREATE TABLE "foo "" bar`" ' '("foo foo" TEXT, "bar.baz.spong`" INT)') conn.close() todb(table, f.name, 'foo " bar`') # check what it did conn = sqlite3.connect(f.name) actual = conn.execute('SELECT * FROM `foo " bar```') expect = (('a', 1), ('b', 2), ('c', 2)) ieq(expect, actual) # TODO test uneven rows petl-1.7.15/petl/test/io/test_tees.py000066400000000000000000000144371457414240700174750ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division from tempfile import NamedTemporaryFile from petl.test.helpers import ieq import petl as etl def test_teepickle(): t1 = (('foo', 'bar'), ('a', 2), ('b', 1), ('c', 3)) f1 = NamedTemporaryFile(delete=False) f2 = NamedTemporaryFile(delete=False) etl.wrap(t1).teepickle(f1.name).selectgt('bar', 1).topickle(f2.name) ieq(t1, etl.frompickle(f1.name)) ieq(etl.wrap(t1).selectgt('bar', 1), etl.frompickle(f2.name)) def test_teecsv(): t1 = (('foo', 'bar'), ('a', 2), ('b', 1), ('c', 3)) f1 = NamedTemporaryFile(delete=False) f2 = NamedTemporaryFile(delete=False) (etl .wrap(t1) .teecsv(f1.name, encoding='ascii') .selectgt('bar', 1) .tocsv(f2.name, encoding='ascii')) ieq(t1, etl.fromcsv(f1.name, encoding='ascii').convertnumbers()) ieq(etl.wrap(t1).selectgt('bar', 1), etl.fromcsv(f2.name, encoding='ascii').convertnumbers()) def test_teetsv(): t1 = (('foo', 'bar'), ('a', 2), ('b', 1), ('c', 3)) f1 = NamedTemporaryFile(delete=False) f2 = NamedTemporaryFile(delete=False) (etl .wrap(t1) .teetsv(f1.name, encoding='ascii') .selectgt('bar', 1) .totsv(f2.name, encoding='ascii')) ieq(t1, etl.fromtsv(f1.name, encoding='ascii').convertnumbers()) ieq(etl.wrap(t1).selectgt('bar', 1), etl.fromtsv(f2.name, encoding='ascii').convertnumbers()) def test_teecsv_write_header(): t1 = (('foo', 'bar'), ('a', '2'), ('b', '1'), ('c', '3')) f1 = NamedTemporaryFile(delete=False) f2 = NamedTemporaryFile(delete=False) (etl .wrap(t1) .convertnumbers() .teecsv(f1.name, write_header=False, encoding='ascii') .selectgt('bar', 1) .tocsv(f2.name, encoding='ascii')) ieq(t1[1:], etl.fromcsv(f1.name, encoding='ascii')) ieq(etl.wrap(t1).convertnumbers().selectgt('bar', 1), etl.fromcsv(f2.name, encoding='ascii').convertnumbers()) def test_teecsv_unicode(): t1 = ((u'name', u'id'), (u'Արամ Խաչատրյան', 1), (u'Johann Strauß', 2), (u'Вагиф Сәмәдоғлу', 3), (u'章子怡', 4)) f1 = NamedTemporaryFile(delete=False) f2 = NamedTemporaryFile(delete=False) (etl .wrap(t1) .teecsv(f1.name, encoding='utf-8') .selectgt('id', 1) .tocsv(f2.name, encoding='utf-8')) ieq(t1, etl.fromcsv(f1.name, encoding='utf-8').convertnumbers()) ieq(etl.wrap(t1).selectgt('id', 1), etl.fromcsv(f2.name, encoding='utf-8').convertnumbers()) def test_teecsv_unicode_write_header(): t1 = ((u'name', u'id'), (u'Արամ Խաչատրյան', u'1'), (u'Johann Strauß', u'2'), (u'Вагиф Сәмәдоғлу', u'3'), (u'章子怡', u'4')) f1 = NamedTemporaryFile(delete=False) f2 = NamedTemporaryFile(delete=False) (etl .wrap(t1) .convertnumbers() .teecsv(f1.name, write_header=False, encoding='utf-8') .selectgt('id', 1) .tocsv(f2.name, encoding='utf-8')) ieq(t1[1:], etl.fromcsv(f1.name, encoding='utf-8')) ieq(etl.wrap(t1).convertnumbers().selectgt('id', 1), etl.fromcsv(f2.name, encoding='utf-8').convertnumbers()) def test_teetsv_unicode(): t1 = ((u'name', u'id'), (u'Արամ Խաչատրյան', 1), (u'Johann Strauß', 2), (u'Вагиф Сәмәдоғлу', 3), (u'章子怡', 4),) f1 = NamedTemporaryFile(delete=False) f2 = NamedTemporaryFile(delete=False) (etl .wrap(t1) .teetsv(f1.name, encoding='utf-8') .selectgt('id', 1) .totsv(f2.name, encoding='utf-8')) ieq(t1, etl.fromtsv(f1.name, encoding='utf-8').convertnumbers()) ieq(etl.wrap(t1).selectgt('id', 1), etl.fromtsv(f2.name, encoding='utf-8').convertnumbers()) def test_teetext(): t1 = (('foo', 'bar'), ('a', 2), ('b', 1), ('c', 3)) f1 = NamedTemporaryFile(delete=False) f2 = NamedTemporaryFile(delete=False) prologue = 'foo,bar\n' template = '{foo},{bar}\n' epilogue = 'd,4' (etl .wrap(t1) .teetext(f1.name, template=template, prologue=prologue, epilogue=epilogue) .selectgt('bar', 1) .topickle(f2.name)) ieq(t1 + (('d', 4),), etl.fromcsv(f1.name).convertnumbers()) ieq(etl.wrap(t1).selectgt('bar', 1), etl.frompickle(f2.name)) def test_teetext_unicode(): t1 = ((u'foo', u'bar'), (u'Արամ Խաչատրյան', 2), (u'Johann Strauß', 1), (u'Вагиф Сәмәдоғлу', 3)) f1 = NamedTemporaryFile(delete=False) f2 = NamedTemporaryFile(delete=False) prologue = u'foo,bar\n' template = u'{foo},{bar}\n' epilogue = u'章子怡,4' (etl .wrap(t1) .teetext(f1.name, template=template, prologue=prologue, epilogue=epilogue, encoding='utf-8') .selectgt('bar', 1) .topickle(f2.name)) ieq(t1 + ((u'章子怡', 4),), etl.fromcsv(f1.name, encoding='utf-8').convertnumbers()) ieq(etl.wrap(t1).selectgt('bar', 1), etl.frompickle(f2.name)) def test_teehtml(): t1 = (('foo', 'bar'), ('a', 2), ('b', 1), ('c', 3)) f1 = NamedTemporaryFile(delete=False) f2 = NamedTemporaryFile(delete=False) etl.wrap(t1).teehtml(f1.name).selectgt('bar', 1).topickle(f2.name) ieq(t1, etl.fromxml(f1.name, './/tr', ('th', 'td')).convertnumbers()) ieq(etl.wrap(t1).selectgt('bar', 1), etl.frompickle(f2.name)) def test_teehtml_unicode(): t1 = ((u'foo', u'bar'), (u'Արամ Խաչատրյան', 2), (u'Johann Strauß', 1), (u'Вагиф Сәмәдоғлу', 3)) f1 = NamedTemporaryFile(delete=False) f2 = NamedTemporaryFile(delete=False) (etl .wrap(t1) .teehtml(f1.name, encoding='utf-8') .selectgt('bar', 1) .topickle(f2.name)) ieq(t1, (etl .fromxml(f1.name, './/tr', ('th', 'td'), encoding='utf-8') .convertnumbers())) ieq(etl.wrap(t1).selectgt('bar', 1), etl.frompickle(f2.name)) petl-1.7.15/petl/test/io/test_text.py000066400000000000000000000102351457414240700175110ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division from tempfile import NamedTemporaryFile import gzip import os import io from petl.test.helpers import ieq, eq_ from petl.io.text import fromtext, totext def test_fromtext(): # initial data f = NamedTemporaryFile(delete=False, mode='wb') f.write(b'foo\tbar\n') f.write(b'a\t1\n') f.write(b'b\t2\n') f.write(b'c\t3\n') f.close() actual = fromtext(f.name, encoding='ascii') expect = (('lines',), ('foo\tbar',), ('a\t1',), ('b\t2',), ('c\t3',)) ieq(expect, actual) ieq(expect, actual) # verify can iterate twice def test_fromtext_lineterminators(): data = [b'foo,bar', b'a,1', b'b,2', b'c,2'] expect = (('lines',), ('foo,bar',), ('a,1',), ('b,2',), ('c,2',)) for lt in b'\r', b'\n', b'\r\n': f = NamedTemporaryFile(mode='wb', delete=False) f.write(lt.join(data)) f.close() actual = fromtext(f.name, encoding='ascii') ieq(expect, actual) def test_totext(): # exercise function table = (('foo', 'bar'), ('a', 1), ('b', 2), ('c', 2)) f = NamedTemporaryFile(delete=False) f.close() prologue = ( "{| class='wikitable'\n" "|-\n" "! foo\n" "! bar\n" ) template = ( "|-\n" "| {foo}\n" "| {bar}\n" ) epilogue = "|}\n" totext(table, f.name, encoding='ascii', template=template, prologue=prologue, epilogue=epilogue) # check what it did with io.open(f.name, mode='rt', encoding='ascii', newline='') as o: actual = o.read() expect = ( "{| class='wikitable'\n" "|-\n" "! foo\n" "! bar\n" "|-\n" "| a\n" "| 1\n" "|-\n" "| b\n" "| 2\n" "|-\n" "| c\n" "| 2\n" "|}\n" ) eq_(expect, actual) def test_fromtext_gz(): # initial data f = NamedTemporaryFile(delete=False) f.close() fn = f.name + '.gz' os.rename(f.name, fn) f = gzip.open(fn, 'wb') try: f.write(b'foo\tbar\n') f.write(b'a\t1\n') f.write(b'b\t2\n') f.write(b'c\t3\n') finally: f.close() actual = fromtext(fn, encoding='ascii') expect = (('lines',), ('foo\tbar',), ('a\t1',), ('b\t2',), ('c\t3',)) ieq(expect, actual) ieq(expect, actual) # verify can iterate twice def test_totext_gz(): # exercise function table = (('foo', 'bar'), ('a', 1), ('b', 2), ('c', 2)) f = NamedTemporaryFile(delete=False) f.close() fn = f.name + '.gz' os.rename(f.name, fn) prologue = ( "{| class='wikitable'\n" "|-\n" "! foo\n" "! bar\n" ) template = ( "|-\n" "| {foo}\n" "| {bar}\n" ) epilogue = "|}\n" totext(table, fn, encoding='ascii', template=template, prologue=prologue, epilogue=epilogue) # check what it did o = gzip.open(fn, 'rb') try: actual = o.read() expect = ( b"{| class='wikitable'\n" b"|-\n" b"! foo\n" b"! bar\n" b"|-\n" b"| a\n" b"| 1\n" b"|-\n" b"| b\n" b"| 2\n" b"|-\n" b"| c\n" b"| 2\n" b"|}\n" ) eq_(expect, actual) finally: o.close() def test_totext_headerless(): table = [] f = NamedTemporaryFile(delete=False) prologue = "-- START\n" template = "+ {f1}\n" epilogue = "-- END\n" totext(table, f.name, encoding='ascii', template=template, prologue=prologue, epilogue=epilogue) with io.open(f.name, mode='rt', encoding='ascii', newline='') as o: actual = o.read() expect = prologue + epilogue eq_(expect, actual) petl-1.7.15/petl/test/io/test_text_unicode.py000066400000000000000000000037711457414240700212260ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division import io from tempfile import NamedTemporaryFile from petl.test.helpers import ieq, eq_ from petl.io.text import fromtext, totext def test_fromtext(): data = ( u"name,id\n" u"Արամ Խաչատրյան,1\n" u"Johann Strauß,2\n" u"Вагиф Сәмәдоғлу,3\n" u"章子怡,4\n" ) fn = NamedTemporaryFile().name f = io.open(fn, encoding='utf-8', mode='wt') f.write(data) f.close() actual = fromtext(fn, encoding='utf-8') expect = ((u'lines',), (u'name,id',), (u'Արամ Խաչատրյան,1',), (u'Johann Strauß,2',), (u'Вагиф Сәмәдоғлу,3',), (u'章子怡,4',), ) ieq(expect, actual) ieq(expect, actual) # verify can iterate twice def test_totext(): # exercise function tbl = ((u'name', u'id'), (u'Արամ Խաչատրյան', 1), (u'Johann Strauß', 2), (u'Вагиф Сәмәдоғлу', 3), (u'章子怡', 4), ) prologue = ( u"{| class='wikitable'\n" u"|-\n" u"! name\n" u"! id\n" ) template = ( u"|-\n" u"| {name}\n" u"| {id}\n" ) epilogue = u"|}\n" fn = NamedTemporaryFile().name totext(tbl, fn, template=template, prologue=prologue, epilogue=epilogue, encoding='utf-8') # check what it did f = io.open(fn, encoding='utf-8', mode='rt') actual = f.read() expect = ( u"{| class='wikitable'\n" u"|-\n" u"! name\n" u"! id\n" u"|-\n" u"| Արամ Խաչատրյան\n" u"| 1\n" u"|-\n" u"| Johann Strauß\n" u"| 2\n" u"|-\n" u"| Вагиф Сәмәдоғлу\n" u"| 3\n" u"|-\n" u"| 章子怡\n" u"| 4\n" u"|}\n" ) eq_(expect, actual) petl-1.7.15/petl/test/io/test_whoosh.py000066400000000000000000000202531457414240700200350ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division import os import tempfile import pytest from petl.test.helpers import ieq import petl as etl from petl.io.whoosh import fromtextindex, totextindex, appendtextindex, \ searchtextindex try: # noinspection PyUnresolvedReferences import whoosh except ImportError as e: pytest.skip('SKIP whoosh tests: %s' % e, allow_module_level=True) else: from whoosh.index import create_in from whoosh.fields import * import datetime def test_fromindex_dirname(): dirname = tempfile.mkdtemp() schema = Schema(title=TEXT(stored=True), path=ID(stored=True), content=TEXT) ix = create_in(dirname, schema) writer = ix.writer() writer.add_document(title=u"First document", path=u"/a", content=u"This is the first document we've added!") writer.add_document(title=u"Second document", path=u"/b", content=u"The second one is even more interesting!") writer.commit() # N.B., fields get sorted expect = ((u'path', u'title'), (u'/a', u'First document'), (u'/b', u'Second document')) actual = fromtextindex(dirname) ieq(expect, actual) def test_fromindex_index(): dirname = tempfile.mkdtemp() schema = Schema(title=TEXT(stored=True), path=ID(stored=True), content=TEXT) ix = create_in(dirname, schema) writer = ix.writer() writer.add_document(title=u"First document", path=u"/a", content=u"This is the first document we've added!") writer.add_document(title=u"Second document", path=u"/b", content=u"The second one is even more interesting!") writer.commit() # N.B., fields get sorted expect = ((u'path', u'title'), (u'/a', u'First document'), (u'/b', u'Second document')) actual = fromtextindex(ix) ieq(expect, actual) def test_fromindex_docnum_field(): dirname = tempfile.mkdtemp() schema = Schema(title=TEXT(stored=True), path=ID(stored=True), content=TEXT) ix = create_in(dirname, schema) writer = ix.writer() writer.add_document(title=u"First document", path=u"/a", content=u"This is the first document we've added!") writer.add_document(title=u"Second document", path=u"/b", content=u"The second one is even more interesting!") writer.commit() # N.B., fields get sorted expect = ((u'docnum', u'path', u'title'), (0, u'/a', u'First document'), (1, u'/b', u'Second document')) actual = fromtextindex(dirname, docnum_field='docnum') ieq(expect, actual) def test_toindex_dirname(): dirname = tempfile.mkdtemp() # name fields in ascending order as whoosh sorts fields on the way out tbl = (('f0', 'f1', 'f2', 'f3', 'f4'), (u'AAA', 12, 4.3, True, datetime.datetime.now()), (u'BBB', 6, 3.4, False, datetime.datetime(1900, 1, 31)), (u'CCC', 42, 7.8, True, datetime.datetime(2100, 12, 25))) schema = Schema(f0=TEXT(stored=True), f1=NUMERIC(int, stored=True), f2=NUMERIC(float, stored=True), f3=BOOLEAN(stored=True), f4=DATETIME(stored=True)) totextindex(tbl, dirname, schema=schema) actual = fromtextindex(dirname) ieq(tbl, actual) def test_toindex_index(): dirname = tempfile.mkdtemp() # name fields in ascending order as whoosh sorts fields on the way out tbl = (('f0', 'f1', 'f2', 'f3', 'f4'), (u'AAA', 12, 4.3, True, datetime.datetime.now()), (u'BBB', 6, 3.4, False, datetime.datetime(1900, 1, 31)), (u'CCC', 42, 7.8, True, datetime.datetime(2100, 12, 25))) schema = Schema(f0=TEXT(stored=True), f1=NUMERIC(int, stored=True), f2=NUMERIC(float, stored=True), f3=BOOLEAN(stored=True), f4=DATETIME(stored=True)) index = create_in(dirname, schema) totextindex(tbl, index) actual = fromtextindex(index) ieq(tbl, actual) def test_appendindex_dirname(): dirname = tempfile.mkdtemp() # name fields in ascending order as whoosh sorts fields on the way out tbl = (('f0', 'f1', 'f2', 'f3', 'f4'), (u'AAA', 12, 4.3, True, datetime.datetime.now()), (u'BBB', 6, 3.4, False, datetime.datetime(1900, 1, 31)), (u'CCC', 42, 7.8, True, datetime.datetime(2100, 12, 25))) schema = Schema(f0=TEXT(stored=True), f1=NUMERIC(int, stored=True), f2=NUMERIC(float, stored=True), f3=BOOLEAN(stored=True), f4=DATETIME(stored=True)) totextindex(tbl, dirname, schema=schema) appendtextindex(tbl, dirname) actual = fromtextindex(dirname) expect = tbl + tbl[1:] ieq(expect, actual) def test_appendindex_index(): dirname = tempfile.mkdtemp() # name fields in ascending order as whoosh sorts fields on the way out tbl = (('f0', 'f1', 'f2', 'f3', 'f4'), (u'AAA', 12, 4.3, True, datetime.datetime.now()), (u'BBB', 6, 3.4, False, datetime.datetime(1900, 1, 31)), (u'CCC', 42, 7.8, True, datetime.datetime(2100, 12, 25))) schema = Schema(f0=TEXT(stored=True), f1=NUMERIC(int, stored=True), f2=NUMERIC(float, stored=True), f3=BOOLEAN(stored=True), f4=DATETIME(stored=True)) index = create_in(dirname, schema) totextindex(tbl, index) appendtextindex(tbl, index) actual = fromtextindex(index) expect = tbl + tbl[1:] ieq(expect, actual) def test_searchindex(): dirname = tempfile.mkdtemp() schema = Schema(title=TEXT(stored=True), path=ID(stored=True), content=TEXT) ix = create_in(dirname, schema) writer = ix.writer() writer.add_document(title=u"Oranges", path=u"/a", content=u"This is the first document we've added!") writer.add_document(title=u"Apples", path=u"/b", content=u"The second document is even more " u"interesting!") writer.commit() # N.B., fields get sorted expect = ((u'path', u'title'), (u'/a', u'Oranges')) # N.B., by default whoosh does not do stemming actual = searchtextindex(dirname, 'oranges') ieq(expect, actual) actual = searchtextindex(dirname, 'add*') ieq(expect, actual) expect = ((u'path', u'title'), (u'/a', u'Oranges'), (u'/b', u'Apples')) actual = searchtextindex(dirname, 'doc*') ieq(expect, actual) def test_integration(): dirname = tempfile.mkdtemp() schema = Schema(title=TEXT(stored=True), path=ID(stored=True), content=TEXT) ix = create_in(dirname, schema) writer = ix.writer() writer.add_document(title=u"First document", path=u"/a", content=u"This is the first document we've added!") writer.add_document(title=u"Second document", path=u"/b", content=u"The second one is even more interesting!") writer.commit() # N.B., fields get sorted expect = ((u'path', u'title'), (u'/a', u'first document'), (u'/b', u'second document')) actual = etl.fromtextindex(dirname).convert('title', 'lower') ieq(expect, actual) # TODO test_searchindexpage # TODO test_searchindex_multifield_query # TODO test_searchindex_nontext_query petl-1.7.15/petl/test/io/test_xls.py000066400000000000000000000076471457414240700173500ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import division, print_function, absolute_import from datetime import datetime from tempfile import NamedTemporaryFile import pytest try: from unittest.mock import patch except ImportError: from mock import patch import petl as etl from petl.io.xls import fromxls, toxls from petl.test.helpers import ieq def _get_test_xls(): try: import pkg_resources return pkg_resources.resource_filename('petl', 'test/resources/test.xls') except: return None try: # noinspection PyUnresolvedReferences import xlrd # noinspection PyUnresolvedReferences import xlwt except ImportError as e: pytest.skip('SKIP xls tests: %s' % e, allow_module_level=True) else: def test_fromxls(): filename = _get_test_xls() if filename is None: return tbl = fromxls(filename, 'Sheet1') expect = (('foo', 'bar'), ('A', 1), ('B', 2), ('C', 2), (u'é', datetime(2012, 1, 1))) ieq(expect, tbl) ieq(expect, tbl) def test_fromxls_nosheet(): filename = _get_test_xls() if filename is None: return tbl = fromxls(filename) expect = (('foo', 'bar'), ('A', 1), ('B', 2), ('C', 2), (u'é', datetime(2012, 1, 1))) ieq(expect, tbl) ieq(expect, tbl) def test_fromxls_use_view(): filename = _get_test_xls() if filename is None: return tbl = fromxls(filename, 'Sheet1', use_view=False) expect = (('foo', 'bar'), ('A', 1), ('B', 2), ('C', 2), (u'é', 40909.0)) ieq(expect, tbl) ieq(expect, tbl) def test_toxls(): expect = (('foo', 'bar'), ('A', 1), ('B', 2), ('C', 2)) f = NamedTemporaryFile(delete=False) f.close() toxls(expect, f.name, 'Sheet1') actual = fromxls(f.name, 'Sheet1') ieq(expect, actual) ieq(expect, actual) def test_toxls_headerless(): expect = [] f = NamedTemporaryFile(delete=False) f.close() toxls(expect, f.name, 'Sheet1') actual = fromxls(f.name, 'Sheet1') ieq(expect, actual) ieq(expect, actual) def test_toxls_date(): expect = (('foo', 'bar'), (u'é', datetime(2012, 1, 1)), (u'éé', datetime(2013, 2, 22))) f = NamedTemporaryFile(delete=False) f.close() toxls(expect, f.name, 'Sheet1', styles={'bar': xlwt.easyxf(num_format_str='DD/MM/YYYY')}) actual = fromxls(f.name, 'Sheet1') ieq(expect, actual) def test_integration(): expect = (('foo', 'bar'), ('A', 1), ('B', 2), ('C', 2)) f = NamedTemporaryFile(delete=False) f.close() etl.wrap(expect).toxls(f.name, 'Sheet1') actual = etl.fromxls(f.name, 'Sheet1') ieq(expect, actual) ieq(expect, actual) def test_passing_kwargs_to_xlutils_view(): filename = _get_test_xls() if filename is None: return from petl.io.xlutils_view import View org_init = View.__init__ def wrapper(self, *args, **kwargs): assert "ignore_workbook_corruption" in kwargs return org_init(self, *args, **kwargs) with patch("petl.io.xlutils_view.View.__init__", wrapper): tbl = fromxls(filename, 'Sheet1', use_view=True, ignore_workbook_corruption=True) expect = (('foo', 'bar'), ('A', 1), ('B', 2), ('C', 2), (u'é', datetime(2012, 1, 1))) ieq(expect, tbl) ieq(expect, tbl) petl-1.7.15/petl/test/io/test_xlsx.py000066400000000000000000000150421457414240700175240ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division from datetime import datetime from tempfile import NamedTemporaryFile import pytest import petl as etl from petl.io.xlsx import fromxlsx, toxlsx, appendxlsx from petl.test.helpers import ieq, eq_ openpyxl = pytest.importorskip("openpyxl") @pytest.fixture() def xlsx_test_filename(): pkg_resources = pytest.importorskip("pkg_resources") # conda is missing pkg_resources return pkg_resources.resource_filename('petl', 'test/resources/test.xlsx') @pytest.fixture(scope="module") def xlsx_test_table(): return (('foo', 'bar'), ('A', 1), ('B', 2), ('C', 2), (u'é', datetime(2012, 1, 1))) @pytest.fixture(scope="module") def xlsx_table_with_non_str_header(): class Header: def __init__(self, name): self.__name = name def __str__(self): return self.__name def __eq__(self, other): return str(other) == str(self) return ((Header('foo'), Header('bar')), ('A', 1), ('B', 2), ('C', 2)) def test_fromxlsx(xlsx_test_table, xlsx_test_filename): tbl = fromxlsx(xlsx_test_filename, 'Sheet1') expect = xlsx_test_table ieq(expect, tbl) ieq(expect, tbl) def test_fromxlsx_read_only(xlsx_test_filename): tbl = fromxlsx(xlsx_test_filename, sheet='Sheet1', read_only=True) expect = (('foo', 'bar'), ('A', 1), ('B', 2), ('C', 2), (u'é', datetime(2012, 1, 1))) ieq(expect, tbl) ieq(expect, tbl) def test_fromxlsx_nosheet(xlsx_test_filename): tbl = fromxlsx(xlsx_test_filename) expect = (('foo', 'bar'), ('A', 1), ('B', 2), ('C', 2), (u'é', datetime(2012, 1, 1))) ieq(expect, tbl) ieq(expect, tbl) def test_fromxlsx_range(xlsx_test_filename): tbl = fromxlsx(xlsx_test_filename, 'Sheet2', range_string='B2:C6') expect = (('foo', 'bar'), ('A', 1), ('B', 2), ('C', 2), (u'é', datetime(2012, 1, 1))) ieq(expect, tbl) ieq(expect, tbl) def test_fromxlsx_offset(xlsx_test_filename): tbl = fromxlsx(xlsx_test_filename, 'Sheet1', min_row=2, min_col=2) expect = ((1,), (2,), (2,), (datetime(2012, 1, 1, 0, 0),)) ieq(expect, tbl) ieq(expect, tbl) def test_toxlsx_appendxlsx(xlsx_test_table): # setup f = NamedTemporaryFile(delete=True, suffix='.xlsx') f.close() # test toxlsx toxlsx(xlsx_test_table, f.name, 'Sheet1') actual = fromxlsx(f.name, 'Sheet1') ieq(xlsx_test_table, actual) # test appendxlsx appendxlsx(xlsx_test_table, f.name, 'Sheet1') expect = etl.cat(xlsx_test_table, xlsx_test_table) ieq(expect, actual) def test_toxlsx_nosheet(xlsx_test_table): f = NamedTemporaryFile(delete=True, suffix='.xlsx') f.close() toxlsx(xlsx_test_table, f.name) actual = fromxlsx(f.name) ieq(xlsx_test_table, actual) def test_integration(xlsx_test_table): f = NamedTemporaryFile(delete=True, suffix='.xlsx') f.close() tbl = etl.wrap(xlsx_test_table) tbl.toxlsx(f.name, 'Sheet1') actual = etl.fromxlsx(f.name, 'Sheet1') ieq(tbl, actual) tbl.appendxlsx(f.name, 'Sheet1') expect = tbl.cat(tbl) ieq(expect, actual) def test_toxlsx_overwrite(xlsx_test_table): f = NamedTemporaryFile(delete=False, suffix='.xlsx') f.close() toxlsx(xlsx_test_table, f.name, 'Sheet1', mode="overwrite") wb = openpyxl.load_workbook(f.name, read_only=True) eq_(1, len(wb.sheetnames)) def test_toxlsx_replace_file(xlsx_test_table): f = NamedTemporaryFile(delete=True, suffix='.xlsx') f.close() toxlsx(xlsx_test_table, f.name, 'Sheet1', mode="overwrite") toxlsx(xlsx_test_table, f.name, sheet=None, mode="replace") wb = openpyxl.load_workbook(f.name, read_only=True) eq_(1, len(wb.sheetnames)) def test_toxlsx_replace_sheet(xlsx_test_table): f = NamedTemporaryFile(delete=True, suffix='.xlsx') f.close() toxlsx(xlsx_test_table, f.name, 'Sheet1', mode="overwrite") toxlsx(xlsx_test_table, f.name, 'Sheet1', mode="replace") wb = openpyxl.load_workbook(f.name, read_only=True) eq_(1, len(wb.sheetnames)) def test_toxlsx_replace_sheet_nofile(xlsx_test_table): f = NamedTemporaryFile(delete=True, suffix='.xlsx') f.close() toxlsx(xlsx_test_table, f.name, 'Sheet1', mode="replace") wb = openpyxl.load_workbook(f.name, read_only=True) eq_(1, len(wb.sheetnames)) def test_toxlsx_add_nosheet(xlsx_test_table): f = NamedTemporaryFile(delete=True, suffix='.xlsx') f.close() toxlsx(xlsx_test_table, f.name, 'Sheet1', mode="overwrite") toxlsx(xlsx_test_table, f.name, None, mode="add") wb = openpyxl.load_workbook(f.name, read_only=True) eq_(2, len(wb.sheetnames)) def test_toxlsx_add_sheet_nomatch(xlsx_test_table): f = NamedTemporaryFile(delete=True, suffix='.xlsx') f.close() toxlsx(xlsx_test_table, f.name, 'Sheet1', mode="overwrite") toxlsx(xlsx_test_table, f.name, 'Sheet2', mode="add") wb = openpyxl.load_workbook(f.name, read_only=True) eq_(2, len(wb.sheetnames)) def test_toxlsx_add_sheet_match(xlsx_test_table): tbl = xlsx_test_table f = NamedTemporaryFile(delete=True, suffix='.xlsx') f.close() toxlsx(tbl, f.name, 'Sheet1', mode="overwrite") with pytest.raises(ValueError) as excinfo: toxlsx(tbl, f.name, 'Sheet1', mode="add") assert 'Sheet Sheet1 already exists in file' in str(excinfo.value) def test_toxlsx_with_non_str_header(xlsx_table_with_non_str_header): f = NamedTemporaryFile(delete=True, suffix='.xlsx') f.close() toxlsx(xlsx_table_with_non_str_header, f.name, 'Sheet1') actual = etl.fromxlsx(f.name, 'Sheet1') ieq(xlsx_table_with_non_str_header, actual) def test_appendxlsx_with_non_str_header(xlsx_table_with_non_str_header, xlsx_test_table): f = NamedTemporaryFile(delete=True, suffix='.xlsx') f.close() # write first table toxlsx(xlsx_test_table, f.name, 'Sheet1') actual = fromxlsx(f.name, 'Sheet1') ieq(xlsx_test_table, actual) # test appendxlsx appendxlsx(xlsx_table_with_non_str_header, f.name, 'Sheet1') expect = etl.cat(xlsx_test_table, xlsx_table_with_non_str_header) ieq(expect, actual) def test_toxlsx_headerless(): expect = [] f = NamedTemporaryFile(delete=False) f.close() toxlsx(expect, f.name) actual = fromxlsx(f.name) ieq(expect, actual) ieq(expect, actual) petl-1.7.15/petl/test/io/test_xml.py000066400000000000000000000341461457414240700173340ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division import sys from collections import OrderedDict from tempfile import NamedTemporaryFile import pytest from petl.test.helpers import ieq from petl.util import nrows, look from petl.io.xml import fromxml, toxml from petl.compat import urlopen def test_fromxml(): # initial data f = NamedTemporaryFile(delete=False, mode='wt') data = """
foobar
a1
b2
c2
""" f.write(data) f.close() actual = fromxml(f.name, 'tr', 'td') expect = (('foo', 'bar'), ('a', '1'), ('b', '2'), ('c', '2')) ieq(expect, actual) ieq(expect, actual) # verify can iterate twice def test_fromxml_2(): # initial data f = NamedTemporaryFile(delete=False, mode='wt') data = """
""" f.write(data) f.close() actual = fromxml(f.name, 'tr', 'td', 'v') expect = (('foo', 'bar'), ('a', '1'), ('b', '2'), ('c', '2')) ieq(expect, actual) ieq(expect, actual) # verify can iterate twice def test_fromxml_3(): # initial data f = NamedTemporaryFile(delete=False, mode='wt') data = """ a b c
""" f.write(data) f.close() actual = fromxml(f.name, 'row', {'foo': 'foo', 'bar': ('baz/bar', 'v')}) # N.B., requested fields come out in name sorted order expect = (('bar', 'foo'), ('1', 'a'), ('2', 'b'), ('2', 'c')) ieq(expect, actual) ieq(expect, actual) # verify can iterate twice def test_fromxml_4(): # initial data f = NamedTemporaryFile(delete=False, mode='wt') data = """ a13 b2 c2
""" f.write(data) f.close() actual = fromxml(f.name, 'row', {'foo': 'foo', 'bar': './/bar'}) # N.B., requested fields come out in name sorted order expect = (('bar', 'foo'), (('1', '3'), 'a'), ('2', 'b'), ('2', 'c')) ieq(expect, actual) ieq(expect, actual) # verify can iterate twice def test_fromxml_5(): # initial data f = NamedTemporaryFile(delete=False, mode='wt') data = """ a b c
""" f.write(data) f.close() actual = fromxml(f.name, 'row', {'foo': 'foo', 'bar': ('baz/bar', 'v')}) # N.B., requested fields come out in name sorted order expect = (('bar', 'foo'), (('1', '3'), 'a'), ('2', 'b'), ('2', 'c')) ieq(expect, actual) ieq(expect, actual) # verify can iterate twice def test_fromxml_6(): data = """
foo bar
a 2
b 1
c 3
""" f = NamedTemporaryFile(delete=False, mode='wt') f.write(data) f.close() actual = fromxml(f.name, './/tr', ('th', 'td')) print(look(actual)) expect = (('foo', 'bar'), ('a', '2'), ('b', '1'), ('c', '3')) ieq(expect, actual) ieq(expect, actual) # verify can iterate twice def test_fromxml_url(): # check internet connection try: url = 'http://raw.githubusercontent.com/petl-developers/petl/master/petl/test/resources/test.xml' urlopen(url) import pkg_resources filename = pkg_resources.resource_filename('petl', 'test/resources/test.xml') except Exception as e: pytest.skip('SKIP test_fromxml_url: %s' % e) else: actual = fromxml(url, 'pydev_property', {'name': ( '.', 'name'), 'prop': '.'}) assert nrows(actual) > 0 expect = fromxml(filename, 'pydev_property', {'name': ( '.', 'name'), 'prop': '.'}) ieq(expect, actual) def _write_temp_file(content, out=None): with NamedTemporaryFile(delete=False, mode='wt') as f: f.write(content) res = f.name f.close() if out is not None: outf = sys.stderr if out else sys.stdout print('TEST %s:\n%s' % (res, content), file=outf) return res def _dump_file(filename, out=None): if out is not None: outf = sys.stderr if out else sys.stdout print('FILE:\n%s' % open(filename).read(), file=outf) def _dump_both(expected, actual, out=None): if out is not None: outf = sys.stderr if out else sys.stdout print('EXPECTED:\n', look(expected), file=outf) print('ACTUAL:\n', look(actual), file=outf) def _compare(expected, actual, out=None): try: _dump_both(expected, actual, out) ieq(expected, actual) except Exception as ex: _dump_both(expected, actual, False) raise ex def _write_test_file(data, pre='', pos=''): content = pre + '' + data + pos + '
' return _write_temp_file(content) def test_fromxml_entity(): _DATA1 = """ foobar a1 b2 c3 """ _DATA2 = 'X9' _DOCTYPE = """ ]> """ _INSERTED = '&inserted;' _TABLE1 = (('foo', 'bar'), ('a', '1'), ('b', '2'), ('c', '3')) temp_file1 = _write_test_file(_DATA1) actual11 = fromxml(temp_file1, 'tr', 'td') _compare(_TABLE1, actual11) try: from lxml import etree except: return data_file_tmp = _write_temp_file(_DATA2) doc_type_temp = _DOCTYPE % data_file_tmp doc_type_miss = _DOCTYPE % '/tmp/doesnotexist' _EXPECT_IT = (('X', '9'),) _EXPECT_NO = ((None, None),) temp_file2 = _write_test_file(_DATA1, pre=doc_type_temp, pos=_INSERTED) temp_file3 = _write_test_file(_DATA1, pre=doc_type_miss, pos=_INSERTED) parser_off = etree.XMLParser(resolve_entities=False) parser_onn = etree.XMLParser(resolve_entities=True) actual12 = fromxml(temp_file1, 'tr', 'td', parser=parser_off) _compare(_TABLE1, actual12) actual21 = fromxml(temp_file2, 'tr', 'td') _compare(_TABLE1 + _EXPECT_NO, actual21) actual22 = fromxml(temp_file2, 'tr', 'td', parser=parser_off) _compare(_TABLE1 + _EXPECT_NO, actual22) actual23 = fromxml(temp_file2, 'tr', 'td', parser=parser_onn) _compare(_TABLE1 + _EXPECT_IT, actual23) actual31 = fromxml(temp_file3, 'tr', 'td') _compare(_TABLE1 + _EXPECT_NO, actual31) actual32 = fromxml(temp_file3, 'tr', 'td', parser=parser_off) _compare(_TABLE1 + _EXPECT_NO, actual32) try: actual33 = fromxml(temp_file3, 'tr', 'td', parser=parser_onn) for _ in actual33: pass except etree.XMLSyntaxError: # print('XMLSyntaxError', ex, file=sys.stderr) pass else: assert True, 'Error testing XML' def _check_toxml(table, expected, check=(), dump=None, **kwargs): with NamedTemporaryFile(delete=True, suffix='.xml') as f: filename = f.name toxml(table, filename, **kwargs) _dump_file(filename, dump) if check: try: actual = fromxml(filename, *check) _compare(expected, actual, dump) except Exception as ex: _dump_file(filename, False) raise ex _HEAD1 = (('ABCD', 'N123'),) _BODY1 = (('a', '1'), ('b', '2'), ('c', '3')) _TABLE1 = _HEAD1 + _BODY1 def test_toxml00(): _check_toxml( _TABLE1, _TABLE1, check=('.//tr', ('th', 'td')) ) def test_toxml01(): _check_toxml( _TABLE1, _TABLE1, check=('//tr', ('th', 'td')), root='table', head='thead/tr/th', rows='tbody/tr/td' ) def test_toxml02(): _check_toxml( _TABLE1, _BODY1, check=('.//row', 'col'), root='matrix', rows='row/col' ) def test_toxml03(): _check_toxml( _TABLE1, _BODY1, check=('line', 'cell'), rows='plan/line/cell' ) def test_toxml04(): _check_toxml( _TABLE1, _BODY1, check=('.//line', 'cell'), rows='dir/file/book/plan/line/cell' ) def test_toxml05(): _check_toxml( _TABLE1, _TABLE1, check=('.//x', 'y'), root='a', head='h/k/x/y', rows='r/v/x/y' ) def test_toxml06(): _check_toxml( _TABLE1, _TABLE1, check=('.//row', 'col'), root='table', head='head/row/col', rows='row/col' ) def test_toxml07(): _check_toxml( _TABLE1, _TABLE1, check=('.//field-list', 'field-name'), root='root-tag', head='head-tag/field-list/field-name', rows='body-row/field-list/field-name' ) def test_toxml08(): _check_toxml( _TABLE1, _TABLE1, check=('.//field.list', 'field.name'), root='root.tag', head='head.tag/field.list/field.name', rows='body.row/field.list/field.name' ) def test_toxml09(): _check_toxml( _TABLE1, _BODY1, check=('.//tr/td', '*'), style='name', root='table', rows='tbody/tr/td' ) def test_toxml10(): _check_toxml( _TABLE1, _BODY1, check=('.//tr/td', '*'), style='name', root='table', head='thead/tr/th', rows='tbody/tr/td' ) _ATTRIB_COLS = {'ABCD': ('.', 'ABCD'), 'N123': ('.', 'N123')} def test_toxml11(): _check_toxml( _TABLE1, _TABLE1, check=('.//tr/td', _ATTRIB_COLS), style='attribute', root='table', rows='tbody/tr/td' ) def test_toxml12(): _check_toxml( _TABLE1, _TABLE1, check=('.//tr/td', _ATTRIB_COLS), style='attribute', root='table', head='thead/tr/th', rows='tbody/tr/td' ) def test_toxml13(): _check_toxml( _TABLE1, _BODY1, check=('.//tr', ('td', 'th')), style=' {ABCD}{N123}\n', root='table', rows='tbody' ) def test_toxml131(): _check_toxml( _TABLE1, _TABLE1, check=('.//tr', ('th', 'td')), style=' {ABCD}{N123}\n', root='table', head='thead/tr/td', rows='tbody' ) def test_toxml14(): table1 = [['foo', 'bar'], ['a', 1], ['b', 2]] _check_toxml( table1, table1, style='attribute', rows='row/col' ) _check_toxml( table1, table1, style='name', rows='row/col' ) _ROW_A0 = (('A', '0'),) _ROW_Z9 = (('Z', '9'),) _TAB_ABZ = _ROW_A0 + _BODY1 + _ROW_Z9 _TAB_HAZ = _HEAD1 + _ROW_A0 + _BODY1 + _ROW_Z9 _TAG_A0 = ' A0' _TAG_Z9 = ' Z9' _TAG_TOP = '\n' _TAG_END = '\n
' def test_toxml15(): _check_toxml( _TABLE1, _TAB_ABZ, check=('row', 'col'), root='table', rows='row/col', prologue=_TAG_A0, epilogue=_TAG_Z9 ) def test_toxml16(): _check_toxml( _TABLE1, _TAB_HAZ, check=('.//row', 'col'), root='table', head='thead/row/col', rows='tbody/row/col', prologue=_TAG_A0, epilogue=_TAG_Z9 ) def test_toxml17(): _check_toxml( _TABLE1, _TAB_ABZ, check=('row', 'col'), rows='row/col', prologue=_TAG_TOP + _TAG_A0, epilogue=_TAG_Z9 + _TAG_END ) def test_toxml18(): _TAB_AHZ = _ROW_A0 + _HEAD1 + _BODY1 + _ROW_Z9 _check_toxml( _TABLE1, _TAB_AHZ, check=('.//row', 'col'), head='thead/row/col', rows='row/col', prologue=_TAG_TOP + _TAG_A0, epilogue=_TAG_Z9 + _TAG_END ) def test_toxml19(): _check_toxml( _TABLE1, _BODY1, check=('.//line', 'cell'), rows='tbody/line/cell', prologue=_TAG_TOP + _TAG_A0, epilogue=_TAG_Z9 + _TAG_END ) def test_toxml20(): _check_toxml( _TABLE1, _TABLE1, check=('.//line', 'cell'), root='book', head='thead/line/cell', rows='tbody/line/cell', prologue=_TAG_TOP + _TAG_A0, epilogue=_TAG_Z9 + _TAG_END ) def test_toxml21(): _check_toxml( _TABLE1, _TAB_HAZ, check=('//row', 'col'), root='book', head='thead/row/col', rows='tbody/row/col', prologue=_TAG_TOP + _TAG_A0, epilogue=_TAG_Z9 + _TAG_END ) def test_toxml22(): _check_toxml( _TABLE1, _TAB_HAZ, check=('.//tr/td', _ATTRIB_COLS), style='attribute', root='table', rows='tbody/tr/td', # dump=True, prologue='', epilogue='' ) petl-1.7.15/petl/test/resources/000077500000000000000000000000001457414240700165165ustar00rootroot00000000000000petl-1.7.15/petl/test/resources/test.xls000077500000000000000000000520001457414240700202250ustar00rootroot00000000000000ࡱ> (' g2\pAlistair Miles Ba==?8@"1Arial1Calibri1Calibri1Calibri1Arial1Arial1h8Cambria1,8Calibri18Calibri18Calibri1Calibri1Calibri1<Calibri1>Calibri1?Calibri14Calibri14Calibri1 Calibri1 Calibri1Calibri1Calibri1 Calibri""#,##0;\-""#,##0""#,##0;[Red]\-""#,##0""#,##0.00;\-""#,##0.00#""#,##0.00;[Red]\-""#,##0.005*0_-""* #,##0_-;\-""* #,##0_-;_-""* "-"_-;_-@_-,)'_-* #,##0_-;\-* #,##0_-;_-* "-"_-;_-@_-=,8_-""* #,##0.00_-;\-""* #,##0.00_-;_-""* "-"??_-;_-@_-4+/_-* #,##0.00_-;\-* #,##0.00_-;_-* "-"??_-;_-@_-                                                                      ff + ) , *     P  P        `            a>    ||?eќ}-}= _-;_-* "}}* _-;_-* "-@_- ??? ??? ??? ???}A}6 }_-;_-* "-@_- }}) }_-;_-* "-@_-    }}9 ???_-;_-* "-@_- ??? ??? ??? ???}}5 ??v_-;_-* "̙-@_-    }A}7 e_-;_-* "-@_- }A}( _-;_-* "-@_- }A}0 a_-;_-* "-@_- }-}4 _-;_-* "}A}3 _-;_-* "23-@_- }A}2 _-;_-* "?-@_- }A}1 _-;_-* "-@_- }-}; _-;_-* "}x}8_-;_-* "-@_  }-}/ _-;_-* "}U}< _-;_-* "-@_ }A}" _-;_-* "-@_}A} _-;_-* "ef-@_}A} _-;_-* "L-@_}A} _-;_-* "23-@_}A}# _-;_-* "-@_}A} _-;_-* "ef-@_}A} _-;_-* "L-@_}A} _-;_-* "23-@_}A}$ _-;_-* "-@_}A} _-;_-* "ef-@_}A} _-;_-* "L-@_}A} _-;_-* "23-@_}A}% _-;_-* "-@_}A} _-;_-* "ef-@_}A} _-;_-* "L-@_}A} _-;_-* "23-@_}A}& _-;_-* "-@_}A} _-;_-* "ef-@_}A} _-;_-* "L-@_}A}  _-;_-* "23-@_}A}' _-;_-* " -@_}A} _-;_-* "ef -@_}A} _-;_-* "L -@_}A}! _-;_-* "23 -@_ 20% - Accent1M 20% - Accent1 ef % 20% - Accent2M" 20% - Accent2 ef % 20% - Accent3M& 20% - Accent3 ef % 20% - Accent4M* 20% - Accent4 ef % 20% - Accent5M. 20% - Accent5 ef % 20% - Accent6M2 20% - Accent6  ef % 40% - Accent1M 40% - Accent1 L % 40% - Accent2M# 40% - Accent2 L渷 % 40% - Accent3M' 40% - Accent3 L % 40% - Accent4M+ 40% - Accent4 L % 40% - Accent5M/ 40% - Accent5 L % 40% - Accent6M3 40% - Accent6  Lմ % 60% - Accent1M 60% - Accent1 23 % 60% - Accent2M$ 60% - Accent2 23ږ % 60% - Accent3M( 60% - Accent3 23כ % 60% - Accent4M, 60% - Accent4 23 % 60% - Accent5M0 60% - Accent5 23 %! 60% - Accent6M4 60% - Accent6  23 % "Accent1AAccent1 O % #Accent2A!Accent2 PM % $Accent3A%Accent3 Y % %Accent4A)Accent4 d % &Accent5A-Accent5 K % 'Accent6A1Accent6  F %(Bad9Bad  %) Calculation Calculation  }% * Check Cell Check Cell  %????????? ???+ Comma,( Comma [0]-&Currency.. Currency [0]/Explanatory TextG5Explanatory Text % 0Good;Good  a%1 Heading 1G Heading 1 I}%O2 Heading 2G Heading 2 I}%?3 Heading 3G Heading 3 I}%234 Heading 49 Heading 4 I}% 5InputuInput ̙ ??v% 6 Linked CellK Linked Cell }% 7NeutralANeutral  e%"Normal 8Noteb Note   9OutputwOutput  ???%????????? ???:$Percent ;Title1Title I}% <TotalMTotal %OO= Warning Text? Warning Text %XTableStyleMedium2PivotStyleLight16`*Sheet1,8$foobarABC ) ccB g2 q+,  dMbP?_*+%,$!&C&"Times New Roman,Regular"&12&A)&&C&"Times New Roman,Regular"&12Page &P&333333?'333333?(-؂-?)-؂-?"& New333333?333333?12&<3U}  ,   ~ ? ~ @ ~ @ ~ >@P>@ddggD Oh+'0 PXp  Alistair MilesAlistair Miles0Microsoft Excel@@80l@>V),՜.+,0HP X`hp x  Sheet1  Worksheets  !"#$%&Root Entry F`B-,Workbook,SummaryInformation(DocumentSummaryInformation8petl-1.7.15/petl/test/resources/test.xlsx000066400000000000000000000225741457414240700204270ustar00rootroot00000000000000PK!!F:v[Content_Types].xml (Tj0Fb+ɡ'.6klؒL;vJȂiK5o8^UTfN?IY*g!@1 gqTDYJJ&΃܅ZBz-Tr=YK15b4|\-+ּU27VD/s U*P$vyn2.[ 4TW y3@Hwlai<>3 yWOh&*Їٻ\WۅܹErkk%2v{eXHcO::V0W#m*݂^c.U=%~c_= #G]؏jS{@z9z4٦Am~PK!U0#L _rels/.rels (N0 HCnHLH!T$$@Jc?[iTb/Nú(A3b{jxVb"giaWl_xb#b4O r0Qahѓeܔ=P-<4Mox/}bN@;vCf ۨBI"c&\O8q"KH<ߊs@.h<⧄MdaT_PK!JaGxl/_rels/workbook.xml.rels (j0 ѽqP:{)0Mlc?y6У$41f9#u)(ڛε ^OW 3H./6kNOd@"8R`ÃT[4e>KAsc+EY5iQw~ om4]~ ɉ -i^Yy\YD>qW$KS3b2k T>:3[/%s* }+4?rV PK!$F^Wxl/workbook.xmlR]O0}7?4}Q#/(sYXC.m{S9='/"NpR0B_ %s-2=f'H,I\QA4qS[s%|,MIͥ=Ô,m $|W|^JMk|QQBz>`k:5mJ($|D@[Wh̎yeY6ZB~.7R 1.7R iƣ +] 8|А^xybd+ݘ,Ǔh3lΰ3]](RpU`*i\D PK!YQ xl/styles.xmlo0'NLX%P-M*uդd^0Ī ta!!D}X໳gۣ˵JtbT e#*Њ%eߦ?ֵm9e|p^bNt̔H5a~$ ")W'd湩\˚:炻ca$C{Ry<7MGtYU$K4.rQjhY*S>دJc P4VTb8>TRE# ,s!"υ@CҎN!]cuY]NhL6%s(+5:{/ N+tr&̏},jd&C`>_ ̞;?߱X{T k>5rL5oh9U%? .WKY/uZWU|8[6_쯬Uv"_ƧĎBˆ~yϢ`zgw&[Qxg(Ivle<\sJKwI;WPK!;m2KB#xl/worksheets/_rels/sheet2.xml.rels0ECx{օ CS7"Ubۗ{ep6<f,Ժch{-A8 -Iy0Ьjm_/N,}W:=RY}H9EbAwk}m PK!_]xl/worksheets/sheet2.xml]o0i(ɤu]礱vۍ_ϱfcTn;>y/$I)^Zے~@y{(#8z]Wݹ$&g̉w6Iv˜XLqӁsi[- z?@,tcƝhJSBheJjk70qqbj y IUQK XhJzKʪ"盄{&oЁP( h +"ŀ@,JT~DA U>Ł};E?ܶej=[pgIC>)?Q־d2K/3dl;Z} $6bHbxp[#` !c/*-!`Ζ/&CQ gKa(r?D0vEg*v] Ǝ"W";ED'"Lm.x>4v0[Vt 3M([ki6;Zm&8Fkڠw ~oZ `Amң^.U=/PK!bmxl/theme/theme1.xmlYOo6w tom'uرMniXS@I}úa0l+t&[HJKՇD"|#uڃC"$q۫]z>8h{wK cxLޜH]ś*$A>J%aACMʈJ&M;4Be tY>c~4$ &^ L1bma]ut(gZ[Wvr2u{`M,EF,2nQ%[NJeD >֗f}{7vtd%|JYw2Oڡ~J=L8-o|(<4 ժX}.@'d}.Fbo\C\ҼMT0 zSώt--g.—~?~xY'y92h!ы/ɋ>%mGEFD[t3q%'#qSgv 9feqwW@(^wdbh a8g.J pC*Xx8rbV`|XƻcǵYU3 Jݐ8b3+(QuK>QELKM2#'vi~ vlwu8+zHHJ:) ~L\E\O*t@G1lm~C*uG.R(:-ys^Di7QR8,b?SQ*q7C;+}ݧ;4pDZ_^'܉M01UJS#]flmʒgD^< dB[_WE)*k;ZxO(c5g4 h܇A:I~KBxԙ \ YWQB@^4@.hKik<ʞ6b+jΎ9#U`δuM֯DAaVB[͈f-WY؜j0[:X ~;Qㅋt >z/fʒ"Z x Zp;+e{/eP;,.&'Qk5q&pT(KLb} Sd›L17 jpaS! 35'+ZzQ TIIvt]K&⫢ #v5-|#4b3q:TA1pa*~9mm34銗bg1KB[Y&[)H V*Q Ua?SE'p>vX`3qBU( 8W0 Aw 9Kä5$ PeD)jeI2b!aC]zoPnIZ diͩdks|l2Rn6 Mf\ļ=XvYEEĢͪgY [A+M[XK52`%p7!?ڊ&aQ}6HH;8`ҤiI[-۬/0,>eE;ck;ٓ) C cc?f}p|61%M0*<ҭPK!2xl/worksheets/sheet1.xmlMS0Sl'iHꘁf2p(ڻb˱˫J ~=+NϻZ)ѭjȍ0VB;IS"Jٮgx[Z1wңl B8h휞2fZ(n#ТEKFqfͬ6!H5,LqҎ050d!Plh]1vGS>8F4"V.@)Ql݂Mر ǺDsxqJ(]sXMgcm'mIocK!~o)8IC >! ?׾jt+(O`wK寕 ԭ-W } lUrѿ<<2١IyOcBLC:+w,]:F8 I"NXϓPK!K'xl/printerSettings/printerSettings1.binSKN0$|!*b_,&$m!FTőXp ֜C NpW l“ޛgSAA 5HbB}9>py瀃C5 fKН.'c-o J&3,|BvW鯼]2L\[~7I;c-q#ex^WQb(#;ˊy+M$XDnlyt"/E9NJJ2pdmk{/PK!*1docProps/app.xml (N0HC=uZ#ďXnXgX8vv@b73Wf=4l>YNʸMK!)W)lȮXA k%h΢R*mp_Fí-8E^v)cڊVn⁲VA@ Default python 2.7 /petl/src petl-1.7.15/petl/test/test_comparison.py000066400000000000000000000110321457414240700202640ustar00rootroot00000000000000from __future__ import print_function, division, absolute_import from datetime import datetime from decimal import Decimal import pytest from petl.test.helpers import eq_, ieq from petl.comparison import Comparable def test_comparable(): # bools d = [True, False] a = sorted(d, key=Comparable) e = [False, True] eq_(e, a) # ints d = [3, 1, 2] a = sorted(d, key=Comparable) e = [1, 2, 3] eq_(e, a) # floats d = [3., 1.2, 2.5] a = sorted(d, key=Comparable) e = [1.2, 2.5, 3.] eq_(e, a) # mixed numeric d = [3., 1, 2.5, Decimal('1.5')] a = sorted(d, key=Comparable) e = [1, Decimal('1.5'), 2.5, 3.] eq_(e, a) # mixed numeric and bool d = [True, False, -1.2, 2, .5] a = sorted(d, key=Comparable) e = [-1.2, False, .5, True, 2] eq_(e, a) # mixed numeric and None d = [3, None, 2.5] a = sorted(d, key=Comparable) e = [None, 2.5, 3.] eq_(e, a) # bytes d = [b'b', b'ccc', b'aa'] a = sorted(d, key=Comparable) e = [b'aa', b'b', b'ccc'] eq_(e, a) # text d = [u'b', u'ccc', u'aa'] a = sorted(d, key=Comparable) e = [u'aa', u'b', u'ccc'] eq_(e, a) # mixed bytes and text d = [u'b', b'ccc', b'aa'] a = sorted(d, key=Comparable) # N.B., expect always bytes < unicode e = [b'aa', b'ccc', u'b'] eq_(e, a) # mixed bytes and None d = [b'b', b'ccc', None, b'aa'] a = sorted(d, key=Comparable) e = [None, b'aa', b'b', b'ccc'] eq_(e, a) # mixed text and None d = [u'b', u'ccc', None, u'aa'] a = sorted(d, key=Comparable) e = [None, u'aa', u'b', u'ccc'] eq_(e, a) # mixed bytes, text and None d = [u'b', b'ccc', None, b'aa'] a = sorted(d, key=Comparable) # N.B., expect always bytes < unicode e = [None, b'aa', b'ccc', u'b'] eq_(e, a) # mixed bytes, text, numbers and None d = [u'b', True, b'ccc', False, None, b'aa', -1, 3.4] a = sorted(d, key=Comparable) e = [None, -1, False, True, 3.4, b'aa', b'ccc', u'b'] eq_(e, a) def test_comparable_datetime(): dt = datetime.now().replace # datetimes d = [dt(hour=12), dt(hour=3), dt(hour=1)] a = sorted(d, key=Comparable) e = [dt(hour=1), dt(hour=3), dt(hour=12)] eq_(e, a) # mixed datetime and None d = [dt(hour=12), None, dt(hour=3), dt(hour=1)] a = sorted(d, key=Comparable) e = [None, dt(hour=1), dt(hour=3), dt(hour=12)] eq_(e, a) # mixed datetime, numbers, bytes, text and None d = [dt(hour=12), None, dt(hour=3), u'b', True, b'ccc', False, b'aa', -1, 3.4] a = sorted(d, key=Comparable) # N.B., because bytes and unicode type names have changed in PY3, # petl uses PY2 type names to try and achieve consistent behaviour across # versions, i.e., 'datetime' < 'str' < 'unicode' rather than 'bytes' < # 'datetime' < 'str' e = [None, -1, False, True, 3.4, dt(hour=3), dt(hour=12), b'aa', b'ccc', u'b'] eq_(e, a) def test_comparable_nested(): # lists d = [[3], [1], [2]] a = sorted(d, key=Comparable) e = [[1], [2], [3]] eq_(e, a) # tuples d = [(3,), (1,), (2,)] a = sorted(d, key=Comparable) e = [(1,), (2,), (3,)] eq_(e, a) # mixed lists and numeric d = [3, 1, [2]] a = sorted(d, key=Comparable) e = [1, 3, [2]] eq_(e, a) # lists containing None d = [[3], [None], [2]] a = sorted(d, key=Comparable) e = [[None], [2], [3]] eq_(e, a) # mixed lists and tuples d = [[3], [1], (2,)] a = sorted(d, key=Comparable) e = [[1], (2,), [3]] eq_(e, a) # length 2 lists d = [[3, 2], [3, 1], [2]] a = sorted(d, key=Comparable) e = [[2], [3, 1], [3, 2]] eq_(e, a) dt = datetime.now().replace # mixed everything d = [dt(hour=12), None, (dt(hour=3), 'b'), True, [b'aa', False], (b'aa', -1), 3.4] a = sorted(d, key=Comparable) e = [None, True, 3.4, dt(hour=12), (dt(hour=3), 'b'), (b'aa', -1), [b'aa', False]] eq_(e, a) def test_comparable_ieq_table(): rows = [[u'Bob', 42, 33], [u'Jim', 13, 69], [u'Joe', 86, 17], [u'Ted', 23, 51]] ieq(rows, rows) def test_comparable_ieq_rows(): rows = [['a', 'b', 'c'], [1, 2]] ieq(rows, rows) def test_comparable_ieq_missing(): x = ['a', 'b', 'c'] y = ['a', 'b'] with pytest.raises(AssertionError): ieq(x, y) with pytest.raises(AssertionError): ieq(y, x) petl-1.7.15/petl/test/test_fluent.py000066400000000000000000000056271457414240700174240ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division from tempfile import NamedTemporaryFile import csv from petl.compat import PY2 import petl as etl from petl.test.helpers import ieq, eq_ def test_basics(): t1 = (('foo', 'bar'), ('A', 1), ('B', 2)) w1 = etl.wrap(t1) eq_(('foo', 'bar'), w1.header()) eq_(etl.header(w1), w1.header()) ieq((('A', 1), ('B', 2)), w1.data()) ieq(etl.data(w1), w1.data()) w2 = w1.cut('bar', 'foo') expect2 = (('bar', 'foo'), (1, 'A'), (2, 'B')) ieq(expect2, w2) ieq(etl.cut(w1, 'bar', 'foo'), w2) w3 = w1.cut('bar', 'foo').cut('foo', 'bar') ieq(t1, w3) def test_staticmethods(): data = [b'foo,bar', b'a,1', b'b,2', b'c,2'] f = NamedTemporaryFile(mode='wb', delete=False) f.write(b'\n'.join(data)) f.close() expect = (('foo', 'bar'), ('a', '1'), ('b', '2'), ('c', '2')) actual = etl.fromcsv(f.name, encoding='ascii') ieq(expect, actual) ieq(expect, actual) # verify can iterate twice def test_container(): table = (('foo', 'bar'), ('a', 1), ('b', 2), ('c', 2)) actual = etl.wrap(table)[0] expect = ('foo', 'bar') eq_(expect, actual) actual = etl.wrap(table)['bar'] expect = (1, 2, 2) ieq(expect, actual) actual = len(etl.wrap(table)) expect = 4 eq_(expect, actual) def test_values_container_convenience_methods(): table = etl.wrap((('foo', 'bar'), ('a', 1), ('b', 2), ('c', 2))) actual = table.values('foo').set() expect = {'a', 'b', 'c'} eq_(expect, actual) actual = table.values('foo').list() expect = ['a', 'b', 'c'] eq_(expect, actual) actual = table.values('foo').tuple() expect = ('a', 'b', 'c') eq_(expect, actual) actual = table.values('bar').sum() expect = 5 eq_(expect, actual) actual = table.data().dict() expect = {'a': 1, 'b': 2, 'c': 2} eq_(expect, actual) def test_empty(): actual = ( etl .empty() .addcolumn('foo', ['a', 'b', 'c']) .addcolumn('bar', [1, 2, 2]) ) expect = (('foo', 'bar'), ('a', 1), ('b', 2), ('c', 2)) ieq(expect, actual) ieq(expect, actual) def test_wrap_tuple_return(): tablea = etl.wrap((('foo', 'bar'), ('A', 1), ('C', 7))) tableb = etl.wrap((('foo', 'bar'), ('B', 5), ('C', 7))) added, removed = tablea.diff(tableb) eq_(('foo', 'bar'), added.header()) eq_(('foo', 'bar'), removed.header()) ieq(etl.data(added), added.data()) ieq(etl.data(removed), removed.data()) petl-1.7.15/petl/test/test_helpers.py000066400000000000000000000023451457414240700175630ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division import pytest from petl.test.helpers import eq_, ieq, get_env_vars_named GET_ENV_PREFIX = "PETL_TEST_HELPER_ENVVAR_" def _testcase_get_env_vars_named(num_vals, prefix=""): res = {} for i in range(1, num_vals, 1): reskey = prefix + str(i) res[reskey] = str(i) return res @pytest.fixture() def setup_helpers_get_env_vars_named(monkeypatch): varlist = _testcase_get_env_vars_named(3, prefix=GET_ENV_PREFIX) for k, v in varlist.items(): monkeypatch.setenv(k, v) def test_helper_get_env_vars_named_prefixed(setup_helpers_get_env_vars_named): expected = _testcase_get_env_vars_named(3, GET_ENV_PREFIX) found = get_env_vars_named(GET_ENV_PREFIX, remove_prefix=False) ieq(found, expected) def test_helper_get_env_vars_named_unprefixed(setup_helpers_get_env_vars_named): expected = _testcase_get_env_vars_named(3) found = get_env_vars_named(GET_ENV_PREFIX, remove_prefix=True) ieq(found, expected) def test_helper_get_env_vars_named_not_found(setup_helpers_get_env_vars_named): expected = None found = get_env_vars_named("PETL_TEST_HELPER_ENVVAR_NOT_FOUND_") eq_(found, expected) petl-1.7.15/petl/test/test_interactive.py000066400000000000000000000032271457414240700204360ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division import petl as etl from petl.test.helpers import eq_ def test_repr(): table = (('foo', 'bar'), ('a', 1), ('b', 2), ('c', 2)) expect = str(etl.look(table)) actual = repr(etl.wrap(table)) eq_(expect, actual) def test_str(): table = (('foo', 'bar'), ('a', 1), ('b', 2), ('c', 2)) expect = str(etl.look(table, vrepr=str)) actual = str(etl.wrap(table)) eq_(expect, actual) def test_repr_html(): table = (('foo', 'bar'), ('a', 1), ('b', 2), ('c', 2)) expect = """
foo bar
a 1
b 2
c 2
""" actual = etl.wrap(table)._repr_html_() for l1, l2 in zip(expect.split('\n'), actual.split('\n')): eq_(l1, l2) def test_repr_html_limit(): table = (('foo', 'bar'), ('a', 1), ('b', 2), ('c', 2)) # lower limit etl.config.display_limit = 2 expect = """
foo bar
a 1
b 2

...

""" actual = etl.wrap(table)._repr_html_() print(actual) for l1, l2 in zip(expect.split('\n'), actual.split('\n')): eq_(l1, l2) petl-1.7.15/petl/test/transform/000077500000000000000000000000001457414240700165175ustar00rootroot00000000000000petl-1.7.15/petl/test/transform/__init__.py000066400000000000000000000001011457414240700206200ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division petl-1.7.15/petl/test/transform/test_basics.py000066400000000000000000000524151457414240700214030ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division import pytest from petl.errors import FieldSelectionError from petl.test.helpers import ieq from petl.util import expr, empty, coalesce from petl.transform.basics import cut, cat, addfield, rowslice, head, tail, \ cutout, skipcomments, annex, addrownumbers, addcolumn, \ addfieldusingcontext, movefield, stack, addfields def test_cut(): table = (('foo', 'bar', 'baz'), ('A', 1, 2), ('B', '2', '3.4'), (u'B', u'3', u'7.8', True), ('D', 'xyz', 9.0), ('E', None)) cut1 = cut(table, 'foo') expectation = (('foo',), ('A',), ('B',), (u'B',), ('D',), ('E',)) ieq(expectation, cut1) cut2 = cut(table, 'foo', 'baz') expectation = (('foo', 'baz'), ('A', 2), ('B', '3.4'), (u'B', u'7.8'), ('D', 9.0), ('E', None)) ieq(expectation, cut2) cut3 = cut(table, 0, 2) expectation = (('foo', 'baz'), ('A', 2), ('B', '3.4'), (u'B', u'7.8'), ('D', 9.0), ('E', None)) ieq(expectation, cut3) cut4 = cut(table, 'bar', 0) expectation = (('bar', 'foo'), (1, 'A'), ('2', 'B'), (u'3', u'B'), ('xyz', 'D'), (None, 'E')) ieq(expectation, cut4) cut5 = cut(table, ('foo', 'baz')) expectation = (('foo', 'baz'), ('A', 2), ('B', '3.4'), (u'B', u'7.8'), ('D', 9.0), ('E', None)) ieq(expectation, cut5) def test_cut_empty(): table = (('foo', 'bar'),) expect = (('bar',),) actual = cut(table, 'bar') ieq(expect, actual) def test_cut_headerless(): table = () with pytest.raises(FieldSelectionError): for i in cut(table, 'bar'): pass def test_cutout(): table = (('foo', 'bar', 'baz'), ('A', 1, 2), ('B', '2', '3.4'), (u'B', u'3', u'7.8', True), ('D', 'xyz', 9.0), ('E', None)) cut1 = cutout(table, 'bar', 'baz') expectation = (('foo',), ('A',), ('B',), (u'B',), ('D',), ('E',)) ieq(expectation, cut1) cut2 = cutout(table, 'bar') expectation = (('foo', 'baz'), ('A', 2), ('B', '3.4'), (u'B', u'7.8'), ('D', 9.0), ('E', None)) ieq(expectation, cut2) cut3 = cutout(table, 1) expectation = (('foo', 'baz'), ('A', 2), ('B', '3.4'), (u'B', u'7.8'), ('D', 9.0), ('E', None)) ieq(expectation, cut3) def test_cutout_headerless(): table = () with pytest.raises(FieldSelectionError): for i in cutout(table, 'bar'): pass def test_cat(): table1 = (('foo', 'bar'), (1, 'A'), (2, 'B')) table2 = (('bar', 'baz'), ('C', True), ('D', False)) cat1 = cat(table1, table2, missing=None) expectation = (('foo', 'bar', 'baz'), (1, 'A', None), (2, 'B', None), (None, 'C', True), (None, 'D', False)) ieq(expectation, cat1) # how does cat cope with uneven rows? table3 = (('foo', 'bar', 'baz'), ('A', 1, 2), ('B', '2', '3.4'), (u'B', u'3', u'7.8', True), ('D', 'xyz', 9.0), ('E', None)) cat3 = cat(table3, missing=None) expectation = (('foo', 'bar', 'baz'), ('A', 1, 2), ('B', '2', '3.4'), (u'B', u'3', u'7.8'), ('D', 'xyz', 9.0), ('E', None, None)) ieq(expectation, cat3) # cat more than two tables? cat4 = cat(table1, table2, table3) expectation = (('foo', 'bar', 'baz'), (1, 'A', None), (2, 'B', None), (None, 'C', True), (None, 'D', False), ('A', 1, 2), ('B', '2', '3.4'), (u'B', u'3', u'7.8'), ('D', 'xyz', 9.0), ('E', None, None)) ieq(expectation, cat4) def test_cat_with_header(): table1 = (('bar', 'foo'), ('A', 1), ('B', 2)) table2 = (('bar', 'baz'), ('C', True), ('D', False)) actual = cat(table1, header=['A', 'foo', 'B', 'bar', 'C']) expect = (('A', 'foo', 'B', 'bar', 'C'), (None, 1, None, 'A', None), (None, 2, None, 'B', None)) ieq(expect, actual) ieq(expect, actual) actual = cat(table1, table2, header=['A', 'foo', 'B', 'bar', 'C']) expect = (('A', 'foo', 'B', 'bar', 'C'), (None, 1, None, 'A', None), (None, 2, None, 'B', None), (None, None, None, 'C', None), (None, None, None, 'D', None)) ieq(expect, actual) ieq(expect, actual) def test_cat_empty(): table1 = (('foo', 'bar'), (1, 'A'), (2, 'B')) table2 = (('bar', 'baz'),) expect = (('foo', 'bar', 'baz'), (1, 'A', None), (2, 'B', None)) actual = cat(table1, table2) ieq(expect, actual) def test_cat_headerless(): table1 = (('foo', 'bar'), (1, 'A'), (2, 'B')) table2 = () expect = table1 # basically does nothing actual = cat(table1, table2) ieq(expect, actual) def test_cat_dupfields(): table1 = (('foo', 'foo'), (1, 'A'), (2,), (3, 'B', True)) # these cases are pathological, including to confirm expected behaviour, # but user needs to rename fields to get something sensible actual = cat(table1) expect = (('foo', 'foo'), (1, 1), (2, 2), (3, 3)) ieq(expect, actual) table2 = (('foo', 'foo', 'bar'), (4, 'C', True), (5, 'D', False)) actual = cat(table1, table2) expect = (('foo', 'foo', 'bar'), (1, 1, None), (2, 2, None), (3, 3, None), (4, 4, True), (5, 5, False)) ieq(expect, actual) def test_stack_dupfields(): table1 = (('foo', 'foo'), (1, 'A'), (2,), (3, 'B', True)) actual = stack(table1) expect = (('foo', 'foo'), (1, 'A'), (2, None), (3, 'B')) ieq(expect, actual) table2 = (('foo', 'foo', 'bar'), (4, 'C', True), (5, 'D', False)) actual = stack(table1, table2) expect = (('foo', 'foo'), (1, 'A'), (2, None), (3, 'B'), (4, 'C'), (5, 'D')) ieq(expect, actual) def test_stack_headerless(): table1 = (('foo', 'bar'), (1, 'A'), (2, 'B')) table2 = () expect = table1 # basically does nothing actual = stack(table1, table2) ieq(expect, actual) def test_addfield(): table = (('foo', 'bar'), ('M', 12), ('F', 34), ('-', 56)) result = addfield(table, 'baz', 42) expectation = (('foo', 'bar', 'baz'), ('M', 12, 42), ('F', 34, 42), ('-', 56, 42)) ieq(expectation, result) ieq(expectation, result) result = addfield(table, 'baz', lambda row: '%s,%s' % (row.foo, row.bar)) expectation = (('foo', 'bar', 'baz'), ('M', 12, 'M,12'), ('F', 34, 'F,34'), ('-', 56, '-,56')) ieq(expectation, result) ieq(expectation, result) result = addfield(table, 'baz', lambda rec: rec['bar'] * 2) expectation = (('foo', 'bar', 'baz'), ('M', 12, 24), ('F', 34, 68), ('-', 56, 112)) ieq(expectation, result) ieq(expectation, result) result = addfield(table, 'baz', expr('{bar} * 2')) expectation = (('foo', 'bar', 'baz'), ('M', 12, 24), ('F', 34, 68), ('-', 56, 112)) ieq(expectation, result) ieq(expectation, result) result = addfield(table, 'baz', 42, index=0) expectation = (('baz', 'foo', 'bar'), (42, 'M', 12), (42, 'F', 34), (42, '-', 56)) ieq(expectation, result) ieq(expectation, result) def test_addfield_empty(): table = (('foo', 'bar'),) expect = (('foo', 'bar', 'baz'),) actual = addfield(table, 'baz', 42) ieq(expect, actual) ieq(expect, actual) def test_addfield_headerless(): """When adding a field to a headerless table, implicitly add a header.""" table = () expect = (('foo',),) actual = addfield(table, 'foo', 1) ieq(expect, actual) ieq(expect, actual) def test_addfield_coalesce(): table = (('foo', 'bar', 'baz', 'quux'), ('M', 12, 23, 44), ('F', None, 23, 11), ('-', None, None, 42)) result = addfield(table, 'spong', coalesce('bar', 'baz', 'quux')) expect = (('foo', 'bar', 'baz', 'quux', 'spong'), ('M', 12, 23, 44, 12), ('F', None, 23, 11, 23), ('-', None, None, 42, 42)) ieq(expect, result) ieq(expect, result) result = addfield(table, 'spong', coalesce(1, 2, 3)) expect = (('foo', 'bar', 'baz', 'quux', 'spong'), ('M', 12, 23, 44, 12), ('F', None, 23, 11, 23), ('-', None, None, 42, 42)) ieq(expect, result) ieq(expect, result) def test_addfield_uneven_rows(): table = (('foo', 'bar'), ('M',), ('F', 34), ('-', 56, 'spong')) result = addfield(table, 'baz', 42) expectation = (('foo', 'bar', 'baz'), ('M', None, 42), ('F', 34, 42), ('-', 56, 42)) ieq(expectation, result) ieq(expectation, result) def test_addfield_dupfield(): table = (('foo', 'foo'), ('M', 12), ('F', 34), ('-', 56)) result = addfield(table, 'bar', 42) expectation = (('foo', 'foo', 'bar'), ('M', 12, 42), ('F', 34, 42), ('-', 56, 42)) ieq(expectation, result) ieq(expectation, result) def test_addfields(): table = (('foo', 'bar'), ('M', 12), ('F', 34), ('-', 56)) result = addfields(table, [('baz', 42), ('qux', lambda row: '%s,%s' % (row.foo, row.bar)), ('fiz', lambda rec: rec['bar'] * 2, 0)]) expectation = (('fiz', 'foo', 'bar', 'baz', 'qux'), (24, 'M', 12, 42, 'M,12'), (68, 'F', 34, 42, 'F,34'), (112, '-', 56, 42, '-,56')) ieq(expectation, result) ieq(expectation, result) def test_addfields_uneven_rows(): table = (('foo', 'bar'), ('M',), ('F', 34), ('-', 56, 'spong')) result = addfields(table, [('baz', 42), ('qux', 100), ('qux', 200)]) expectation = (('foo', 'bar', 'baz', 'qux', 'qux'), ('M', None, 42, 100, 200), ('F', 34, 42, 100, 200), ('-', 56, 42, 100, 200)) ieq(expectation, result) ieq(expectation, result) result = addfields(table, [('baz', 42), ('qux', 100, 0), ('qux', 200, 0)]) expectation = (('qux', 'qux', 'foo', 'bar', 'baz'), (200, 100, 'M', None, 42), (200, 100, 'F', 34, 42), (200, 100, '-', 56, 42)) ieq(expectation, result) ieq(expectation, result) def test_rowslice(): table = (('foo', 'bar', 'baz'), ('A', 1, 2), ('B', '2', '3.4'), (u'B', u'3', u'7.8', True), ('D', 'xyz', 9.0), ('E', None)) result = rowslice(table, 2) expectation = (('foo', 'bar', 'baz'), ('A', 1, 2), ('B', '2', '3.4')) ieq(expectation, result) result = rowslice(table, 1, 2) expectation = (('foo', 'bar', 'baz'), ('B', '2', '3.4')) ieq(expectation, result) result = rowslice(table, 1, 5, 2) expectation = (('foo', 'bar', 'baz'), ('B', '2', '3.4'), ('D', 'xyz', 9.0)) ieq(expectation, result) def test_rowslice_empty(): table = (('foo', 'bar'),) expect = (('foo', 'bar'),) actual = rowslice(table, 1, 2) ieq(expect, actual) def test_rowslice_headerless(): table = () expect = () actual = rowslice(table, 1, 2) ieq(expect, actual) def test_head(): table1 = (('foo', 'bar'), ('a', 1), ('b', 2), ('c', 5), ('d', 7), ('f', 42), ('f', 3), ('h', 90), ('k', 12), ('l', 77), ('q', 2)) table2 = head(table1, 4) expect = (('foo', 'bar'), ('a', 1), ('b', 2), ('c', 5), ('d', 7)) ieq(expect, table2) def test_head_raises_stop_iteration_for_empty_table(): table = iter(head([])) with pytest.raises(StopIteration): next(table) # header def test_head_raises_stop_iteration_for_header_only(): table1 = (('foo', 'bar', 'baz'),) table = iter(head(table1)) next(table) # header with pytest.raises(StopIteration): next(table) def test_tail(): table1 = (('foo', 'bar'), ('a', 1), ('b', 2), ('c', 5), ('d', 7), ('f', 42), ('f', 3), ('h', 90), ('k', 12), ('l', 77), ('q', 2)) table2 = tail(table1, 4) expect = (('foo', 'bar'), ('h', 90), ('k', 12), ('l', 77), ('q', 2)) ieq(expect, table2) def test_tail_empty(): table = (('foo', 'bar'),) expect = (('foo', 'bar'),) actual = tail(table) ieq(expect, actual) def test_tail_headerless(): table = () expect = () actual = tail(table) ieq(expect, actual) def test_skipcomments(): table1 = (('##aaa', 'bbb', 'ccc'), ('##mmm',), ('#foo', 'bar'), ('##nnn', 1), ('a', 1), ('b', 2)) table2 = skipcomments(table1, '##') expect2 = (('#foo', 'bar'), ('a', 1), ('b', 2)) ieq(expect2, table2) ieq(expect2, table2) # can iterate twice? def test_skipcomments_empty(): table1 = (('##aaa', 'bbb', 'ccc'), ('##mmm',), ('#foo', 'bar'), ('##nnn', 1)) table2 = skipcomments(table1, '##') expect2 = (('#foo', 'bar'),) ieq(expect2, table2) def test_annex(): table1 = (('foo', 'bar'), ('A', 9), ('C', 2), ('F', 1)) table2 = (('foo', 'baz'), ('B', 3), ('D', 10)) expect = (('foo', 'bar', 'foo', 'baz'), ('A', 9, 'B', 3), ('C', 2, 'D', 10), ('F', 1, None, None)) actual = annex(table1, table2) ieq(expect, actual) ieq(expect, actual) expect21 = (('foo', 'baz', 'foo', 'bar'), ('B', 3, 'A', 9), ('D', 10, 'C', 2), (None, None, 'F', 1)) actual21 = annex(table2, table1) ieq(expect21, actual21) ieq(expect21, actual21) def test_annex_uneven_rows(): table1 = (('foo', 'bar'), ('A', 9, True), ('C', 2), ('F',)) table2 = (('foo', 'baz'), ('B', 3), ('D', 10)) expect = (('foo', 'bar', 'foo', 'baz'), ('A', 9, 'B', 3), ('C', 2, 'D', 10), ('F', None, None, None)) actual = annex(table1, table2) ieq(expect, actual) ieq(expect, actual) def test_annex_headerless(): table1 = (('foo', 'bar'), ('C', 2)) table2 = () # does nothing expect = table1 actual = annex(table1, table2) ieq(expect, actual) ieq(expect, actual) def test_addrownumbers(): table1 = (('foo', 'bar'), ('A', 9), ('C', 2), ('F', 1)) expect = (('row', 'foo', 'bar'), (1, 'A', 9), (2, 'C', 2), (3, 'F', 1)) actual = addrownumbers(table1) ieq(expect, actual) ieq(expect, actual) def test_addrownumbers_field_name(): table1 = (('foo', 'bar'), ('A', 9), ('C', 2)) expect = (('id', 'foo', 'bar'), (1, 'A', 9), (2, 'C', 2)) actual = addrownumbers(table1, field='id') ieq(expect, actual) ieq(expect, actual) def test_addrownumbers_headerless(): """Adds a column row if there is none.""" table = () expect = (('id',),) actual = addrownumbers(table, field='id') ieq(expect, actual) ieq(expect, actual) def test_addcolumn(): table1 = (('foo', 'bar'), ('A', 1), ('B', 2)) col = [True, False] expect2 = (('foo', 'bar', 'baz'), ('A', 1, True), ('B', 2, False)) table2 = addcolumn(table1, 'baz', col) ieq(expect2, table2) ieq(expect2, table2) # test short column table3 = (('foo', 'bar'), ('A', 1), ('B', 2), ('C', 2)) expect4 = (('foo', 'bar', 'baz'), ('A', 1, True), ('B', 2, False), ('C', 2, None)) table4 = addcolumn(table3, 'baz', col) ieq(expect4, table4) # test short table col = [True, False, False] expect5 = (('foo', 'bar', 'baz'), ('A', 1, True), ('B', 2, False), (None, None, False)) table5 = addcolumn(table1, 'baz', col) ieq(expect5, table5) def test_empty_addcolumn(): table1 = empty() table2 = addcolumn(table1, 'foo', ['A', 'B']) table3 = addcolumn(table2, 'bar', [1, 2]) expect = (('foo', 'bar'), ('A', 1), ('B', 2)) ieq(expect, table3) ieq(expect, table3) def test_addcolumn_headerless(): """Adds a header row if none exists.""" table1 = () expect = (('foo',), ('A',), ('B',)) actual = addcolumn(table1, 'foo', ['A', 'B']) ieq(expect, actual) ieq(expect, actual) def test_addfieldusingcontext(): table1 = (('foo', 'bar'), ('A', 1), ('B', 4), ('C', 5), ('D', 9)) expect = (('foo', 'bar', 'baz', 'quux'), ('A', 1, None, 3), ('B', 4, 3, 1), ('C', 5, 1, 4), ('D', 9, 4, None)) def upstream(prv, cur, nxt): if prv is None: return None else: return cur.bar - prv.bar def downstream(prv, cur, nxt): if nxt is None: return None else: return nxt.bar - cur.bar table2 = addfieldusingcontext(table1, 'baz', upstream) table3 = addfieldusingcontext(table2, 'quux', downstream) ieq(expect, table3) ieq(expect, table3) def test_addfieldusingcontext_stateful(): table1 = (('foo', 'bar'), ('A', 1), ('B', 4), ('C', 5), ('D', 9)) expect = (('foo', 'bar', 'baz', 'quux'), ('A', 1, 1, 5), ('B', 4, 5, 10), ('C', 5, 10, 19), ('D', 9, 19, 19)) def upstream(prv, cur, nxt): if prv is None: return cur.bar else: return cur.bar + prv.baz def downstream(prv, cur, nxt): if nxt is None: return prv.quux elif prv is None: return nxt.bar + cur.bar else: return nxt.bar + prv.quux table2 = addfieldusingcontext(table1, 'baz', upstream) table3 = addfieldusingcontext(table2, 'quux', downstream) ieq(expect, table3) ieq(expect, table3) def test_addfieldusingcontext_empty(): table = empty() expect = (('foo',),) def query(prv, cur, nxt): return 0 actual = addfieldusingcontext(table, 'foo', query) ieq(expect, actual) ieq(expect, actual) def test_addfieldusingcontext_headerless(): table = () expect = (('foo',),) def query(prv, cur, nxt): return 0 actual = addfieldusingcontext(table, 'foo', query) ieq(expect, actual) ieq(expect, actual) def test_movefield(): table1 = (('foo', 'bar', 'baz'), (1, 'A', True), (2, 'B', False)) expect = (('bar', 'foo', 'baz'), ('A', 1, True), ('B', 2, False)) actual = movefield(table1, 'bar', 0) ieq(expect, actual) ieq(expect, actual) actual = movefield(table1, 'foo', 1) ieq(expect, actual) ieq(expect, actual) petl-1.7.15/petl/test/transform/test_conversions.py000066400000000000000000000256601457414240700225110ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division import pytest from petl.errors import FieldSelectionError from petl.test.failonerror import assert_failonerror from petl.test.helpers import ieq from petl.transform.conversions import convert, convertall, convertnumbers, \ replace, update, format, interpolate from functools import partial def test_convert(): table1 = (('foo', 'bar', 'baz'), ('A', 1, 2), ('B', '2', '3.4'), (u'B', u'3', u'7.8', True), ('D', 'xyz', 9.0), ('E', None)) # test the simplest style - single field, lambda function table2 = convert(table1, 'foo', lambda s: s.lower()) expect2 = (('foo', 'bar', 'baz'), ('a', 1, 2), ('b', '2', '3.4'), (u'b', u'3', u'7.8', True), ('d', 'xyz', 9.0), ('e', None)) ieq(expect2, table2) ieq(expect2, table2) # test single field with method call table3 = convert(table1, 'foo', 'lower') expect3 = expect2 ieq(expect3, table3) # test single field with method call with arguments table4 = convert(table1, 'foo', 'replace', 'B', 'BB') expect4 = (('foo', 'bar', 'baz'), ('A', 1, 2), ('BB', '2', '3.4'), (u'BB', u'3', u'7.8', True), ('D', 'xyz', 9.0), ('E', None)) ieq(expect4, table4) # test multiple fields with the same conversion table5 = convert(table1, ('bar', 'baz'), str) expect5 = (('foo', 'bar', 'baz'), ('A', '1', '2'), ('B', '2', '3.4'), (u'B', u'3', u'7.8', True), ('D', 'xyz', '9.0'), ('E', 'None')) ieq(expect5, table5) # test convert with dictionary table6 = convert(table1, 'foo', {'A': 'Z', 'B': 'Y'}) expect6 = (('foo', 'bar', 'baz'), ('Z', 1, 2), ('Y', '2', '3.4'), (u'Y', u'3', u'7.8', True), ('D', 'xyz', 9.0), ('E', None)) ieq(expect6, table6) def test_convert_empty(): table = (('foo', 'bar'),) expect = (('foo', 'bar'),) actual = convert(table, 'foo', int) ieq(expect, actual) def test_convert_headerless(): table = () with pytest.raises(FieldSelectionError): for i in convert(table, 'foo', int): pass def test_convert_headerless_no_conversions(): table = expect = () actual = convert(table) ieq(expect, actual) def test_convert_indexes(): table1 = (('foo', 'bar', 'baz'), ('A', 1, 2), ('B', '2', '3.4'), (u'B', u'3', u'7.8', True), ('D', 'xyz', 9.0), ('E', None)) # test the simplest style - single field, lambda function table2 = convert(table1, 0, lambda s: s.lower()) expect2 = (('foo', 'bar', 'baz'), ('a', 1, 2), ('b', '2', '3.4'), (u'b', u'3', u'7.8', True), ('d', 'xyz', 9.0), ('e', None)) ieq(expect2, table2) ieq(expect2, table2) # test single field with method call table3 = convert(table1, 0, 'lower') expect3 = expect2 ieq(expect3, table3) # test single field with method call with arguments table4 = convert(table1, 0, 'replace', 'B', 'BB') expect4 = (('foo', 'bar', 'baz'), ('A', 1, 2), ('BB', '2', '3.4'), (u'BB', u'3', u'7.8', True), ('D', 'xyz', 9.0), ('E', None)) ieq(expect4, table4) # test multiple fields with the same conversion table5a = convert(table1, (1, 2), str) table5b = convert(table1, (1, 'baz'), str) table5c = convert(table1, ('bar', 2), str) table5d = convert(table1, list(range(1, 3)), str) expect5 = (('foo', 'bar', 'baz'), ('A', '1', '2'), ('B', '2', '3.4'), (u'B', u'3', u'7.8', True), ('D', 'xyz', '9.0'), ('E', 'None')) ieq(expect5, table5a) ieq(expect5, table5b) ieq(expect5, table5c) ieq(expect5, table5d) # test convert with dictionary table6 = convert(table1, 0, {'A': 'Z', 'B': 'Y'}) expect6 = (('foo', 'bar', 'baz'), ('Z', 1, 2), ('Y', '2', '3.4'), (u'Y', u'3', u'7.8', True), ('D', 'xyz', 9.0), ('E', None)) ieq(expect6, table6) def test_fieldconvert(): table1 = (('foo', 'bar', 'baz'), ('A', 1, 2), ('B', '2', '3.4'), (u'B', u'3', u'7.8', True), ('D', 'xyz', 9.0), ('E', None)) # test the style where the converters functions are passed in as a # dictionary converters = {'foo': str, 'bar': int, 'baz': float} table5 = convert(table1, converters, errorvalue='error') expect5 = (('foo', 'bar', 'baz'), ('A', 1, 2.0), ('B', 2, 3.4), ('B', 3, 7.8, True), # N.B., long rows are preserved ('D', 'error', 9.0), ('E', 'error')) # N.B., short rows are preserved ieq(expect5, table5) # test the style where the converters functions are added one at a time table6 = convert(table1, errorvalue='err') table6['foo'] = str table6['bar'] = int table6['baz'] = float expect6 = (('foo', 'bar', 'baz'), ('A', 1, 2.0), ('B', 2, 3.4), ('B', 3, 7.8, True), ('D', 'err', 9.0), ('E', 'err')) ieq(expect6, table6) # test some different converters table7 = convert(table1) table7['foo'] = 'replace', 'B', 'BB' expect7 = (('foo', 'bar', 'baz'), ('A', 1, 2), ('BB', '2', '3.4'), (u'BB', u'3', u'7.8', True), ('D', 'xyz', 9.0), ('E', None)) ieq(expect7, table7) # test the style where the converters functions are passed in as a list converters = [str, int, float] table8 = convert(table1, converters, errorvalue='error') expect8 = (('foo', 'bar', 'baz'), ('A', 1, 2.0), ('B', 2, 3.4), ('B', 3, 7.8, True), # N.B., long rows are preserved ('D', 'error', 9.0), ('E', 'error')) # N.B., short rows are preserved ieq(expect8, table8) # test the style where the converters functions are passed in as a list converters = [str, None, float] table9 = convert(table1, converters, errorvalue='error') expect9 = (('foo', 'bar', 'baz'), ('A', 1, 2.0), ('B', '2', 3.4), ('B', u'3', 7.8, True), # N.B., long rows are preserved ('D', 'xyz', 9.0), ('E', None)) # N.B., short rows are preserved ieq(expect9, table9) def test_convertall(): table1 = (('foo', 'bar', 'baz'), ('1', '3', '9'), ('2', '1', '7')) table2 = convertall(table1, int) expect2 = (('foo', 'bar', 'baz'), (1, 3, 9), (2, 1, 7)) ieq(expect2, table2) ieq(expect2, table2) # test with non-string field names table1 = (('foo', 3, 4), (2, 2, 2)) table2 = convertall(table1, lambda x: x**2) expect = (('foo', 3, 4), (4, 4, 4)) ieq(expect, table2) def test_convertnumbers(): table1 = (('foo', 'bar', 'baz', 'quux'), ('1', '3.0', '9+3j', 'aaa'), ('2', '1.3', '7+2j', None)) table2 = convertnumbers(table1) expect2 = (('foo', 'bar', 'baz', 'quux'), (1, 3.0, 9+3j, 'aaa'), (2, 1.3, 7+2j, None)) ieq(expect2, table2) ieq(expect2, table2) def test_convert_translate(): table = (('foo', 'bar'), ('M', 12), ('F', 34), ('-', 56)) trans = {'M': 'male', 'F': 'female'} result = convert(table, 'foo', trans) expectation = (('foo', 'bar'), ('male', 12), ('female', 34), ('-', 56)) ieq(expectation, result) def test_convert_with_row(): table = (('foo', 'bar'), ('a', 1), ('b', 2)) expect = (('foo', 'bar'), ('a', 'A'), ('b', 'B')) actual = convert(table, 'bar', lambda v, row: row.foo.upper(), pass_row=True) ieq(expect, actual) def test_convert_with_row_backwards_compat(): table = (('foo', 'bar'), (' a ', 1), (' b ', 2)) expect = (('foo', 'bar'), ('a', 1), ('b', 2)) actual = convert(table, 'foo', 'strip') ieq(expect, actual) def test_convert_where(): tbl1 = (('foo', 'bar'), ('a', 1), ('b', 2)) expect = (('foo', 'bar'), ('a', 1), ('b', 4)) actual = convert(tbl1, 'bar', lambda v: v*2, where=lambda r: r.foo == 'b') ieq(expect, actual) ieq(expect, actual) actual = convert(tbl1, 'bar', lambda v: v*2, where="{foo} == 'b'") ieq(expect, actual) ieq(expect, actual) def test_convert_failonerror(): input_ = (('foo',), ('A',), (1,)) cvt_ = {'foo': 'lower'} expect_ = (('foo',), ('a',), (None,)) assert_failonerror( input_fn=partial(convert, input_, cvt_), expected_output=expect_) def test_replace_where(): tbl1 = (('foo', 'bar'), ('a', 1), ('b', 2)) expect = (('foo', 'bar'), ('a', 1), ('b', 4)) actual = replace(tbl1, 'bar', 2, 4, where=lambda r: r.foo == 'b') ieq(expect, actual) ieq(expect, actual) actual = replace(tbl1, 'bar', 2, 4, where="{foo} == 'b'") ieq(expect, actual) ieq(expect, actual) def test_update(): table1 = (('foo', 'bar', 'baz'), ('A', 1, 2), ('B', '2', '3.4'), (u'B', u'3', u'7.8', True), ('D', 'xyz', 9.0), ('E', None)) table2 = update(table1, 'foo', 'X') expect2 = (('foo', 'bar', 'baz'), ('X', 1, 2), ('X', '2', '3.4'), ('X', u'3', u'7.8', True), ('X', 'xyz', 9.0), ('X', None)) ieq(expect2, table2) ieq(expect2, table2) def test_replace_unhashable(): table1 = (('foo', 'bar'), ('a', ['b']), ('c', None)) expect = (('foo', 'bar'), ('a', ['b']), ('c', [])) actual = replace(table1, 'bar', None, []) ieq(expect, actual) def test_format(): table = (('foo', 'bar'), ('a', 1), ('b', 2)) expect = (('foo', 'bar'), ('a', '01'), ('b', '02')) actual = format(table, 'bar', '{0:02d}') ieq(expect, actual) ieq(expect, actual) def test_interpolate(): table = (('foo', 'bar'), ('a', 1), ('b', 2)) expect = (('foo', 'bar'), ('a', '01'), ('b', '02')) actual = interpolate(table, 'bar', '%02d') ieq(expect, actual) ieq(expect, actual) petl-1.7.15/petl/test/transform/test_dedup.py000066400000000000000000000165451457414240700212440ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division import pytest from petl.errors import FieldSelectionError from petl.test.helpers import ieq from petl.transform.dedup import duplicates, unique, conflicts, distinct, \ isunique def test_duplicates(): table = (('foo', 'bar', 'baz'), ('A', 1, 2), ('B', '2', '3.4'), ('D', 'xyz', 9.0), ('B', u'3', u'7.8', True), ('B', '2', 42), ('E', None), ('D', 4, 12.3)) result = duplicates(table, 'foo') expectation = (('foo', 'bar', 'baz'), ('B', '2', '3.4'), ('B', u'3', u'7.8', True), ('B', '2', 42), ('D', 'xyz', 9.0), ('D', 4, 12.3)) ieq(expectation, result) # test with compound key result = duplicates(table, key=('foo', 'bar')) expectation = (('foo', 'bar', 'baz'), ('B', '2', '3.4'), ('B', '2', 42)) ieq(expectation, result) def test_duplicates_headerless_no_keys(): """Removing the duplicates from an empty table without specifying which columns shouldn't be a problem. """ table = [] actual = duplicates(table) expect = [] ieq(expect, actual) def test_duplicates_headerless_explicit(): table = [] with pytest.raises(FieldSelectionError): for i in duplicates(table, 'foo'): pass def test_duplicates_empty(): table = (('foo', 'bar'),) expect = (('foo', 'bar'),) actual = duplicates(table, key='foo') ieq(expect, actual) def test_duplicates_wholerow(): table = (('foo', 'bar', 'baz'), ('A', 1, 2), ('B', '2', '3.4'), ('B', '2', '3.4'), ('D', 4, 12.3)) result = duplicates(table) expectation = (('foo', 'bar', 'baz'), ('B', '2', '3.4'), ('B', '2', '3.4')) ieq(expectation, result) def test_unique(): table = (('foo', 'bar', 'baz'), ('A', 1, 2), ('B', '2', '3.4'), ('D', 'xyz', 9.0), ('B', u'3', u'7.8', True), ('B', '2', 42), ('E', None), ('D', 4, 12.3), ('F', 7, 2.3)) result = unique(table, 'foo') expectation = (('foo', 'bar', 'baz'), ('A', 1, 2), ('E', None), ('F', 7, 2.3)) ieq(expectation, result) ieq(expectation, result) # test with compound key result = unique(table, key=('foo', 'bar')) expectation = (('foo', 'bar', 'baz'), ('A', 1, 2), ('B', u'3', u'7.8', True), ('D', 4, 12.3), ('D', 'xyz', 9.0), ('E', None), ('F', 7, 2.3)) ieq(expectation, result) ieq(expectation, result) def test_unique_empty(): table = (('foo', 'bar'),) expect = (('foo', 'bar'),) actual = unique(table, key='foo') ieq(expect, actual) def test_unique_wholerow(): table = (('foo', 'bar', 'baz'), ('A', 1, 2), ('B', '2', '3.4'), ('B', '2', '3.4'), ('D', 4, 12.3)) result = unique(table) expectation = (('foo', 'bar', 'baz'), ('A', 1, 2), ('D', 4, 12.3)) ieq(expectation, result) def test_conflicts(): table = (('foo', 'bar', 'baz'), ('A', 1, 2), ('B', '2', None), ('D', 'xyz', 9.4), ('B', None, u'7.8', True), ('E', None), ('D', 'xyz', 12.3), ('A', 2, None)) result = conflicts(table, 'foo', missing=None) expectation = (('foo', 'bar', 'baz'), ('A', 1, 2), ('A', 2, None), ('D', 'xyz', 9.4), ('D', 'xyz', 12.3)) ieq(expectation, result) ieq(expectation, result) result = conflicts(table, 'foo', missing=None, exclude='baz') expectation = (('foo', 'bar', 'baz'), ('A', 1, 2), ('A', 2, None)) ieq(expectation, result) ieq(expectation, result) result = conflicts(table, 'foo', missing=None, exclude=('bar', 'baz')) expectation = (('foo', 'bar', 'baz'),) ieq(expectation, result) ieq(expectation, result) result = conflicts(table, 'foo', missing=None, include='bar') expectation = (('foo', 'bar', 'baz'), ('A', 1, 2), ('A', 2, None)) print(expectation) print(list(result)) ieq(expectation, result) ieq(expectation, result) result = conflicts(table, 'foo', missing=None, include=('bar', 'baz')) expectation = (('foo', 'bar', 'baz'), ('A', 1, 2), ('A', 2, None), ('D', 'xyz', 9.4), ('D', 'xyz', 12.3)) ieq(expectation, result) ieq(expectation, result) def test_conflicts_empty(): table = (('foo', 'bar'),) expect = (('foo', 'bar'),) actual = conflicts(table, key='foo') ieq(expect, actual) def test_distinct(): table = (('foo', 'bar', 'baz'), ('A', 1, 2), ('B', '2', '3.4'), ('B', '2', '3.4'), ('D', 4, 12.3), (None, None, None)) result = distinct(table) expect = (('foo', 'bar', 'baz'), (None, None, None), ('A', 1, 2), ('B', '2', '3.4'), ('D', 4, 12.3)) ieq(expect, result) def test_distinct_count(): table = (('foo', 'bar', 'baz'), (None, None, None), ('A', 1, 2), ('B', '2', '3.4'), ('B', '2', '3.4'), ('D', 4, 12.3)) result = distinct(table, count='count') expect = (('foo', 'bar', 'baz', 'count'), (None, None, None, 1), ('A', 1, 2, 1), ('B', '2', '3.4', 2), ('D', 4, 12.3, 1)) ieq(expect, result) def test_key_distinct(): table = (('foo', 'bar', 'baz'), (None, None, None), ('A', 1, 2), ('B', '2', '3.4'), ('B', '2', '5'), ('D', 4, 12.3)) result = distinct(table, key='foo') expect = (('foo', 'bar', 'baz'), (None, None, None), ('A', 1, 2), ('B', '2', '3.4'), ('D', 4, 12.3)) ieq(expect, result) def test_key_distinct_2(): # test for https://github.com/alimanfoo/petl/issues/318 tbl = (('a', 'b'), ('x', '1'), ('x', '3'), ('y', '1'), (None, None)) result = distinct(tbl, key='b') expect = (('a', 'b'), (None, None), ('x', '1'), ('x', '3')) ieq(expect, result) def test_key_distinct_count(): table = (('foo', 'bar', 'baz'), ('A', 1, 2), ('B', '2', '3.4'), ('B', '2', '5'), ('D', 4, 12.3), (None, None, None)) result = distinct(table, key='foo', count='count') expect = (('foo', 'bar', 'baz', 'count'), (None, None, None, 1), ('A', 1, 2, 1), ('B', '2', '3.4', 2), ('D', 4, 12.3, 1)) ieq(expect, result) def test_isunique(): table = (('foo', 'bar'), ('a', 1), ('b',), ('b', 2), ('c', 3, True)) assert not isunique(table, 'foo') assert isunique(table, 'bar') petl-1.7.15/petl/test/transform/test_fills.py000066400000000000000000000056331457414240700212500ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division from petl.test.helpers import ieq from petl.transform.fills import filldown, fillleft, fillright def test_filldown(): table = (('foo', 'bar', 'baz'), (1, 'a', None), (1, None, .23), (1, 'b', None), (2, None, None), (2, None, .56), (2, 'c', None), (None, 'c', .72)) actual = filldown(table) expect = (('foo', 'bar', 'baz'), (1, 'a', None), (1, 'a', .23), (1, 'b', .23), (2, 'b', .23), (2, 'b', .56), (2, 'c', .56), (2, 'c', .72)) ieq(expect, actual) ieq(expect, actual) actual = filldown(table, 'bar') expect = (('foo', 'bar', 'baz'), (1, 'a', None), (1, 'a', .23), (1, 'b', None), (2, 'b', None), (2, 'b', .56), (2, 'c', None), (None, 'c', .72)) ieq(expect, actual) ieq(expect, actual) actual = filldown(table, 'foo', 'bar') expect = (('foo', 'bar', 'baz'), (1, 'a', None), (1, 'a', .23), (1, 'b', None), (2, 'b', None), (2, 'b', .56), (2, 'c', None), (2, 'c', .72)) ieq(expect, actual) ieq(expect, actual) def test_filldown_headerless(): table = [] actual = filldown(table, 'foo') expect = [] ieq(expect, actual) def test_fillright(): table = (('foo', 'bar', 'baz'), (1, 'a', None), (1, None, .23), (1, 'b', None), (2, None, None), (2, None, .56), (2, 'c', None), (None, 'c', .72)) actual = fillright(table) expect = (('foo', 'bar', 'baz'), (1, 'a', 'a'), (1, 1, .23), (1, 'b', 'b'), (2, 2, 2), (2, 2, .56), (2, 'c', 'c'), (None, 'c', .72)) ieq(expect, actual) ieq(expect, actual) def test_fillright_headerless(): table = [] actual = fillright(table, 'foo') expect = [] ieq(expect, actual) def test_fillleft(): table = (('foo', 'bar', 'baz'), (1, 'a', None), (1, None, .23), (1, 'b', None), (2, None, None), (None, None, .56), (2, 'c', None), (None, 'c', .72)) actual = fillleft(table) expect = (('foo', 'bar', 'baz'), (1, 'a', None), (1, .23, .23), (1, 'b', None), (2, None, None), (.56, .56, .56), (2, 'c', None), ('c', 'c', .72)) ieq(expect, actual) ieq(expect, actual) def test_fillleft_headerless(): table = [] actual = fillleft(table, 'foo') expect = [] ieq(expect, actual) petl-1.7.15/petl/test/transform/test_headers.py000066400000000000000000000177151457414240700215560ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division import pytest from petl.test.helpers import ieq from petl.errors import FieldSelectionError from petl.util import fieldnames from petl.transform.headers import setheader, extendheader, pushheader, skip, \ rename, prefixheader, suffixheader, sortheader def test_setheader(): table1 = (('foo', 'bar'), ('a', 1), ('b', 2)) table2 = setheader(table1, ['foofoo', 'barbar']) expect2 = (('foofoo', 'barbar'), ('a', 1), ('b', 2)) ieq(expect2, table2) ieq(expect2, table2) # can iterate twice? def test_setheader_empty(): table1 = (('foo', 'bar'),) table2 = setheader(table1, ['foofoo', 'barbar']) expect2 = (('foofoo', 'barbar'),) ieq(expect2, table2) def test_setheader_headerless(): table = [] actual = setheader(table, ['foo', 'bar']) expect = [('foo', 'bar')] ieq(expect, actual) def test_extendheader(): table1 = (('foo',), ('a', 1, True), ('b', 2, False)) table2 = extendheader(table1, ['bar', 'baz']) expect2 = (('foo', 'bar', 'baz'), ('a', 1, True), ('b', 2, False)) ieq(expect2, table2) ieq(expect2, table2) # can iterate twice? def test_extendheader_empty(): table1 = (('foo',),) table2 = extendheader(table1, ['bar', 'baz']) expect2 = (('foo', 'bar', 'baz'),) ieq(expect2, table2) def test_extendheader_headerless(): table = [] actual = extendheader(table, ['foo', 'bar']) expect = [('foo', 'bar')] ieq(expect, actual) ieq(expect, actual) def test_pushheader(): table1 = (('a', 1), ('b', 2)) table2 = pushheader(table1, ['foo', 'bar']) expect2 = (('foo', 'bar'), ('a', 1), ('b', 2)) ieq(expect2, table2) ieq(expect2, table2) # can iterate twice? def test_pushheader_empty(): table1 = (('a', 1),) table2 = pushheader(table1, ['foo', 'bar']) expect2 = (('foo', 'bar'), ('a', 1)) ieq(expect2, table2) table1 = tuple() table2 = pushheader(table1, ['foo', 'bar']) expect2 = (('foo', 'bar'),) ieq(expect2, table2) table1 = tuple() table2 = pushheader(table1, 'foo', 'bar') expect2 = (('foo', 'bar'),) ieq(expect2, table2) def test_pushheader_headerless(): table = [] actual = pushheader(table, ['foo', 'bar']) expect = [('foo', 'bar')] ieq(expect, actual) ieq(expect, actual) def test_pushheader_positional(): table1 = (('a', 1), ('b', 2)) # positional arguments instead of list table2 = pushheader(table1, 'foo', 'bar') expect2 = (('foo', 'bar'), ('a', 1), ('b', 2)) ieq(expect2, table2) ieq(expect2, table2) # can iterate twice? # test with many fields table1 = (('a', 1, 11, 111, 1111), ('b', 2, 22, 222, 2222)) # positional arguments instead of list table2 = pushheader(table1, 'foo', 'bar', 'foo1', 'foo2', 'foo3') expect2 = (('foo', 'bar', 'foo1', 'foo2', 'foo3'), ('a', 1, 11, 111, 1111), ('b', 2, 22, 222, 2222)) ieq(expect2, table2) ieq(expect2, table2) # can iterate twice? # test with too few fields in header table1 = (('a', 1, 11, 111, 1111), ('b', 2, 22, 222, 2222)) # positional arguments instead of list table2 = pushheader(table1, 'foo', 'bar', 'foo1', 'foo2') expect2 = (('foo', 'bar', 'foo1', 'foo2'), ('a', 1, 11, 111, 1111), ('b', 2, 22, 222, 2222)) ieq(expect2, table2) ieq(expect2, table2) # can iterate twice? # test with too many fields in header table1 = (('a', 1, 11, 111, 1111), ('b', 2, 22, 222, 2222)) # positional arguments instead of list table2 = pushheader(table1, 'foo', 'bar', 'foo1', 'foo2', 'foo3', 'foo4') expect2 = (('foo', 'bar', 'foo1', 'foo2', 'foo3', 'foo4'), ('a', 1, 11, 111, 1111), ('b', 2, 22, 222, 2222)) ieq(expect2, table2) ieq(expect2, table2) # can iterate twice? def test_skip(): table1 = (('#aaa', 'bbb', 'ccc'), ('#mmm',), ('foo', 'bar'), ('a', 1), ('b', 2)) table2 = skip(table1, 2) expect2 = (('foo', 'bar'), ('a', 1), ('b', 2)) ieq(expect2, table2) ieq(expect2, table2) # can iterate twice? def test_skip_empty(): table1 = (('#aaa', 'bbb', 'ccc'), ('#mmm',), ('foo', 'bar')) table2 = skip(table1, 2) expect2 = (('foo', 'bar'),) ieq(expect2, table2) def test_skip_headerless(): table = [] actual = skip(table, 2) expect = [] ieq(expect, actual) def test_rename(): table = (('foo', 'bar'), ('M', 12), ('F', 34), ('-', 56)) result = rename(table, 'foo', 'foofoo') assert fieldnames(result) == ('foofoo', 'bar') result = rename(table, 0, 'foofoo') assert fieldnames(result) == ('foofoo', 'bar') result = rename(table, {'foo': 'foofoo', 'bar': 'barbar'}) assert fieldnames(result) == ('foofoo', 'barbar') result = rename(table, {0: 'baz', 1: 'quux'}) assert fieldnames(result) == ('baz', 'quux') result = rename(table) result['foo'] = 'spong' assert fieldnames(result) == ('spong', 'bar') def test_rename_strict(): table = (('foo', 'bar'), ('M', 12), ('F', 34), ('-', 56)) result = rename(table, 'baz', 'quux') try: fieldnames(result) except FieldSelectionError: pass else: assert False, 'exception expected' result = rename(table, 2, 'quux') try: fieldnames(result) except FieldSelectionError: pass else: assert False, 'exception expected' result = rename(table, 'baz', 'quux', strict=False) assert fieldnames(result) == ('foo', 'bar') result = rename(table, 2, 'quux', strict=False) assert fieldnames(result) == ('foo', 'bar') def test_rename_empty(): table = (('foo', 'bar'),) expect = (('foofoo', 'bar'),) actual = rename(table, 'foo', 'foofoo') ieq(expect, actual) def test_rename_headerless(): table = [] with pytest.raises(FieldSelectionError): for i in rename(table, 'foo', 'foofoo'): pass def test_prefixheader(): table1 = (('foo', 'bar'), (1, 'A'), (2, 'B')) expect = (('pre_foo', 'pre_bar'), (1, 'A'), (2, 'B')) actual = prefixheader(table1, 'pre_') ieq(expect, actual) ieq(expect, actual) def test_prefixheader_headerless(): table = [] actual = prefixheader(table, 'pre_') expect = [] ieq(expect, actual) def test_suffixheader(): table1 = (('foo', 'bar'), (1, 'A'), (2, 'B')) expect = (('foo_suf', 'bar_suf'), (1, 'A'), (2, 'B')) actual = suffixheader(table1, '_suf') ieq(expect, actual) ieq(expect, actual) def test_suffixheader_headerless(): table = [] actual = suffixheader(table, '_suf') expect = [] ieq(expect, actual) def test_sortheaders(): table1 = ( ('id', 'foo', 'bar', 'baz'), ('a', 1, 2, 3), ('b', 4, 5, 6)) expect = ( ('bar', 'baz', 'foo', 'id'), (2, 3, 1, 'a'), (5, 6, 4, 'b'), ) actual = sortheader(table1) ieq(expect, actual) def test_sortheaders_duplicate_headers(): """ Failing test case provided in sortheader() with duplicate column names overlays values #392 """ table1 = ( ('id', 'foo', 'foo', 'foo'), ('a', 1, 2, 3), ('b', 4, 5, 6)) expect = ( ('foo', 'foo', 'foo', 'id'), (1, 2, 3, 'a'), (4, 5, 6, 'b'), ) actual = sortheader(table1) ieq(expect, actual) def test_sortheader_headerless(): table = [] actual = sortheader(table) expect = [] ieq(expect, actual) petl-1.7.15/petl/test/transform/test_intervals.py000066400000000000000000000777171457414240700221620ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division import logging import pytest import petl as etl from petl.test.helpers import ieq, eq_ from petl.util.vis import lookall from petl.errors import DuplicateKeyError from petl.transform.intervals import intervallookup, intervallookupone, \ facetintervallookup, facetintervallookupone, intervaljoin, \ intervalleftjoin, intervaljoinvalues, intervalsubtract, \ collapsedintervals, _Interval, intervalantijoin logger = logging.getLogger(__name__) debug = logger.debug try: # noinspection PyUnresolvedReferences import intervaltree except ImportError as e: pytest.skip('SKIP interval tests: %s' % e, allow_module_level=True) else: def test_intervallookup(): table = (('start', 'stop', 'value'), (1, 4, 'foo'), (3, 7, 'bar'), (4, 9, 'baz')) lkp = intervallookup(table, 'start', 'stop') actual = lkp.search(0, 1) expect = [] eq_(expect, actual) actual = lkp.search(1, 2) expect = [(1, 4, 'foo')] eq_(expect, actual) actual = lkp.search(2, 4) expect = [(1, 4, 'foo'), (3, 7, 'bar')] eq_(expect, actual) actual = lkp.search(2, 5) expect = [(1, 4, 'foo'), (3, 7, 'bar'), (4, 9, 'baz')] eq_(expect, actual) actual = lkp.search(9, 14) expect = [] eq_(expect, actual) actual = lkp.search(19, 140) expect = [] eq_(expect, actual) actual = lkp.search(1) expect = [(1, 4, 'foo')] eq_(expect, actual) actual = lkp.search(2) expect = [(1, 4, 'foo')] eq_(expect, actual) actual = lkp.search(4) expect = [(3, 7, 'bar'), (4, 9, 'baz')] eq_(expect, actual) actual = lkp.search(5) expect = [(3, 7, 'bar'), (4, 9, 'baz')] eq_(expect, actual) def test_intervallookup_include_stop(): table = (('start', 'stop', 'value'), (1, 4, 'foo'), (3, 7, 'bar'), (4, 9, None)) lkp = intervallookup(table, 'start', 'stop', value='value', include_stop=True) actual = lkp.search(0, 1) expect = ['foo'] eq_(expect, actual) actual = lkp.search(1, 2) expect = ['foo'] eq_(expect, actual) actual = lkp.search(2, 4) expect = ['foo', 'bar', None] eq_(expect, actual) actual = lkp.search(2, 5) expect = ['foo', 'bar', None] eq_(expect, actual) actual = lkp.search(9, 14) expect = [None] eq_(expect, actual) actual = lkp.search(19, 140) expect = [] eq_(expect, actual) actual = lkp.search(1) expect = ['foo'] eq_(expect, actual) actual = lkp.search(2) expect = ['foo'] eq_(expect, actual) actual = lkp.search(4) expect = ['foo', 'bar', None] eq_(expect, actual) actual = lkp.search(5) expect = ['bar', None] eq_(expect, actual) def test_intervallookupone(): table = (('start', 'stop', 'value'), (1, 4, 'foo'), (3, 7, 'bar'), (4, 9, 'baz')) lkp = intervallookupone(table, 'start', 'stop', value='value') actual = lkp.search(0, 1) expect = None eq_(expect, actual) actual = lkp.search(1, 2) expect = 'foo' eq_(expect, actual) try: lkp.search(2, 4) except DuplicateKeyError: pass else: assert False, 'expected error' try: lkp.search(2, 5) except DuplicateKeyError: pass else: assert False, 'expected error' try: lkp.search(4, 5) except DuplicateKeyError: pass else: assert False, 'expected error' try: lkp.search(5, 7) except DuplicateKeyError: pass else: assert False, 'expected error' actual = lkp.search(8, 9) expect = 'baz' eq_(expect, actual) actual = lkp.search(9, 14) expect = None eq_(expect, actual) actual = lkp.search(19, 140) expect = None eq_(expect, actual) actual = lkp.search(0) expect = None eq_(expect, actual) actual = lkp.search(1) expect = 'foo' eq_(expect, actual) actual = lkp.search(2) expect = 'foo' eq_(expect, actual) try: lkp.search(4) except DuplicateKeyError: pass else: assert False, 'expected error' try: lkp.search(5) except DuplicateKeyError: pass else: assert False, 'expected error' actual = lkp.search(8) expect = 'baz' eq_(expect, actual) def test_intervallookupone_not_strict(): table = (('start', 'stop', 'value'), (1, 4, 'foo'), (3, 7, 'bar'), (4, 9, 'baz')) lkp = intervallookupone(table, 'start', 'stop', value='value', strict=False) actual = lkp.search(0, 1) expect = None eq_(expect, actual) actual = lkp.search(1, 2) expect = 'foo' eq_(expect, actual) actual = lkp.search(2, 4) expect = 'foo' eq_(expect, actual) actual = lkp.search(2, 5) expect = 'foo' eq_(expect, actual) actual = lkp.search(4, 5) expect = 'bar' eq_(expect, actual) actual = lkp.search(5, 7) expect = 'bar' eq_(expect, actual) actual = lkp.search(8, 9) expect = 'baz' eq_(expect, actual) actual = lkp.search(9, 14) expect = None eq_(expect, actual) actual = lkp.search(19, 140) expect = None eq_(expect, actual) actual = lkp.search(0) expect = None eq_(expect, actual) actual = lkp.search(1) expect = 'foo' eq_(expect, actual) actual = lkp.search(2) expect = 'foo' eq_(expect, actual) actual = lkp.search(4) expect = 'bar' eq_(expect, actual) actual = lkp.search(5) expect = 'bar' eq_(expect, actual) actual = lkp.search(8) expect = 'baz' eq_(expect, actual) def test_facetintervallookup(): table = (('type', 'start', 'stop', 'value'), ('apple', 1, 4, 'foo'), ('apple', 3, 7, 'bar'), ('orange', 4, 9, 'baz')) lkp = facetintervallookup(table, key='type', start='start', stop='stop') actual = lkp['apple'].search(0, 1) expect = [] eq_(expect, actual) actual = lkp['apple'].search(1, 2) expect = [('apple', 1, 4, 'foo')] eq_(expect, actual) actual = lkp['apple'].search(2, 4) expect = [('apple', 1, 4, 'foo'), ('apple', 3, 7, 'bar')] eq_(expect, actual) actual = lkp['apple'].search(2, 5) expect = [('apple', 1, 4, 'foo'), ('apple', 3, 7, 'bar')] eq_(expect, actual) actual = lkp['orange'].search(2, 5) expect = [('orange', 4, 9, 'baz')] eq_(expect, actual) actual = lkp['orange'].search(9, 14) expect = [] eq_(expect, actual) actual = lkp['orange'].search(19, 140) expect = [] eq_(expect, actual) actual = lkp['apple'].search(0) expect = [] eq_(expect, actual) actual = lkp['apple'].search(1) expect = [('apple', 1, 4, 'foo')] eq_(expect, actual) actual = lkp['apple'].search(2) expect = [('apple', 1, 4, 'foo')] eq_(expect, actual) actual = lkp['apple'].search(4) expect = [('apple', 3, 7, 'bar')] eq_(expect, actual) actual = lkp['apple'].search(5) expect = [('apple', 3, 7, 'bar')] eq_(expect, actual) actual = lkp['orange'].search(5) expect = [('orange', 4, 9, 'baz')] eq_(expect, actual) def test_facetintervallookupone(): table = (('type', 'start', 'stop', 'value'), ('apple', 1, 4, 'foo'), ('apple', 3, 7, 'bar'), ('orange', 4, 9, 'baz')) lkp = facetintervallookupone(table, key='type', start='start', stop='stop', value='value') actual = lkp['apple'].search(0, 1) expect = None eq_(expect, actual) actual = lkp['apple'].search(1, 2) expect = 'foo' eq_(expect, actual) try: lkp['apple'].search(2, 4) except DuplicateKeyError: pass else: assert False, 'expected error' try: lkp['apple'].search(2, 5) except DuplicateKeyError: pass else: assert False, 'expected error' actual = lkp['apple'].search(4, 5) expect = 'bar' eq_(expect, actual) actual = lkp['orange'].search(4, 5) expect = 'baz' eq_(expect, actual) actual = lkp['apple'].search(5, 7) expect = 'bar' eq_(expect, actual) actual = lkp['orange'].search(5, 7) expect = 'baz' eq_(expect, actual) actual = lkp['apple'].search(8, 9) expect = None eq_(expect, actual) actual = lkp['orange'].search(8, 9) expect = 'baz' eq_(expect, actual) actual = lkp['orange'].search(9, 14) expect = None eq_(expect, actual) actual = lkp['orange'].search(19, 140) expect = None eq_(expect, actual) actual = lkp['apple'].search(0) expect = None eq_(expect, actual) actual = lkp['apple'].search(1) expect = 'foo' eq_(expect, actual) actual = lkp['apple'].search(2) expect = 'foo' eq_(expect, actual) actual = lkp['apple'].search(4) expect = 'bar' eq_(expect, actual) actual = lkp['apple'].search(5) expect = 'bar' eq_(expect, actual) actual = lkp['orange'].search(5) expect = 'baz' eq_(expect, actual) actual = lkp['apple'].search(8) expect = None eq_(expect, actual) actual = lkp['orange'].search(8) expect = 'baz' eq_(expect, actual) def test_facetintervallookup_compound(): table = (('type', 'variety', 'start', 'stop', 'value'), ('apple', 'cox', 1, 4, 'foo'), ('apple', 'fuji', 3, 7, 'bar'), ('orange', 'mandarin', 4, 9, 'baz')) lkp = facetintervallookup(table, key=('type', 'variety'), start='start', stop='stop') actual = lkp['apple', 'cox'].search(1, 2) expect = [('apple', 'cox', 1, 4, 'foo')] eq_(expect, actual) actual = lkp['apple', 'cox'].search(2, 4) expect = [('apple', 'cox', 1, 4, 'foo')] eq_(expect, actual) def test_intervaljoin(): left = (('begin', 'end', 'quux'), (1, 2, 'a'), (2, 4, 'b'), (2, 5, 'c'), (9, 14, 'd'), (9, 140, 'e'), (1, 1, 'f'), (2, 2, 'g'), (4, 4, 'h'), (5, 5, 'i'), (1, 8, 'j')) right = (('start', 'stop', 'value'), (1, 4, 'foo'), (3, 7, 'bar'), (4, 9, 'baz')) actual = intervaljoin(left, right, lstart='begin', lstop='end', rstart='start', rstop='stop') expect = (('begin', 'end', 'quux', 'start', 'stop', 'value'), (1, 2, 'a', 1, 4, 'foo'), (2, 4, 'b', 1, 4, 'foo'), (2, 4, 'b', 3, 7, 'bar'), (2, 5, 'c', 1, 4, 'foo'), (2, 5, 'c', 3, 7, 'bar'), (2, 5, 'c', 4, 9, 'baz'), (1, 8, 'j', 1, 4, 'foo'), (1, 8, 'j', 3, 7, 'bar'), (1, 8, 'j', 4, 9, 'baz')) debug(lookall(actual)) ieq(expect, actual) ieq(expect, actual) def test_intervaljoin_include_stop(): left = (('begin', 'end', 'quux'), (1, 2, 'a'), (2, 4, 'b'), (2, 5, 'c'), (9, 14, 'd'), (9, 140, 'e'), (1, 1, 'f'), (2, 2, 'g'), (4, 4, 'h'), (5, 5, 'i'), (1, 8, 'j')) right = (('start', 'stop', 'value'), (1, 4, 'foo'), (3, 7, 'bar'), (4, 9, 'baz')) actual = intervaljoin(left, right, lstart='begin', lstop='end', rstart='start', rstop='stop', include_stop=True) expect = (('begin', 'end', 'quux', 'start', 'stop', 'value'), (1, 2, 'a', 1, 4, 'foo'), (2, 4, 'b', 1, 4, 'foo'), (2, 4, 'b', 3, 7, 'bar'), (2, 4, 'b', 4, 9, 'baz'), (2, 5, 'c', 1, 4, 'foo'), (2, 5, 'c', 3, 7, 'bar'), (2, 5, 'c', 4, 9, 'baz'), (9, 14, 'd', 4, 9, 'baz'), (9, 140, 'e', 4, 9, 'baz'), (1, 1, 'f', 1, 4, 'foo'), (2, 2, 'g', 1, 4, 'foo'), (4, 4, 'h', 1, 4, 'foo'), (4, 4, 'h', 3, 7, 'bar'), (4, 4, 'h', 4, 9, 'baz'), (5, 5, 'i', 3, 7, 'bar'), (5, 5, 'i', 4, 9, 'baz'), (1, 8, 'j', 1, 4, 'foo'), (1, 8, 'j', 3, 7, 'bar'), (1, 8, 'j', 4, 9, 'baz')) ieq(expect, actual) ieq(expect, actual) def test_intervaljoin_prefixes(): left = (('begin', 'end', 'quux'), (1, 2, 'a'), (2, 4, 'b'), (2, 5, 'c'), (9, 14, 'd'), (9, 140, 'e'), (1, 1, 'f'), (2, 2, 'g'), (4, 4, 'h'), (5, 5, 'i'), (1, 8, 'j')) right = (('start', 'stop', 'value'), (1, 4, 'foo'), (3, 7, 'bar'), (4, 9, 'baz')) actual = intervaljoin(left, right, lstart='begin', lstop='end', rstart='start', rstop='stop', lprefix='l_', rprefix='r_') expect = (('l_begin', 'l_end', 'l_quux', 'r_start', 'r_stop', 'r_value'), (1, 2, 'a', 1, 4, 'foo'), (2, 4, 'b', 1, 4, 'foo'), (2, 4, 'b', 3, 7, 'bar'), (2, 5, 'c', 1, 4, 'foo'), (2, 5, 'c', 3, 7, 'bar'), (2, 5, 'c', 4, 9, 'baz'), (1, 8, 'j', 1, 4, 'foo'), (1, 8, 'j', 3, 7, 'bar'), (1, 8, 'j', 4, 9, 'baz')) ieq(expect, actual) ieq(expect, actual) def test_intervalleftjoin(): left = (('begin', 'end', 'quux'), (1, 2, 'a'), (2, 4, 'b'), (2, 5, 'c'), (9, 14, 'd'), (9, 140, 'e'), (1, 1, 'f'), (2, 2, 'g'), (4, 4, 'h'), (5, 5, 'i'), (1, 8, 'j')) right = (('start', 'stop', 'value'), (1, 4, 'foo'), (3, 7, 'bar'), (4, 9, 'baz')) actual = intervalleftjoin(left, right, lstart='begin', lstop='end', rstart='start', rstop='stop') expect = (('begin', 'end', 'quux', 'start', 'stop', 'value'), (1, 2, 'a', 1, 4, 'foo'), (2, 4, 'b', 1, 4, 'foo'), (2, 4, 'b', 3, 7, 'bar'), (2, 5, 'c', 1, 4, 'foo'), (2, 5, 'c', 3, 7, 'bar'), (2, 5, 'c', 4, 9, 'baz'), (9, 14, 'd', None, None, None), (9, 140, 'e', None, None, None), (1, 1, 'f', None, None, None), (2, 2, 'g', None, None, None), (4, 4, 'h', None, None, None), (5, 5, 'i', None, None, None), (1, 8, 'j', 1, 4, 'foo'), (1, 8, 'j', 3, 7, 'bar'), (1, 8, 'j', 4, 9, 'baz')) ieq(expect, actual) ieq(expect, actual) def test_intervaljoin_faceted(): left = (('fruit', 'begin', 'end'), ('apple', 1, 2), ('apple', 2, 4), ('apple', 2, 5), ('orange', 2, 5), ('orange', 9, 14), ('orange', 19, 140), ('apple', 1, 1), ('apple', 2, 2), ('apple', 4, 4), ('apple', 5, 5), ('orange', 5, 5)) right = (('type', 'start', 'stop', 'value'), ('apple', 1, 4, 'foo'), ('apple', 3, 7, 'bar'), ('orange', 4, 9, 'baz')) expect = (('fruit', 'begin', 'end', 'type', 'start', 'stop', 'value'), ('apple', 1, 2, 'apple', 1, 4, 'foo'), ('apple', 2, 4, 'apple', 1, 4, 'foo'), ('apple', 2, 4, 'apple', 3, 7, 'bar'), ('apple', 2, 5, 'apple', 1, 4, 'foo'), ('apple', 2, 5, 'apple', 3, 7, 'bar'), ('orange', 2, 5, 'orange', 4, 9, 'baz')) actual = intervaljoin(left, right, lstart='begin', lstop='end', rstart='start', rstop='stop', lkey='fruit', rkey='type') ieq(expect, actual) ieq(expect, actual) def test_intervalleftjoin_faceted(): left = (('fruit', 'begin', 'end'), ('apple', 1, 2), ('apple', 2, 4), ('apple', 2, 5), ('orange', 2, 5), ('orange', 9, 14), ('orange', 19, 140), ('apple', 1, 1), ('apple', 2, 2), ('apple', 4, 4), ('apple', 5, 5), ('orange', 5, 5)) right = (('type', 'start', 'stop', 'value'), ('apple', 1, 4, 'foo'), ('apple', 3, 7, 'bar'), ('orange', 4, 9, 'baz')) expect = (('fruit', 'begin', 'end', 'type', 'start', 'stop', 'value'), ('apple', 1, 2, 'apple', 1, 4, 'foo'), ('apple', 2, 4, 'apple', 1, 4, 'foo'), ('apple', 2, 4, 'apple', 3, 7, 'bar'), ('apple', 2, 5, 'apple', 1, 4, 'foo'), ('apple', 2, 5, 'apple', 3, 7, 'bar'), ('orange', 2, 5, 'orange', 4, 9, 'baz'), ('orange', 9, 14, None, None, None, None), ('orange', 19, 140, None, None, None, None), ('apple', 1, 1, None, None, None, None), ('apple', 2, 2, None, None, None, None), ('apple', 4, 4, None, None, None, None), ('apple', 5, 5, None, None, None, None), ('orange', 5, 5, None, None, None, None)) actual = intervalleftjoin(left, right, lstart='begin', lstop='end', rstart='start', rstop='stop', lkey='fruit', rkey='type') ieq(expect, actual) ieq(expect, actual) def test_intervalleftjoin_faceted_rkeymissing(): left = (('fruit', 'begin', 'end'), ('apple', 1, 2), ('orange', 5, 5)) right = (('type', 'start', 'stop', 'value'), ('apple', 1, 4, 'foo')) expect = (('fruit', 'begin', 'end', 'type', 'start', 'stop', 'value'), ('apple', 1, 2, 'apple', 1, 4, 'foo'), ('orange', 5, 5, None, None, None, None)) actual = intervalleftjoin(left, right, lstart='begin', lstop='end', rstart='start', rstop='stop', lkey='fruit', rkey='type') ieq(expect, actual) ieq(expect, actual) def test_intervaljoins_faceted_compound(): left = (('fruit', 'sort', 'begin', 'end'), ('apple', 'cox', 1, 2), ('apple', 'fuji', 2, 4)) right = (('type', 'variety', 'start', 'stop', 'value'), ('apple', 'cox', 1, 4, 'foo'), ('apple', 'fuji', 3, 7, 'bar'), ('orange', 'mandarin', 4, 9, 'baz')) expect = (('fruit', 'sort', 'begin', 'end', 'type', 'variety', 'start', 'stop', 'value'), ('apple', 'cox', 1, 2, 'apple', 'cox', 1, 4, 'foo'), ('apple', 'fuji', 2, 4, 'apple', 'fuji', 3, 7, 'bar')) actual = intervaljoin(left, right, lstart='begin', lstop='end', rstart='start', rstop='stop', lkey=('fruit', 'sort'), rkey=('type', 'variety')) ieq(expect, actual) ieq(expect, actual) actual = intervalleftjoin(left, right, lstart='begin', lstop='end', rstart='start', rstop='stop', lkey=('fruit', 'sort'), rkey=('type', 'variety')) ieq(expect, actual) ieq(expect, actual) def test_intervalleftjoin_prefixes(): left = (('begin', 'end', 'quux'), (1, 2, 'a'), (2, 4, 'b'), (2, 5, 'c'), (9, 14, 'd'), (9, 140, 'e'), (1, 1, 'f'), (2, 2, 'g'), (4, 4, 'h'), (5, 5, 'i'), (1, 8, 'j')) right = (('start', 'stop', 'value'), (1, 4, 'foo'), (3, 7, 'bar'), (4, 9, 'baz')) actual = intervalleftjoin(left, right, lstart='begin', lstop='end', rstart='start', rstop='stop', lprefix='l_', rprefix='r_') expect = (('l_begin', 'l_end', 'l_quux', 'r_start', 'r_stop', 'r_value'), (1, 2, 'a', 1, 4, 'foo'), (2, 4, 'b', 1, 4, 'foo'), (2, 4, 'b', 3, 7, 'bar'), (2, 5, 'c', 1, 4, 'foo'), (2, 5, 'c', 3, 7, 'bar'), (2, 5, 'c', 4, 9, 'baz'), (9, 14, 'd', None, None, None), (9, 140, 'e', None, None, None), (1, 1, 'f', None, None, None), (2, 2, 'g', None, None, None), (4, 4, 'h', None, None, None), (5, 5, 'i', None, None, None), (1, 8, 'j', 1, 4, 'foo'), (1, 8, 'j', 3, 7, 'bar'), (1, 8, 'j', 4, 9, 'baz')) ieq(expect, actual) ieq(expect, actual) def test_intervalantijoin(): left = (('begin', 'end', 'quux'), (1, 2, 'a'), (2, 4, 'b'), (2, 5, 'c'), (9, 14, 'd'), (9, 140, 'e'), (1, 1, 'f'), (2, 2, 'g'), (4, 4, 'h'), (5, 5, 'i'), (1, 8, 'j')) right = (('start', 'stop', 'value'), (1, 4, 'foo'), (3, 7, 'bar'), (4, 9, 'baz')) actual = intervalantijoin(left, right, lstart='begin', lstop='end', rstart='start', rstop='stop') expect = (('begin', 'end', 'quux'), (9, 14, 'd'), (9, 140, 'e'), (1, 1, 'f'), (2, 2, 'g'), (4, 4, 'h'), (5, 5, 'i')) debug(lookall(actual)) ieq(expect, actual) ieq(expect, actual) def test_intervalantijoin_include_stop(): left = (('begin', 'end', 'quux'), (1, 2, 'a'), (2, 4, 'b'), (2, 5, 'c'), (9, 14, 'd'), (9, 140, 'e'), (10, 140, 'e'), (1, 1, 'f'), (2, 2, 'g'), (4, 4, 'h'), (5, 5, 'i'), (1, 8, 'j')) right = (('start', 'stop', 'value'), (1, 4, 'foo'), (3, 7, 'bar'), (4, 9, 'baz')) actual = intervalantijoin(left, right, lstart='begin', lstop='end', rstart='start', rstop='stop', include_stop=True) expect = (('begin', 'end', 'quux'), (10, 140, 'e')) debug(lookall(actual)) ieq(expect, actual) ieq(expect, actual) def test_intervalantijoin_faceted(): left = (('fruit', 'begin', 'end'), ('apple', 1, 2), ('apple', 2, 4), ('apple', 2, 5), ('orange', 2, 5), ('orange', 9, 14), ('orange', 19, 140), ('apple', 1, 1), ('apple', 2, 2), ('apple', 4, 4), ('apple', 5, 5), ('orange', 5, 5)) right = (('type', 'start', 'stop', 'value'), ('apple', 1, 4, 'foo'), ('apple', 3, 7, 'bar'), ('orange', 4, 9, 'baz')) expect = (('fruit', 'begin', 'end'), ('orange', 9, 14), ('orange', 19, 140), ('apple', 1, 1), ('apple', 2, 2), ('apple', 4, 4), ('apple', 5, 5), ('orange', 5, 5)) actual = intervalantijoin(left, right, lstart='begin', lstop='end', rstart='start', rstop='stop', lkey='fruit', rkey='type') ieq(expect, actual) ieq(expect, actual) def test_intervaljoinvalues_faceted(): left = (('fruit', 'begin', 'end'), ('apple', 1, 2), ('apple', 2, 4), ('apple', 2, 5), ('orange', 2, 5), ('orange', 9, 14), ('orange', 19, 140), ('apple', 1, 1), ('apple', 2, 2), ('apple', 4, 4), ('apple', 5, 5), ('orange', 5, 5)) right = (('type', 'start', 'stop', 'value'), ('apple', 1, 4, 'foo'), ('apple', 3, 7, 'bar'), ('orange', 4, 9, 'baz')) expect = (('fruit', 'begin', 'end', 'value'), ('apple', 1, 2, ['foo']), ('apple', 2, 4, ['foo', 'bar']), ('apple', 2, 5, ['foo', 'bar']), ('orange', 2, 5, ['baz']), ('orange', 9, 14, []), ('orange', 19, 140, []), ('apple', 1, 1, []), ('apple', 2, 2, []), ('apple', 4, 4, []), ('apple', 5, 5, []), ('orange', 5, 5, [])) actual = intervaljoinvalues(left, right, lstart='begin', lstop='end', rstart='start', rstop='stop', lkey='fruit', rkey='type', value='value') ieq(expect, actual) ieq(expect, actual) def test_subtract_1(): left = (('begin', 'end', 'label'), (1, 6, 'apple'), (3, 6, 'orange'), (5, 9, 'banana')) right = (('start', 'stop', 'foo'), (3, 4, True)) expect = (('begin', 'end', 'label'), (1, 3, 'apple'), (4, 6, 'apple'), (4, 6, 'orange'), (5, 9, 'banana')) actual = intervalsubtract(left, right, lstart='begin', lstop='end', rstart='start', rstop='stop') ieq(expect, actual) ieq(expect, actual) def test_subtract_2(): left = (('begin', 'end', 'label'), (1, 6, 'apple'), (3, 6, 'orange'), (5, 9, 'banana')) right = (('start', 'stop', 'foo'), (3, 4, True), (5, 6, True)) expect = (('begin', 'end', 'label'), (1, 3, 'apple'), (4, 5, 'apple'), (4, 5, 'orange'), (6, 9, 'banana')) actual = intervalsubtract(left, right, lstart='begin', lstop='end', rstart='start', rstop='stop') ieq(expect, actual) ieq(expect, actual) def test_subtract_faceted(): left = (('region', 'begin', 'end', 'label'), ('north', 1, 6, 'apple'), ('south', 3, 6, 'orange'), ('west', 5, 9, 'banana')) right = (('place', 'start', 'stop', 'foo'), ('south', 3, 4, True), ('north', 5, 6, True)) expect = (('region', 'begin', 'end', 'label'), ('north', 1, 5, 'apple'), ('south', 4, 6, 'orange'), ('west', 5, 9, 'banana')) actual = intervalsubtract(left, right, lkey='region', rkey='place', lstart='begin', lstop='end', rstart='start', rstop='stop') ieq(expect, actual) ieq(expect, actual) def test_collapse(): # no facet key tbl = (('begin', 'end', 'label'), (1, 6, 'apple'), (3, 6, 'orange'), (5, 9, 'banana'), (12, 14, 'banana'), (13, 17, 'kiwi')) expect = [_Interval(1, 9), _Interval(12, 17)] actual = collapsedintervals(tbl, start='begin', stop='end') ieq(expect, actual) # faceted tbl = (('region', 'begin', 'end', 'label'), ('north', 1, 6, 'apple'), ('north', 3, 6, 'orange'), ('north', 5, 9, 'banana'), ('south', 12, 14, 'banana'), ('south', 13, 17, 'kiwi')) expect = [('north', 1, 9), ('south', 12, 17)] actual = collapsedintervals(tbl, start='begin', stop='end', key='region') ieq(expect, actual) def test_integration(): left = etl.wrap((('begin', 'end', 'quux'), (1, 2, 'a'), (2, 4, 'b'), (2, 5, 'c'), (9, 14, 'd'), (9, 140, 'e'), (1, 1, 'f'), (2, 2, 'g'), (4, 4, 'h'), (5, 5, 'i'), (1, 8, 'j'))) right = etl.wrap((('start', 'stop', 'value'), (1, 4, 'foo'), (3, 7, 'bar'), (4, 9, 'baz'))) actual = left.intervaljoin(right, lstart='begin', lstop='end', rstart='start', rstop='stop') expect = (('begin', 'end', 'quux', 'start', 'stop', 'value'), (1, 2, 'a', 1, 4, 'foo'), (2, 4, 'b', 1, 4, 'foo'), (2, 4, 'b', 3, 7, 'bar'), (2, 5, 'c', 1, 4, 'foo'), (2, 5, 'c', 3, 7, 'bar'), (2, 5, 'c', 4, 9, 'baz'), (1, 8, 'j', 1, 4, 'foo'), (1, 8, 'j', 3, 7, 'bar'), (1, 8, 'j', 4, 9, 'baz')) ieq(expect, actual) ieq(expect, actual) petl-1.7.15/petl/test/transform/test_joins.py000066400000000000000000001247221457414240700212620ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division from petl.test.helpers import ieq from petl import join, leftjoin, rightjoin, outerjoin, crossjoin, antijoin, \ lookupjoin, hashjoin, hashleftjoin, hashrightjoin, hashantijoin, \ hashlookupjoin, unjoin, sort, cut def _test_join_basic(join_impl): table1 = (('id', 'colour'), (1, 'blue'), (2, 'red'), (3, 'purple')) table2 = (('id', 'shape'), (1, 'circle'), (3, 'square'), (4, 'ellipse')) # normal inner join table3 = join_impl(table1, table2, key='id') expect3 = (('id', 'colour', 'shape'), (1, 'blue', 'circle'), (3, 'purple', 'square')) ieq(expect3, table3) ieq(expect3, table3) # check twice # natural join table4 = join_impl(table1, table2) expect4 = expect3 ieq(expect4, table4) ieq(expect4, table4) # check twice # multiple rows for each key table5 = (('id', 'colour'), (1, 'blue'), (1, 'red'), (2, 'purple')) table6 = (('id', 'shape'), (1, 'circle'), (1, 'square'), (2, 'ellipse')) table7 = join_impl(table5, table6, key='id') expect7 = (('id', 'colour', 'shape'), (1, 'blue', 'circle'), (1, 'blue', 'square'), (1, 'red', 'circle'), (1, 'red', 'square'), (2, 'purple', 'ellipse')) ieq(expect7, table7) def _test_join_compound_keys(join_impl): # compound keys table8 = (('id', 'time', 'height'), (1, 1, 12.3), (1, 2, 34.5), (2, 1, 56.7)) table9 = (('id', 'time', 'weight'), (1, 2, 4.5), (2, 1, 6.7), (2, 2, 8.9)) table10 = join_impl(table8, table9, key=['id', 'time']) expect10 = (('id', 'time', 'height', 'weight'), (1, 2, 34.5, 4.5), (2, 1, 56.7, 6.7)) ieq(expect10, table10) # natural join on compound key table11 = join_impl(table8, table9) expect11 = expect10 ieq(expect11, table11) def _test_join_string_key(join_impl): table1 = (('id', 'colour'), ('aa', 'blue'), ('bb', 'red'), ('cc', 'purple')) table2 = (('id', 'shape'), ('aa', 'circle'), ('cc', 'square'), ('dd', 'ellipse')) # normal inner join table3 = join_impl(table1, table2, key='id') expect3 = (('id', 'colour', 'shape'), ('aa', 'blue', 'circle'), ('cc', 'purple', 'square')) ieq(expect3, table3) ieq(expect3, table3) # check twice def _test_join_empty(join_impl): table1 = (('id', 'colour'), (1, 'blue'), (2, 'red'), (3, 'purple')) table2 = (('id', 'shape'),) table3 = join_impl(table1, table2, key='id') expect3 = (('id', 'colour', 'shape'),) ieq(expect3, table3) table1 = (('id', 'colour'),) table2 = (('id', 'shape'), (1, 'circle'), (3, 'square'), (4, 'ellipse')) table3 = join_impl(table1, table2, key='id') expect3 = (('id', 'colour', 'shape'),) ieq(expect3, table3) def _test_join_novaluefield(join_impl): table1 = (('id', 'colour'), (1, 'blue'), (2, 'red'), (3, 'purple')) table2 = (('id', 'shape'), (1, 'circle'), (3, 'square'), (4, 'ellipse')) expect = (('id', 'colour', 'shape'), (1, 'blue', 'circle'), (3, 'purple', 'square')) actual = join_impl(table1, table2, key='id') ieq(expect, actual) actual = join_impl(cut(table1, 'id'), table2, key='id') ieq(cut(expect, 'id', 'shape'), actual) actual = join_impl(table1, cut(table2, 'id'), key='id') ieq(cut(expect, 'id', 'colour'), actual) actual = join_impl(cut(table1, 'id'), cut(table2, 'id'), key='id') ieq(cut(expect, 'id'), actual) def _test_join_prefix(join_impl): table1 = (('id', 'colour'), ('aa', 'blue'), ('bb', 'red'), ('cc', 'purple')) table2 = (('id', 'shape'), ('aa', 'circle'), ('cc', 'square'), ('dd', 'ellipse')) table3 = join_impl(table1, table2, key='id', lprefix='l_', rprefix='r_') expect3 = (('l_id', 'l_colour', 'r_shape'), ('aa', 'blue', 'circle'), ('cc', 'purple', 'square')) ieq(expect3, table3) def _test_join_lrkey(join_impl): table1 = (('id', 'colour'), ('aa', 'blue'), ('bb', 'red'), ('cc', 'purple')) table2 = (('identifier', 'shape'), ('aa', 'circle'), ('cc', 'square'), ('dd', 'ellipse')) table3 = join_impl(table1, table2, lkey='id', rkey='identifier') expect3 = (('id', 'colour', 'shape'), ('aa', 'blue', 'circle'), ('cc', 'purple', 'square')) ieq(expect3, table3) def _test_join_multiple(join_impl): table1 = (('id', 'color', 'cost'), (1, 'blue', 12), (1, 'red', 8), (2, 'yellow', 15), (2, 'orange', 5), (3, 'purple', 4), (4, 'chartreuse', 42)) table2 = (('id', 'shape', 'size'), (1, 'circle', 'big'), (2, 'square', 'tiny'), (2, 'square', 'big'), (3, 'ellipse', 'small'), (3, 'ellipse', 'tiny'), (5, 'didodecahedron', 3.14159265)) actual = join_impl(table1, table2, key='id') expect = (('id', 'color', 'cost', 'shape', 'size'), (1, 'blue', 12, 'circle', 'big'), (1, 'red', 8, 'circle', 'big'), (2, 'yellow', 15, 'square', 'tiny'), (2, 'yellow', 15, 'square', 'big'), (2, 'orange', 5, 'square', 'tiny'), (2, 'orange', 5, 'square', 'big'), (3, 'purple', 4, 'ellipse', 'small'), (3, 'purple', 4, 'ellipse', 'tiny')) ieq(expect, actual) def _test_join(join_impl): _test_join_basic(join_impl) _test_join_compound_keys(join_impl) _test_join_string_key(join_impl) _test_join_empty(join_impl) _test_join_novaluefield(join_impl) _test_join_prefix(join_impl) _test_join_lrkey(join_impl) _test_join_multiple(join_impl) def test_join(): _test_join(join) def _test_leftjoin_1(leftjoin_impl): table1 = (('id', 'colour'), (1, 'blue'), (2, 'red'), (3, 'purple'), (5, 'yellow'), (7, 'orange')) table2 = (('id', 'shape'), (1, 'circle'), (3, 'square'), (4, 'ellipse')) table3 = leftjoin_impl(table1, table2, key='id') expect3 = (('id', 'colour', 'shape'), (1, 'blue', 'circle'), (2, 'red', None), (3, 'purple', 'square'), (5, 'yellow', None,), (7, 'orange', None)) ieq(expect3, table3) ieq(expect3, table3) # check twice # natural join table4 = leftjoin_impl(table1, table2) expect4 = expect3 ieq(expect4, table4) def _test_leftjoin_2(leftjoin_impl): table1 = (('id', 'colour'), (1, 'blue'), (2, 'red'), (3, 'purple'), (5, 'yellow'), (7, 'orange')) table2 = (('id', 'shape'), (1, 'circle'), (3, 'square')) table3 = leftjoin_impl(table1, table2, key='id') expect3 = (('id', 'colour', 'shape'), (1, 'blue', 'circle'), (2, 'red', None), (3, 'purple', 'square'), (5, 'yellow', None,), (7, 'orange', None)) ieq(expect3, table3) ieq(expect3, table3) # check twice # natural join table4 = leftjoin_impl(table1, table2) expect4 = expect3 ieq(expect4, table4) def _test_leftjoin_3(leftjoin_impl): table1 = (('id', 'colour'), (1, 'blue'), (2, 'red'), (3, 'purple')) table2 = (('id', 'shape'), (1, 'circle'), (3, 'square'), (4, 'ellipse'), (5, 'triangle')) table3 = leftjoin_impl(table1, table2, key='id') expect3 = (('id', 'colour', 'shape'), (1, 'blue', 'circle'), (2, 'red', None), (3, 'purple', 'square')) ieq(expect3, table3) ieq(expect3, table3) # check twice # natural join table4 = leftjoin_impl(table1, table2) expect4 = expect3 ieq(expect4, table4) def _test_leftjoin_compound_keys(leftjoin_impl): # compound keys table5 = (('id', 'time', 'height'), (1, 1, 12.3), (1, 2, 34.5), (2, 1, 56.7)) table6 = (('id', 'time', 'weight', 'bp'), (1, 2, 4.5, 120), (2, 1, 6.7, 110), (2, 2, 8.9, 100)) table7 = leftjoin_impl(table5, table6, key=['id', 'time']) expect7 = (('id', 'time', 'height', 'weight', 'bp'), (1, 1, 12.3, None, None), (1, 2, 34.5, 4.5, 120), (2, 1, 56.7, 6.7, 110)) ieq(expect7, table7) def _test_leftjoin_empty(leftjoin_impl): table1 = (('id', 'colour'), (1, 'blue'), (2, 'red'), (3, 'purple'), (5, 'yellow'), (7, 'orange')) table2 = (('id', 'shape'),) table3 = leftjoin_impl(table1, table2, key='id') expect3 = (('id', 'colour', 'shape'), (1, 'blue', None), (2, 'red', None), (3, 'purple', None), (5, 'yellow', None,), (7, 'orange', None)) ieq(expect3, table3) def _test_leftjoin_novaluefield(leftjoin_impl): table1 = (('id', 'colour'), (1, 'blue'), (2, 'red'), (3, 'purple'), (5, 'yellow'), (7, 'orange')) table2 = (('id', 'shape'), (1, 'circle'), (3, 'square'), (4, 'ellipse')) expect = (('id', 'colour', 'shape'), (1, 'blue', 'circle'), (2, 'red', None), (3, 'purple', 'square'), (5, 'yellow', None,), (7, 'orange', None)) actual = leftjoin_impl(table1, table2, key='id') ieq(expect, actual) actual = leftjoin_impl(cut(table1, 'id'), table2, key='id') ieq(cut(expect, 'id', 'shape'), actual) actual = leftjoin_impl(table1, cut(table2, 'id'), key='id') ieq(cut(expect, 'id', 'colour'), actual) actual = leftjoin_impl(cut(table1, 'id'), cut(table2, 'id'), key='id') ieq(cut(expect, 'id'), actual) def _test_leftjoin_multiple(leftjoin_impl): table1 = (('id', 'color', 'cost'), (1, 'blue', 12), (1, 'red', 8), (2, 'yellow', 15), (2, 'orange', 5), (3, 'purple', 4), (4, 'chartreuse', 42)) table2 = (('id', 'shape', 'size'), (1, 'circle', 'big'), (2, 'square', 'tiny'), (2, 'square', 'big'), (3, 'ellipse', 'small'), (3, 'ellipse', 'tiny'), (5, 'didodecahedron', 3.14159265)) actual = leftjoin_impl(table1, table2, key='id') expect = (('id', 'color', 'cost', 'shape', 'size'), (1, 'blue', 12, 'circle', 'big'), (1, 'red', 8, 'circle', 'big'), (2, 'yellow', 15, 'square', 'tiny'), (2, 'yellow', 15, 'square', 'big'), (2, 'orange', 5, 'square', 'tiny'), (2, 'orange', 5, 'square', 'big'), (3, 'purple', 4, 'ellipse', 'small'), (3, 'purple', 4, 'ellipse', 'tiny'), (4, 'chartreuse', 42, None, None)) ieq(expect, actual) def _test_leftjoin_prefix(leftjoin_impl): table1 = (('id', 'colour'), (1, 'blue'), (2, 'red'), (3, 'purple'), (5, 'yellow'), (7, 'orange')) table2 = (('id', 'shape'), (1, 'circle'), (3, 'square'), (4, 'ellipse')) table3 = leftjoin_impl(table1, table2, key='id', lprefix='l_', rprefix='r_') expect3 = (('l_id', 'l_colour', 'r_shape'), (1, 'blue', 'circle'), (2, 'red', None), (3, 'purple', 'square'), (5, 'yellow', None,), (7, 'orange', None)) ieq(expect3, table3) def _test_leftjoin_lrkey(leftjoin_impl): table1 = (('id', 'colour'), (1, 'blue'), (2, 'red'), (3, 'purple'), (5, 'yellow'), (7, 'orange')) table2 = (('identifier', 'shape'), (1, 'circle'), (3, 'square'), (4, 'ellipse')) table3 = leftjoin_impl(table1, table2, lkey='id', rkey='identifier') expect3 = (('id', 'colour', 'shape'), (1, 'blue', 'circle'), (2, 'red', None), (3, 'purple', 'square'), (5, 'yellow', None,), (7, 'orange', None)) ieq(expect3, table3) def _test_leftjoin(leftjoin_impl): _test_leftjoin_1(leftjoin_impl) _test_leftjoin_2(leftjoin_impl) _test_leftjoin_3(leftjoin_impl) _test_leftjoin_compound_keys(leftjoin_impl) _test_leftjoin_empty(leftjoin_impl) _test_leftjoin_novaluefield(leftjoin_impl) _test_leftjoin_multiple(leftjoin_impl) _test_leftjoin_prefix(leftjoin_impl) _test_leftjoin_lrkey(leftjoin_impl) def test_leftjoin(): _test_leftjoin(leftjoin) def _test_rightjoin_1(rightjoin_impl): table1 = (('id', 'colour'), (1, 'blue'), (2, 'red'), (3, 'purple')) table2 = (('id', 'shape'), (0, 'triangle'), (1, 'circle'), (3, 'square'), (4, 'ellipse'), (5, 'pentagon')) table3 = rightjoin_impl(table1, table2, key='id') expect3 = (('id', 'colour', 'shape'), (0, None, 'triangle'), (1, 'blue', 'circle'), (3, 'purple', 'square'), (4, None, 'ellipse'), (5, None, 'pentagon')) ieq(expect3, table3) ieq(expect3, table3) # check twice # natural join table4 = rightjoin_impl(table1, table2) expect4 = expect3 ieq(expect4, table4) def _test_rightjoin_2(rightjoin_impl): table1 = (('id', 'colour'), (0, 'black'), (1, 'blue'), (2, 'red'), (3, 'purple'), (5, 'yellow'), (7, 'white')) table2 = (('id', 'shape'), (1, 'circle'), (3, 'square'), (4, 'ellipse')) table3 = rightjoin_impl(table1, table2, key='id') expect3 = (('id', 'colour', 'shape'), (1, 'blue', 'circle'), (3, 'purple', 'square'), (4, None, 'ellipse')) ieq(expect3, table3) ieq(expect3, table3) # check twice # natural join table4 = rightjoin_impl(table1, table2) expect4 = expect3 ieq(expect4, table4) def _test_rightjoin_3(rightjoin_impl): table1 = (('id', 'colour'), (1, 'blue'), (2, 'red'), (3, 'purple'), (4, 'orange')) table2 = (('id', 'shape'), (0, 'triangle'), (1, 'circle'), (3, 'square'), (5, 'ellipse'), (7, 'pentagon')) table3 = rightjoin_impl(table1, table2, key='id') expect3 = (('id', 'colour', 'shape'), (0, None, 'triangle'), (1, 'blue', 'circle'), (3, 'purple', 'square'), (5, None, 'ellipse'), (7, None, 'pentagon')) ieq(expect3, table3) ieq(expect3, table3) # check twice # natural join table4 = rightjoin_impl(table1, table2) expect4 = expect3 ieq(expect4, table4) def _test_rightjoin_empty(rightjoin_impl): table1 = (('id', 'colour'),) table2 = (('id', 'shape'), (0, 'triangle'), (1, 'circle'), (3, 'square'), (4, 'ellipse'), (5, 'pentagon')) table3 = rightjoin_impl(table1, table2, key='id') expect3 = (('id', 'colour', 'shape'), (0, None, 'triangle'), (1, None, 'circle'), (3, None, 'square'), (4, None, 'ellipse'), (5, None, 'pentagon')) ieq(expect3, table3) def _test_rightjoin_novaluefield(rightjoin_impl): table1 = (('id', 'colour'), (1, 'blue'), (2, 'red'), (3, 'purple')) table2 = (('id', 'shape'), (0, 'triangle'), (1, 'circle'), (3, 'square'), (4, 'ellipse'), (5, 'pentagon')) expect = (('id', 'colour', 'shape'), (0, None, 'triangle'), (1, 'blue', 'circle'), (3, 'purple', 'square'), (4, None, 'ellipse'), (5, None, 'pentagon')) actual = rightjoin_impl(table1, table2, key='id') ieq(expect, actual) actual = rightjoin_impl(cut(table1, 'id'), table2, key='id') ieq(cut(expect, 'id', 'shape'), actual) actual = rightjoin_impl(table1, cut(table2, 'id'), key='id') ieq(cut(expect, 'id', 'colour'), actual) actual = rightjoin_impl(cut(table1, 'id'), cut(table2, 'id'), key='id') ieq(cut(expect, 'id'), actual) def _test_rightjoin_prefix(rightjoin_impl): table1 = (('id', 'colour'), (1, 'blue'), (2, 'red'), (3, 'purple')) table2 = (('id', 'shape'), (0, 'triangle'), (1, 'circle'), (3, 'square'), (4, 'ellipse'), (5, 'pentagon')) table3 = rightjoin_impl(table1, table2, key='id', lprefix='l_', rprefix='r_') expect3 = (('l_id', 'l_colour', 'r_shape'), (0, None, 'triangle'), (1, 'blue', 'circle'), (3, 'purple', 'square'), (4, None, 'ellipse'), (5, None, 'pentagon')) ieq(expect3, table3) def _test_rightjoin_lrkey(rightjoin_impl): table1 = (('id', 'colour'), (1, 'blue'), (2, 'red'), (3, 'purple')) table2 = (('identifier', 'shape'), (0, 'triangle'), (1, 'circle'), (3, 'square'), (4, 'ellipse'), (5, 'pentagon')) table3 = rightjoin_impl(table1, table2, lkey='id', rkey='identifier') expect3 = (('id', 'colour', 'shape'), (0, None, 'triangle'), (1, 'blue', 'circle'), (3, 'purple', 'square'), (4, None, 'ellipse'), (5, None, 'pentagon')) ieq(expect3, table3) def _test_rightjoin_multiple(rightjoin_impl): table1 = (('id', 'color', 'cost'), (1, 'blue', 12), (1, 'red', 8), (2, 'yellow', 15), (2, 'orange', 5), (3, 'purple', 4), (4, 'chartreuse', 42)) table2 = (('id', 'shape', 'size'), (1, 'circle', 'big'), (2, 'square', 'tiny'), (2, 'square', 'big'), (3, 'ellipse', 'small'), (3, 'ellipse', 'tiny'), (5, 'didodecahedron', 3.14159265)) actual = rightjoin_impl(table1, table2, key='id') expect = (('id', 'color', 'cost', 'shape', 'size'), (1, 'blue', 12, 'circle', 'big'), (1, 'red', 8, 'circle', 'big'), (2, 'yellow', 15, 'square', 'tiny'), (2, 'yellow', 15, 'square', 'big'), (2, 'orange', 5, 'square', 'tiny'), (2, 'orange', 5, 'square', 'big'), (3, 'purple', 4, 'ellipse', 'small'), (3, 'purple', 4, 'ellipse', 'tiny'), (5, None, None, 'didodecahedron', 3.14159265)) # N.B., need to sort because hash and sort implementations will return # rows in a different order ieq(sort(expect), sort(actual)) def _test_rightjoin(rightjoin_impl): _test_rightjoin_1(rightjoin_impl) _test_rightjoin_2(rightjoin_impl) _test_rightjoin_3(rightjoin_impl) _test_rightjoin_empty(rightjoin_impl) _test_rightjoin_novaluefield(rightjoin_impl) _test_rightjoin_prefix(rightjoin_impl) _test_rightjoin_lrkey(rightjoin_impl) _test_rightjoin_multiple(rightjoin_impl) def test_rightjoin(): _test_rightjoin(rightjoin) def test_outerjoin(): table1 = (('id', 'colour'), (0, 'black'), (1, 'blue'), (2, 'red'), (3, 'purple'), (5, 'yellow'), (7, 'white')) table2 = (('id', 'shape'), (1, 'circle'), (3, 'square'), (4, 'ellipse')) table3 = outerjoin(table1, table2, key='id') expect3 = (('id', 'colour', 'shape'), (0, 'black', None), (1, 'blue', 'circle'), (2, 'red', None), (3, 'purple', 'square'), (4, None, 'ellipse'), (5, 'yellow', None), (7, 'white', None)) ieq(expect3, table3) ieq(expect3, table3) # check twice # natural join table4 = outerjoin(table1, table2) expect4 = expect3 ieq(expect4, table4) def test_outerjoin_2(): table1 = (('id', 'colour'), (1, 'blue'), (2, 'red'), (3, 'purple')) table2 = (('id', 'shape'), (0, 'pentagon'), (1, 'circle'), (3, 'square'), (4, 'ellipse'), (5, 'triangle')) table3 = outerjoin(table1, table2, key='id') expect3 = (('id', 'colour', 'shape'), (0, None, 'pentagon'), (1, 'blue', 'circle'), (2, 'red', None), (3, 'purple', 'square'), (4, None, 'ellipse'), (5, None, 'triangle')) ieq(expect3, table3) ieq(expect3, table3) # check twice # natural join table4 = outerjoin(table1, table2) expect4 = expect3 ieq(expect4, table4) def test_outerjoin_fieldorder(): table1 = (('colour', 'id'), ('blue', 1), ('red', 2), ('purple', 3)) table2 = (('id', 'shape'), (1, 'circle'), (3, 'square'), (4, 'ellipse')) table3 = outerjoin(table1, table2, key='id') expect3 = (('colour', 'id', 'shape'), ('blue', 1, 'circle'), ('red', 2, None), ('purple', 3, 'square'), (None, 4, 'ellipse')) ieq(expect3, table3) ieq(expect3, table3) # check twice def test_outerjoin_empty(): table1 = (('id', 'colour'), (0, 'black'), (1, 'blue'), (2, 'red'), (3, 'purple'), (5, 'yellow'), (7, 'white')) table2 = (('id', 'shape'),) table3 = outerjoin(table1, table2, key='id') expect3 = (('id', 'colour', 'shape'), (0, 'black', None), (1, 'blue', None), (2, 'red', None), (3, 'purple', None), (5, 'yellow', None), (7, 'white', None)) ieq(expect3, table3) def test_outerjoin_novaluefield(): table1 = (('id', 'colour'), (0, 'black'), (1, 'blue'), (2, 'red'), (3, 'purple'), (5, 'yellow'), (7, 'white')) table2 = (('id', 'shape'), (1, 'circle'), (3, 'square'), (4, 'ellipse')) expect = (('id', 'colour', 'shape'), (0, 'black', None), (1, 'blue', 'circle'), (2, 'red', None), (3, 'purple', 'square'), (4, None, 'ellipse'), (5, 'yellow', None), (7, 'white', None)) actual = outerjoin(table1, table2, key='id') ieq(expect, actual) actual = outerjoin(cut(table1, 'id'), table2, key='id') ieq(cut(expect, 'id', 'shape'), actual) actual = outerjoin(table1, cut(table2, 'id'), key='id') ieq(cut(expect, 'id', 'colour'), actual) actual = outerjoin(cut(table1, 'id'), cut(table2, 'id'), key='id') ieq(cut(expect, 'id'), actual) def test_outerjoin_prefix(): table1 = (('id', 'colour'), (0, 'black'), (1, 'blue'), (2, 'red'), (3, 'purple'), (5, 'yellow'), (7, 'white')) table2 = (('id', 'shape'), (1, 'circle'), (3, 'square'), (4, 'ellipse')) table3 = outerjoin(table1, table2, key='id', lprefix='l_', rprefix='r_') expect3 = (('l_id', 'l_colour', 'r_shape'), (0, 'black', None), (1, 'blue', 'circle'), (2, 'red', None), (3, 'purple', 'square'), (4, None, 'ellipse'), (5, 'yellow', None), (7, 'white', None)) ieq(expect3, table3) ieq(expect3, table3) # check twice def test_outerjoin_lrkey(): table1 = (('id', 'colour'), (0, 'black'), (1, 'blue'), (2, 'red'), (3, 'purple'), (5, 'yellow'), (7, 'white')) table2 = (('identifier', 'shape'), (1, 'circle'), (3, 'square'), (4, 'ellipse')) table3 = outerjoin(table1, table2, lkey='id', rkey='identifier') expect3 = (('id', 'colour', 'shape'), (0, 'black', None), (1, 'blue', 'circle'), (2, 'red', None), (3, 'purple', 'square'), (4, None, 'ellipse'), (5, 'yellow', None), (7, 'white', None)) ieq(expect3, table3) ieq(expect3, table3) # check twice def test_outerjoin_multiple(): table1 = (('id', 'color', 'cost'), (1, 'blue', 12), (1, 'red', 8), (2, 'yellow', 15), (2, 'orange', 5), (3, 'purple', 4), (4, 'chartreuse', 42)) table2 = (('id', 'shape', 'size'), (1, 'circle', 'big'), (2, 'square', 'tiny'), (2, 'square', 'big'), (3, 'ellipse', 'small'), (3, 'ellipse', 'tiny'), (5, 'didodecahedron', 3.14159265)) actual = outerjoin(table1, table2, key='id') expect = (('id', 'color', 'cost', 'shape', 'size'), (1, 'blue', 12, 'circle', 'big'), (1, 'red', 8, 'circle', 'big'), (2, 'yellow', 15, 'square', 'tiny'), (2, 'yellow', 15, 'square', 'big'), (2, 'orange', 5, 'square', 'tiny'), (2, 'orange', 5, 'square', 'big'), (3, 'purple', 4, 'ellipse', 'small'), (3, 'purple', 4, 'ellipse', 'tiny'), (4, 'chartreuse', 42, None, None), (5, None, None, 'didodecahedron', 3.14159265)) ieq(expect, actual) def test_crossjoin(): table1 = (('id', 'colour'), (1, 'blue'), (2, 'red')) table2 = (('id', 'shape'), (1, 'circle'), (3, 'square')) table3 = crossjoin(table1, table2) expect3 = (('id', 'colour', 'id', 'shape'), (1, 'blue', 1, 'circle'), (1, 'blue', 3, 'square'), (2, 'red', 1, 'circle'), (2, 'red', 3, 'square')) ieq(expect3, table3) def test_crossjoin_empty(): table1 = (('id', 'colour'), (1, 'blue'), (2, 'red')) table2 = (('id', 'shape'),) table3 = crossjoin(table1, table2) expect3 = (('id', 'colour', 'id', 'shape'),) ieq(expect3, table3) def test_crossjoin_novaluefield(): table1 = (('id', 'colour'), (1, 'blue'), (2, 'red')) table2 = (('id', 'shape'), (1, 'circle'), (3, 'square')) expect = (('id', 'colour', 'id', 'shape'), (1, 'blue', 1, 'circle'), (1, 'blue', 3, 'square'), (2, 'red', 1, 'circle'), (2, 'red', 3, 'square')) actual = crossjoin(table1, table2, key='id') ieq(expect, actual) actual = crossjoin(cut(table1, 'id'), table2, key='id') ieq(cut(expect, 0, 2, 'shape'), actual) actual = crossjoin(table1, cut(table2, 'id'), key='id') ieq(cut(expect, 0, 'colour', 2), actual) actual = crossjoin(cut(table1, 'id'), cut(table2, 'id'), key='id') ieq(cut(expect, 0, 2), actual) def test_crossjoin_prefix(): table1 = (('id', 'colour'), (1, 'blue'), (2, 'red')) table2 = (('id', 'shape'), (1, 'circle'), (3, 'square')) table3 = crossjoin(table1, table2, prefix=True) expect3 = (('1_id', '1_colour', '2_id', '2_shape'), (1, 'blue', 1, 'circle'), (1, 'blue', 3, 'square'), (2, 'red', 1, 'circle'), (2, 'red', 3, 'square')) ieq(expect3, table3) def _test_antijoin_basics(antijoin_impl): table1 = (('id', 'colour'), (0, 'black'), (1, 'blue'), (2, 'red'), (4, 'yellow'), (5, 'white')) table2 = (('id', 'shape'), (1, 'circle'), (3, 'square')) table3 = antijoin_impl(table1, table2, key='id') expect3 = (('id', 'colour'), (0, 'black'), (2, 'red'), (4, 'yellow'), (5, 'white')) ieq(expect3, table3) table4 = antijoin_impl(table1, table2) expect4 = expect3 ieq(expect4, table4) def _test_antijoin_empty(antijoin_impl): table1 = (('id', 'colour'), (0, 'black'), (1, 'blue'), (2, 'red'), (4, 'yellow'), (5, 'white')) table2 = (('id', 'shape'),) actual = antijoin_impl(table1, table2, key='id') expect = table1 ieq(expect, actual) def _test_antijoin_novaluefield(antijoin_impl): table1 = (('id', 'colour'), (0, 'black'), (1, 'blue'), (2, 'red'), (4, 'yellow'), (5, 'white')) table2 = (('id', 'shape'), (1, 'circle'), (3, 'square')) expect = (('id', 'colour'), (0, 'black'), (2, 'red'), (4, 'yellow'), (5, 'white')) actual = antijoin_impl(table1, table2, key='id') ieq(expect, actual) actual = antijoin_impl(cut(table1, 'id'), table2, key='id') ieq(cut(expect, 'id'), actual) actual = antijoin_impl(table1, cut(table2, 'id'), key='id') ieq(expect, actual) actual = antijoin_impl(cut(table1, 'id'), cut(table2, 'id'), key='id') ieq(cut(expect, 'id'), actual) def _test_antijoin_lrkey(antijoin_impl): table1 = (('id', 'colour'), (0, 'black'), (1, 'blue'), (2, 'red'), (4, 'yellow'), (5, 'white')) table2 = (('identifier', 'shape'), (1, 'circle'), (3, 'square')) table3 = antijoin_impl(table1, table2, lkey='id', rkey='identifier') expect3 = (('id', 'colour'), (0, 'black'), (2, 'red'), (4, 'yellow'), (5, 'white')) ieq(expect3, table3) def _test_antijoin(antijoin_impl): _test_antijoin_basics(antijoin_impl) _test_antijoin_empty(antijoin_impl) _test_antijoin_novaluefield(antijoin_impl) _test_antijoin_lrkey(antijoin_impl) def test_antijoin(): _test_antijoin(antijoin) def _test_lookupjoin_1(lookupjoin_impl): table1 = (('id', 'color', 'cost'), (1, 'blue', 12), (2, 'red', 8), (3, 'purple', 4)) table2 = (('id', 'shape', 'size'), (1, 'circle', 'big'), (2, 'square', 'tiny'), (3, 'ellipse', 'small')) actual = lookupjoin_impl(table1, table2, key='id') expect = (('id', 'color', 'cost', 'shape', 'size'), (1, 'blue', 12, 'circle', 'big'), (2, 'red', 8, 'square', 'tiny'), (3, 'purple', 4, 'ellipse', 'small')) ieq(expect, actual) ieq(expect, actual) # natural join actual = lookupjoin_impl(table1, table2) expect = (('id', 'color', 'cost', 'shape', 'size'), (1, 'blue', 12, 'circle', 'big'), (2, 'red', 8, 'square', 'tiny'), (3, 'purple', 4, 'ellipse', 'small')) ieq(expect, actual) ieq(expect, actual) def _test_lookupjoin_2(lookupjoin_impl): table1 = (('id', 'color', 'cost'), (1, 'blue', 12), (2, 'red', 8), (3, 'purple', 4)) table2 = (('id', 'shape', 'size'), (1, 'circle', 'big'), (1, 'circle', 'small'), (2, 'square', 'tiny'), (2, 'square', 'big'), (3, 'ellipse', 'small'), (3, 'ellipse', 'tiny')) actual = lookupjoin_impl(table1, table2, key='id') expect = (('id', 'color', 'cost', 'shape', 'size'), (1, 'blue', 12, 'circle', 'big'), (2, 'red', 8, 'square', 'tiny'), (3, 'purple', 4, 'ellipse', 'small')) ieq(expect, actual) ieq(expect, actual) def _test_lookupjoin_prefix(lookupjoin_impl): table1 = (('id', 'color', 'cost'), (1, 'blue', 12), (2, 'red', 8), (3, 'purple', 4)) table2 = (('id', 'shape', 'size'), (1, 'circle', 'big'), (2, 'square', 'tiny'), (3, 'ellipse', 'small')) actual = lookupjoin_impl(table1, table2, key='id', lprefix='l_', rprefix='r_') expect = (('l_id', 'l_color', 'l_cost', 'r_shape', 'r_size'), (1, 'blue', 12, 'circle', 'big'), (2, 'red', 8, 'square', 'tiny'), (3, 'purple', 4, 'ellipse', 'small')) ieq(expect, actual) def _test_lookupjoin_lrkey(lookupjoin_impl): table1 = (('id', 'color', 'cost'), (1, 'blue', 12), (2, 'red', 8), (3, 'purple', 4)) table2 = (('identifier', 'shape', 'size'), (1, 'circle', 'big'), (2, 'square', 'tiny'), (3, 'ellipse', 'small')) actual = lookupjoin_impl(table1, table2, lkey='id', rkey='identifier') expect = (('id', 'color', 'cost', 'shape', 'size'), (1, 'blue', 12, 'circle', 'big'), (2, 'red', 8, 'square', 'tiny'), (3, 'purple', 4, 'ellipse', 'small')) ieq(expect, actual) def _test_lookupjoin_novaluefield(lookupjoin_impl): table1 = (('id', 'color', 'cost'), (1, 'blue', 12), (2, 'red', 8), (3, 'purple', 4)) table2 = (('id', 'shape', 'size'), (1, 'circle', 'big'), (2, 'square', 'tiny'), (3, 'ellipse', 'small')) expect = (('id', 'color', 'cost', 'shape', 'size'), (1, 'blue', 12, 'circle', 'big'), (2, 'red', 8, 'square', 'tiny'), (3, 'purple', 4, 'ellipse', 'small')) actual = lookupjoin_impl(table1, table2, key='id') ieq(expect, actual) actual = lookupjoin_impl(cut(table1, 'id'), table2, key='id') ieq(cut(expect, 'id', 'shape', 'size'), actual) actual = lookupjoin_impl(table1, cut(table2, 'id'), key='id') ieq(cut(expect, 'id', 'color', 'cost'), actual) actual = lookupjoin_impl(cut(table1, 'id'), cut(table2, 'id'), key='id') ieq(cut(expect, 'id'), actual) def _test_lookupjoin(lookupjoin_impl): _test_lookupjoin_1(lookupjoin_impl) _test_lookupjoin_2(lookupjoin_impl) _test_lookupjoin_prefix(lookupjoin_impl) _test_lookupjoin_lrkey(lookupjoin_impl) _test_lookupjoin_novaluefield(lookupjoin_impl) def test_lookupjoin(): _test_lookupjoin(lookupjoin) def test_hashjoin(): _test_join(hashjoin) def test_hashleftjoin(): _test_leftjoin(hashleftjoin) def test_hashrightjoin(): _test_rightjoin(hashrightjoin) def test_hashantijoin(): _test_antijoin(hashantijoin) def test_hashlookupjoin(): _test_lookupjoin(hashlookupjoin) def test_unjoin_implicit_key(): # test the case where the join key needs to be reconstructed table1 = (('foo', 'bar'), (1, 'apple'), (2, 'apple'), (3, 'orange')) expect_left = (('foo', 'bar_id'), (1, 1), (2, 1), (3, 2)) expect_right = (('id', 'bar'), (1, 'apple'), (2, 'orange')) left, right = unjoin(table1, 'bar') ieq(expect_left, left) ieq(expect_left, left) ieq(expect_right, right) ieq(expect_right, right) def test_unjoin_explicit_key(): # test the case where the join key is still present table2 = (('Customer ID', 'First Name', 'Surname', 'Telephone Number'), (123, 'Robert', 'Ingram', '555-861-2025'), (456, 'Jane', 'Wright', '555-403-1659'), (456, 'Jane', 'Wright', '555-776-4100'), (789, 'Maria', 'Fernandez', '555-808-9633')) expect_left = (('Customer ID', 'First Name', 'Surname'), (123, 'Robert', 'Ingram'), (456, 'Jane', 'Wright'), (789, 'Maria', 'Fernandez')) expect_right = (('Customer ID', 'Telephone Number'), (123, '555-861-2025'), (456, '555-403-1659'), (456, '555-776-4100'), (789, '555-808-9633')) left, right = unjoin(table2, 'Telephone Number', key='Customer ID') ieq(expect_left, left) ieq(expect_left, left) ieq(expect_right, right) ieq(expect_right, right) def test_unjoin_explicit_key_2(): table3 = (('Employee', 'Skill', 'Current Work Location'), ('Jones', 'Typing', '114 Main Street'), ('Jones', 'Shorthand', '114 Main Street'), ('Jones', 'Whittling', '114 Main Street'), ('Bravo', 'Light Cleaning', '73 Industrial Way'), ('Ellis', 'Alchemy', '73 Industrial Way'), ('Ellis', 'Flying', '73 Industrial Way'), ('Harrison', 'Light Cleaning', '73 Industrial Way')) # N.B., we do expect rows will get sorted expect_left = (('Employee', 'Current Work Location'), ('Bravo', '73 Industrial Way'), ('Ellis', '73 Industrial Way'), ('Harrison', '73 Industrial Way'), ('Jones', '114 Main Street')) expect_right = (('Employee', 'Skill'), ('Bravo', 'Light Cleaning'), ('Ellis', 'Alchemy'), ('Ellis', 'Flying'), ('Harrison', 'Light Cleaning'), ('Jones', 'Shorthand'), ('Jones', 'Typing'), ('Jones', 'Whittling')) left, right = unjoin(table3, 'Skill', key='Employee') ieq(expect_left, left) ieq(expect_left, left) ieq(expect_right, right) ieq(expect_right, right) def test_unjoin_explicit_key_3(): table4 = (('Tournament', 'Year', 'Winner', 'Date of Birth'), ('Indiana Invitational', 1998, 'Al Fredrickson', '21 July 1975'), ('Cleveland Open', 1999, 'Bob Albertson', '28 September 1968'), ('Des Moines Masters', 1999, 'Al Fredrickson', '21 July 1975'), ('Indiana Invitational', 1999, 'Chip Masterson', '14 March 1977')) # N.B., we do expect rows will get sorted expect_left = (('Tournament', 'Year', 'Winner'), ('Cleveland Open', 1999, 'Bob Albertson'), ('Des Moines Masters', 1999, 'Al Fredrickson'), ('Indiana Invitational', 1998, 'Al Fredrickson'), ('Indiana Invitational', 1999, 'Chip Masterson')) expect_right = (('Winner', 'Date of Birth'), ('Al Fredrickson', '21 July 1975'), ('Bob Albertson', '28 September 1968'), ('Chip Masterson', '14 March 1977')) left, right = unjoin(table4, 'Date of Birth', key='Winner') ieq(expect_left, left) ieq(expect_left, left) ieq(expect_right, right) ieq(expect_right, right) def test_unjoin_explicit_key_4(): table5 = (('Restaurant', 'Pizza Variety', 'Delivery Area'), ('A1 Pizza', 'Thick Crust', 'Springfield'), ('A1 Pizza', 'Thick Crust', 'Shelbyville'), ('A1 Pizza', 'Thick Crust', 'Capital City'), ('A1 Pizza', 'Stuffed Crust', 'Springfield'), ('A1 Pizza', 'Stuffed Crust', 'Shelbyville'), ('A1 Pizza', 'Stuffed Crust', 'Capital City'), ('Elite Pizza', 'Thin Crust', 'Capital City'), ('Elite Pizza', 'Stuffed Crust', 'Capital City'), ("Vincenzo's Pizza", "Thick Crust", "Springfield"), ("Vincenzo's Pizza", "Thick Crust", "Shelbyville"), ("Vincenzo's Pizza", "Thin Crust", "Springfield"), ("Vincenzo's Pizza", "Thin Crust", "Shelbyville")) # N.B., we do expect rows will get sorted expect_left = (('Restaurant', 'Pizza Variety'), ('A1 Pizza', 'Stuffed Crust'), ('A1 Pizza', 'Thick Crust'), ('Elite Pizza', 'Stuffed Crust'), ('Elite Pizza', 'Thin Crust'), ("Vincenzo's Pizza", "Thick Crust"), ("Vincenzo's Pizza", "Thin Crust")) expect_right = (('Restaurant', 'Delivery Area'), ('A1 Pizza', 'Capital City'), ('A1 Pizza', 'Shelbyville'), ('A1 Pizza', 'Springfield'), ('Elite Pizza', 'Capital City'), ("Vincenzo's Pizza", "Shelbyville"), ("Vincenzo's Pizza", "Springfield")) left, right = unjoin(table5, 'Delivery Area', key='Restaurant') ieq(expect_left, left) ieq(expect_left, left) ieq(expect_right, right) ieq(expect_right, right) def test_unjoin_explicit_key_5(): table6 = (('ColA', 'ColB', 'ColC'), ('A', 1, 'apple'), ('B', 1, 'apple'), ('C', 2, 'orange'), ('D', 3, 'lemon'), ('E', 3, 'lemon')) # N.B., we do expect rows will get sorted expect_left = (('ColA', 'ColB'), ('A', 1), ('B', 1), ('C', 2), ('D', 3), ('E', 3)) expect_right = (('ColB', 'ColC'), (1, 'apple'), (2, 'orange'), (3, 'lemon')) left, right = unjoin(table6, 'ColC', key='ColB') ieq(expect_left, left) ieq(expect_left, left) ieq(expect_right, right) ieq(expect_right, right) petl-1.7.15/petl/test/transform/test_maps.py000066400000000000000000000265471457414240700211060ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division from collections import OrderedDict from petl.test.failonerror import assert_failonerror from petl.test.helpers import ieq from petl.transform.maps import fieldmap, rowmap, rowmapmany from functools import partial def test_fieldmap(): table = (('id', 'sex', 'age', 'height', 'weight'), (1, 'male', 16, 1.45, 62.0), (2, 'female', 19, 1.34, 55.4), (3, 'female', 17, 1.78, 74.4), (4, 'male', 21, 1.33, 45.2), (5, '-', 25, 1.65, 51.9)) mappings = OrderedDict() mappings['subject_id'] = 'id' mappings['gender'] = 'sex', {'male': 'M', 'female': 'F'} mappings['age_months'] = 'age', lambda v: v * 12 mappings['bmi'] = lambda rec: rec['weight'] / rec['height'] ** 2 actual = fieldmap(table, mappings) expect = (('subject_id', 'gender', 'age_months', 'bmi'), (1, 'M', 16 * 12, 62.0 / 1.45 ** 2), (2, 'F', 19 * 12, 55.4 / 1.34 ** 2), (3, 'F', 17 * 12, 74.4 / 1.78 ** 2), (4, 'M', 21 * 12, 45.2 / 1.33 ** 2), (5, '-', 25 * 12, 51.9 / 1.65 ** 2)) ieq(expect, actual) ieq(expect, actual) # can iteratate twice? # do it with suffix actual = fieldmap(table) actual['subject_id'] = 'id' actual['gender'] = 'sex', {'male': 'M', 'female': 'F'} actual['age_months'] = 'age', lambda v: v * 12 actual['bmi'] = '{weight} / {height}**2' ieq(expect, actual) # test short rows table2 = (('id', 'sex', 'age', 'height', 'weight'), (1, 'male', 16, 1.45, 62.0), (2, 'female', 19, 1.34, 55.4), (3, 'female', 17, 1.78, 74.4), (4, 'male', 21, 1.33, 45.2), (5, '-', 25, 1.65)) expect = (('subject_id', 'gender', 'age_months', 'bmi'), (1, 'M', 16 * 12, 62.0 / 1.45 ** 2), (2, 'F', 19 * 12, 55.4 / 1.34 ** 2), (3, 'F', 17 * 12, 74.4 / 1.78 ** 2), (4, 'M', 21 * 12, 45.2 / 1.33 ** 2), (5, '-', 25 * 12, None)) actual = fieldmap(table2, mappings) ieq(expect, actual) def test_fieldmap_record_access(): table = (('id', 'sex', 'age', 'height', 'weight'), (1, 'male', 16, 1.45, 62.0), (2, 'female', 19, 1.34, 55.4), (3, 'female', 17, 1.78, 74.4), (4, 'male', 21, 1.33, 45.2), (5, '-', 25, 1.65, 51.9)) mappings = OrderedDict() mappings['subject_id'] = 'id' mappings['gender'] = 'sex', {'male': 'M', 'female': 'F'} mappings['age_months'] = 'age', lambda v: v * 12 mappings['bmi'] = lambda rec: rec.weight / rec.height ** 2 actual = fieldmap(table, mappings) expect = (('subject_id', 'gender', 'age_months', 'bmi'), (1, 'M', 16 * 12, 62.0 / 1.45 ** 2), (2, 'F', 19 * 12, 55.4 / 1.34 ** 2), (3, 'F', 17 * 12, 74.4 / 1.78 ** 2), (4, 'M', 21 * 12, 45.2 / 1.33 ** 2), (5, '-', 25 * 12, 51.9 / 1.65 ** 2)) ieq(expect, actual) ieq(expect, actual) # can iteratate twice? def test_fieldmap_empty(): table = (('foo', 'bar'),) expect = (('foo', 'baz'),) mappings = OrderedDict() mappings['foo'] = 'foo' mappings['baz'] = 'bar', lambda v: v * 2 actual = fieldmap(table, mappings) ieq(expect, actual) def test_fieldmap_headerless(): table = [] expect = [] mappings = OrderedDict() mappings['foo'] = 'foo' mappings['baz'] = 'bar', lambda v: v * 2 actual = fieldmap(table, mappings) ieq(expect, actual) def test_fieldmap_failonerror(): input_ = (('foo',), ('A',), (1,)) mapper_ = {'bar': ('foo', lambda v: v.lower())} expect_ = (('bar',), ('a',), (None,)) assert_failonerror( input_fn=partial(fieldmap, input_, mapper_), expected_output=expect_) def test_rowmap(): table = (('id', 'sex', 'age', 'height', 'weight'), (1, 'male', 16, 1.45, 62.0), (2, 'female', 19, 1.34, 55.4), (3, 'female', 17, 1.78, 74.4), (4, 'male', 21, 1.33, 45.2), (5, '-', 25, 1.65, 51.9)) def rowmapper(row): transmf = {'male': 'M', 'female': 'F'} return [row[0], transmf[row[1]] if row[1] in transmf else row[1], row[2] * 12, row[4] / row[3] ** 2] actual = rowmap(table, rowmapper, header=['subject_id', 'gender', 'age_months', 'bmi']) expect = (('subject_id', 'gender', 'age_months', 'bmi'), (1, 'M', 16 * 12, 62.0 / 1.45 ** 2), (2, 'F', 19 * 12, 55.4 / 1.34 ** 2), (3, 'F', 17 * 12, 74.4 / 1.78 ** 2), (4, 'M', 21 * 12, 45.2 / 1.33 ** 2), (5, '-', 25 * 12, 51.9 / 1.65 ** 2)) ieq(expect, actual) ieq(expect, actual) # can iteratate twice? # test short rows table2 = (('id', 'sex', 'age', 'height', 'weight'), (1, 'male', 16, 1.45, 62.0), (2, 'female', 19, 1.34, 55.4), (3, 'female', 17, 1.78, 74.4), (4, 'male', 21, 1.33, 45.2), (5, '-', 25, 1.65)) expect = (('subject_id', 'gender', 'age_months', 'bmi'), (1, 'M', 16 * 12, 62.0 / 1.45 ** 2), (2, 'F', 19 * 12, 55.4 / 1.34 ** 2), (3, 'F', 17 * 12, 74.4 / 1.78 ** 2), (4, 'M', 21 * 12, 45.2 / 1.33 ** 2)) actual = rowmap(table2, rowmapper, header=['subject_id', 'gender', 'age_months', 'bmi']) ieq(expect, actual) def test_rowmap_empty(): table = (('id', 'sex', 'age', 'height', 'weight'),) def rowmapper(row): transmf = {'male': 'M', 'female': 'F'} return [row[0], transmf[row[1]] if row[1] in transmf else row[1], row[2] * 12, row[4] / row[3] ** 2] actual = rowmap(table, rowmapper, header=['subject_id', 'gender', 'age_months', 'bmi']) expect = (('subject_id', 'gender', 'age_months', 'bmi'),) ieq(expect, actual) def test_rowmap_headerless(): table = [] def rowmapper(row): return row actual = rowmap(table, rowmapper, header=['subject_id', 'gender']) expect = [] ieq(expect, actual) def test_rowmap_failonerror(): input_ = (('foo',), ('A',), (1,), ('B',)) mapper = lambda r: [r[0].lower()] # exceptions in rowmappers do not generate an output row expect_ = (('foo',), ('a',), ('b',)) assert_failonerror( input_fn=partial(rowmap, input_, mapper, header=('foo',)), expected_output=expect_) def test_recordmap(): table = (('id', 'sex', 'age', 'height', 'weight'), (1, 'male', 16, 1.45, 62.0), (2, 'female', 19, 1.34, 55.4), (3, 'female', 17, 1.78, 74.4), (4, 'male', 21, 1.33, 45.2), (5, '-', 25, 1.65, 51.9)) def recmapper(rec): transmf = {'male': 'M', 'female': 'F'} return [rec['id'], transmf[rec['sex']] if rec['sex'] in transmf else rec['sex'], rec['age'] * 12, rec['weight'] / rec['height'] ** 2] actual = rowmap(table, recmapper, header=['subject_id', 'gender', 'age_months', 'bmi']) expect = (('subject_id', 'gender', 'age_months', 'bmi'), (1, 'M', 16 * 12, 62.0 / 1.45 ** 2), (2, 'F', 19 * 12, 55.4 / 1.34 ** 2), (3, 'F', 17 * 12, 74.4 / 1.78 ** 2), (4, 'M', 21 * 12, 45.2 / 1.33 ** 2), (5, '-', 25 * 12, 51.9 / 1.65 ** 2)) ieq(expect, actual) ieq(expect, actual) # can iteratate twice? # test short rows table2 = (('id', 'sex', 'age', 'height', 'weight'), (1, 'male', 16, 1.45, 62.0), (2, 'female', 19, 1.34, 55.4), (3, 'female', 17, 1.78, 74.4), (4, 'male', 21, 1.33, 45.2), (5, '-', 25, 1.65)) expect = (('subject_id', 'gender', 'age_months', 'bmi'), (1, 'M', 16 * 12, 62.0 / 1.45 ** 2), (2, 'F', 19 * 12, 55.4 / 1.34 ** 2), (3, 'F', 17 * 12, 74.4 / 1.78 ** 2), (4, 'M', 21 * 12, 45.2 / 1.33 ** 2)) actual = rowmap(table2, recmapper, header=['subject_id', 'gender', 'age_months', 'bmi']) ieq(expect, actual) def test_rowmapmany(): table = (('id', 'sex', 'age', 'height', 'weight'), (1, 'male', 16, 1.45, 62.0), (2, 'female', 19, 1.34, 55.4), (3, '-', 17, 1.78, 74.4), (4, 'male', 21, 1.33)) def rowgenerator(row): transmf = {'male': 'M', 'female': 'F'} yield [row[0], 'gender', transmf[row[1]] if row[1] in transmf else row[1]] yield [row[0], 'age_months', row[2] * 12] yield [row[0], 'bmi', row[4] / row[3] ** 2] actual = rowmapmany(table, rowgenerator, header=['subject_id', 'variable', 'value']) expect = (('subject_id', 'variable', 'value'), (1, 'gender', 'M'), (1, 'age_months', 16 * 12), (1, 'bmi', 62.0 / 1.45 ** 2), (2, 'gender', 'F'), (2, 'age_months', 19 * 12), (2, 'bmi', 55.4 / 1.34 ** 2), (3, 'gender', '-'), (3, 'age_months', 17 * 12), (3, 'bmi', 74.4 / 1.78 ** 2), (4, 'gender', 'M'), (4, 'age_months', 21 * 12)) ieq(expect, actual) ieq(expect, actual) # can iteratate twice? def test_rowmapmany_failonerror(): input_ = (('foo',), ('A',), (1,), ('B',)) mapper = lambda r: [r[0].lower()] expect_ = (('foo',), ('a',), ('b',),) assert_failonerror( input_fn=partial(rowmapmany, input_, mapper, header=('foo',)), expected_output=expect_) def test_recordmapmany(): table = (('id', 'sex', 'age', 'height', 'weight'), (1, 'male', 16, 1.45, 62.0), (2, 'female', 19, 1.34, 55.4), (3, '-', 17, 1.78, 74.4), (4, 'male', 21, 1.33)) def rowgenerator(rec): transmf = {'male': 'M', 'female': 'F'} yield [rec['id'], 'gender', transmf[rec['sex']] if rec['sex'] in transmf else rec['sex']] yield [rec['id'], 'age_months', rec['age'] * 12] yield [rec['id'], 'bmi', rec['weight'] / rec['height'] ** 2] actual = rowmapmany(table, rowgenerator, header=['subject_id', 'variable', 'value']) expect = (('subject_id', 'variable', 'value'), (1, 'gender', 'M'), (1, 'age_months', 16 * 12), (1, 'bmi', 62.0 / 1.45 ** 2), (2, 'gender', 'F'), (2, 'age_months', 19 * 12), (2, 'bmi', 55.4 / 1.34 ** 2), (3, 'gender', '-'), (3, 'age_months', 17 * 12), (3, 'bmi', 74.4 / 1.78 ** 2), (4, 'gender', 'M'), (4, 'age_months', 21 * 12)) ieq(expect, actual) ieq(expect, actual) # can iteratate twice? def test_recordmapmany_headerless(): table = [] def duplicate(rec): yield rec yield rec actual = rowmapmany(table, duplicate, header=['subject_id', 'variable']) expect = [] ieq(expect, actual) ieq(expect, actual) # can iteratate twice? petl-1.7.15/petl/test/transform/test_reductions.py000066400000000000000000000343041457414240700223130ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division import operator from collections import OrderedDict from petl.test.helpers import ieq from petl.util import strjoin from petl.transform.reductions import rowreduce, aggregate, \ mergeduplicates, Conflict, fold def test_rowreduce(): table1 = (('foo', 'bar'), ('a', 3), ('a', 7), ('b', 2), ('b', 1), ('b', 9), ('c', 4)) def sumbar(key, rows): return [key, sum(row[1] for row in rows)] table2 = rowreduce(table1, key='foo', reducer=sumbar, header=['foo', 'barsum']) expect2 = (('foo', 'barsum'), ('a', 10), ('b', 12), ('c', 4)) ieq(expect2, table2) def test_rowreduce_fieldnameaccess(): table1 = (('foo', 'bar'), ('a', 3), ('a', 7), ('b', 2), ('b', 1), ('b', 9), ('c', 4)) def sumbar(key, records): return [key, sum([rec['bar'] for rec in records])] table2 = rowreduce(table1, key='foo', reducer=sumbar, header=['foo', 'barsum']) expect2 = (('foo', 'barsum'), ('a', 10), ('b', 12), ('c', 4)) ieq(expect2, table2) def test_rowreduce_more(): table1 = (('foo', 'bar'), ('aa', 3), ('aa', 7), ('bb', 2), ('bb', 1), ('bb', 9), ('cc', 4)) def sumbar(key, records): return [key, sum(rec['bar'] for rec in records)] table2 = rowreduce(table1, key='foo', reducer=sumbar, header=['foo', 'barsum']) expect2 = (('foo', 'barsum'), ('aa', 10), ('bb', 12), ('cc', 4)) ieq(expect2, table2) def test_rowreduce_empty(): table = (('foo', 'bar'),) expect = (('foo', 'bar'),) reducer = lambda key, rows: (key, [r[0] for r in rows]) actual = rowreduce(table, key='foo', reducer=reducer, header=('foo', 'bar')) ieq(expect, actual) def test_aggregate_simple(): table1 = (('foo', 'bar', 'baz'), ('a', 3, True), ('a', 7, False), ('b', 2, True), ('b', 2, False), ('b', 9, False), ('c', 4, True)) # simplest signature - aggregate whole rows table2 = aggregate(table1, 'foo', len) expect2 = (('foo', 'value'), ('a', 2), ('b', 3), ('c', 1)) ieq(expect2, table2) ieq(expect2, table2) # next simplest signature - aggregate single field table3 = aggregate(table1, 'foo', sum, 'bar') expect3 = (('foo', 'value'), ('a', 10), ('b', 13), ('c', 4)) ieq(expect3, table3) ieq(expect3, table3) # alternative signature for simple aggregation table4 = aggregate(table1, key=('foo', 'bar'), aggregation=list, value=('bar', 'baz')) expect4 = (('foo', 'bar', 'value'), ('a', 3, [(3, True)]), ('a', 7, [(7, False)]), ('b', 2, [(2, True), (2, False)]), ('b', 9, [(9, False)]), ('c', 4, [(4, True)])) ieq(expect4, table4) ieq(expect4, table4) table5 = aggregate(table1, 'foo', len, field='nrows') expect5 = (('foo', 'nrows'), ('a', 2), ('b', 3), ('c', 1)) ieq(expect5, table5) ieq(expect5, table5) def test_aggregate_simple_key_is_None(): table1 = (('foo', 'bar', 'baz'), ('a', 3, True), ('a', 7, False), ('b', 2, True), ('b', 2, False), ('b', 9, False), ('c', 4, True)) # simplest signature - aggregate whole rows table2 = aggregate(table1, None, len) expect2 = (('value',), (6,)) ieq(expect2, table2) ieq(expect2, table2) # next simplest signature - aggregate single field table3 = aggregate(table1, None, sum, 'bar') expect3 = (('value',), (27,)) ieq(expect3, table3) ieq(expect3, table3) # alternative signature for simple aggregation table4 = aggregate(table1, key=None, aggregation=list, value=('bar', 'baz')) expect4 = (('value',), ( [(3, True), (7, False), (2, True), (2, False), (9, False), (4, True)],),) ieq(expect4, table4) ieq(expect4, table4) table5 = aggregate(table1, None, len, field='nrows') expect5 = (('nrows',), (6,)) ieq(expect5, table5) ieq(expect5, table5) def test_aggregate_multifield(): table1 = (('foo', 'bar'), ('a', 3), ('a', 7), ('b', 2), ('b', 1), ('b', 9), ('c', 4)) # dict arg aggregators = OrderedDict() aggregators['count'] = len aggregators['minbar'] = 'bar', min aggregators['maxbar'] = 'bar', max aggregators['sumbar'] = 'bar', sum aggregators['listbar'] = 'bar', list aggregators['bars'] = 'bar', strjoin(', ') table2 = aggregate(table1, 'foo', aggregators) expect2 = (('foo', 'count', 'minbar', 'maxbar', 'sumbar', 'listbar', 'bars'), ('a', 2, 3, 7, 10, [3, 7], '3, 7'), ('b', 3, 1, 9, 12, [2, 1, 9], '2, 1, 9'), ('c', 1, 4, 4, 4, [4], '4')) ieq(expect2, table2) ieq(expect2, table2) # check can iterate twice # use suffix notation table3 = aggregate(table1, 'foo') table3['count'] = len table3['minbar'] = 'bar', min table3['maxbar'] = 'bar', max table3['sumbar'] = 'bar', sum table3['listbar'] = 'bar' # default aggregation is list table3['bars'] = 'bar', strjoin(', ') ieq(expect2, table3) # list arg aggregators = [('count', len), ('minbar', 'bar', min), ('maxbar', 'bar', max), ('sumbar', 'bar', sum), ('listbar', 'bar', list), ('bars', 'bar', strjoin(', '))] table4 = aggregate(table1, 'foo', aggregators) ieq(expect2, table4) ieq(expect2, table4) # check can iterate twice def test_aggregate_multifield_key_is_None(): table1 = (('foo', 'bar'), ('a', 3), ('a', 7), ('b', 2), ('b', 1), ('b', 9), ('c', 4)) # dict arg aggregators = OrderedDict() aggregators['count'] = len aggregators['minbar'] = 'bar', min aggregators['maxbar'] = 'bar', max aggregators['sumbar'] = 'bar', sum aggregators['listbar'] = 'bar', list aggregators['bars'] = 'bar', strjoin(', ') table2 = aggregate(table1, None, aggregators) expect2 = (('count', 'minbar', 'maxbar', 'sumbar', 'listbar', 'bars'), (6, 1, 9, 26, [3, 7, 2, 1, 9, 4], '3, 7, 2, 1, 9, 4')) ieq(expect2, table2) ieq(expect2, table2) # check can iterate twice # use suffix notation table3 = aggregate(table1, None) table3['count'] = len table3['minbar'] = 'bar', min table3['maxbar'] = 'bar', max table3['sumbar'] = 'bar', sum table3['listbar'] = 'bar' # default aggregation is list table3['bars'] = 'bar', strjoin(', ') ieq(expect2, table3) # list arg aggregators = [('count', len), ('minbar', 'bar', min), ('maxbar', 'bar', max), ('sumbar', 'bar', sum), ('listbar', 'bar', list), ('bars', 'bar', strjoin(', '))] table4 = aggregate(table1, None, aggregators) ieq(expect2, table4) ieq(expect2, table4) # check can iterate twice def test_aggregate_more(): table1 = (('foo', 'bar'), ('aa', 3), ('aa', 7), ('bb', 2), ('bb', 1), ('bb', 9), ('cc', 4), ('dd', 3)) aggregators = OrderedDict() aggregators['minbar'] = 'bar', min aggregators['maxbar'] = 'bar', max aggregators['sumbar'] = 'bar', sum aggregators['listbar'] = 'bar' # default aggregation is list aggregators['bars'] = 'bar', strjoin(', ') table2 = aggregate(table1, 'foo', aggregators) expect2 = (('foo', 'minbar', 'maxbar', 'sumbar', 'listbar', 'bars'), ('aa', 3, 7, 10, [3, 7], '3, 7'), ('bb', 1, 9, 12, [2, 1, 9], '2, 1, 9'), ('cc', 4, 4, 4, [4], '4'), ('dd', 3, 3, 3, [3], '3')) ieq(expect2, table2) ieq(expect2, table2) # check can iterate twice table3 = aggregate(table1, 'foo') table3['minbar'] = 'bar', min table3['maxbar'] = 'bar', max table3['sumbar'] = 'bar', sum table3['listbar'] = 'bar' # default aggregation is list table3['bars'] = 'bar', strjoin(', ') ieq(expect2, table3) def test_aggregate_more_key_is_None(): table1 = (('foo', 'bar'), ('aa', 3), ('aa', 7), ('bb', 2), ('bb', 1), ('bb', 9), ('cc', 4), ('dd', 3)) aggregators = OrderedDict() aggregators['minbar'] = 'bar', min aggregators['maxbar'] = 'bar', max aggregators['sumbar'] = 'bar', sum aggregators['listbar'] = 'bar' # default aggregation is list aggregators['bars'] = 'bar', strjoin(', ') table2 = aggregate(table1, None, aggregators) expect2 = (('minbar', 'maxbar', 'sumbar', 'listbar', 'bars'), (1, 9, 29, [3, 7, 2, 1, 9, 4, 3], '3, 7, 2, 1, 9, 4, 3')) ieq(expect2, table2) ieq(expect2, table2) # check can iterate twice table3 = aggregate(table1, None) table3['minbar'] = 'bar', min table3['maxbar'] = 'bar', max table3['sumbar'] = 'bar', sum table3['listbar'] = 'bar' # default aggregation is list table3['bars'] = 'bar', strjoin(', ') ieq(expect2, table3) def test_aggregate_multiple_source_fields(): table = (('foo', 'bar', 'baz'), ('a', 3, True), ('a', 7, False), ('b', 2, True), ('b', 2, False), ('b', 9, False), ('c', 4, True)) expect = (('foo', 'bar', 'value'), ('a', 3, [(3, True)]), ('a', 7, [(7, False)]), ('b', 2, [(2, True), (2, False)]), ('b', 9, [(9, False)]), ('c', 4, [(4, True)])) actual = aggregate(table, ('foo', 'bar'), list, ('bar', 'baz')) ieq(expect, actual) ieq(expect, actual) actual = aggregate(table, key=('foo', 'bar'), aggregation=list, value=('bar', 'baz')) ieq(expect, actual) ieq(expect, actual) actual = aggregate(table, key=('foo', 'bar')) actual['value'] = ('bar', 'baz'), list ieq(expect, actual) ieq(expect, actual) def test_aggregate_multiple_source_fields_key_is_None(): table = (('foo', 'bar', 'baz'), ('a', 3, True), ('a', 7, False), ('b', 2, True), ('b', 2, False), ('b', 9, False), ('c', 4, True)) expect = (('value',), ( [(3, True), (7, False), (2, True), (2, False), (9, False), (4, True)],),) actual = aggregate(table, None, list, ('bar', 'baz')) ieq(expect, actual) ieq(expect, actual) actual = aggregate(table, key=None, aggregation=list, value=('bar', 'baz')) ieq(expect, actual) ieq(expect, actual) actual = aggregate(table, key=None) actual['value'] = ('bar', 'baz'), list ieq(expect, actual) ieq(expect, actual) def test_aggregate_empty(): table = (('foo', 'bar'),) aggregators = OrderedDict() aggregators['minbar'] = 'bar', min aggregators['maxbar'] = 'bar', max aggregators['sumbar'] = 'bar', sum actual = aggregate(table, 'foo', aggregators) expect = (('foo', 'minbar', 'maxbar', 'sumbar'),) ieq(expect, actual) def test_aggregate_empty_key_is_None(): table = (('foo', 'bar'),) aggregators = OrderedDict() aggregators['minbar'] = 'bar', min aggregators['maxbar'] = 'bar', max aggregators['sumbar'] = 'bar', sum actual = aggregate(table, None, aggregators) expect = (('minbar', 'maxbar', 'sumbar'),) ieq(expect, actual) def test_mergeduplicates(): table = (('foo', 'bar', 'baz'), ('A', 1, 2), ('B', '2', None), ('D', 'xyz', 9.4), ('B', None, u'7.8', True), ('E', None, 42.), ('D', 'xyz', 12.3), ('A', 2, None)) # value overrides missing result = mergeduplicates(table, 'foo', missing=None) expectation = (('foo', 'bar', 'baz'), ('A', Conflict([1, 2]), 2), ('B', '2', u'7.8'), ('D', 'xyz', Conflict([9.4, 12.3])), ('E', None, 42.)) ieq(expectation, result) def test_mergeduplicates_empty(): table = (('foo', 'bar'),) expect = (('foo', 'bar'),) actual = mergeduplicates(table, key='foo') ieq(expect, actual) def test_mergeduplicates_shortrows(): table = [['foo', 'bar', 'baz'], ['a', 1, True], ['b', 2, True], ['b', 3]] actual = mergeduplicates(table, 'foo') expect = [('foo', 'bar', 'baz'), ('a', 1, True), ('b', Conflict([2, 3]), True)] ieq(expect, actual) def test_mergeduplicates_compoundkey(): table = [['foo', 'bar', 'baz'], ['a', 1, True], ['a', 1, True], ['a', 2, False], ['a', 2, None], ['c', 3, True], ['c', 3, False], ] actual = mergeduplicates(table, key=('foo', 'bar')) expect = [('foo', 'bar', 'baz'), ('a', 1, True), ('a', 2, False), ('c', 3, Conflict([True, False]))] ieq(expect, actual) def test_fold(): t1 = (('id', 'count'), (1, 3), (1, 5), (2, 4), (2, 8)) t2 = fold(t1, 'id', operator.add, 'count', presorted=True) expect = (('key', 'value'), (1, 8), (2, 12)) ieq(expect, t2) ieq(expect, t2) petl-1.7.15/petl/test/transform/test_regex.py000066400000000000000000000220651457414240700212470ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division import pytest from petl.compat import next from petl.errors import ArgumentError from petl.test.helpers import ieq, eq_ from petl.transform.regex import capture, split, search, searchcomplement, splitdown from petl.transform.basics import TransformError def test_capture(): table = (('id', 'variable', 'value'), ('1', 'A1', '12'), ('2', 'A2', '15'), ('3', 'B1', '18'), ('4', 'C12', '19')) expectation = (('id', 'value', 'treat', 'time'), ('1', '12', 'A', '1'), ('2', '15', 'A', '2'), ('3', '18', 'B', '1'), ('4', '19', 'C', '12')) result = capture(table, 'variable', '(\\w)(\\d+)', ('treat', 'time')) ieq(expectation, result) result = capture(table, 'variable', '(\\w)(\\d+)', ('treat', 'time'), include_original=False) ieq(expectation, result) # what about including the original field? expectation = (('id', 'variable', 'value', 'treat', 'time'), ('1', 'A1', '12', 'A', '1'), ('2', 'A2', '15', 'A', '2'), ('3', 'B1', '18', 'B', '1'), ('4', 'C12', '19', 'C', '12')) result = capture(table, 'variable', '(\\w)(\\d+)', ('treat', 'time'), include_original=True) ieq(expectation, result) # what about if number of captured groups is different from new fields? expectation = (('id', 'value'), ('1', '12', 'A', '1'), ('2', '15', 'A', '2'), ('3', '18', 'B', '1'), ('4', '19', 'C', '12')) result = capture(table, 'variable', '(\\w)(\\d+)') ieq(expectation, result) def test_capture_empty(): table = (('foo', 'bar'),) expect = (('foo', 'baz', 'qux'),) actual = capture(table, 'bar', r'(\w)(\d)', ('baz', 'qux')) ieq(expect, actual) def test_capture_headerless(): table = [] with pytest.raises(ArgumentError): for i in capture(table, 'bar', r'(\w)(\d)', ('baz', 'qux')): pass def test_capture_nonmatching(): table = (('id', 'variable', 'value'), ('1', 'A1', '12'), ('2', 'A2', '15'), ('3', 'B1', '18'), ('4', 'C12', '19')) expectation = (('id', 'value', 'treat', 'time'), ('1', '12', 'A', '1'), ('2', '15', 'A', '2'), ('3', '18', 'B', '1')) # default behaviour, raise exception result = capture(table, 'variable', r'([A-B])(\d+)', ('treat', 'time')) it = iter(result) eq_(expectation[0], next(it)) # header eq_(expectation[1], next(it)) eq_(expectation[2], next(it)) eq_(expectation[3], next(it)) try: next(it) # doesn't match except TransformError: pass # expected else: assert False, 'expected exception' # explicit fill result = capture(table, 'variable', r'([A-B])(\d+)', newfields=('treat', 'time'), fill=['', 0]) it = iter(result) eq_(expectation[0], next(it)) # header eq_(expectation[1], next(it)) eq_(expectation[2], next(it)) eq_(expectation[3], next(it)) eq_(('4', '19', '', 0), next(it)) def test_split(): table = (('id', 'variable', 'value'), ('1', 'parad1', '12'), ('2', 'parad2', '15'), ('3', 'tempd1', '18'), ('4', 'tempd2', '19')) expectation = (('id', 'value', 'variable', 'day'), ('1', '12', 'para', '1'), ('2', '15', 'para', '2'), ('3', '18', 'temp', '1'), ('4', '19', 'temp', '2')) result = split(table, 'variable', 'd', ('variable', 'day')) ieq(expectation, result) ieq(expectation, result) # proper regex result = split(table, 'variable', '[Dd]', ('variable', 'day')) ieq(expectation, result) # integer field reference result = split(table, 1, 'd', ('variable', 'day')) ieq(expectation, result) expectation = (('id', 'variable', 'value', 'variable', 'day'), ('1', 'parad1', '12', 'para', '1'), ('2', 'parad2', '15', 'para', '2'), ('3', 'tempd1', '18', 'temp', '1'), ('4', 'tempd2', '19', 'temp', '2')) result = split(table, 'variable', 'd', ('variable', 'day'), include_original=True) ieq(expectation, result) # what about if no new fields? expectation = (('id', 'value'), ('1', '12', 'para', '1'), ('2', '15', 'para', '2'), ('3', '18', 'temp', '1'), ('4', '19', 'temp', '2')) result = split(table, 'variable', 'd') ieq(expectation, result) def test_split_empty(): table = (('foo', 'bar'),) expect = (('foo', 'baz', 'qux'),) actual = split(table, 'bar', 'd', ('baz', 'qux')) ieq(expect, actual) def test_split_headerless(): table = [] with pytest.raises(ArgumentError): for i in split(table, 'bar', 'd', ('baz', 'qux')): pass def test_search(): table1 = (('foo', 'bar', 'baz'), ('orange', 12, 'oranges are nice fruit'), ('mango', 42, 'I like them'), ('banana', 74, 'lovely too'), ('cucumber', 41, 'better than mango')) # search any field table2 = search(table1, '.g.') expect2 = (('foo', 'bar', 'baz'), ('orange', 12, 'oranges are nice fruit'), ('mango', 42, 'I like them'), ('cucumber', 41, 'better than mango')) ieq(expect2, table2) ieq(expect2, table2) # search a specific field table3 = search(table1, 'foo', '.g.') expect3 = (('foo', 'bar', 'baz'), ('orange', 12, 'oranges are nice fruit'), ('mango', 42, 'I like them')) ieq(expect3, table3) ieq(expect3, table3) def test_search_2(): # test ported from selectre table = (('foo', 'bar', 'baz'), ('aa', 4, 9.3), ('aaa', 2, 88.2), ('b', 1, 23.3), ('ccc', 8, 42.0), ('bb', 7, 100.9), ('c', 2)) actual = search(table, 'foo', '[ab]{2}') expect = (('foo', 'bar', 'baz'), ('aa', 4, 9.3), ('aaa', 2, 88.2), ('bb', 7, 100.9)) ieq(expect, actual) ieq(expect, actual) def test_search_headerless(): table = [] actual = search(table, 'foo', '[ab]{2}') expect = [] ieq(expect, actual) ieq(expect, actual) def test_searchcomplement(): table1 = (('foo', 'bar', 'baz'), ('orange', 12, 'oranges are nice fruit'), ('mango', 42, 'I like them'), ('banana', 74, 'lovely too'), ('cucumber', 41, 'better than mango')) # search any field table2 = searchcomplement(table1, '.g.') expect2 = (('foo', 'bar', 'baz'), ('banana', 74, 'lovely too')) ieq(expect2, table2) ieq(expect2, table2) # search a specific field table3 = searchcomplement(table1, 'foo', '.g.') expect3 = (('foo', 'bar', 'baz'), ('banana', 74, 'lovely too'), ('cucumber', 41, 'better than mango')) ieq(expect3, table3) ieq(expect3, table3) # search any field, using complement table2 = search(table1, '.g.', complement=True) expect2 = (('foo', 'bar', 'baz'), ('banana', 74, 'lovely too')) ieq(expect2, table2) ieq(expect2, table2) # search a specific field, using complement table3 = search(table1, 'foo', '.g.', complement=True) expect3 = (('foo', 'bar', 'baz'), ('banana', 74, 'lovely too'), ('cucumber', 41, 'better than mango')) ieq(expect3, table3) ieq(expect3, table3) def test_search_unicode(): tbl = ((u'name', u'id'), (u'Արամ Խաչատրյան', 1), (u'Johann Strauß', 2), (u'Вагиф Сәмәдоғлу', 3), (u'章子怡', 4)) actual = search(tbl, u'.Խա.') expect = ((u'name', u'id'), (u'Արամ Խաչատրյան', 1)) ieq(expect, actual) ieq(expect, actual) def test_splitdown(): tbl = ((u'name', u'roles'), (u'Jane Doe', u'president,engineer,tailor,lawyer'), (u'John Doe', u'rocket scientist,optometrist,chef,knight,sailor')) actual = splitdown(tbl, 'roles', ',') expect = ((u'name', u'roles'), (u'Jane Doe', u'president'), (u'Jane Doe', u'engineer'), (u'Jane Doe', u'tailor'), (u'Jane Doe', u'lawyer'), (u'John Doe', u'rocket scientist'), (u'John Doe', u'optometrist'), (u'John Doe', u'chef'), (u'John Doe', u'knight'), (u'John Doe', u'sailor')) ieq(expect, actual) ieq(expect, actual) ieq(expect, actual) ieq(expect, actual) ieq(expect, actual) ieq(expect, actual) ieq(expect, actual) ieq(expect, actual) ieq(expect, actual) ieq(expect, actual) # TODO test sub() petl-1.7.15/petl/test/transform/test_reshape.py000066400000000000000000000321221457414240700215570ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division from datetime import datetime import pytest from petl.errors import FieldSelectionError from petl.test.helpers import ieq from petl.transform.reshape import melt, recast, transpose, pivot, flatten, \ unflatten from petl.transform.regex import split, capture def test_melt_1(): table = (('id', 'gender', 'age'), (1, 'F', 12), (2, 'M', 17), (3, 'M', 16)) expectation = (('id', 'variable', 'value'), (1, 'gender', 'F'), (1, 'age', 12), (2, 'gender', 'M'), (2, 'age', 17), (3, 'gender', 'M'), (3, 'age', 16)) result = melt(table, key='id') ieq(expectation, result) # use field index as key result = melt(table, key=0) ieq(expectation, result) result = melt(table, key='id', variablefield='variable', valuefield='value') ieq(expectation, result) def test_melt_2(): table = (('id', 'time', 'height', 'weight'), (1, 11, 66.4, 12.2), (2, 16, 53.2, 17.3), (3, 12, 34.5, 9.4)) expectation = (('id', 'time', 'variable', 'value'), (1, 11, 'height', 66.4), (1, 11, 'weight', 12.2), (2, 16, 'height', 53.2), (2, 16, 'weight', 17.3), (3, 12, 'height', 34.5), (3, 12, 'weight', 9.4)) result = melt(table, key=('id', 'time')) ieq(expectation, result) expectation = (('id', 'time', 'variable', 'value'), (1, 11, 'height', 66.4), (2, 16, 'height', 53.2), (3, 12, 'height', 34.5)) result = melt(table, key=('id', 'time'), variables='height') print(result) ieq(expectation, result) def test_melt_empty(): table = (('foo', 'bar', 'baz'),) expect = (('foo', 'variable', 'value'),) actual = melt(table, key='foo') ieq(expect, actual) def test_melt_headerless(): table = [] expect = [] actual = melt(table, key='foo') ieq(expect, actual) def test_melt_1_shortrow(): table = (('id', 'gender', 'age'), (1, 'F', 12), (2, 'M', 17), (3, 'M'), (4,)) expectation = (('id', 'variable', 'value'), (1, 'gender', 'F'), (1, 'age', 12), (2, 'gender', 'M'), (2, 'age', 17), (3, 'gender', 'M')) result = melt(table, key='id') ieq(expectation, result) result = melt(table, key='id', variablefield='variable', valuefield='value') ieq(expectation, result) def test_melt_2_shortrow(): table = (('id', 'time', 'height', 'weight'), (1, 11, 66.4, 12.2), (2, 16, 53.2, 17.3), (3, 12, 34.5), (4, 14)) expectation = (('id', 'time', 'variable', 'value'), (1, 11, 'height', 66.4), (1, 11, 'weight', 12.2), (2, 16, 'height', 53.2), (2, 16, 'weight', 17.3), (3, 12, 'height', 34.5)) result = melt(table, key=('id', 'time')) ieq(expectation, result) expectation = (('id', 'time', 'variable', 'value'), (1, 11, 'height', 66.4), (2, 16, 'height', 53.2), (3, 12, 'height', 34.5)) result = melt(table, key=('id', 'time'), variables='height') ieq(expectation, result) def test_recast_1(): table = (('id', 'variable', 'value'), (3, 'age', 16), (1, 'gender', 'F'), (2, 'gender', 'M'), (2, 'age', 17), (1, 'age', 12), (3, 'gender', 'M')) expectation = (('id', 'age', 'gender'), (1, 12, 'F'), (2, 17, 'M'), (3, 16, 'M')) # by default lift 'variable' field, hold everything else result = recast(table) ieq(expectation, result) result = recast(table, variablefield='variable') ieq(expectation, result) result = recast(table, key='id', variablefield='variable') ieq(expectation, result) result = recast(table, key='id', variablefield='variable', valuefield='value') ieq(expectation, result) def test_recast_2(): table = (('id', 'variable', 'value'), (3, 'age', 16), (1, 'gender', 'F'), (2, 'gender', 'M'), (2, 'age', 17), (1, 'age', 12), (3, 'gender', 'M')) expectation = (('id', 'gender'), (1, 'F'), (2, 'M'), (3, 'M')) # can manually pick which variables you want to recast as fields # TODO this is awkward result = recast(table, key='id', variablefield={'variable': ['gender']}) ieq(expectation, result) def test_recast_3(): table = (('id', 'time', 'variable', 'value'), (1, 11, 'weight', 66.4), (1, 14, 'weight', 55.2), (2, 12, 'weight', 53.2), (2, 16, 'weight', 43.3), (3, 12, 'weight', 34.5), (3, 17, 'weight', 49.4)) expectation = (('id', 'time', 'weight'), (1, 11, 66.4), (1, 14, 55.2), (2, 12, 53.2), (2, 16, 43.3), (3, 12, 34.5), (3, 17, 49.4)) result = recast(table) ieq(expectation, result) # in the absence of an aggregation function, list all values expectation = (('id', 'weight'), (1, [66.4, 55.2]), (2, [53.2, 43.3]), (3, [34.5, 49.4])) result = recast(table, key='id') ieq(expectation, result) # max aggregation expectation = (('id', 'weight'), (1, 66.4), (2, 53.2), (3, 49.4)) result = recast(table, key='id', reducers={'weight': max}) ieq(expectation, result) # min aggregation expectation = (('id', 'weight'), (1, 55.2), (2, 43.3), (3, 34.5)) result = recast(table, key='id', reducers={'weight': min}) ieq(expectation, result) # mean aggregation expectation = (('id', 'weight'), (1, 60.80), (2, 48.25), (3, 41.95)) def mean(values): return float(sum(values)) / len(values) def meanf(precision): def f(values): v = mean(values) v = round(v, precision) return v return f result = recast(table, key='id', reducers={'weight': meanf(precision=2)}) ieq(expectation, result) def test_recast4(): # deal with missing data table = (('id', 'variable', 'value'), (1, 'gender', 'F'), (2, 'age', 17), (1, 'age', 12), (3, 'gender', 'M')) result = recast(table, key='id') expect = (('id', 'age', 'gender'), (1, 12, 'F'), (2, 17, None), (3, None, 'M')) ieq(expect, result) def test_recast_empty(): table = (('foo', 'variable', 'value'),) expect = (('foo',),) actual = recast(table) ieq(expect, actual) def test_recast_headerless(): table = [] expect = [] actual = recast(table) ieq(expect, actual) def test_recast_date(): dt = datetime.now().replace table = (('id', 'variable', 'value'), (dt(hour=3), 'age', 16), (dt(hour=1), 'gender', 'F'), (dt(hour=2), 'gender', 'M'), (dt(hour=2), 'age', 17), (dt(hour=1), 'age', 12), (dt(hour=3), 'gender', 'M')) expectation = (('id', 'age', 'gender'), (dt(hour=1), 12, 'F'), (dt(hour=2), 17, 'M'), (dt(hour=3), 16, 'M')) # by default lift 'variable' field, hold everything else result = recast(table) ieq(expectation, result) result = recast(table, variablefield='variable') ieq(expectation, result) result = recast(table, key='id', variablefield='variable') ieq(expectation, result) result = recast(table, key='id', variablefield='variable', valuefield='value') ieq(expectation, result) def test_melt_and_capture(): table = (('id', 'parad0', 'parad1', 'parad2'), ('1', '12', '34', '56'), ('2', '23', '45', '67')) expectation = (('id', 'parasitaemia', 'day'), ('1', '12', '0'), ('1', '34', '1'), ('1', '56', '2'), ('2', '23', '0'), ('2', '45', '1'), ('2', '67', '2')) step1 = melt(table, key='id', valuefield='parasitaemia') step2 = capture(step1, 'variable', 'parad(\\d+)', ('day',)) ieq(expectation, step2) def test_melt_and_split(): table = (('id', 'parad0', 'parad1', 'parad2', 'tempd0', 'tempd1', 'tempd2'), ('1', '12', '34', '56', '37.2', '37.4', '37.9'), ('2', '23', '45', '67', '37.1', '37.8', '36.9')) expectation = (('id', 'value', 'variable', 'day'), ('1', '12', 'para', '0'), ('1', '34', 'para', '1'), ('1', '56', 'para', '2'), ('1', '37.2', 'temp', '0'), ('1', '37.4', 'temp', '1'), ('1', '37.9', 'temp', '2'), ('2', '23', 'para', '0'), ('2', '45', 'para', '1'), ('2', '67', 'para', '2'), ('2', '37.1', 'temp', '0'), ('2', '37.8', 'temp', '1'), ('2', '36.9', 'temp', '2')) step1 = melt(table, key='id') step2 = split(step1, 'variable', 'd', ('variable', 'day')) ieq(expectation, step2) def test_transpose(): table1 = (('id', 'colour'), (1, 'blue'), (2, 'red'), (3, 'purple'), (5, 'yellow'), (7, 'orange')) table2 = transpose(table1) expect2 = (('id', 1, 2, 3, 5, 7), ('colour', 'blue', 'red', 'purple', 'yellow', 'orange')) ieq(expect2, table2) ieq(expect2, table2) def test_transpose_empty(): table1 = (('id', 'colour'),) table2 = transpose(table1) expect2 = (('id',), ('colour',)) ieq(expect2, table2) def test_pivot(): table1 = (('region', 'gender', 'style', 'units'), ('east', 'boy', 'tee', 12), ('east', 'boy', 'golf', 14), ('east', 'boy', 'fancy', 7), ('east', 'girl', 'tee', 3), ('east', 'girl', 'golf', 8), ('east', 'girl', 'fancy', 18), ('west', 'boy', 'tee', 12), ('west', 'boy', 'golf', 15), ('west', 'boy', 'fancy', 8), ('west', 'girl', 'tee', 6), ('west', 'girl', 'golf', 16), ('west', 'girl', 'fancy', 1)) table2 = pivot(table1, 'region', 'gender', 'units', sum) expect2 = (('region', 'boy', 'girl'), ('east', 33, 29), ('west', 35, 23)) ieq(expect2, table2) ieq(expect2, table2) def test_pivot_empty(): table1 = (('region', 'gender', 'style', 'units'),) table2 = pivot(table1, 'region', 'gender', 'units', sum) expect2 = (('region',),) ieq(expect2, table2) def test_pivot_headerless(): table1 = [] with pytest.raises(FieldSelectionError): for i in pivot(table1, 'region', 'gender', 'units', sum): pass def test_flatten(): table1 = (('foo', 'bar', 'baz'), ('A', 1, True), ('C', 7, False), ('B', 2, False), ('C', 9, True)) expect1 = ('A', 1, True, 'C', 7, False, 'B', 2, False, 'C', 9, True) actual1 = flatten(table1) ieq(expect1, actual1) ieq(expect1, actual1) def test_flatten_empty(): table1 = (('foo', 'bar', 'baz'),) expect1 = [] actual1 = flatten(table1) ieq(expect1, actual1) def test_flatten_headerless(): table1 = [] expect1 = [] actual1 = flatten(table1) ieq(expect1, actual1) def test_unflatten(): table1 = (('lines',), ('A',), (1,), (True,), ('C',), (7,), (False,), ('B',), (2,), (False,), ('C',), (9,)) expect1 = (('f0', 'f1', 'f2'), ('A', 1, True), ('C', 7, False), ('B', 2, False), ('C', 9, None)) actual1 = unflatten(table1, 'lines', 3) ieq(expect1, actual1) ieq(expect1, actual1) def test_unflatten_2(): inpt = ('A', 1, True, 'C', 7, False, 'B', 2, False, 'C', 9) expect1 = (('f0', 'f1', 'f2'), ('A', 1, True), ('C', 7, False), ('B', 2, False), ('C', 9, None)) actual1 = unflatten(inpt, 3) ieq(expect1, actual1) ieq(expect1, actual1) def test_unflatten_empty(): table1 = (('lines',),) expect1 = (('f0', 'f1', 'f2'),) actual1 = unflatten(table1, 'lines', 3) ieq(expect1, actual1) petl-1.7.15/petl/test/transform/test_selects.py000066400000000000000000000221071457414240700215740ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division import pytest from petl.errors import FieldSelectionError from petl.test.helpers import ieq, eq_ from petl.comparison import Comparable from petl.transform.selects import select, selectin, selectcontains, \ rowlenselect, selectusingcontext, facet, selectgt, selectlt def test_select(): table = (('foo', 'bar', 'baz'), ('a', 4, 9.3), ('a', 2, 88.2), ('b', 1, 23.3), ('c', 8, 42.0), ('d', 7, 100.9), ('c', 2)) actual = select(table, lambda rec: rec[0] == 'a') expect = (('foo', 'bar', 'baz'), ('a', 4, 9.3), ('a', 2, 88.2)) ieq(expect, actual) ieq(expect, actual) # check can iterate twice table = (('foo', 'bar', 'baz'), ('a', 4, 9.3), ('a', 2, 88.2), ('b', 1, 23.3), ('c', 8, 42.0), ('d', 7, 100.9), ('c', 2)) actual = select(table, lambda rec: rec['foo'] == 'a') expect = (('foo', 'bar', 'baz'), ('a', 4, 9.3), ('a', 2, 88.2)) ieq(expect, actual) ieq(expect, actual) # check can iterate twice table = (('foo', 'bar', 'baz'), ('a', 4, 9.3), ('a', 2, 88.2), ('b', 1, 23.3), ('c', 8, 42.0), ('d', 7, 100.9), ('c', 2)) actual = select(table, lambda rec: rec.foo == 'a') expect = (('foo', 'bar', 'baz'), ('a', 4, 9.3), ('a', 2, 88.2)) ieq(expect, actual) ieq(expect, actual) # check can iterate twice # check select complement actual = select(table, lambda rec: rec['foo'] == 'a', complement=True) expect = (('foo', 'bar', 'baz'), ('b', 1, 23.3), ('c', 8, 42.0), ('d', 7, 100.9), ('c', 2)) ieq(expect, actual) ieq(expect, actual) # check can iterate twice actual = select(table, lambda rec: rec['foo'] == 'a' and rec['bar'] > 3) expect = (('foo', 'bar', 'baz'), ('a', 4, 9.3)) ieq(expect, actual) actual = select(table, "{foo} == 'a'") expect = (('foo', 'bar', 'baz'), ('a', 4, 9.3), ('a', 2, 88.2)) ieq(expect, actual) actual = select(table, "{foo} == 'a' and {bar} > 3") expect = (('foo', 'bar', 'baz'), ('a', 4, 9.3)) ieq(expect, actual) # check error handling on short rows actual = select(table, lambda rec: Comparable(rec['baz']) > 88.1) expect = (('foo', 'bar', 'baz'), ('a', 2, 88.2), ('d', 7, 100.9)) ieq(expect, actual) # check single field tests actual = select(table, 'foo', lambda v: v == 'a') expect = (('foo', 'bar', 'baz'), ('a', 4, 9.3), ('a', 2, 88.2)) ieq(expect, actual) ieq(expect, actual) # check can iterate twice # check select complement actual = select(table, 'foo', lambda v: v == 'a', complement=True) expect = (('foo', 'bar', 'baz'), ('b', 1, 23.3), ('c', 8, 42.0), ('d', 7, 100.9), ('c', 2)) ieq(expect, actual) ieq(expect, actual) # check can iterate twice def test_select_empty(): table = (('foo', 'bar'),) expect = (('foo', 'bar'),) actual = select(table, lambda r: r['foo'] == r['bar']) ieq(expect, actual) def test_rowselect_headerless(): table = [] expect = [] actual = select(table, 'True') ieq(expect, actual) def test_fieldselect_headerless(): table = [] with pytest.raises(FieldSelectionError): for i in select(table, 'foo', lambda v: v == 'a'): pass def test_select_falsey(): table = (('foo',), ([],), ('',)) expect = (('foo',),) actual = select(table, '{foo}') ieq(expect, actual) def test_selectgt(): table = (('foo', 'bar', 'baz'), ('a', 4, 9.3), ('a', 2, 88.2), ('b', 1, None), ('c', 8, 42.0), ('d', 7, 100.9), ('c', 2)) actual = selectgt(table, 'baz', 50) expect = (('foo', 'bar', 'baz'), ('a', 2, 88.2), ('d', 7, 100.9)) ieq(expect, actual) ieq(expect, actual) def test_selectlt(): table = (('foo', 'bar', 'baz'), ('a', 4, 9.3), ('a', 2, 88.2), ('b', 1, None), ('c', 8, 42.0), ('d', 7, 100.9), ('c', 2)) actual = selectlt(table, 'baz', 50) expect = (('foo', 'bar', 'baz'), ('a', 4, 9.3), ('b', 1, None), ('c', 8, 42.0), ('c', 2)) ieq(expect, actual) ieq(expect, actual) def test_selectin(): table = (('foo', 'bar', 'baz'), ('a', 4, 9.3), ('a', 2, 88.2), ('b', 1, 23.3), ('c', 8, 42.0), ('d', 7, 100.9), ('c', 2)) actual = selectin(table, 'foo', ['a', 'x', 'y']) expect = (('foo', 'bar', 'baz'), ('a', 4, 9.3), ('a', 2, 88.2)) ieq(expect, actual) ieq(expect, actual) # check can iterate twice def test_selectcontains(): table = (('foo', 'bar', 'baz'), ('aaa', 4, 9.3), ('aa', 2, 88.2), ('bab', 1, 23.3), ('c', 8, 42.0), ('d', 7, 100.9), ('c', 2)) actual = selectcontains(table, 'foo', 'a') expect = (('foo', 'bar', 'baz'), ('aaa', 4, 9.3), ('aa', 2, 88.2), ('bab', 1, 23.3)) ieq(expect, actual) ieq(expect, actual) # check can iterate twice def test_rowselect(): table = (('foo', 'bar', 'baz'), ('a', 4, 9.3), ('a', 2, 88.2), ('b', 1, 23.3), ('c', 8, 42.0), ('d', 7, 100.9), ('c', 2)) actual = select(table, lambda row: row[0] == 'a') expect = (('foo', 'bar', 'baz'), ('a', 4, 9.3), ('a', 2, 88.2)) ieq(expect, actual) ieq(expect, actual) # check can iterate twice def test_rowlenselect(): table = (('foo', 'bar', 'baz'), ('a', 4, 9.3), ('a', 2, 88.2), ('b', 1, 23.3), ('c', 8, 42.0), ('d', 7, 100.9), ('c', 2)) actual = rowlenselect(table, 3) expect = (('foo', 'bar', 'baz'), ('a', 4, 9.3), ('a', 2, 88.2), ('b', 1, 23.3), ('c', 8, 42.0), ('d', 7, 100.9)) ieq(expect, actual) ieq(expect, actual) # check can iterate twice def test_recordselect(): table = (('foo', 'bar', 'baz'), ('a', 4, 9.3), ('a', 2, 88.2), ('b', 1, 23.3), ('c', 8, 42.0), ('d', 7, 100.9), ('c', 2)) actual = select(table, lambda rec: rec['foo'] == 'a') expect = (('foo', 'bar', 'baz'), ('a', 4, 9.3), ('a', 2, 88.2)) ieq(expect, actual) ieq(expect, actual) # check can iterate twice def test_selectusingcontext(): table1 = (('foo', 'bar'), ('A', 1), ('B', 4), ('C', 5), ('D', 9)) expect = (('foo', 'bar'), ('B', 4), ('C', 5)) def query(prv, cur, nxt): return ((prv is not None and (cur.bar - prv.bar) < 2) or (nxt is not None and (nxt.bar - cur.bar) < 2)) actual = selectusingcontext(table1, query) ieq(expect, actual) ieq(expect, actual) def test_facet(): table = (('foo', 'bar', 'baz'), ('a', 4, 9.3), ('a', 2, 88.2), ('b', 1, 23.3), ('c', 8, 42.0), ('d', 7, 100.9), ('c', 2)) fct = facet(table, 'foo') assert set(fct.keys()) == {'a', 'b', 'c', 'd'} expect_fcta = (('foo', 'bar', 'baz'), ('a', 4, 9.3), ('a', 2, 88.2)) ieq(fct['a'], expect_fcta) ieq(fct['a'], expect_fcta) # check can iterate twice expect_fctc = (('foo', 'bar', 'baz'), ('c', 8, 42.0), ('c', 2)) ieq(fct['c'], expect_fctc) ieq(fct['c'], expect_fctc) # check can iterate twice def test_facet_2(): table = (('foo', 'bar', 'baz'), ('aa', 4, 9.3), ('aa', 2, 88.2), ('bb', 1, 23.3), ('cc', 8, 42.0), ('dd', 7, 100.9), ('cc', 2)) fct = facet(table, 'foo') assert set(fct.keys()) == {'aa', 'bb', 'cc', 'dd'} expect_fcta = (('foo', 'bar', 'baz'), ('aa', 4, 9.3), ('aa', 2, 88.2)) ieq(fct['aa'], expect_fcta) ieq(fct['aa'], expect_fcta) # check can iterate twice expect_fctc = (('foo', 'bar', 'baz'), ('cc', 8, 42.0), ('cc', 2)) ieq(fct['cc'], expect_fctc) ieq(fct['cc'], expect_fctc) # check can iterate twice def test_facet_empty(): table = (('foo', 'bar'),) actual = facet(table, 'foo') eq_(list(), list(actual.keys())) petl-1.7.15/petl/test/transform/test_setops.py000066400000000000000000000235671457414240700214620ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division from datetime import datetime from petl.test.helpers import ieq from petl.transform.setops import complement, intersection, diff, \ recordcomplement, recorddiff, hashcomplement, hashintersection def _test_complement_1(complement_impl): table1 = (('foo', 'bar'), ('A', 1), ('B', 2), ('C', 7)) table2 = (('foo', 'bar'), ('A', 9), ('B', 2), ('B', 3)) expectation = (('foo', 'bar'), ('A', 1), ('C', 7)) result = complement_impl(table1, table2) ieq(expectation, result) def _test_complement_2(complement_impl): tablea = (('foo', 'bar', 'baz'), ('A', 1, True), ('C', 7, False), ('B', 2, False), ('C', 9, True)) tableb = (('x', 'y', 'z'), ('B', 2, False), ('A', 9, False), ('B', 3, True), ('C', 9, True)) aminusb = (('foo', 'bar', 'baz'), ('A', 1, True), ('C', 7, False)) result = complement_impl(tablea, tableb) ieq(aminusb, result) bminusa = (('x', 'y', 'z'), ('A', 9, False), ('B', 3, True)) result = complement_impl(tableb, tablea) ieq(bminusa, result) def _test_complement_3(complement_impl): # make sure we deal with empty tables table1 = (('foo', 'bar'), ('A', 1), ('B', 2)) table2 = (('foo', 'bar'),) expectation = (('foo', 'bar'), ('A', 1), ('B', 2)) result = complement_impl(table1, table2) ieq(expectation, result) ieq(expectation, result) expectation = (('foo', 'bar'),) result = complement_impl(table2, table1) ieq(expectation, result) ieq(expectation, result) def _test_complement_4(complement_impl): # test behaviour with duplicate rows table1 = (('foo', 'bar'), ('A', 1), ('B', 2), ('B', 2), ('C', 7)) table2 = (('foo', 'bar'), ('B', 2)) result = complement_impl(table1, table2) expectation = (('foo', 'bar'), ('A', 1), ('B', 2), ('C', 7)) ieq(expectation, result) ieq(expectation, result) # strict behaviour result = complement_impl(table1, table2, strict=True) expectation = (('foo', 'bar'), ('A', 1), ('C', 7)) ieq(expectation, result) ieq(expectation, result) def _test_complement_none(complement_impl): # test behaviour with unsortable types now = datetime.now() ta = [['a', 'b'], [None, None]] tb = [['a', 'b'], [None, now]] expectation = (('a', 'b'), (None, None)) result = complement_impl(ta, tb) ieq(expectation, result) ta = [['a'], [now], [None]] tb = [['a'], [None], [None]] expectation = (('a',), (now,)) result = complement_impl(ta, tb) ieq(expectation, result) def _test_complement(f): _test_complement_1(f) _test_complement_2(f) _test_complement_3(f) _test_complement_4(f) _test_complement_none(f) def test_complement(): _test_complement(complement) def test_complement_seqtypes(): # test complement isn't confused by list vs tuple ta = [['a', 'b'], ['A', 1], ['B', 2]] tb = [('a', 'b'), ('A', 1), ('B', 2)] expectation = (('a', 'b'),) actual = complement(ta, tb, presorted=True) ieq(expectation, actual) def test_hashcomplement_seqtypes(): # test complement isn't confused by list vs tuple ta = [['a', 'b'], ['A', 1], ['B', 2]] tb = [('a', 'b'), ('A', 1), ('B', 2)] expectation = (('a', 'b'),) actual = hashcomplement(ta, tb) ieq(expectation, actual) def test_diff(): tablea = (('foo', 'bar', 'baz'), ('A', 1, True), ('C', 7, False), ('B', 2, False), ('C', 9, True)) tableb = (('x', 'y', 'z'), ('B', 2, False), ('A', 9, False), ('B', 3, True), ('C', 9, True)) aminusb = (('foo', 'bar', 'baz'), ('A', 1, True), ('C', 7, False)) bminusa = (('x', 'y', 'z'), ('A', 9, False), ('B', 3, True)) added, subtracted = diff(tablea, tableb) ieq(bminusa, added) ieq(aminusb, subtracted) def test_recordcomplement_1(): table1 = (('foo', 'bar'), ('A', 1), ('B', 2), ('C', 7)) table2 = (('bar', 'foo'), (9, 'A'), (2, 'B'), (3, 'B')) expectation = (('foo', 'bar'), ('A', 1), ('C', 7)) result = recordcomplement(table1, table2) ieq(expectation, result) def test_recordcomplement_2(): tablea = (('foo', 'bar', 'baz'), ('A', 1, True), ('C', 7, False), ('B', 2, False), ('C', 9, True)) tableb = (('bar', 'foo', 'baz'), (2, 'B', False), (9, 'A', False), (3, 'B', True), (9, 'C', True)) aminusb = (('foo', 'bar', 'baz'), ('A', 1, True), ('C', 7, False)) result = recordcomplement(tablea, tableb) ieq(aminusb, result) bminusa = (('bar', 'foo', 'baz'), (3, 'B', True), (9, 'A', False)) result = recordcomplement(tableb, tablea) ieq(bminusa, result) def test_recordcomplement_3(): # make sure we deal with empty tables table1 = (('foo', 'bar'), ('A', 1), ('B', 2)) table2 = (('bar', 'foo'),) expectation = (('foo', 'bar'), ('A', 1), ('B', 2)) result = recordcomplement(table1, table2) ieq(expectation, result) ieq(expectation, result) expectation = (('bar', 'foo'),) result = recordcomplement(table2, table1) ieq(expectation, result) ieq(expectation, result) def test_recordcomplement_4(): # test behaviour with duplicate rows table1 = (('foo', 'bar'), ('A', 1), ('B', 2), ('B', 2), ('C', 7)) table2 = (('bar', 'foo'), (2, 'B')) result = recordcomplement(table1, table2) expectation = (('foo', 'bar'), ('A', 1), ('B', 2), ('C', 7)) ieq(expectation, result) ieq(expectation, result) # strict behaviour result = recordcomplement(table1, table2, strict=True) expectation = (('foo', 'bar'), ('A', 1), ('C', 7)) ieq(expectation, result) ieq(expectation, result) def test_recorddiff(): tablea = (('foo', 'bar', 'baz'), ('A', 1, True), ('C', 7, False), ('B', 2, False), ('C', 9, True)) tableb = (('bar', 'foo', 'baz'), (2, 'B', False), (9, 'A', False), (3, 'B', True), (9, 'C', True)) aminusb = (('foo', 'bar', 'baz'), ('A', 1, True), ('C', 7, False)) bminusa = (('bar', 'foo', 'baz'), (3, 'B', True), (9, 'A', False)) added, subtracted = recorddiff(tablea, tableb) ieq(aminusb, subtracted) ieq(bminusa, added) def _test_intersection_1(intersection_impl): table1 = (('foo', 'bar'), ('A', 1), ('B', 2), ('C', 7)) table2 = (('foo', 'bar'), ('A', 9), ('B', 2), ('B', 3)) expectation = (('foo', 'bar'), ('B', 2)) result = intersection_impl(table1, table2) ieq(expectation, result) def _test_intersection_2(intersection_impl): table1 = (('foo', 'bar', 'baz'), ('A', 1, True), ('C', 7, False), ('B', 2, False), ('C', 9, True)) table2 = (('x', 'y', 'z'), ('B', 2, False), ('A', 9, False), ('B', 3, True), ('C', 9, True)) expect = (('foo', 'bar', 'baz'), ('B', 2, False), ('C', 9, True)) table3 = intersection_impl(table1, table2) ieq(expect, table3) def _test_intersection_3(intersection_impl): # empty table table1 = (('foo', 'bar'), ('A', 1), ('B', 2), ('C', 7)) table2 = (('foo', 'bar'),) expectation = (('foo', 'bar'),) result = intersection_impl(table1, table2) ieq(expectation, result) ieq(expectation, result) def _test_intersection_4(intersection_impl): # duplicate rows table1 = (('foo', 'bar'), ('A', 1), ('B', 2), ('B', 2), ('B', 2), ('C', 7)) table2 = (('foo', 'bar'), ('A', 9), ('B', 2), ('B', 2), ('B', 3)) expectation = (('foo', 'bar'), ('B', 2), ('B', 2)) result = intersection_impl(table1, table2) ieq(expectation, result) ieq(expectation, result) def _test_intersection_empty(intersection_impl): table1 = (('foo', 'bar'), ('A', 1), ('B', 2), ('C', 7)) table2 = (('foo', 'bar'),) expectation = (('foo', 'bar'),) result = intersection_impl(table1, table2) ieq(expectation, result) def _test_intersection(intersection_impl): _test_intersection_1(intersection_impl) _test_intersection_2(intersection_impl) _test_intersection_3(intersection_impl) _test_intersection_4(intersection_impl) _test_intersection_empty(intersection_impl) def test_intersection(): _test_intersection(intersection) def test_hashcomplement(): _test_complement(hashcomplement) def test_hashintersection(): _test_intersection(hashintersection) petl-1.7.15/petl/test/transform/test_sorts.py000066400000000000000000000335661457414240700213170ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division import os import gc import logging from datetime import datetime import platform import pytest from petl.compat import next from petl.errors import FieldSelectionError from petl.test.helpers import ieq, eq_ from petl.util import nrows from petl.transform.basics import cat from petl.transform.sorts import sort, mergesort, issorted logger = logging.getLogger(__name__) debug = logger.debug def test_sort_1(): table = (('foo', 'bar'), ('C', '2'), ('A', '9'), ('A', '6'), ('F', '1'), ('D', '10')) result = sort(table, 'foo') expectation = (('foo', 'bar'), ('A', '9'), ('A', '6'), ('C', '2'), ('D', '10'), ('F', '1')) ieq(expectation, result) def test_sort_2(): table = (('foo', 'bar'), ('C', '2'), ('A', '9'), ('A', '6'), ('F', '1'), ('D', '10')) result = sort(table, key=('foo', 'bar')) expectation = (('foo', 'bar'), ('A', '6'), ('A', '9'), ('C', '2'), ('D', '10'), ('F', '1')) ieq(expectation, result) result = sort(table) # default is lexical sort expectation = (('foo', 'bar'), ('A', '6'), ('A', '9'), ('C', '2'), ('D', '10'), ('F', '1')) ieq(expectation, result) def test_sort_3(): table = (('foo', 'bar'), ('C', '2'), ('A', '9'), ('A', '6'), ('F', '1'), ('D', '10')) result = sort(table, 'bar') expectation = (('foo', 'bar'), ('F', '1'), ('D', '10'), ('C', '2'), ('A', '6'), ('A', '9')) ieq(expectation, result) def test_sort_4(): table = (('foo', 'bar'), ('C', 2), ('A', 9), ('A', 6), ('F', 1), ('D', 10)) result = sort(table, 'bar') expectation = (('foo', 'bar'), ('F', 1), ('C', 2), ('A', 6), ('A', 9), ('D', 10)) ieq(expectation, result) def test_sort_5(): table = (('foo', 'bar'), (2.3, 2), (1.2, 9), (2.3, 6), (3.2, 1), (1.2, 10)) expectation = (('foo', 'bar'), (1.2, 9), (1.2, 10), (2.3, 2), (2.3, 6), (3.2, 1)) # can use either field names or indices (from 1) to specify sort key result = sort(table, key=('foo', 'bar')) ieq(expectation, result) result = sort(table, key=(0, 1)) ieq(expectation, result) result = sort(table, key=('foo', 1)) ieq(expectation, result) result = sort(table, key=(0, 'bar')) ieq(expectation, result) def test_sort_6(): table = (('foo', 'bar'), (2.3, 2), (1.2, 9), (2.3, 6), (3.2, 1), (1.2, 10)) expectation = (('foo', 'bar'), (3.2, 1), (2.3, 6), (2.3, 2), (1.2, 10), (1.2, 9)) result = sort(table, key=('foo', 'bar'), reverse=True) ieq(expectation, result) def test_sort_buffered(): table = (('foo', 'bar'), ('C', 2), ('A', 9), ('A', 6), ('F', 1), ('D', 10)) # test sort forwards expectation = (('foo', 'bar'), ('F', 1), ('C', 2), ('A', 6), ('A', 9), ('D', 10)) result = sort(table, 'bar') ieq(expectation, result) result = sort(table, 'bar', buffersize=2) ieq(expectation, result) # sort in reverse expectation = (('foo', 'bar'), ('D', 10), ('A', 9), ('A', 6), ('C', 2), ('F', 1)) result = sort(table, 'bar', reverse=True) ieq(expectation, result) result = sort(table, 'bar', reverse=True, buffersize=2) ieq(expectation, result) # no key expectation = (('foo', 'bar'), ('F', 1), ('D', 10), ('C', 2), ('A', 9), ('A', 6)) result = sort(table, reverse=True) ieq(expectation, result) result = sort(table, reverse=True, buffersize=2) ieq(expectation, result) def test_sort_buffered_tempdir(): table = (('foo', 'bar'), ('C', 2), ('A', 9), ('A', 6), ('F', 1), ('D', 10)) # test sort forwards expectation = (('foo', 'bar'), ('F', 1), ('C', 2), ('A', 6), ('A', 9), ('D', 10)) result = sort(table, 'bar') ieq(expectation, result) tempdir = 'tmp' if not os.path.exists(tempdir): os.mkdir(tempdir) result = sort(table, 'bar', buffersize=2, tempdir=tempdir) ieq(expectation, result) def test_sort_buffered_independent(): table = (('foo', 'bar'), ('C', 2), ('A', 9), ('A', 6), ('F', 1), ('D', 10)) expectation = (('foo', 'bar'), ('F', 1), ('C', 2), ('A', 6), ('A', 9), ('D', 10)) result = sort(table, 'bar', buffersize=4) nrows(result) # cause data to be cached # check that two row iterators are independent, i.e., consuming rows # from one does not affect the other it1 = iter(result) it2 = iter(result) eq_(expectation[0], next(it1)) eq_(expectation[1], next(it1)) eq_(expectation[0], next(it2)) eq_(expectation[1], next(it2)) eq_(expectation[2], next(it2)) eq_(expectation[2], next(it1)) def _get_names(l): return [x.name for x in l] def test_sort_buffered_cleanup(): table = (('foo', 'bar'), ('C', 2), ('A', 9), ('A', 6), ('F', 1), ('D', 10)) result = sort(table, 'bar', buffersize=2) debug('initially filecache should be empty') eq_(None, result._filecache) debug('pull rows through, should populate file cache') eq_(5, nrows(result)) eq_(3, len(result._filecache)) debug('check all files exist') filenames = _get_names(result._filecache) for fn in filenames: assert os.path.exists(fn), fn debug('delete object and garbage collect') del result gc.collect() debug('check all files have been deleted') for fn in filenames: assert not os.path.exists(fn), fn @pytest.mark.skipif(platform.python_implementation() == 'PyPy', reason='SKIP sort cleanup test (PyPy)') def test_sort_buffered_cleanup_open_iterator(): table = (('foo', 'bar'), ('C', 2), ('A', 9), ('A', 6), ('F', 1), ('D', 10)) # check if cleanup is robust against open iterators result = sort(table, 'bar', buffersize=2) debug('pull rows through, should populate file cache') eq_(5, nrows(result)) eq_(3, len(result._filecache)) debug('check all files exist') filenames = _get_names(result._filecache) for fn in filenames: assert os.path.exists(fn), fn debug(filenames) debug('open an iterator') it = iter(result) next(it) next(it) debug('delete objects and garbage collect') del result del it gc.collect() for fn in filenames: assert not os.path.exists(fn), fn def test_sort_empty(): table = (('foo', 'bar'),) expect = (('foo', 'bar'),) actual = sort(table) ieq(expect, actual) def test_sort_none(): table = (('foo', 'bar'), ('C', 2), ('A', 9), ('A', None), ('F', 1), ('D', 10)) result = sort(table, 'bar') print(list(result)) expectation = (('foo', 'bar'), ('A', None), ('F', 1), ('C', 2), ('A', 9), ('D', 10)) ieq(expectation, result) dt = datetime.now().replace table = (('foo', 'bar'), ('C', dt(hour=5)), ('A', dt(hour=1)), ('A', None), ('F', dt(hour=9)), ('D', dt(hour=17))) result = sort(table, 'bar') expectation = (('foo', 'bar'), ('A', None), ('A', dt(hour=1)), ('C', dt(hour=5)), ('F', dt(hour=9)), ('D', dt(hour=17))) ieq(expectation, result) def test_sort_headerless_no_keys(): """ Sorting a headerless table without specifying cols should be a no-op. """ table = [] result = sort(table) expectation = [] ieq(expectation, result) def test_sort_headerless_explicit(): """ But if you specify keys, they must exist. """ table = [] with pytest.raises(FieldSelectionError): for i in sort(table, 'foo'): pass # TODO test sort with native comparison def test_mergesort_1(): table1 = (('foo', 'bar'), ('A', 6), ('C', 2), ('D', 10), ('A', 9), ('F', 1)) table2 = (('foo', 'bar'), ('B', 3), ('D', 10), ('A', 10), ('F', 4)) # should be same as concatenate then sort (but more efficient, esp. when # presorted) expect = sort(cat(table1, table2)) actual = mergesort(table1, table2) ieq(expect, actual) ieq(expect, actual) actual = mergesort(sort(table1), sort(table2), presorted=True) ieq(expect, actual) ieq(expect, actual) def test_mergesort_2(): table1 = (('foo', 'bar'), ('A', 9), ('C', 2), ('D', 10), ('A', 6), ('F', 1)) table2 = (('foo', 'baz'), ('B', 3), ('D', 10), ('A', 10), ('F', 4)) # should be same as concatenate then sort (but more efficient, esp. when # presorted) expect = sort(cat(table1, table2), key='foo') actual = mergesort(table1, table2, key='foo') ieq(expect, actual) ieq(expect, actual) actual = mergesort(sort(table1, key='foo'), sort(table2, key='foo'), key='foo', presorted=True) ieq(expect, actual) ieq(expect, actual) def test_mergesort_3(): table1 = (('foo', 'bar'), ('A', 9), ('C', 2), ('D', 10), ('A', 6), ('F', 1)) table2 = (('foo', 'baz'), ('B', 3), ('D', 10), ('A', 10), ('F', 4)) # should be same as concatenate then sort (but more efficient, esp. when # presorted) expect = sort(cat(table1, table2), key='foo', reverse=True) actual = mergesort(table1, table2, key='foo', reverse=True) ieq(expect, actual) ieq(expect, actual) actual = mergesort(sort(table1, key='foo', reverse=True), sort(table2, key='foo', reverse=True), key='foo', reverse=True, presorted=True) ieq(expect, actual) ieq(expect, actual) def test_mergesort_4(): table1 = (('foo', 'bar', 'baz'), (1, 'A', True), (2, 'B', None), (4, 'C', True)) table2 = (('bar', 'baz', 'quux'), ('A', True, 42.0), ('B', False, 79.3), ('C', False, 12.4)) expect = sort(cat(table1, table2), key='bar') actual = mergesort(table1, table2, key='bar') ieq(expect, actual) ieq(expect, actual) def test_mergesort_empty(): table1 = (('foo', 'bar'), ('A', 9), ('C', 2), ('D', 10), ('F', 1)) table2 = (('foo', 'bar'),) expect = table1 actual = mergesort(table1, table2, key='foo') ieq(expect, actual) ieq(expect, actual) def test_issorted(): table1 = (('foo', 'bar', 'baz'), ('a', 1, True), ('b', 3, True), ('b', 2)) assert issorted(table1, key='foo') assert not issorted(table1, key='foo', reverse=True) assert not issorted(table1, key='foo', strict=True) table2 = (('foo', 'bar', 'baz'), ('b', 2, True), ('a', 1, True), ('b', 3)) assert not issorted(table2, key='foo') table3 = (('foo', 'bar', 'baz'), ('a', 1, True), ('b', 2, True), ('b', 3)) assert issorted(table3, key=('foo', 'bar')) assert issorted(table3) table4 = (('foo', 'bar', 'baz'), ('a', 1, True), ('b', 3, True), ('b', 2)) assert not issorted(table4, key=('foo', 'bar')) assert not issorted(table4) table5 = (('foo', 'bar', 'baz'), ('b', 3, True), ('b', 2), ('a', 1, True)) assert not issorted(table5, key='foo') assert issorted(table5, key='foo', reverse=True) assert not issorted(table5, key='foo', reverse=True, strict=True) def test_sort_missing_cell_numeric(): """ Sorting table with missing values raises IndexError #385 """ tbl = (('a', 'b'), ('4',), ('2', '1'), ('1',)) expect = (('a', 'b'), ('1',), ('2', '1'), ('4',)) tbl_sorted = sort(tbl) ieq(expect, tbl_sorted) def test_sort_missing_cell_text(): """ Sorting table with missing values raises IndexError #385 """ tbl = (('a', 'b', 'c'), ('C',), ('A', '4', '5')) expect = (('a', 'b', 'c'), ('A', '4', '5'), ('C',)) tbl_sorted = sort(tbl) ieq(expect, tbl_sorted) petl-1.7.15/petl/test/transform/test_unpacks.py000066400000000000000000000101411457414240700215710ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division import pytest from petl.errors import ArgumentError from petl.test.helpers import ieq from petl.transform.unpacks import unpack, unpackdict def test_unpack(): table1 = (('foo', 'bar'), (1, ['a', 'b']), (2, ['c', 'd']), (3, ['e', 'f'])) table2 = unpack(table1, 'bar', ['baz', 'quux']) expect2 = (('foo', 'baz', 'quux'), (1, 'a', 'b'), (2, 'c', 'd'), (3, 'e', 'f')) ieq(expect2, table2) ieq(expect2, table2) # check twice # check no new fields table3 = unpack(table1, 'bar') expect3 = (('foo',), (1,), (2,), (3,)) ieq(expect3, table3) # check more values than new fields table4 = unpack(table1, 'bar', ['baz']) expect4 = (('foo', 'baz'), (1, 'a'), (2, 'c'), (3, 'e')) ieq(expect4, table4) # check include original table5 = unpack(table1, 'bar', ['baz'], include_original=True) expect5 = (('foo', 'bar', 'baz'), (1, ['a', 'b'], 'a'), (2, ['c', 'd'], 'c'), (3, ['e', 'f'], 'e')) ieq(expect5, table5) # check specify number to unpack table6 = unpack(table1, 'bar', 3) expect6 = (('foo', 'bar1', 'bar2', 'bar3'), (1, 'a', 'b', None), (2, 'c', 'd', None), (3, 'e', 'f', None)) ieq(expect6, table6) # check specify number to unpack, non-default missing value table7 = unpack(table1, 'bar', 3, missing='NA') expect7 = (('foo', 'bar1', 'bar2', 'bar3'), (1, 'a', 'b', 'NA'), (2, 'c', 'd', 'NA'), (3, 'e', 'f', 'NA')) ieq(expect7, table7) # check can use field index table8 = unpack(table1, 1, 3) expect8 = (('foo', 'bar1', 'bar2', 'bar3'), (1, 'a', 'b', None), (2, 'c', 'd', None), (3, 'e', 'f', None)) ieq(expect8, table8) def test_unpack_empty(): table1 = (('foo', 'bar'),) table2 = unpack(table1, 'bar', ['baz', 'quux']) expect2 = (('foo', 'baz', 'quux'),) ieq(expect2, table2) def test_unpack_headerless(): table = [] with pytest.raises(ArgumentError): for i in unpack(table, 'bar', ['baz', 'quux']): pass def test_unpackdict(): table1 = (('foo', 'bar'), (1, {'baz': 'a', 'quux': 'b'}), (2, {'baz': 'c', 'quux': 'd'}), (3, {'baz': 'e', 'quux': 'f'})) table2 = unpackdict(table1, 'bar') expect2 = (('foo', 'baz', 'quux'), (1, 'a', 'b'), (2, 'c', 'd'), (3, 'e', 'f')) ieq(expect2, table2) ieq(expect2, table2) # check twice # test include original table1 = (('foo', 'bar'), (1, {'baz': 'a', 'quux': 'b'}), (2, {'baz': 'c', 'quux': 'd'}), (3, {'baz': 'e', 'quux': 'f'})) table2 = unpackdict(table1, 'bar', includeoriginal=True) expect2 = (('foo', 'bar', 'baz', 'quux'), (1, {'baz': 'a', 'quux': 'b'}, 'a', 'b'), (2, {'baz': 'c', 'quux': 'd'}, 'c', 'd'), (3, {'baz': 'e', 'quux': 'f'}, 'e', 'f')) ieq(expect2, table2) ieq(expect2, table2) # check twice # test specify keys table1 = (('foo', 'bar'), (1, {'baz': 'a', 'quux': 'b'}), (2, {'baz': 'c', 'quux': 'd'}), (3, {'baz': 'e', 'quux': 'f'})) table2 = unpackdict(table1, 'bar', keys=['quux']) expect2 = (('foo', 'quux'), (1, 'b'), (2, 'd'), (3, 'f')) ieq(expect2, table2) ieq(expect2, table2) # check twice # test dodgy data table1 = (('foo', 'bar'), (1, {'baz': 'a', 'quux': 'b'}), (2, 'foobar'), (3, {'baz': 'e', 'quux': 'f'})) table2 = unpackdict(table1, 'bar') expect2 = (('foo', 'baz', 'quux'), (1, 'a', 'b'), (2, None, None), (3, 'e', 'f')) ieq(expect2, table2) ieq(expect2, table2) # check twice petl-1.7.15/petl/test/transform/test_validation.py000066400000000000000000000074421457414240700222710ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division import logging import pytest import petl as etl from petl.transform.validation import validate from petl.test.helpers import ieq from petl.errors import FieldSelectionError logger = logging.getLogger(__name__) debug = logger.debug def test_constraints(): constraints = [ dict(name='C1', field='foo', test=int), dict(name='C2', field='bar', test=etl.dateparser('%Y-%m-%d')), dict(name='C3', field='baz', assertion=lambda v: v in ['Y', 'N']), dict(name='C4', assertion=lambda row: None not in row) ] table = (('foo', 'bar', 'baz'), (1, '2000-01-01', 'Y'), ('x', '2010-10-10', 'N'), (2, '2000/01/01', 'Y'), (3, '2015-12-12', 'x'), (4, None, 'N'), ('y', '1999-99-99', 'z')) expect = (('name', 'row', 'field', 'value', 'error'), ('C1', 2, 'foo', 'x', 'ValueError'), ('C2', 3, 'bar', '2000/01/01', 'ValueError'), ('C3', 4, 'baz', 'x', 'AssertionError'), ('C2', 5, 'bar', None, 'AttributeError'), ('C4', 5, None, None, 'AssertionError'), ('C1', 6, 'foo', 'y', 'ValueError'), ('C2', 6, 'bar', '1999-99-99', 'ValueError'), ('C3', 6, 'baz', 'z', 'AssertionError')) actual = validate(table, constraints) debug(actual) ieq(expect, actual) ieq(expect, actual) def test_non_optional_constraint_with_missing_field(): constraints = [ dict(name='C1', field='foo', test=int), ] table = (('bar', 'baz'), ('1999-99-99', 'z')) actual = validate(table, constraints) with pytest.raises(FieldSelectionError): debug(actual) def test_optional_constraint_with_missing_field(): constraints = [ dict(name='C1', field='foo', test=int, optional=True), ] table = (('bar', 'baz'), ('1999-99-99', 'z')) expect = (('name', 'row', 'field', 'value', 'error'),) actual = validate(table, constraints) debug(actual) ieq(expect, actual) def test_row_length(): table = (('foo', 'bar', 'baz'), (1, '2000-01-01', 'Y'), ('x', '2010-10-10'), (2, '2000/01/01', 'Y', True)) expect = (('name', 'row', 'field', 'value', 'error'), ('__len__', 2, None, 2, 'AssertionError'), ('__len__', 3, None, 4, 'AssertionError')) actual = validate(table) debug(actual) ieq(expect, actual) ieq(expect, actual) def test_header(): header = ('foo', 'bar', 'baz') table = (('foo', 'bar', 'bazzz'), (1, '2000-01-01', 'Y'), ('x', '2010-10-10', 'N')) expect = (('name', 'row', 'field', 'value', 'error'), ('__header__', 0, None, None, 'AssertionError')) actual = validate(table, header=header) debug(actual) ieq(expect, actual) ieq(expect, actual) header = ('foo', 'bar', 'baz', 'quux') table = (('foo', 'bar', 'baz'), (1, '2000-01-01', 'Y'), ('x', '2010-10-10', 'N')) expect = (('name', 'row', 'field', 'value', 'error'), ('__header__', 0, None, None, 'AssertionError'), ('__len__', 1, None, 3, 'AssertionError'), ('__len__', 2, None, 3, 'AssertionError')) actual = validate(table, header=header) debug(actual) ieq(expect, actual) ieq(expect, actual) def test_validation_headerless(): header = ('foo', 'bar', 'baz') table = [] # Expect only a missing header - no exceptions please expect = (('name', 'row', 'field', 'value', 'error'), ('__header__', 0, None, None, 'AssertionError')) actual = validate(table, header=header) ieq(expect, actual) ieq(expect, actual) petl-1.7.15/petl/test/util/000077500000000000000000000000001457414240700154615ustar00rootroot00000000000000petl-1.7.15/petl/test/util/__init__.py000066400000000000000000000001001457414240700175610ustar00rootroot00000000000000from __future__ import absolute_import, print_function, divisionpetl-1.7.15/petl/test/util/test_base.py000066400000000000000000000162341457414240700200120ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division import pytest from petl.errors import FieldSelectionError from petl.test.helpers import ieq, eq_ from petl.compat import next from petl.util.base import header, fieldnames, data, dicts, records, \ namedtuples, itervalues, values, rowgroupby def test_header(): table = (('foo', 'bar'), ('a', 1), ('b', 2)) actual = header(table) expect = ('foo', 'bar') eq_(expect, actual) table = (['foo', 'bar'], ['a', 1], ['b', 2]) actual = header(table) eq_(expect, actual) def test_fieldnames(): table = (('foo', 'bar'), ('a', 1), ('b', 2)) actual = fieldnames(table) expect = ('foo', 'bar') eq_(expect, actual) class CustomField(object): def __init__(self, key, description): self.key = key self.description = description def __str__(self): return self.key def __repr__(self): return 'CustomField(%r, %r)' % (self.key, self.description) table = ((CustomField('foo', 'Get some foo.'), CustomField('bar', 'A lot of bar.')), ('a', 1), ('b', 2)) actual = fieldnames(table) expect = ('foo', 'bar') eq_(expect, actual) def test_data(): table = (('foo', 'bar'), ('a', 1), ('b', 2)) actual = data(table) expect = (('a', 1), ('b', 2)) ieq(expect, actual) def test_data_headerless(): table = [] actual = data(table) expect = [] ieq(expect, actual) def test_dicts(): table = (('foo', 'bar'), ('a', 1), ('b', 2)) actual = dicts(table) expect = ({'foo': 'a', 'bar': 1}, {'foo': 'b', 'bar': 2}) ieq(expect, actual) def test_dicts_headerless(): table = [] actual = dicts(table) expect = [] ieq(expect, actual) def test_dicts_shortrows(): table = (('foo', 'bar'), ('a', 1), ('b',)) actual = dicts(table) expect = ({'foo': 'a', 'bar': 1}, {'foo': 'b', 'bar': None}) ieq(expect, actual) def test_records(): table = (('foo', 'bar'), ('a', 1), ('b', 2), ('c', 3)) actual = records(table) # access items it = iter(actual) o = next(it) eq_('a', o['foo']) eq_(1, o['bar']) o = next(it) eq_('b', o['foo']) eq_(2, o['bar']) # access attributes it = iter(actual) o = next(it) eq_('a', o.foo) eq_(1, o.bar) o = next(it) eq_('b', o.foo) eq_(2, o.bar) # access with get() method it = iter(actual) o = next(it) eq_('a', o.get('foo')) eq_(1, o.get('bar')) eq_(None, o.get('baz')) eq_('qux', o.get('baz', default='qux')) def test_records_headerless(): table = [] actual = records(table) expect = [] ieq(expect, actual) def test_records_errors(): table = (('foo', 'bar'), ('a', 1), ('b', 2)) actual = records(table) # access items it = iter(actual) o = next(it) try: o['baz'] except KeyError: pass else: raise Exception('expected exception not raised') try: o.baz except AttributeError: pass else: raise Exception('expected exception not raised') def test_records_unevenrows(): table = (('foo', 'bar'), ('a', 1, True), ('b',)) actual = records(table) # access items it = iter(actual) o = next(it) eq_('a', o['foo']) eq_(1, o['bar']) o = next(it) eq_('b', o['foo']) eq_(None, o['bar']) # access attributes it = iter(actual) o = next(it) eq_('a', o.foo) eq_(1, o.bar) o = next(it) eq_('b', o.foo) eq_(None, o.bar) def test_namedtuples(): table = (('foo', 'bar'), ('a', 1), ('b', 2)) actual = namedtuples(table) it = iter(actual) o = next(it) eq_('a', o.foo) eq_(1, o.bar) o = next(it) eq_('b', o.foo) eq_(2, o.bar) def test_namedtuples_headerless(): table = [] actual = namedtuples(table) expect = [] ieq(expect, actual) def test_namedtuples_unevenrows(): table = (('foo', 'bar'), ('a', 1, True), ('b',)) actual = namedtuples(table) it = iter(actual) o = next(it) eq_('a', o.foo) eq_(1, o.bar) o = next(it) eq_('b', o.foo) eq_(None, o.bar) def test_itervalues(): table = (('foo', 'bar', 'baz'), ('a', 1, True), ('b', 2), ('b', 7, False)) actual = itervalues(table, 'foo') expect = ('a', 'b', 'b') ieq(expect, actual) actual = itervalues(table, 'bar') expect = (1, 2, 7) ieq(expect, actual) actual = itervalues(table, ('foo', 'bar')) expect = (('a', 1), ('b', 2), ('b', 7)) ieq(expect, actual) actual = itervalues(table, 'baz') expect = (True, None, False) ieq(expect, actual) actual = itervalues(table, ('foo', 'baz')) expect = (('a', True), ('b', None), ('b', False)) ieq(expect, actual) def test_itervalues_headerless(): table = [] actual = itervalues(table, 'foo') with pytest.raises(FieldSelectionError): for i in actual: pass def test_values(): table = (('foo', 'bar', 'baz'), ('a', 1, True), ('b', 2), ('b', 7, False)) actual = values(table, 'foo') expect = ('a', 'b', 'b') ieq(expect, actual) ieq(expect, actual) actual = values(table, 'bar') expect = (1, 2, 7) ieq(expect, actual) ieq(expect, actual) # old style signature for multiple fields, still supported actual = values(table, ('foo', 'bar')) expect = (('a', 1), ('b', 2), ('b', 7)) ieq(expect, actual) ieq(expect, actual) # as of 0.24 new style signature for multiple fields actual = values(table, 'foo', 'bar') expect = (('a', 1), ('b', 2), ('b', 7)) ieq(expect, actual) ieq(expect, actual) actual = values(table, 'baz') expect = (True, None, False) ieq(expect, actual) ieq(expect, actual) def test_values_headerless(): table = [] actual = values(table, 'foo') with pytest.raises(FieldSelectionError): for i in actual: pass def test_rowgroupby(): table = (('foo', 'bar', 'baz'), ('a', 1, True), ('b', 2, True), ('b', 3)) # simplest form g = rowgroupby(table, 'foo') key, vals = next(g) vals = list(vals) eq_('a', key) eq_(1, len(vals)) eq_(('a', 1, True), vals[0]) key, vals = next(g) vals = list(vals) eq_('b', key) eq_(2, len(vals)) eq_(('b', 2, True), vals[0]) eq_(('b', 3), vals[1]) # specify value g = rowgroupby(table, 'foo', 'bar') key, vals = next(g) vals = list(vals) eq_('a', key) eq_(1, len(vals)) eq_(1, vals[0]) key, vals = next(g) vals = list(vals) eq_('b', key) eq_(2, len(vals)) eq_(2, vals[0]) eq_(3, vals[1]) # callable key g = rowgroupby(table, lambda r: r['foo'], lambda r: r['baz']) key, vals = next(g) vals = list(vals) eq_('a', key) eq_(1, len(vals)) eq_(True, vals[0]) key, vals = next(g) vals = list(vals) eq_('b', key) eq_(2, len(vals)) eq_(True, vals[0]) eq_(None, vals[1]) # gets padded def test_rowgroupby_headerless(): table = [] with pytest.raises(FieldSelectionError): rowgroupby(table, 'foo') petl-1.7.15/petl/test/util/test_counting.py000066400000000000000000000114171457414240700207240ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division from petl.compat import PY2 from petl.test.helpers import ieq, eq_ from petl.util.counting import valuecount, valuecounter, valuecounts, \ rowlengths, typecounts, parsecounts, stringpatterns, nrows def test_nrows(): table = (('foo', 'bar'), ('a', 1), ('b',)) actual = nrows(table) expect = 2 eq_(expect, actual) def test_valuecount(): table = (('foo', 'bar'), ('a', 1), ('b', 2), ('b', 7)) n, f = valuecount(table, 'foo', 'b') eq_(2, n) eq_(2./3, f) def test_valuecounter(): table = (('foo', 'bar'), ('a', 1), ('b', 2), ('b', 7)) actual = valuecounter(table, 'foo') expect = {'b': 2, 'a': 1} eq_(expect, actual) def test_valuecounter_shortrows(): table = (('foo', 'bar'), ('a', 7), ('b',), ('b', 7)) actual = valuecounter(table, 'foo') expect = {'b': 2, 'a': 1} eq_(expect, actual) actual = valuecounter(table, 'bar') expect = {7: 2, None: 1} eq_(expect, actual) actual = valuecounter(table, 'foo', 'bar') expect = {('a', 7): 1, ('b', None): 1, ('b', 7): 1} eq_(expect, actual) def test_valuecounts(): table = (('foo', 'bar'), ('a', 1), ('b', 2), ('b', 7)) actual = valuecounts(table, 'foo') expect = (('foo', 'count', 'frequency'), ('b', 2, 2./3), ('a', 1, 1./3)) ieq(expect, actual) ieq(expect, actual) def test_valuecounts_shortrows(): table = (('foo', 'bar'), ('a', True), ('x', True), ('b',), ('b', True), ('c', False), ('z', False)) actual = valuecounts(table, 'bar') expect = (('bar', 'count', 'frequency'), (True, 3, 3./6), (False, 2, 2./6), (None, 1, 1./6)) ieq(expect, actual) ieq(expect, actual) def test_valuecounts_multifields(): table = (('foo', 'bar', 'baz'), ('a', True, .12), ('a', True, .17), ('b', False, .34), ('b', False, .44), ('b',), ('b', False, .56)) actual = valuecounts(table, 'foo', 'bar') expect = (('foo', 'bar', 'count', 'frequency'), ('b', False, 3, 3./6), ('a', True, 2, 2./6), ('b', None, 1, 1./6)) ieq(expect, actual) ieq(expect, actual) def test_rowlengths(): table = (('foo', 'bar', 'baz'), ('A', 1, 2), ('B', '2', '3.4'), ('B', '3', '7.8', True), ('D', 'xyz', 9.0), ('E', None), ('F', 9)) actual = rowlengths(table) expect = (('length', 'count'), (3, 3), (2, 2), (4, 1)) ieq(expect, actual) def test_typecounts(): table = (('foo', 'bar', 'baz'), (b'A', 1, 2.), (b'B', u'2', 3.4), (u'B', u'3', 7.8, True), (b'D', u'xyz', 9.0), (b'E', 42)) actual = typecounts(table, 'foo') if PY2: expect = (('type', 'count', 'frequency'), ('str', 4, 4./5), ('unicode', 1, 1./5)) else: expect = (('type', 'count', 'frequency'), ('bytes', 4, 4./5), ('str', 1, 1./5)) ieq(expect, actual) actual = typecounts(table, 'bar') if PY2: expect = (('type', 'count', 'frequency'), ('unicode', 3, 3./5), ('int', 2, 2./5)) else: expect = (('type', 'count', 'frequency'), ('str', 3, 3./5), ('int', 2, 2./5)) ieq(expect, actual) actual = typecounts(table, 'baz') expect = (('type', 'count', 'frequency'), ('float', 4, 4./5), ('NoneType', 1, 1./5)) ieq(expect, actual) def test_parsecounts(): table = (('foo', 'bar', 'baz'), ('A', 'aaa', 2), ('B', '2', '3.4'), ('B', '3', '7.8', True), ('D', '3.7', 9.0), ('E', 42)) actual = parsecounts(table, 'bar') expect = (('type', 'count', 'errors'), ('float', 3, 1), ('int', 2, 2)) ieq(expect, actual) def test_stringpatterns(): table = (('foo', 'bar'), ('Mr. Foo', '123-1254'), ('Mrs. Bar', '234-1123'), ('Mr. Spo', '123-1254'), ('Mr. Baz', '321 1434'), ('Mrs. Baz', '321 1434'), ('Mr. Quux', '123-1254-XX')) actual = stringpatterns(table, 'foo') expect = (('pattern', 'count', 'frequency'), ('Aa. Aaa', 3, 3./6), ('Aaa. Aaa', 2, 2./6), ('Aa. Aaaa', 1, 1./6)) ieq(expect, actual) actual = stringpatterns(table, 'bar') expect = (('pattern', 'count', 'frequency'), ('999-9999', 3, 3./6), ('999 9999', 2, 2./6), ('999-9999-AA', 1, 1./6)) ieq(expect, actual) petl-1.7.15/petl/test/util/test_lookups.py000066400000000000000000000133171457414240700205730ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division import pytest from petl.errors import DuplicateKeyError, FieldSelectionError from petl.test.helpers import eq_ from petl import cut, lookup, lookupone, dictlookup, dictlookupone, \ recordlookup, recordlookupone def test_lookup(): t1 = (('foo', 'bar'), ('a', 1), ('b', 2), ('b', 3)) # lookup one column on another actual = lookup(t1, 'foo', 'bar') expect = {'a': [1], 'b': [2, 3]} eq_(expect, actual) # test default value - tuple of whole row actual = lookup(t1, 'foo') # no value selector expect = {'a': [('a', 1)], 'b': [('b', 2), ('b', 3)]} eq_(expect, actual) # test default value - key only actual = lookup(cut(t1, 'foo'), 'foo') expect = {'a': [('a',)], 'b': [('b',), ('b',)]} eq_(expect, actual) t2 = (('foo', 'bar', 'baz'), ('a', 1, True), ('b', 2, False), ('b', 3, True), ('b', 3, False)) # test value selection actual = lookup(t2, 'foo', ('bar', 'baz')) expect = {'a': [(1, True)], 'b': [(2, False), (3, True), (3, False)]} eq_(expect, actual) # test compound key actual = lookup(t2, ('foo', 'bar'), 'baz') expect = {('a', 1): [True], ('b', 2): [False], ('b', 3): [True, False]} eq_(expect, actual) def test_lookup_headerless(): table = [] with pytest.raises(FieldSelectionError): lookup(table, 'foo', 'bar') def test_lookupone(): t1 = (('foo', 'bar'), ('a', 1), ('b', 2), ('b', 3)) # lookup one column on another under strict mode try: lookupone(t1, 'foo', 'bar', strict=True) except DuplicateKeyError: pass # expected else: assert False, 'expected error' # lookup one column on another under, not strict actual = lookupone(t1, 'foo', 'bar', strict=False) expect = {'a': 1, 'b': 2} # first value wins eq_(expect, actual) # test default value - tuple of whole row actual = lookupone(t1, 'foo', strict=False) # no value selector expect = {'a': ('a', 1), 'b': ('b', 2)} # first wins eq_(expect, actual) # test default value - key only actual = lookupone(cut(t1, 'foo'), 'foo') expect = {'a': ('a',), 'b': ('b',)} eq_(expect, actual) t2 = (('foo', 'bar', 'baz'), ('a', 1, True), ('b', 2, False), ('b', 3, True), ('b', 3, False)) # test value selection actual = lookupone(t2, 'foo', ('bar', 'baz'), strict=False) expect = {'a': (1, True), 'b': (2, False)} eq_(expect, actual) # test compound key actual = lookupone(t2, ('foo', 'bar'), 'baz', strict=False) expect = {('a', 1): True, ('b', 2): False, ('b', 3): True} # first wins eq_(expect, actual) def test_lookupone_headerless(): table = [] with pytest.raises(FieldSelectionError): lookupone(table, 'foo', 'bar') def test_dictlookup(): t1 = (('foo', 'bar'), ('a', 1), ('b', 2), ('b', 3)) actual = dictlookup(t1, 'foo') expect = {'a': [{'foo': 'a', 'bar': 1}], 'b': [{'foo': 'b', 'bar': 2}, {'foo': 'b', 'bar': 3}]} eq_(expect, actual) # key only actual = dictlookup(cut(t1, 'foo'), 'foo') expect = {'a': [{'foo': 'a'}], 'b': [{'foo': 'b'}, {'foo': 'b'}]} eq_(expect, actual) t2 = (('foo', 'bar', 'baz'), ('a', 1, True), ('b', 2, False), ('b', 3, True), ('b', 3, False)) # test compound key actual = dictlookup(t2, ('foo', 'bar')) expect = {('a', 1): [{'foo': 'a', 'bar': 1, 'baz': True}], ('b', 2): [{'foo': 'b', 'bar': 2, 'baz': False}], ('b', 3): [{'foo': 'b', 'bar': 3, 'baz': True}, {'foo': 'b', 'bar': 3, 'baz': False}]} eq_(expect, actual) def test_dictlookupone(): t1 = (('foo', 'bar'), ('a', 1), ('b', 2), ('b', 3)) try: dictlookupone(t1, 'foo', strict=True) except DuplicateKeyError: pass # expected else: assert False, 'expected error' # relax actual = dictlookupone(t1, 'foo', strict=False) # first wins expect = {'a': {'foo': 'a', 'bar': 1}, 'b': {'foo': 'b', 'bar': 2}} eq_(expect, actual) # key only actual = dictlookupone(cut(t1, 'foo'), 'foo') expect = {'a': {'foo': 'a'}, 'b': {'foo': 'b'}} eq_(expect, actual) t2 = (('foo', 'bar', 'baz'), ('a', 1, True), ('b', 2, False), ('b', 3, True), ('b', 3, False)) # test compound key actual = dictlookupone(t2, ('foo', 'bar'), strict=False) expect = {('a', 1): {'foo': 'a', 'bar': 1, 'baz': True}, ('b', 2): {'foo': 'b', 'bar': 2, 'baz': False}, ('b', 3): {'foo': 'b', 'bar': 3, 'baz': True}} # first wins eq_(expect, actual) def test_recordlookup(): t1 = (('foo', 'bar'), ('a', 1), ('b', 2), ('b', 3)) lkp = recordlookup(t1, 'foo') eq_(['a'], [r.foo for r in lkp['a']]) eq_(['b', 'b'], [r.foo for r in lkp['b']]) eq_([1], [r.bar for r in lkp['a']]) eq_([2, 3], [r.bar for r in lkp['b']]) # key only lkp = recordlookup(cut(t1, 'foo'), 'foo') eq_(['a'], [r.foo for r in lkp['a']]) eq_(['b', 'b'], [r.foo for r in lkp['b']]) def test_recordlookupone(): t1 = (('foo', 'bar'), ('a', 1), ('b', 2), ('b', 3)) try: recordlookupone(t1, 'foo', strict=True) except DuplicateKeyError: pass # expected else: assert False, 'expected error' # relax lkp = recordlookupone(t1, 'foo', strict=False) eq_('a', lkp['a'].foo) eq_('b', lkp['b'].foo) eq_(1, lkp['a'].bar) eq_(2, lkp['b'].bar) # first wins # key only lkp = recordlookupone(cut(t1, 'foo'), 'foo', strict=False) eq_('a', lkp['a'].foo) eq_('b', lkp['b'].foo) petl-1.7.15/petl/test/util/test_materialise.py000066400000000000000000000021531457414240700213720ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division import pytest from petl.errors import FieldSelectionError from petl.test.helpers import eq_ from petl.util.materialise import columns, facetcolumns def test_columns(): table = [['foo', 'bar'], ['a', 1], ['b', 2], ['b', 3]] cols = columns(table) eq_(['a', 'b', 'b'], cols['foo']) eq_([1, 2, 3], cols['bar']) def test_columns_empty(): table = [('foo', 'bar')] cols = columns(table) eq_([], cols['foo']) eq_([], cols['bar']) def test_columns_headerless(): table = [] cols = columns(table) eq_({}, cols) def test_facetcolumns(): table = [['foo', 'bar', 'baz'], ['a', 1, True], ['b', 2, True], ['b', 3]] fc = facetcolumns(table, 'foo') eq_(['a'], fc['a']['foo']) eq_([1], fc['a']['bar']) eq_([True], fc['a']['baz']) eq_(['b', 'b'], fc['b']['foo']) eq_([2, 3], fc['b']['bar']) eq_([True, None], fc['b']['baz']) def test_facetcolumns_headerless(): table = [] with pytest.raises(FieldSelectionError): facetcolumns(table, 'foo') petl-1.7.15/petl/test/util/test_misc.py000066400000000000000000000020761457414240700200320ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division from petl.test.helpers import eq_ from petl.compat import PY2 from petl.util.misc import typeset, diffvalues, diffheaders def test_typeset(): table = (('foo', 'bar', 'baz'), (b'A', 1, u'2'), (b'B', '2', u'3.4'), (b'B', '3', u'7.8', True), (u'D', u'xyz', 9.0), (b'E', 42)) actual = typeset(table, 'foo') if PY2: expect = {'str', 'unicode'} else: expect = {'bytes', 'str'} eq_(expect, actual) def test_diffheaders(): table1 = (('foo', 'bar', 'baz'), ('a', 1, .3)) table2 = (('baz', 'bar', 'quux'), ('a', 1, .3)) add, sub = diffheaders(table1, table2) eq_({'quux'}, add) eq_({'foo'}, sub) def test_diffvalues(): table1 = (('foo', 'bar'), ('a', 1), ('b', 3)) table2 = (('bar', 'foo'), (1, 'a'), (3, 'c')) add, sub = diffvalues(table1, table2, 'foo') eq_({'c'}, add) eq_({'b'}, sub) petl-1.7.15/petl/test/util/test_parsers.py000066400000000000000000000026201457414240700205510ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division from petl.compat import maxint from petl.test.helpers import eq_ from petl.util.parsers import numparser, datetimeparser def test_numparser(): parsenumber = numparser() assert parsenumber('1') == 1 assert parsenumber('1.0') == 1.0 assert parsenumber(str(maxint + 1)) == maxint + 1 assert parsenumber('3+4j') == 3 + 4j assert parsenumber('aaa') == 'aaa' assert parsenumber(None) is None def test_numparser_strict(): parsenumber = numparser(strict=True) assert parsenumber('1') == 1 assert parsenumber('1.0') == 1.0 assert parsenumber(str(maxint + 1)) == maxint + 1 assert parsenumber('3+4j') == 3 + 4j try: parsenumber('aaa') except ValueError: pass # expected else: assert False, 'expected exception' try: parsenumber(None) except TypeError: pass # expected else: assert False, 'expected exception' def test_laxparsers(): p1 = datetimeparser('%Y-%m-%dT%H:%M:%S') try: p1('2002-12-25 00:00:00') except ValueError: pass else: assert False, 'expected exception' p2 = datetimeparser('%Y-%m-%dT%H:%M:%S', strict=False) try: v = p2('2002-12-25 00:00:00') except ValueError: assert False, 'did not expect exception' else: eq_('2002-12-25 00:00:00', v) petl-1.7.15/petl/test/util/test_random.py000066400000000000000000000045641457414240700203630ustar00rootroot00000000000000import random as pyrandom import time from functools import partial from petl.util.random import randomseed, randomtable, RandomTable, dummytable, DummyTable def test_randomseed(): """ Ensure that randomseed provides a non-empty string that changes. """ seed_1 = randomseed() time.sleep(1) seed_2 = randomseed() assert isinstance(seed_1, str) assert seed_1 != "" assert seed_1 != seed_2 def test_randomtable(): """ Ensure that randomtable provides a table with the right number of rows and columns. """ columns, rows = 3, 10 table = randomtable(columns, rows) assert len(table[0]) == columns assert len(table) == rows + 1 def test_randomtable_class(): """ Ensure that RandomTable provides a table with the right number of rows and columns. """ columns, rows = 4, 60 table = RandomTable(numflds=columns, numrows=rows) assert len(table[0]) == columns assert len(table) == rows + 1 def test_dummytable_custom_fields(): """ Ensure that dummytable provides a table with the right number of rows and that it accepts and uses custom column names provided. """ columns = ( ('count', partial(pyrandom.randint, 0, 100)), ('pet', partial(pyrandom.choice, ['dog', 'cat', 'cow', ])), ('color', partial(pyrandom.choice, ['yellow', 'orange', 'brown'])), ('value', pyrandom.random), ) rows = 35 table = dummytable(numrows=rows, fields=columns) assert table[0] == ('count', 'pet', 'color', 'value') assert len(table) == rows + 1 def test_dummytable_no_seed(): """ Ensure that dummytable provides a table with the right number of rows and columns when not provided with a seed. """ rows = 35 table = dummytable(numrows=rows) assert len(table[0]) == 3 assert len(table) == rows + 1 def test_dummytable_int_seed(): """ Ensure that dummytable provides a table with the right number of rows and columns when provided with an integer as a seed. """ rows = 35 seed = 42 table = dummytable(numrows=rows, seed=seed) assert len(table[0]) == 3 assert len(table) == rows + 1 def test_dummytable_class(): """ Ensure that DummyTable provides a table with the right number of rows and columns. """ rows = 70 table = DummyTable(numrows=rows) assert len(table) == rows + 1 petl-1.7.15/petl/test/util/test_statistics.py000066400000000000000000000011321457414240700212610ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division from petl.test.helpers import eq_ from petl.util.statistics import stats def test_stats(): table = (('foo', 'bar', 'baz'), ('A', 1, 2), ('B', '2', '3.4'), ('B', '3', '7.8', True), ('D', 'xyz', 9.0), ('E', None)) result = stats(table, 'bar') eq_(1.0, result.min) eq_(3.0, result.max) eq_(6.0, result.sum) eq_(3, result.count) eq_(2, result.errors) eq_(2.0, result.mean) eq_(2/3, result.pvariance) eq_((2/3)**.5, result.pstdev) petl-1.7.15/petl/test/util/test_timing.py000066400000000000000000000011271457414240700203620ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division from petl.util.counting import nrows from petl.util.timing import progress, log_progress def test_progress(): # make sure progress doesn't raise exception table = (('foo', 'bar', 'baz'), ('a', 1, True), ('b', 2, True), ('b', 3)) nrows(progress(table)) def test_log_progress(): # make sure log_progress doesn't raise exception table = (('foo', 'bar', 'baz'), ('a', 1, True), ('b', 2, True), ('b', 3)) nrows(log_progress(table)) petl-1.7.15/petl/test/util/test_vis.py000066400000000000000000000073701457414240700177020ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division import logging from petl.test.helpers import eq_ import petl as etl from petl.util.vis import look, see, lookstr logger = logging.getLogger(__name__) debug = logger.debug def test_look(): table = (('foo', 'bar'), ('a', 1), ('b', 2)) actual = repr(look(table)) expect = """+-----+-----+ | foo | bar | +=====+=====+ | 'a' | 1 | +-----+-----+ | 'b' | 2 | +-----+-----+ """ eq_(expect, actual) def test_look_irregular_rows(): table = (('foo', 'bar'), ('a',), ('b', 2, True)) actual = repr(look(table)) expect = """+-----+-----+------+ | foo | bar | | +=====+=====+======+ | 'a' | | | +-----+-----+------+ | 'b' | 2 | True | +-----+-----+------+ """ eq_(expect, actual) def test_look_index_header(): table = (('foo', 'bar'), ('a', 1), ('b', 2)) actual = repr(look(table, index_header=True)) expect = """+-------+-------+ | 0|foo | 1|bar | +=======+=======+ | 'a' | 1 | +-------+-------+ | 'b' | 2 | +-------+-------+ """ eq_(expect, actual) def test_look_bool(): table = (('foo', 'bar'), ('a', True), ('b', False)) actual = repr(look(table)) expect = """+-----+-------+ | foo | bar | +=====+=======+ | 'a' | True | +-----+-------+ | 'b' | False | +-----+-------+ """ eq_(expect, actual) def test_look_truncate(): table = (('foo', 'bar'), ('abcd', 1234), ('bcde', 2345)) actual = repr(look(table, truncate=3)) expect = """+-----+-----+ | foo | bar | +=====+=====+ | 'ab | 123 | +-----+-----+ | 'bc | 234 | +-----+-----+ """ eq_(expect, actual) actual = repr(look(table, truncate=3, vrepr=str)) expect = """+-----+-----+ | foo | bar | +=====+=====+ | abc | 123 | +-----+-----+ | bcd | 234 | +-----+-----+ """ eq_(expect, actual) def test_look_width(): table = (('foo', 'bar'), ('a', 1), ('b', 2)) actual = repr(look(table, width=10)) expect = ("+-----+---\n" "| foo | ba\n" "+=====+===\n" "| 'a' | \n" "+-----+---\n" "| 'b' | \n" "+-----+---\n") eq_(expect, actual) def test_look_style_simple(): table = (('foo', 'bar'), ('a', 1), ('b', 2)) actual = repr(look(table, style='simple')) expect = """=== === foo bar === === 'a' 1 'b' 2 === === """ eq_(expect, actual) etl.config.look_style = 'simple' actual = repr(look(table)) eq_(expect, actual) etl.config.look_style = 'grid' def test_look_style_minimal(): table = (('foo', 'bar'), ('a', 1), ('b', 2)) actual = repr(look(table, style='minimal')) expect = """foo bar 'a' 1 'b' 2 """ eq_(expect, actual) etl.config.look_style = 'minimal' actual = repr(look(table)) eq_(expect, actual) etl.config.look_style = 'grid' def test_see(): table = (('foo', 'bar'), ('a', 1), ('b', 2)) actual = repr(see(table)) expect = """foo: 'a', 'b' bar: 1, 2 """ eq_(expect, actual) def test_see_index_header(): table = (('foo', 'bar'), ('a', 1), ('b', 2)) actual = repr(see(table, index_header=True)) expect = """0|foo: 'a', 'b' 1|bar: 1, 2 """ eq_(expect, actual) def test_see_duplicateheader(): table = (('foo', 'bar', 'foo'), ('a', 1, 'a_prime'), ('b', 2, 'b_prime')) actual = repr(see(table)) expect = """foo: 'a', 'b' bar: 1, 2 foo: 'a_prime', 'b_prime' """ eq_(expect, actual) def test_lookstr(): table = (('foo', 'bar'), ('a', 1), ('b', 2)) actual = repr(lookstr(table)) expect = """+-----+-----+ | foo | bar | +=====+=====+ | a | 1 | +-----+-----+ | b | 2 | +-----+-----+ """ eq_(expect, actual) def test_look_headerless(): table = [] actual = repr(look(table)) expect = "" eq_(expect, actual) petl-1.7.15/petl/transform/000077500000000000000000000000001457414240700155405ustar00rootroot00000000000000petl-1.7.15/petl/transform/__init__.py000066400000000000000000000046421457414240700176570ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division from petl.transform.basics import cut, cutout, movefield, cat, annex, \ addfield, addfieldusingcontext, addrownumbers, addcolumn, rowslice, head, \ tail, skipcomments, stack, addfields from petl.transform.headers import rename, setheader, extendheader, \ pushheader, skip, prefixheader, suffixheader, sortheader from petl.transform.conversions import convert, convertall, replace, \ replaceall, update, convertnumbers, format, formatall, interpolate, \ interpolateall from petl.transform.sorts import sort, mergesort, issorted from petl.transform.selects import select, selectop, selectcontains, \ selecteq, selectfalse, selectge, selectgt, selectin, selectis, \ selectisinstance, selectisnot, selectle, selectlt, selectne, selectnone, \ selectnotin, selectnotnone, selectrangeclosed, selectrangeopen, \ selectrangeopenleft, selectrangeopenright, selecttrue, \ selectusingcontext, rowlenselect, facet, biselect from petl.transform.joins import join, leftjoin, rightjoin, outerjoin, \ crossjoin, antijoin, lookupjoin, unjoin from petl.transform.hashjoins import hashjoin, hashleftjoin, hashrightjoin, \ hashantijoin, hashlookupjoin from petl.transform.reductions import rowreduce, mergeduplicates,\ aggregate, groupcountdistinctvalues, groupselectfirst, groupselectmax, \ groupselectmin, merge, fold, Conflict, groupselectlast from petl.transform.fills import filldown, fillright, fillleft from petl.transform.regex import capture, split, search, searchcomplement, \ sub, splitdown from petl.transform.reshape import melt, recast, transpose, pivot, flatten, \ unflatten from petl.transform.maps import fieldmap, rowmap, rowmapmany, rowgroupmap from petl.transform.unpacks import unpack, unpackdict from petl.transform.dedup import duplicates, unique, distinct, conflicts, \ isunique from petl.transform.setops import complement, intersection, \ recordcomplement, diff, recorddiff, hashintersection, hashcomplement from petl.transform.intervals import intervaljoin, intervalleftjoin, \ intervaljoinvalues, intervalantijoin, intervallookup, intervallookupone, \ intervalrecordlookup, intervalrecordlookupone, intervalsubtract, \ facetintervallookup, facetintervallookupone, facetintervalrecordlookup, \ facetintervalrecordlookupone, collapsedintervals from petl.transform.validation import validate petl-1.7.15/petl/transform/basics.py000066400000000000000000001013441457414240700173610ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division # standard library dependencies from itertools import islice, chain from collections import deque from itertools import count from petl.compat import izip, izip_longest, next, string_types, text_type # internal dependencies from petl.util.base import asindices, rowgetter, Record, Table import logging logger = logging.getLogger(__name__) warning = logger.warning info = logger.info debug = logger.debug def cut(table, *args, **kwargs): """ Choose and/or re-order fields. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar', 'baz'], ... ['A', 1, 2.7], ... ['B', 2, 3.4], ... ['B', 3, 7.8], ... ['D', 42, 9.0], ... ['E', 12]] >>> table2 = etl.cut(table1, 'foo', 'baz') >>> table2 +-----+------+ | foo | baz | +=====+======+ | 'A' | 2.7 | +-----+------+ | 'B' | 3.4 | +-----+------+ | 'B' | 7.8 | +-----+------+ | 'D' | 9.0 | +-----+------+ | 'E' | None | +-----+------+ >>> # fields can also be specified by index, starting from zero ... table3 = etl.cut(table1, 0, 2) >>> table3 +-----+------+ | foo | baz | +=====+======+ | 'A' | 2.7 | +-----+------+ | 'B' | 3.4 | +-----+------+ | 'B' | 7.8 | +-----+------+ | 'D' | 9.0 | +-----+------+ | 'E' | None | +-----+------+ >>> # field names and indices can be mixed ... table4 = etl.cut(table1, 'bar', 0) >>> table4 +-----+-----+ | bar | foo | +=====+=====+ | 1 | 'A' | +-----+-----+ | 2 | 'B' | +-----+-----+ | 3 | 'B' | +-----+-----+ | 42 | 'D' | +-----+-----+ | 12 | 'E' | +-----+-----+ >>> # select a range of fields ... table5 = etl.cut(table1, *range(0, 2)) >>> table5 +-----+-----+ | foo | bar | +=====+=====+ | 'A' | 1 | +-----+-----+ | 'B' | 2 | +-----+-----+ | 'B' | 3 | +-----+-----+ | 'D' | 42 | +-----+-----+ | 'E' | 12 | +-----+-----+ Note that any short rows will be padded with `None` values (or whatever is provided via the `missing` keyword argument). See also :func:`petl.transform.basics.cutout`. """ # support passing a single list or tuple of fields if len(args) == 1 and isinstance(args[0], (list, tuple)): args = args[0] return CutView(table, args, **kwargs) Table.cut = cut class CutView(Table): def __init__(self, source, spec, missing=None): self.source = source self.spec = spec self.missing = missing def __iter__(self): return itercut(self.source, self.spec, self.missing) def itercut(source, spec, missing=None): it = iter(source) spec = tuple(spec) # make sure no-one can change midstream # convert field selection into field indices try: hdr = next(it) except StopIteration: hdr = [] indices = asindices(hdr, spec) # define a function to transform each row in the source data # according to the field selection transform = rowgetter(*indices) # yield the transformed header yield transform(hdr) # construct the transformed data for row in it: try: yield transform(row) except IndexError: # row is short, let's be kind and fill in any missing fields yield tuple(row[i] if i < len(row) else missing for i in indices) def cutout(table, *args, **kwargs): """ Remove fields. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar', 'baz'], ... ['A', 1, 2.7], ... ['B', 2, 3.4], ... ['B', 3, 7.8], ... ['D', 42, 9.0], ... ['E', 12]] >>> table2 = etl.cutout(table1, 'bar') >>> table2 +-----+------+ | foo | baz | +=====+======+ | 'A' | 2.7 | +-----+------+ | 'B' | 3.4 | +-----+------+ | 'B' | 7.8 | +-----+------+ | 'D' | 9.0 | +-----+------+ | 'E' | None | +-----+------+ See also :func:`petl.transform.basics.cut`. """ return CutOutView(table, args, **kwargs) Table.cutout = cutout class CutOutView(Table): def __init__(self, source, spec, missing=None): self.source = source self.spec = spec self.missing = missing def __iter__(self): return itercutout(self.source, self.spec, self.missing) def itercutout(source, spec, missing=None): it = iter(source) spec = tuple(spec) # make sure no-one can change midstream # convert field selection into field indices try: hdr = next(it) except StopIteration: hdr = [] indicesout = asindices(hdr, spec) indices = [i for i in range(len(hdr)) if i not in indicesout] # define a function to transform each row in the source data # according to the field selection transform = rowgetter(*indices) # yield the transformed header yield transform(hdr) # construct the transformed data for row in it: try: yield transform(row) except IndexError: # row is short, let's be kind and fill in any missing fields yield tuple(row[i] if i < len(row) else missing for i in indices) def cat(*tables, **kwargs): """ Concatenate tables. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar'], ... [1, 'A'], ... [2, 'B']] >>> table2 = [['bar', 'baz'], ... ['C', True], ... ['D', False]] >>> table3 = etl.cat(table1, table2) >>> table3 +------+-----+-------+ | foo | bar | baz | +======+=====+=======+ | 1 | 'A' | None | +------+-----+-------+ | 2 | 'B' | None | +------+-----+-------+ | None | 'C' | True | +------+-----+-------+ | None | 'D' | False | +------+-----+-------+ >>> # can also be used to square up a single table with uneven rows ... table4 = [['foo', 'bar', 'baz'], ... ['A', 1, 2], ... ['B', '2', '3.4'], ... [u'B', u'3', u'7.8', True], ... ['D', 'xyz', 9.0], ... ['E', None]] >>> table5 = etl.cat(table4) >>> table5 +-----+-------+-------+ | foo | bar | baz | +=====+=======+=======+ | 'A' | 1 | 2 | +-----+-------+-------+ | 'B' | '2' | '3.4' | +-----+-------+-------+ | 'B' | '3' | '7.8' | +-----+-------+-------+ | 'D' | 'xyz' | 9.0 | +-----+-------+-------+ | 'E' | None | None | +-----+-------+-------+ >>> # use the header keyword argument to specify a fixed set of fields ... table6 = [['bar', 'foo'], ... ['A', 1], ... ['B', 2]] >>> table7 = etl.cat(table6, header=['A', 'foo', 'B', 'bar', 'C']) >>> table7 +------+-----+------+-----+------+ | A | foo | B | bar | C | +======+=====+======+=====+======+ | None | 1 | None | 'A' | None | +------+-----+------+-----+------+ | None | 2 | None | 'B' | None | +------+-----+------+-----+------+ >>> # using the header keyword argument with two input tables ... table8 = [['bar', 'foo'], ... ['A', 1], ... ['B', 2]] >>> table9 = [['bar', 'baz'], ... ['C', True], ... ['D', False]] >>> table10 = etl.cat(table8, table9, header=['A', 'foo', 'B', 'bar', 'C']) >>> table10 +------+------+------+-----+------+ | A | foo | B | bar | C | +======+======+======+=====+======+ | None | 1 | None | 'A' | None | +------+------+------+-----+------+ | None | 2 | None | 'B' | None | +------+------+------+-----+------+ | None | None | None | 'C' | None | +------+------+------+-----+------+ | None | None | None | 'D' | None | +------+------+------+-----+------+ Note that the tables do not need to share exactly the same fields, any missing fields will be padded with `None` or whatever is provided via the `missing` keyword argument. Note that this function can be used with a single table argument, in which case it has the effect of ensuring all data rows are the same length as the header row, truncating any long rows and padding any short rows with the value of the `missing` keyword argument. By default, the fields for the output table will be determined as the union of all fields found in the input tables. Use the `header` keyword argument to override this behaviour and specify a fixed set of fields for the output table. """ return CatView(tables, **kwargs) Table.cat = cat class CatView(Table): def __init__(self, sources, missing=None, header=None): self.sources = sources self.missing = missing self.header = header def __iter__(self): return itercat(self.sources, self.missing, self.header) def itercat(sources, missing, header): its = [iter(t) for t in sources] hdrs = [] for it in its: try: hdrs.append(list(next(it))) except StopIteration: hdrs.append([]) if header is None: # determine output fields by gathering all fields found in the sources outhdr = list(hdrs[0]) for hdr in hdrs[1:]: for h in hdr: if h not in outhdr: # add any new fields as we find them outhdr.append(h) else: # predetermined output fields outhdr = header yield tuple(outhdr) # output data rows for hdr, it in zip(hdrs, its): # now construct and yield the data rows for row in it: outrow = list() for h in outhdr: val = missing try: val = row[hdr.index(h)] except IndexError: # short row pass except ValueError: # field not in table pass outrow.append(val) yield tuple(outrow) def stack(*tables, **kwargs): """Concatenate tables, without trying to match headers. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar'], ... [1, 'A'], ... [2, 'B']] >>> table2 = [['bar', 'baz'], ... ['C', True], ... ['D', False]] >>> table3 = etl.stack(table1, table2) >>> table3 +-----+-------+ | foo | bar | +=====+=======+ | 1 | 'A' | +-----+-------+ | 2 | 'B' | +-----+-------+ | 'C' | True | +-----+-------+ | 'D' | False | +-----+-------+ >>> # can also be used to square up a single table with uneven rows ... table4 = [['foo', 'bar', 'baz'], ... ['A', 1, 2], ... ['B', '2', '3.4'], ... [u'B', u'3', u'7.8', True], ... ['D', 'xyz', 9.0], ... ['E', None]] >>> table5 = etl.stack(table4) >>> table5 +-----+-------+-------+ | foo | bar | baz | +=====+=======+=======+ | 'A' | 1 | 2 | +-----+-------+-------+ | 'B' | '2' | '3.4' | +-----+-------+-------+ | 'B' | '3' | '7.8' | +-----+-------+-------+ | 'D' | 'xyz' | 9.0 | +-----+-------+-------+ | 'E' | None | None | +-----+-------+-------+ Similar to :func:`petl.transform.basics.cat` except that no attempt is made to align fields from different tables. Data rows are simply emitted in order, trimmed or padded to the length of the header row from the first table. .. versionadded:: 1.1.0 """ return StackView(tables, **kwargs) Table.stack = stack class StackView(Table): def __init__(self, sources, missing=None, trim=True, pad=True): self.sources = sources self.missing = missing self.trim = trim self.pad = pad def __iter__(self): return iterstack(self.sources, self.missing, self.trim, self.pad) def iterstack(sources, missing, trim, pad): its = [iter(t) for t in sources] hdrs = [] for it in its: try: hdrs.append(next(it)) except StopIteration: hdrs.append([]) hdr = hdrs[0] n = len(hdr) yield tuple(hdr) for it in its: for row in it: outrow = tuple(row) if trim: outrow = outrow[:n] if pad and len(outrow) < n: outrow += (missing,) * (n - len(outrow)) yield outrow def addfield(table, field, value=None, index=None, missing=None): """ Add a field with a fixed or calculated value. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar'], ... ['M', 12], ... ['F', 34], ... ['-', 56]] >>> # using a fixed value ... table2 = etl.addfield(table1, 'baz', 42) >>> table2 +-----+-----+-----+ | foo | bar | baz | +=====+=====+=====+ | 'M' | 12 | 42 | +-----+-----+-----+ | 'F' | 34 | 42 | +-----+-----+-----+ | '-' | 56 | 42 | +-----+-----+-----+ >>> # calculating the value ... table2 = etl.addfield(table1, 'baz', lambda rec: rec['bar'] * 2) >>> table2 +-----+-----+-----+ | foo | bar | baz | +=====+=====+=====+ | 'M' | 12 | 24 | +-----+-----+-----+ | 'F' | 34 | 68 | +-----+-----+-----+ | '-' | 56 | 112 | +-----+-----+-----+ Use the `index` parameter to control the position of the inserted field. """ return AddFieldView(table, field, value=value, index=index, missing=missing) Table.addfield = addfield class AddFieldView(Table): def __init__(self, source, field, value=None, index=None, missing=None): # ensure rows are all the same length self.source = stack(source, missing=missing) self.field = field self.value = value self.index = index def __iter__(self): return iteraddfield(self.source, self.field, self.value, self.index) def iteraddfield(source, field, value, index): it = iter(source) try: hdr = next(it) except StopIteration: hdr = [] flds = list(map(text_type, hdr)) # determine index of new field if index is None: index = len(hdr) # construct output fields outhdr = list(hdr) outhdr.insert(index, field) yield tuple(outhdr) if callable(value): # wrap rows as records if using calculated value it = (Record(row, flds) for row in it) for row in it: outrow = list(row) v = value(row) outrow.insert(index, v) yield tuple(outrow) else: for row in it: outrow = list(row) outrow.insert(index, value) yield tuple(outrow) def addfields(table, field_defs, missing=None): """ Add fields with fixed or calculated values. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar'], ... ['M', 12], ... ['F', 34], ... ['-', 56]] >>> # using a fixed value or a calculation ... table2 = etl.addfields(table1, ... [('baz', 42), ... ('luhrmann', lambda rec: rec['bar'] * 2)]) >>> table2 +-----+-----+-----+----------+ | foo | bar | baz | luhrmann | +=====+=====+=====+==========+ | 'M' | 12 | 42 | 24 | +-----+-----+-----+----------+ | 'F' | 34 | 42 | 68 | +-----+-----+-----+----------+ | '-' | 56 | 42 | 112 | +-----+-----+-----+----------+ >>> # you can specify an index as a 3rd item in each tuple -- indicies ... # are evaluated in order. ... table2 = etl.addfields(table1, ... [('baz', 42, 0), ... ('luhrmann', lambda rec: rec['bar'] * 2, 0)]) >>> table2 +----------+-----+-----+-----+ | luhrmann | baz | foo | bar | +==========+=====+=====+=====+ | 24 | 42 | 'M' | 12 | +----------+-----+-----+-----+ | 68 | 42 | 'F' | 34 | +----------+-----+-----+-----+ | 112 | 42 | '-' | 56 | +----------+-----+-----+-----+ """ return AddFieldsView(table, field_defs, missing=missing) Table.addfields = addfields class AddFieldsView(Table): def __init__(self, source, field_defs, missing=None): # ensure rows are all the same length self.source = stack(source, missing=missing) # convert tuples to FieldDefinitions, if necessary self.field_defs = field_defs def __iter__(self): return iteraddfields(self.source, self.field_defs) def iteraddfields(source, field_defs): it = iter(source) try: hdr = next(it) except StopIteration: hdr = [] flds = list(map(text_type, hdr)) # initialize output fields and indices outhdr = list(hdr) value_indexes = [] for fdef in field_defs: # determine the defined field index if len(fdef) == 2: name, value = fdef index = len(outhdr) else: name, value, index = fdef # insert the name into the header at the appropriate index outhdr.insert(index, name) # remember the value/index pairs for later value_indexes.append((value, index)) yield tuple(outhdr) for row in it: outrow = list(row) # add each defined field into the row at the appropriate index for value, index in value_indexes: if callable(value): # wrap row as record if using calculated value row = Record(row, flds) v = value(row) outrow.insert(index, v) else: outrow.insert(index, value) yield tuple(outrow) def rowslice(table, *sliceargs): """ Choose a subsequence of data rows. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar'], ... ['a', 1], ... ['b', 2], ... ['c', 5], ... ['d', 7], ... ['f', 42]] >>> table2 = etl.rowslice(table1, 2) >>> table2 +-----+-----+ | foo | bar | +=====+=====+ | 'a' | 1 | +-----+-----+ | 'b' | 2 | +-----+-----+ >>> table3 = etl.rowslice(table1, 1, 4) >>> table3 +-----+-----+ | foo | bar | +=====+=====+ | 'b' | 2 | +-----+-----+ | 'c' | 5 | +-----+-----+ | 'd' | 7 | +-----+-----+ >>> table4 = etl.rowslice(table1, 0, 5, 2) >>> table4 +-----+-----+ | foo | bar | +=====+=====+ | 'a' | 1 | +-----+-----+ | 'c' | 5 | +-----+-----+ | 'f' | 42 | +-----+-----+ Positional arguments are used to slice the data rows. The `sliceargs` are passed through to :func:`itertools.islice`. See also :func:`petl.transform.basics.head`, :func:`petl.transform.basics.tail`. """ return RowSliceView(table, *sliceargs) Table.rowslice = rowslice class RowSliceView(Table): def __init__(self, source, *sliceargs): self.source = source if not sliceargs: self.sliceargs = (None,) else: self.sliceargs = sliceargs def __iter__(self): return iterrowslice(self.source, self.sliceargs) def iterrowslice(source, sliceargs): it = iter(source) try: yield tuple(next(it)) # fields except StopIteration: return for row in islice(it, *sliceargs): yield tuple(row) def head(table, n=5): """ Select the first `n` data rows. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar'], ... ['a', 1], ... ['b', 2], ... ['c', 5], ... ['d', 7], ... ['f', 42], ... ['f', 3], ... ['h', 90]] >>> table2 = etl.head(table1, 4) >>> table2 +-----+-----+ | foo | bar | +=====+=====+ | 'a' | 1 | +-----+-----+ | 'b' | 2 | +-----+-----+ | 'c' | 5 | +-----+-----+ | 'd' | 7 | +-----+-----+ See also :func:`petl.transform.basics.tail`, :func:`petl.transform.basics.rowslice`. """ return rowslice(table, n) Table.head = head def tail(table, n=5): """ Select the last `n` data rows. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar'], ... ['a', 1], ... ['b', 2], ... ['c', 5], ... ['d', 7], ... ['f', 42], ... ['f', 3], ... ['h', 90], ... ['k', 12], ... ['l', 77], ... ['q', 2]] >>> table2 = etl.tail(table1, 4) >>> table2 +-----+-----+ | foo | bar | +=====+=====+ | 'h' | 90 | +-----+-----+ | 'k' | 12 | +-----+-----+ | 'l' | 77 | +-----+-----+ | 'q' | 2 | +-----+-----+ See also :func:`petl.transform.basics.head`, :func:`petl.transform.basics.rowslice`. """ return TailView(table, n) Table.tail = tail class TailView(Table): def __init__(self, source, n): self.source = source self.n = n def __iter__(self): return itertail(self.source, self.n) def itertail(source, n): it = iter(source) try: yield tuple(next(it)) # fields except StopIteration: return # stop generating cache = deque() for row in it: cache.append(row) if len(cache) > n: cache.popleft() for row in cache: yield tuple(row) def skipcomments(table, prefix): """ Skip any row where the first value is a string and starts with `prefix`. E.g.:: >>> import petl as etl >>> table1 = [['##aaa', 'bbb', 'ccc'], ... ['##mmm',], ... ['#foo', 'bar'], ... ['##nnn', 1], ... ['a', 1], ... ['b', 2]] >>> table2 = etl.skipcomments(table1, '##') >>> table2 +------+-----+ | #foo | bar | +======+=====+ | 'a' | 1 | +------+-----+ | 'b' | 2 | +------+-----+ Use the `prefix` parameter to determine which string to consider as indicating a comment. """ return SkipCommentsView(table, prefix) Table.skipcomments = skipcomments class SkipCommentsView(Table): def __init__(self, source, prefix): self.source = source self.prefix = prefix def __iter__(self): return iterskipcomments(self.source, self.prefix) def iterskipcomments(source, prefix): return (row for row in source if (len(row) > 0 and not(isinstance(row[0], string_types) and row[0].startswith(prefix)))) def movefield(table, field, index): """ Move a field to a new position. """ return MoveFieldView(table, field, index) Table.movefield = movefield class MoveFieldView(Table): def __init__(self, table, field, index, missing=None): self.table = table self.field = field self.index = index self.missing = missing def __iter__(self): it = iter(self.table) # determine output fields try: hdr = next(it) except StopIteration: hdr = [] outhdr = [f for f in hdr if f != self.field] outhdr.insert(self.index, self.field) yield tuple(outhdr) # define a function to transform each row in the source data # according to the field selection outflds = list(map(str, outhdr)) indices = asindices(hdr, outflds) transform = rowgetter(*indices) # construct the transformed data for row in it: try: yield transform(row) except IndexError: # row is short, let's be kind and fill in any missing fields yield tuple(row[i] if i < len(row) else self.missing for i in indices) def annex(*tables, **kwargs): """ Join two or more tables by row order. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar'], ... ['A', 9], ... ['C', 2], ... ['F', 1]] >>> table2 = [['foo', 'baz'], ... ['B', 3], ... ['D', 10]] >>> table3 = etl.annex(table1, table2) >>> table3 +-----+-----+------+------+ | foo | bar | foo | baz | +=====+=====+======+======+ | 'A' | 9 | 'B' | 3 | +-----+-----+------+------+ | 'C' | 2 | 'D' | 10 | +-----+-----+------+------+ | 'F' | 1 | None | None | +-----+-----+------+------+ See also :func:`petl.transform.joins.join`. """ return AnnexView(tables, **kwargs) Table.annex = annex class AnnexView(Table): def __init__(self, tables, missing=None): self.tables = tables self.missing = missing def __iter__(self): return iterannex(self.tables, self.missing) def iterannex(tables, missing): its = [iter(t) for t in tables] hdrs = [] for it in its: try: hdrs.append(next(it)) except StopIteration: hdrs.append([]) outhdr = tuple(chain(*hdrs)) yield outhdr for rows in izip_longest(*its): outrow = list() for i, row in enumerate(rows): lh = len(hdrs[i]) if row is None: # handle uneven length tables row = [missing] * len(hdrs[i]) else: lr = len(row) if lr < lh: # handle short rows row = list(row) row.extend([missing] * (lh-lr)) elif lr > lh: # handle long rows row = row[:lh] outrow.extend(row) yield tuple(outrow) def addrownumbers(table, start=1, step=1, field='row'): """ Add a field of row numbers. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar'], ... ['A', 9], ... ['C', 2], ... ['F', 1]] >>> table2 = etl.addrownumbers(table1) >>> table2 +-----+-----+-----+ | row | foo | bar | +=====+=====+=====+ | 1 | 'A' | 9 | +-----+-----+-----+ | 2 | 'C' | 2 | +-----+-----+-----+ | 3 | 'F' | 1 | +-----+-----+-----+ Parameters `start` and `step` control the numbering. """ return AddRowNumbersView(table, start, step, field) Table.addrownumbers = addrownumbers class AddRowNumbersView(Table): def __init__(self, table, start=1, step=1, field='row'): self.table = table self.start = start self.step = step self.field = field def __iter__(self): return iteraddrownumbers(self.table, self.start, self.step, self.field) def iteraddrownumbers(table, start, step, field): it = iter(table) try: hdr = next(it) except StopIteration: hdr = [] outhdr = [field] outhdr.extend(hdr) yield tuple(outhdr) for row, n in izip(it, count(start, step)): outrow = [n] outrow.extend(row) yield tuple(outrow) def addcolumn(table, field, col, index=None, missing=None): """ Add a column of data to the table. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar'], ... ['A', 1], ... ['B', 2]] >>> col = [True, False] >>> table2 = etl.addcolumn(table1, 'baz', col) >>> table2 +-----+-----+-------+ | foo | bar | baz | +=====+=====+=======+ | 'A' | 1 | True | +-----+-----+-------+ | 'B' | 2 | False | +-----+-----+-------+ Use the `index` parameter to control the position of the new column. """ return AddColumnView(table, field, col, index=index, missing=missing) Table.addcolumn = addcolumn class AddColumnView(Table): def __init__(self, table, field, col, index=None, missing=None): self._table = table self._field = field self._col = col self._index = index self._missing = missing def __iter__(self): return iteraddcolumn(self._table, self._field, self._col, self._index, self._missing) def iteraddcolumn(table, field, col, index, missing): it = iter(table) try: hdr = next(it) except StopIteration: hdr = [] # determine position of new column if index is None: index = len(hdr) # construct output header outhdr = list(hdr) outhdr.insert(index, field) yield tuple(outhdr) # construct output data for row, val in izip_longest(it, col, fillvalue=missing): # run out of rows? if row == missing: row = [missing] * len(hdr) outrow = list(row) outrow.insert(index, val) yield tuple(outrow) class TransformError(Exception): pass def addfieldusingcontext(table, field, query): """ Like :func:`petl.transform.basics.addfield` but the `query` function is passed the previous, current and next rows, so values may be calculated based on data in adjacent rows. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar'], ... ['A', 1], ... ['B', 4], ... ['C', 5], ... ['D', 9]] >>> def upstream(prv, cur, nxt): ... if prv is None: ... return None ... else: ... return cur.bar - prv.bar ... >>> def downstream(prv, cur, nxt): ... if nxt is None: ... return None ... else: ... return nxt.bar - cur.bar ... >>> table2 = etl.addfieldusingcontext(table1, 'baz', upstream) >>> table3 = etl.addfieldusingcontext(table2, 'quux', downstream) >>> table3 +-----+-----+------+------+ | foo | bar | baz | quux | +=====+=====+======+======+ | 'A' | 1 | None | 3 | +-----+-----+------+------+ | 'B' | 4 | 3 | 1 | +-----+-----+------+------+ | 'C' | 5 | 1 | 4 | +-----+-----+------+------+ | 'D' | 9 | 4 | None | +-----+-----+------+------+ The `field` parameter is the name of the field to be added. The `query` parameter is a function operating on the current, previous and next rows and returning the value. """ return AddFieldUsingContextView(table, field, query) Table.addfieldusingcontext = addfieldusingcontext class AddFieldUsingContextView(Table): def __init__(self, table, field, query): self.table = table self.field = field self.query = query def __iter__(self): return iteraddfieldusingcontext(self.table, self.field, self.query) def iteraddfieldusingcontext(table, field, query): it = iter(table) try: hdr = tuple(next(it)) except StopIteration: hdr = () flds = list(map(text_type, hdr)) yield hdr + (field,) flds.append(field) it = (Record(row, flds) for row in it) prv = None try: cur = next(it) except StopIteration: return # no more items for nxt in it: v = query(prv, cur, nxt) yield tuple(cur) + (v,) prv = Record(tuple(cur) + (v,), flds) cur = nxt # handle last row v = query(prv, cur, None) yield tuple(cur) + (v,) petl-1.7.15/petl/transform/conversions.py000066400000000000000000000407151457414240700204710ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division from petl.compat import next, integer_types, string_types, text_type import petl.config as config from petl.errors import ArgumentError, FieldSelectionError from petl.util.base import Table, expr, fieldnames, Record from petl.util.parsers import numparser def convert(table, *args, **kwargs): """Transform values under one or more fields via arbitrary functions, method invocations or dictionary translations. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar', 'baz'], ... ['A', '2.4', 12], ... ['B', '5.7', 34], ... ['C', '1.2', 56]] >>> # using a built-in function: ... table2 = etl.convert(table1, 'bar', float) >>> table2 +-----+-----+-----+ | foo | bar | baz | +=====+=====+=====+ | 'A' | 2.4 | 12 | +-----+-----+-----+ | 'B' | 5.7 | 34 | +-----+-----+-----+ | 'C' | 1.2 | 56 | +-----+-----+-----+ >>> # using a lambda function:: ... table3 = etl.convert(table1, 'baz', lambda v: v*2) >>> table3 +-----+-------+-----+ | foo | bar | baz | +=====+=======+=====+ | 'A' | '2.4' | 24 | +-----+-------+-----+ | 'B' | '5.7' | 68 | +-----+-------+-----+ | 'C' | '1.2' | 112 | +-----+-------+-----+ >>> # a method of the data value can also be invoked by passing ... # the method name ... table4 = etl.convert(table1, 'foo', 'lower') >>> table4 +-----+-------+-----+ | foo | bar | baz | +=====+=======+=====+ | 'a' | '2.4' | 12 | +-----+-------+-----+ | 'b' | '5.7' | 34 | +-----+-------+-----+ | 'c' | '1.2' | 56 | +-----+-------+-----+ >>> # arguments to the method invocation can also be given ... table5 = etl.convert(table1, 'foo', 'replace', 'A', 'AA') >>> table5 +------+-------+-----+ | foo | bar | baz | +======+=======+=====+ | 'AA' | '2.4' | 12 | +------+-------+-----+ | 'B' | '5.7' | 34 | +------+-------+-----+ | 'C' | '1.2' | 56 | +------+-------+-----+ >>> # values can also be translated via a dictionary ... table7 = etl.convert(table1, 'foo', {'A': 'Z', 'B': 'Y'}) >>> table7 +-----+-------+-----+ | foo | bar | baz | +=====+=======+=====+ | 'Z' | '2.4' | 12 | +-----+-------+-----+ | 'Y' | '5.7' | 34 | +-----+-------+-----+ | 'C' | '1.2' | 56 | +-----+-------+-----+ >>> # the same conversion can be applied to multiple fields ... table8 = etl.convert(table1, ('foo', 'bar', 'baz'), str) >>> table8 +-----+-------+------+ | foo | bar | baz | +=====+=======+======+ | 'A' | '2.4' | '12' | +-----+-------+------+ | 'B' | '5.7' | '34' | +-----+-------+------+ | 'C' | '1.2' | '56' | +-----+-------+------+ >>> # multiple conversions can be specified at the same time ... table9 = etl.convert(table1, {'foo': 'lower', ... 'bar': float, ... 'baz': lambda v: v * 2}) >>> table9 +-----+-----+-----+ | foo | bar | baz | +=====+=====+=====+ | 'a' | 2.4 | 24 | +-----+-----+-----+ | 'b' | 5.7 | 68 | +-----+-----+-----+ | 'c' | 1.2 | 112 | +-----+-----+-----+ >>> # ...or alternatively via a list ... table10 = etl.convert(table1, ['lower', float, lambda v: v*2]) >>> table10 +-----+-----+-----+ | foo | bar | baz | +=====+=====+=====+ | 'a' | 2.4 | 24 | +-----+-----+-----+ | 'b' | 5.7 | 68 | +-----+-----+-----+ | 'c' | 1.2 | 112 | +-----+-----+-----+ >>> # conversion can be conditional ... table11 = etl.convert(table1, 'baz', lambda v: v * 2, ... where=lambda r: r.foo == 'B') >>> table11 +-----+-------+-----+ | foo | bar | baz | +=====+=======+=====+ | 'A' | '2.4' | 12 | +-----+-------+-----+ | 'B' | '5.7' | 68 | +-----+-------+-----+ | 'C' | '1.2' | 56 | +-----+-------+-----+ >>> # conversion can access other values from the same row ... table12 = etl.convert(table1, 'baz', ... lambda v, row: v * float(row.bar), ... pass_row=True) >>> table12 +-----+-------+--------------------+ | foo | bar | baz | +=====+=======+====================+ | 'A' | '2.4' | 28.799999999999997 | +-----+-------+--------------------+ | 'B' | '5.7' | 193.8 | +-----+-------+--------------------+ | 'C' | '1.2' | 67.2 | +-----+-------+--------------------+ >>> # conversion can use a custom function >>> def my_func(val, row): ... return float(row.bar) + row.baz ... >>> table13 = etl.convert(table1, 'foo', my_func, pass_row=True) >>> table13 +------+-------+-----+ | foo | bar | baz | +======+=======+=====+ | 14.4 | '2.4' | 12 | +------+-------+-----+ | 39.7 | '5.7' | 34 | +------+-------+-----+ | 57.2 | '1.2' | 56 | +------+-------+-----+ Note that either field names or indexes can be given. The ``where`` keyword argument can be given with a callable or expression which is evaluated on each row and which should return True if the conversion should be applied on that row, else False. The ``pass_row`` keyword argument can be given, which if True will mean that both the value and the containing row will be passed as arguments to the conversion function (so, i.e., the conversion function should accept two arguments). When multiple fields are converted in a single call, the conversions are independent of each other. Each conversion sees the original row:: >>> # multiple conversions do not affect each other ... table13 = etl.convert(table1, { ... "foo": lambda foo, row: row.bar, ... "bar": lambda bar, row: row.foo, ... }, pass_row=True) >>> table13 +-------+-----+-----+ | foo | bar | baz | +=======+=====+=====+ | '2.4' | 'A' | 12 | +-------+-----+-----+ | '5.7' | 'B' | 34 | +-------+-----+-----+ | '1.2' | 'C' | 56 | +-------+-----+-----+ Also accepts `failonerror` and `errorvalue` keyword arguments, documented under :func:`petl.config.failonerror` """ converters = None if len(args) == 0: # no conversion specified, can be set afterwards via suffix notation pass elif len(args) == 1: converters = args[0] elif len(args) > 1: converters = dict() # assume first arg is field name or spec field = args[0] if len(args) == 2: conv = args[1] else: conv = args[1:] if isinstance(field, (list, tuple)): # allow for multiple fields for f in field: converters[f] = conv else: converters[field] = conv return FieldConvertView(table, converters, **kwargs) Table.convert = convert def convertall(table, *args, **kwargs): """ Convenience function to convert all fields in the table using a common function or mapping. See also :func:`convert`. The ``where`` keyword argument can be given with a callable or expression which is evaluated on each row and which should return True if the conversion should be applied on that row, else False. """ # TODO don't read the data twice! return convert(table, fieldnames(table), *args, **kwargs) Table.convertall = convertall def replace(table, field, a, b, **kwargs): """ Convenience function to replace all occurrences of `a` with `b` under the given field. See also :func:`convert`. The ``where`` keyword argument can be given with a callable or expression which is evaluated on each row and which should return True if the conversion should be applied on that row, else False. """ return convert(table, field, {a: b}, **kwargs) Table.replace = replace def replaceall(table, a, b, **kwargs): """ Convenience function to replace all instances of `a` with `b` under all fields. See also :func:`convertall`. The ``where`` keyword argument can be given with a callable or expression which is evaluated on each row and which should return True if the conversion should be applied on that row, else False. """ return convertall(table, {a: b}, **kwargs) Table.replaceall = replaceall def update(table, field, value, **kwargs): """ Convenience function to convert a field to a fixed value. Accepts the ``where`` keyword argument. See also :func:`convert`. """ return convert(table, field, lambda v: value, **kwargs) Table.update = update def convertnumbers(table, strict=False, **kwargs): """ Convenience function to convert all field values to numbers where possible. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar', 'baz', 'quux'], ... ['1', '3.0', '9+3j', 'aaa'], ... ['2', '1.3', '7+2j', None]] >>> table2 = etl.convertnumbers(table1) >>> table2 +-----+-----+--------+-------+ | foo | bar | baz | quux | +=====+=====+========+=======+ | 1 | 3.0 | (9+3j) | 'aaa' | +-----+-----+--------+-------+ | 2 | 1.3 | (7+2j) | None | +-----+-----+--------+-------+ """ return convertall(table, numparser(strict), **kwargs) Table.convertnumbers = convertnumbers class FieldConvertView(Table): def __init__(self, source, converters=None, failonerror=None, errorvalue=None, where=None, pass_row=False): self.source = source if converters is None: self.converters = dict() elif isinstance(converters, dict): self.converters = converters elif isinstance(converters, (tuple, list)): self.converters = dict([(i, v) for i, v in enumerate(converters)]) else: raise ArgumentError('unexpected converters: %r' % converters) self.failonerror = (config.failonerror if failonerror is None else failonerror) self.errorvalue = errorvalue self.where = where self.pass_row = pass_row def __iter__(self): return iterfieldconvert(self.source, self.converters, self.failonerror, self.errorvalue, self.where, self.pass_row) def __setitem__(self, key, value): self.converters[key] = value def iterfieldconvert(source, converters, failonerror, errorvalue, where, pass_row): # grab the fields in the source table it = iter(source) try: hdr = next(it) flds = list(map(text_type, hdr)) yield tuple(hdr) # these are not modified except StopIteration: hdr = flds = [] # converters will fail selecting a field # build converter functions converter_functions = dict() for k, c in converters.items(): # turn field names into row indices if not isinstance(k, integer_types): try: k = flds.index(k) except ValueError: # not in list raise FieldSelectionError(k) assert isinstance(k, int), 'expected integer, found %r' % k # is converter a function? if callable(c): converter_functions[k] = c # is converter a method name? elif isinstance(c, string_types): converter_functions[k] = methodcaller(c) # is converter a method name with arguments? elif isinstance(c, (tuple, list)) and isinstance(c[0], string_types): methnm = c[0] methargs = c[1:] converter_functions[k] = methodcaller(methnm, *methargs) # is converter a dictionary? elif isinstance(c, dict): converter_functions[k] = dictconverter(c) # is it something else? elif c is None: pass # ignore else: raise ArgumentError( 'unexpected converter specification on field %r: %r' % (k, c) ) # define a function to transform a value def transform_value(i, v, *args): if i not in converter_functions: # no converter defined on this field, return value as-is return v else: try: return converter_functions[i](v, *args) except Exception as e: if failonerror == 'inline': return e elif failonerror: raise e else: return errorvalue # define a function to transform a row if pass_row: def transform_row(_row): return tuple(transform_value(i, v, _row) for i, v in enumerate(_row)) else: def transform_row(_row): return tuple(transform_value(i, v) for i, v in enumerate(_row)) # prepare where function if isinstance(where, string_types): where = expr(where) elif where is not None: assert callable(where), 'expected callable for "where" argument, ' \ 'found %r' % where # prepare iterator if pass_row or where: # wrap rows as records it = (Record(row, flds) for row in it) # construct the data rows if where is None: # simple case, transform all rows for row in it: yield transform_row(row) else: # conditionally transform rows for row in it: if where(row): yield transform_row(row) else: yield row def methodcaller(nm, *args): return lambda v: getattr(v, nm)(*args) def dictconverter(d): def conv(v): try: if v in d: return d[v] else: return v except TypeError: # value is not hashable return v return conv def format(table, field, fmt, **kwargs): """ Convenience function to format all values in the given `field` using the `fmt` format string. The ``where`` keyword argument can be given with a callable or expression which is evaluated on each row and which should return True if the conversion should be applied on that row, else False. """ conv = lambda v: fmt.format(v) return convert(table, field, conv, **kwargs) Table.format = format def formatall(table, fmt, **kwargs): """ Convenience function to format all values in all fields using the `fmt` format string. The ``where`` keyword argument can be given with a callable or expression which is evaluated on each row and which should return True if the conversion should be applied on that row, else False. """ conv = lambda v: fmt.format(v) return convertall(table, conv, **kwargs) Table.formatall = formatall def interpolate(table, field, fmt, **kwargs): """ Convenience function to interpolate all values in the given `field` using the `fmt` string. The ``where`` keyword argument can be given with a callable or expression which is evaluated on each row and which should return True if the conversion should be applied on that row, else False. """ conv = lambda v: fmt % v return convert(table, field, conv, **kwargs) Table.interpolate = interpolate def interpolateall(table, fmt, **kwargs): """ Convenience function to interpolate all values in all fields using the `fmt` string. The ``where`` keyword argument can be given with a callable or expression which is evaluated on each row and which should return True if the conversion should be applied on that row, else False. """ conv = lambda v: fmt % v return convertall(table, conv, **kwargs) Table.interpolateall = interpolateall petl-1.7.15/petl/transform/dedup.py000066400000000000000000000362561457414240700172270ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division import operator from petl.compat import text_type from petl.util.base import Table, asindices, itervalues from petl.transform.sorts import sort def duplicates(table, key=None, presorted=False, buffersize=None, tempdir=None, cache=True): """ Select rows with duplicate values under a given key (or duplicate rows where no key is given). E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar', 'baz'], ... ['A', 1, 2.0], ... ['B', 2, 3.4], ... ['D', 6, 9.3], ... ['B', 3, 7.8], ... ['B', 2, 12.3], ... ['E', None, 1.3], ... ['D', 4, 14.5]] >>> table2 = etl.duplicates(table1, 'foo') >>> table2 +-----+-----+------+ | foo | bar | baz | +=====+=====+======+ | 'B' | 2 | 3.4 | +-----+-----+------+ | 'B' | 3 | 7.8 | +-----+-----+------+ | 'B' | 2 | 12.3 | +-----+-----+------+ | 'D' | 6 | 9.3 | +-----+-----+------+ | 'D' | 4 | 14.5 | +-----+-----+------+ >>> # compound keys are supported ... table3 = etl.duplicates(table1, key=['foo', 'bar']) >>> table3 +-----+-----+------+ | foo | bar | baz | +=====+=====+======+ | 'B' | 2 | 3.4 | +-----+-----+------+ | 'B' | 2 | 12.3 | +-----+-----+------+ If `presorted` is True, it is assumed that the data are already sorted by the given key, and the `buffersize`, `tempdir` and `cache` arguments are ignored. Otherwise, the data are sorted, see also the discussion of the `buffersize`, `tempdir` and `cache` arguments under the :func:`petl.transform.sorts.sort` function. See also :func:`petl.transform.dedup.unique` and :func:`petl.transform.dedup.distinct`. """ return DuplicatesView(table, key=key, presorted=presorted, buffersize=buffersize, tempdir=tempdir, cache=cache) Table.duplicates = duplicates class DuplicatesView(Table): def __init__(self, source, key=None, presorted=False, buffersize=None, tempdir=None, cache=True): if presorted: self.source = source else: self.source = sort(source, key, buffersize=buffersize, tempdir=tempdir, cache=cache) self.key = key def __iter__(self): return iterduplicates(self.source, self.key) def iterduplicates(source, key): # assume source is sorted # first need to sort the data it = iter(source) try: hdr = next(it) except StopIteration: if key is None: return # nothing to do on a table without headers hdr = [] yield tuple(hdr) # convert field selection into field indices if key is None: indices = range(len(hdr)) else: indices = asindices(hdr, key) # now use field indices to construct a _getkey function # N.B., this may raise an exception on short rows, depending on # the field selection getkey = operator.itemgetter(*indices) previous = None previous_yielded = False for row in it: if previous is None: previous = row else: kprev = getkey(previous) kcurr = getkey(row) if kprev == kcurr: if not previous_yielded: yield tuple(previous) previous_yielded = True yield tuple(row) else: # reset previous_yielded = False previous = row def unique(table, key=None, presorted=False, buffersize=None, tempdir=None, cache=True): """ Select rows with unique values under a given key (or unique rows if no key is given). E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar', 'baz'], ... ['A', 1, 2], ... ['B', '2', '3.4'], ... ['D', 'xyz', 9.0], ... ['B', u'3', u'7.8'], ... ['B', '2', 42], ... ['E', None, None], ... ['D', 4, 12.3], ... ['F', 7, 2.3]] >>> table2 = etl.unique(table1, 'foo') >>> table2 +-----+------+------+ | foo | bar | baz | +=====+======+======+ | 'A' | 1 | 2 | +-----+------+------+ | 'E' | None | None | +-----+------+------+ | 'F' | 7 | 2.3 | +-----+------+------+ If `presorted` is True, it is assumed that the data are already sorted by the given key, and the `buffersize`, `tempdir` and `cache` arguments are ignored. Otherwise, the data are sorted, see also the discussion of the `buffersize`, `tempdir` and `cache` arguments under the :func:`petl.transform.sorts.sort` function. See also :func:`petl.transform.dedup.duplicates` and :func:`petl.transform.dedup.distinct`. """ return UniqueView(table, key=key, presorted=presorted, buffersize=buffersize, tempdir=tempdir, cache=cache) Table.unique = unique class UniqueView(Table): def __init__(self, source, key=None, presorted=False, buffersize=None, tempdir=None, cache=True): if presorted: self.source = source else: self.source = sort(source, key, buffersize=buffersize, tempdir=tempdir, cache=cache) self.key = key def __iter__(self): return iterunique(self.source, self.key) def iterunique(source, key): # assume source is sorted # first need to sort the data it = iter(source) try: hdr = next(it) except StopIteration: return yield tuple(hdr) # convert field selection into field indices if key is None: indices = range(len(hdr)) else: indices = asindices(hdr, key) # now use field indices to construct a _getkey function # N.B., this may raise an exception on short rows, depending on # the field selection getkey = operator.itemgetter(*indices) try: prev = next(it) except StopIteration: return prev_key = getkey(prev) prev_comp_ne = True for curr in it: curr_key = getkey(curr) curr_comp_ne = (curr_key != prev_key) if prev_comp_ne and curr_comp_ne: yield tuple(prev) prev = curr prev_key = curr_key prev_comp_ne = curr_comp_ne # last one? if prev_comp_ne: yield prev def conflicts(table, key, missing=None, include=None, exclude=None, presorted=False, buffersize=None, tempdir=None, cache=True): """ Select rows with the same key value but differing in some other field. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar', 'baz'], ... ['A', 1, 2.7], ... ['B', 2, None], ... ['D', 3, 9.4], ... ['B', None, 7.8], ... ['E', None], ... ['D', 3, 12.3], ... ['A', 2, None]] >>> table2 = etl.conflicts(table1, 'foo') >>> table2 +-----+-----+------+ | foo | bar | baz | +=====+=====+======+ | 'A' | 1 | 2.7 | +-----+-----+------+ | 'A' | 2 | None | +-----+-----+------+ | 'D' | 3 | 9.4 | +-----+-----+------+ | 'D' | 3 | 12.3 | +-----+-----+------+ Missing values are not considered conflicts. By default, `None` is treated as the missing value, this can be changed via the `missing` keyword argument. One or more fields can be ignored when determining conflicts by providing the `exclude` keyword argument. Alternatively, fields to use when determining conflicts can be specified explicitly with the `include` keyword argument. This provides a simple mechanism for analysing the source of conflicting rows from multiple tables, e.g.:: >>> table1 = [['foo', 'bar'], [1, 'a'], [2, 'b']] >>> table2 = [['foo', 'bar'], [1, 'a'], [2, 'c']] >>> table3 = etl.cat(etl.addfield(table1, 'source', 1), ... etl.addfield(table2, 'source', 2)) >>> table4 = etl.conflicts(table3, key='foo', exclude='source') >>> table4 +-----+-----+--------+ | foo | bar | source | +=====+=====+========+ | 2 | 'b' | 1 | +-----+-----+--------+ | 2 | 'c' | 2 | +-----+-----+--------+ If `presorted` is True, it is assumed that the data are already sorted by the given key, and the `buffersize`, `tempdir` and `cache` arguments are ignored. Otherwise, the data are sorted, see also the discussion of the `buffersize`, `tempdir` and `cache` arguments under the :func:`petl.transform.sorts.sort` function. """ return ConflictsView(table, key, missing=missing, exclude=exclude, include=include, presorted=presorted, buffersize=buffersize, tempdir=tempdir, cache=cache) Table.conflicts = conflicts class ConflictsView(Table): def __init__(self, source, key, missing=None, exclude=None, include=None, presorted=False, buffersize=None, tempdir=None, cache=True): if presorted: self.source = source else: self.source = sort(source, key, buffersize=buffersize, tempdir=tempdir, cache=cache) self.key = key self.missing = missing self.exclude = exclude self.include = include def __iter__(self): return iterconflicts(self.source, self.key, self.missing, self.exclude, self.include) def iterconflicts(source, key, missing, exclude, include): # normalise arguments if exclude and not isinstance(exclude, (list, tuple)): exclude = (exclude,) if include and not isinstance(include, (list, tuple)): include = (include,) # exclude overrides include if include and exclude: include = None it = iter(source) try: hdr = next(it) except StopIteration: return flds = list(map(text_type, hdr)) yield tuple(hdr) # convert field selection into field indices indices = asindices(hdr, key) # now use field indices to construct a _getkey function # N.B., this may raise an exception on short rows, depending on # the field selection getkey = operator.itemgetter(*indices) previous = None previous_yielded = False for row in it: if previous is None: previous = row else: kprev = getkey(previous) kcurr = getkey(row) if kprev == kcurr: # is there a conflict? conflict = False for x, y, f in zip(previous, row, flds): if (exclude and f not in exclude) \ or (include and f in include) \ or (not exclude and not include): if missing not in (x, y) and x != y: conflict = True break if conflict: if not previous_yielded: yield tuple(previous) previous_yielded = True yield tuple(row) else: # reset previous_yielded = False previous = row def distinct(table, key=None, count=None, presorted=False, buffersize=None, tempdir=None, cache=True): """ Return only distinct rows in the table. If the `count` argument is not None, it will be used as the name for an additional field, and the values of the field will be the number of duplicate rows. If the `key` keyword argument is passed, the comparison is done on the given key instead of the full row. See also :func:`petl.transform.dedup.duplicates`, :func:`petl.transform.dedup.unique`, :func:`petl.transform.reductions.groupselectfirst`, :func:`petl.transform.reductions.groupselectlast`. """ return DistinctView(table, key=key, count=count, presorted=presorted, buffersize=buffersize, tempdir=tempdir, cache=cache) Table.distinct = distinct class DistinctView(Table): def __init__(self, table, key=None, count=None, presorted=False, buffersize=None, tempdir=None, cache=True): if presorted: self.table = table else: self.table = sort(table, key=key, buffersize=buffersize, tempdir=tempdir, cache=cache) self.key = key self.count = count def __iter__(self): it = iter(self.table) try: hdr = next(it) except StopIteration: return # convert field selection into field indices if self.key is None: indices = range(len(hdr)) else: indices = asindices(hdr, self.key) # now use field indices to construct a _getkey function # N.B., this may raise an exception on short rows, depending on # the field selection getkey = operator.itemgetter(*indices) INIT = object() if self.count: hdr = tuple(hdr) + (self.count,) yield hdr previous = INIT n_dup = 1 for row in it: if previous is INIT: previous = row else: kprev = getkey(previous) kcurr = getkey(row) if kprev == kcurr: n_dup += 1 else: yield tuple(previous) + (n_dup,) n_dup = 1 previous = row # deal with last row yield tuple(previous) + (n_dup,) else: yield tuple(hdr) previous_keys = INIT for row in it: keys = getkey(row) if keys != previous_keys: yield tuple(row) previous_keys = keys def isunique(table, field): """ Return True if there are no duplicate values for the given field(s), otherwise False. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar'], ... ['a', 1], ... ['b'], ... ['b', 2], ... ['c', 3, True]] >>> etl.isunique(table1, 'foo') False >>> etl.isunique(table1, 'bar') True The `field` argument can be a single field name or index (starting from zero) or a tuple of field names and/or indexes. """ vals = set() for v in itervalues(table, field): if v in vals: return False else: vals.add(v) return True Table.isunique = isunique petl-1.7.15/petl/transform/fills.py000066400000000000000000000157151457414240700172340ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division from petl.compat import next from petl.util.base import Table, asindices def filldown(table, *fields, **kwargs): """ Replace missing values with non-missing values from the row above. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar', 'baz'], ... [1, 'a', None], ... [1, None, .23], ... [1, 'b', None], ... [2, None, None], ... [2, None, .56], ... [2, 'c', None], ... [None, 'c', .72]] >>> table2 = etl.filldown(table1) >>> table2.lookall() +-----+-----+------+ | foo | bar | baz | +=====+=====+======+ | 1 | 'a' | None | +-----+-----+------+ | 1 | 'a' | 0.23 | +-----+-----+------+ | 1 | 'b' | 0.23 | +-----+-----+------+ | 2 | 'b' | 0.23 | +-----+-----+------+ | 2 | 'b' | 0.56 | +-----+-----+------+ | 2 | 'c' | 0.56 | +-----+-----+------+ | 2 | 'c' | 0.72 | +-----+-----+------+ >>> table3 = etl.filldown(table1, 'bar') >>> table3.lookall() +------+-----+------+ | foo | bar | baz | +======+=====+======+ | 1 | 'a' | None | +------+-----+------+ | 1 | 'a' | 0.23 | +------+-----+------+ | 1 | 'b' | None | +------+-----+------+ | 2 | 'b' | None | +------+-----+------+ | 2 | 'b' | 0.56 | +------+-----+------+ | 2 | 'c' | None | +------+-----+------+ | None | 'c' | 0.72 | +------+-----+------+ >>> table4 = etl.filldown(table1, 'bar', 'baz') >>> table4.lookall() +------+-----+------+ | foo | bar | baz | +======+=====+======+ | 1 | 'a' | None | +------+-----+------+ | 1 | 'a' | 0.23 | +------+-----+------+ | 1 | 'b' | 0.23 | +------+-----+------+ | 2 | 'b' | 0.23 | +------+-----+------+ | 2 | 'b' | 0.56 | +------+-----+------+ | 2 | 'c' | 0.56 | +------+-----+------+ | None | 'c' | 0.72 | +------+-----+------+ Use the `missing` keyword argument to control which value is treated as missing (`None` by default). """ return FillDownView(table, fields, **kwargs) Table.filldown = filldown class FillDownView(Table): def __init__(self, table, fields, missing=None): self.table = table self.fields = fields self.missing = missing def __iter__(self): return iterfilldown(self.table, self.fields, self.missing) def iterfilldown(table, fillfields, missing): it = iter(table) try: hdr = next(it) except StopIteration: return yield tuple(hdr) if not fillfields: # fill down all fields fillfields = hdr fillindices = asindices(hdr, fillfields) fill = list(next(it)) # fill values yield tuple(fill) for row in it: outrow = list(row) for idx in fillindices: if row[idx] == missing: outrow[idx] = fill[idx] # fill down else: fill[idx] = row[idx] # new fill value yield tuple(outrow) def fillright(table, missing=None): """ Replace missing values with preceding non-missing values. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar', 'baz'], ... [1, 'a', None], ... [1, None, .23], ... [1, 'b', None], ... [2, None, None], ... [2, None, .56], ... [2, 'c', None], ... [None, 'c', .72]] >>> table2 = etl.fillright(table1) >>> table2.lookall() +------+-----+------+ | foo | bar | baz | +======+=====+======+ | 1 | 'a' | 'a' | +------+-----+------+ | 1 | 1 | 0.23 | +------+-----+------+ | 1 | 'b' | 'b' | +------+-----+------+ | 2 | 2 | 2 | +------+-----+------+ | 2 | 2 | 0.56 | +------+-----+------+ | 2 | 'c' | 'c' | +------+-----+------+ | None | 'c' | 0.72 | +------+-----+------+ Use the `missing` keyword argument to control which value is treated as missing (`None` by default). """ return FillRightView(table, missing=missing) Table.fillright = fillright class FillRightView(Table): def __init__(self, table, missing=None): self.table = table self.missing = missing def __iter__(self): return iterfillright(self.table, self.missing) def iterfillright(table, missing): it = iter(table) try: hdr = next(it) except StopIteration: return yield tuple(hdr) for row in it: outrow = list(row) for i, _ in enumerate(outrow): if i > 0 and outrow[i] == missing and outrow[i-1] != missing: outrow[i] = outrow[i-1] yield tuple(outrow) def fillleft(table, missing=None): """ Replace missing values with following non-missing values. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar', 'baz'], ... [1, 'a', None], ... [1, None, .23], ... [1, 'b', None], ... [2, None, None], ... [2, None, .56], ... [2, 'c', None], ... [None, 'c', .72]] >>> table2 = etl.fillleft(table1) >>> table2.lookall() +-----+------+------+ | foo | bar | baz | +=====+======+======+ | 1 | 'a' | None | +-----+------+------+ | 1 | 0.23 | 0.23 | +-----+------+------+ | 1 | 'b' | None | +-----+------+------+ | 2 | None | None | +-----+------+------+ | 2 | 0.56 | 0.56 | +-----+------+------+ | 2 | 'c' | None | +-----+------+------+ | 'c' | 'c' | 0.72 | +-----+------+------+ Use the `missing` keyword argument to control which value is treated as missing (`None` by default). """ return FillLeftView(table, missing=missing) Table.fillleft = fillleft class FillLeftView(Table): def __init__(self, table, missing=None): self.table = table self.missing = missing def __iter__(self): return iterfillleft(self.table, self.missing) def iterfillleft(table, missing): it = iter(table) try: hdr = next(it) except StopIteration: return yield tuple(hdr) for row in it: outrow = list(reversed(row)) for i, _ in enumerate(outrow): if i > 0 and outrow[i] == missing and outrow[i-1] != missing: outrow[i] = outrow[i-1] yield tuple(reversed(outrow)) petl-1.7.15/petl/transform/hashjoins.py000066400000000000000000000357121457414240700201100ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division import operator from petl.compat import next, text_type from petl.util.base import Table, asindices, rowgetter, iterpeek from petl.util.lookups import lookup, lookupone from petl.transform.joins import keys_from_args def hashjoin(left, right, key=None, lkey=None, rkey=None, cache=True, lprefix=None, rprefix=None): """Alternative implementation of :func:`petl.transform.joins.join`, where the join is executed by constructing an in-memory lookup for the right hand table, then iterating over rows from the left hand table. May be faster and/or more resource efficient where the right table is small and the left table is large. By default data from right hand table is cached to improve performance (only available when `key` is given). Left and right tables with different key fields can be handled via the `lkey` and `rkey` arguments. """ lkey, rkey = keys_from_args(left, right, key, lkey, rkey) return HashJoinView(left, right, lkey=lkey, rkey=rkey, cache=cache, lprefix=lprefix, rprefix=rprefix) Table.hashjoin = hashjoin class HashJoinView(Table): def __init__(self, left, right, lkey, rkey, cache=True, lprefix=None, rprefix=None): self.left = left self.right = right self.lkey = lkey self.rkey = rkey self.cache = cache self.rlookup = None self.lprefix = lprefix self.rprefix = rprefix def __iter__(self): if not self.cache or self.rlookup is None: self.rlookup = lookup(self.right, self.rkey) return iterhashjoin(self.left, self.right, self.lkey, self.rkey, self.rlookup, self.lprefix, self.rprefix) def iterhashjoin(left, right, lkey, rkey, rlookup, lprefix, rprefix): lit = iter(left) rit = iter(right) lhdr = next(lit) rhdr = next(rit) # determine indices of the key fields in left and right tables lkind = asindices(lhdr, lkey) rkind = asindices(rhdr, rkey) # construct functions to extract key values from left table lgetk = operator.itemgetter(*lkind) # determine indices of non-key fields in the right table # (in the output, we only include key fields from the left table - we # don't want to duplicate fields) rvind = [i for i in range(len(rhdr)) if i not in rkind] rgetv = rowgetter(*rvind) # determine the output fields if lprefix is None: outhdr = list(lhdr) else: outhdr = [(text_type(lprefix) + text_type(f)) for f in lhdr] if rprefix is None: outhdr.extend(rgetv(rhdr)) else: outhdr.extend([(text_type(rprefix) + text_type(f)) for f in rgetv(rhdr)]) yield tuple(outhdr) # define a function to join rows def joinrows(_lrow, _rrows): for rrow in _rrows: # start with the left row _outrow = list(_lrow) # extend with non-key values from the right row _outrow.extend(rgetv(rrow)) yield tuple(_outrow) for lrow in lit: k = lgetk(lrow) if k in rlookup: rrows = rlookup[k] for outrow in joinrows(lrow, rrows): yield outrow def hashleftjoin(left, right, key=None, lkey=None, rkey=None, missing=None, cache=True, lprefix=None, rprefix=None): """Alternative implementation of :func:`petl.transform.joins.leftjoin`, where the join is executed by constructing an in-memory lookup for the right hand table, then iterating over rows from the left hand table. May be faster and/or more resource efficient where the right table is small and the left table is large. By default data from right hand table is cached to improve performance (only available when `key` is given). Left and right tables with different key fields can be handled via the `lkey` and `rkey` arguments. """ lkey, rkey = keys_from_args(left, right, key, lkey, rkey) return HashLeftJoinView(left, right, lkey, rkey, missing=missing, cache=cache, lprefix=lprefix, rprefix=rprefix) Table.hashleftjoin = hashleftjoin class HashLeftJoinView(Table): def __init__(self, left, right, lkey, rkey, missing=None, cache=True, lprefix=None, rprefix=None): self.left = left self.right = right self.lkey = lkey self.rkey = rkey self.missing = missing self.cache = cache self.rlookup = None self.lprefix = lprefix self.rprefix = rprefix def __iter__(self): if not self.cache or self.rlookup is None: self.rlookup = lookup(self.right, self.rkey) return iterhashleftjoin(self.left, self.right, self.lkey, self.rkey, self.missing, self.rlookup, self.lprefix, self.rprefix) def iterhashleftjoin(left, right, lkey, rkey, missing, rlookup, lprefix, rprefix): lit = iter(left) rit = iter(right) lhdr = next(lit) rhdr = next(rit) # determine indices of the key fields in left and right tables lkind = asindices(lhdr, lkey) rkind = asindices(rhdr, rkey) # construct functions to extract key values from left table lgetk = operator.itemgetter(*lkind) # determine indices of non-key fields in the right table # (in the output, we only include key fields from the left table - we # don't want to duplicate fields) rvind = [i for i in range(len(rhdr)) if i not in rkind] rgetv = rowgetter(*rvind) # determine the output fields if lprefix is None: outhdr = list(lhdr) else: outhdr = [(text_type(lprefix) + text_type(f)) for f in lhdr] if rprefix is None: outhdr.extend(rgetv(rhdr)) else: outhdr.extend([(text_type(rprefix) + text_type(f)) for f in rgetv(rhdr)]) yield tuple(outhdr) # define a function to join rows def joinrows(_lrow, _rrows): for rrow in _rrows: # start with the left row _outrow = list(_lrow) # extend with non-key values from the right row _outrow.extend(rgetv(rrow)) yield tuple(_outrow) for lrow in lit: k = lgetk(lrow) if k in rlookup: rrows = rlookup[k] for outrow in joinrows(lrow, rrows): yield outrow else: outrow = list(lrow) # start with the left row # extend with missing values in place of the right row outrow.extend([missing] * len(rvind)) yield tuple(outrow) def hashrightjoin(left, right, key=None, lkey=None, rkey=None, missing=None, cache=True, lprefix=None, rprefix=None): """Alternative implementation of :func:`petl.transform.joins.rightjoin`, where the join is executed by constructing an in-memory lookup for the left hand table, then iterating over rows from the right hand table. May be faster and/or more resource efficient where the left table is small and the right table is large. By default data from right hand table is cached to improve performance (only available when `key` is given). Left and right tables with different key fields can be handled via the `lkey` and `rkey` arguments. """ lkey, rkey = keys_from_args(left, right, key, lkey, rkey) return HashRightJoinView(left, right, lkey, rkey, missing=missing, cache=cache, lprefix=lprefix, rprefix=rprefix) Table.hashrightjoin = hashrightjoin class HashRightJoinView(Table): def __init__(self, left, right, lkey, rkey, missing=None, cache=True, lprefix=None, rprefix=None): self.left = left self.right = right self.lkey = lkey self.rkey = rkey self.missing = missing self.cache = cache self.llookup = None self.lprefix = lprefix self.rprefix = rprefix def __iter__(self): if not self.cache or self.llookup is None: self.llookup = lookup(self.left, self.lkey) return iterhashrightjoin(self.left, self.right, self.lkey, self.rkey, self.missing, self.llookup, self.lprefix, self.rprefix) def iterhashrightjoin(left, right, lkey, rkey, missing, llookup, lprefix, rprefix): lit = iter(left) rit = iter(right) lhdr = next(lit) rhdr = next(rit) # determine indices of the key fields in left and right tables lkind = asindices(lhdr, lkey) rkind = asindices(rhdr, rkey) # construct functions to extract key values from left table rgetk = operator.itemgetter(*rkind) # determine indices of non-key fields in the right table # (in the output, we only include key fields from the left table - we # don't want to duplicate fields) rvind = [i for i in range(len(rhdr)) if i not in rkind] rgetv = rowgetter(*rvind) # determine the output fields if lprefix is None: outhdr = list(lhdr) else: outhdr = [(text_type(lprefix) + text_type(f)) for f in lhdr] if rprefix is None: outhdr.extend(rgetv(rhdr)) else: outhdr.extend([(text_type(rprefix) + text_type(f)) for f in rgetv(rhdr)]) yield tuple(outhdr) # define a function to join rows def joinrows(_rrow, _lrows): for lrow in _lrows: # start with the left row _outrow = list(lrow) # extend with non-key values from the right row _outrow.extend(rgetv(_rrow)) yield tuple(_outrow) for rrow in rit: k = rgetk(rrow) if k in llookup: lrows = llookup[k] for outrow in joinrows(rrow, lrows): yield outrow else: # start with missing values in place of the left row outrow = [missing] * len(lhdr) # set key values for li, ri in zip(lkind, rkind): outrow[li] = rrow[ri] # extend with non-key values from the right row outrow.extend(rgetv(rrow)) yield tuple(outrow) def hashantijoin(left, right, key=None, lkey=None, rkey=None): """Alternative implementation of :func:`petl.transform.joins.antijoin`, where the join is executed by constructing an in-memory set for all keys found in the right hand table, then iterating over rows from the left hand table. May be faster and/or more resource efficient where the right table is small and the left table is large. Left and right tables with different key fields can be handled via the `lkey` and `rkey` arguments. """ lkey, rkey = keys_from_args(left, right, key, lkey, rkey) return HashAntiJoinView(left, right, lkey, rkey) Table.hashantijoin = hashantijoin class HashAntiJoinView(Table): def __init__(self, left, right, lkey, rkey): self.left = left self.right = right self.lkey = lkey self.rkey = rkey def __iter__(self): return iterhashantijoin(self.left, self.right, self.lkey, self.rkey) def iterhashantijoin(left, right, lkey, rkey): lit = iter(left) rit = iter(right) lhdr = next(lit) rhdr = next(rit) yield tuple(lhdr) # determine indices of the key fields in left and right tables lkind = asindices(lhdr, lkey) rkind = asindices(rhdr, rkey) # construct functions to extract key values from both tables lgetk = operator.itemgetter(*lkind) rgetk = operator.itemgetter(*rkind) rkeys = set() for rrow in rit: rk = rgetk(rrow) rkeys.add(rk) for lrow in lit: lk = lgetk(lrow) if lk not in rkeys: yield tuple(lrow) def hashlookupjoin(left, right, key=None, lkey=None, rkey=None, missing=None, lprefix=None, rprefix=None): """Alternative implementation of :func:`petl.transform.joins.lookupjoin`, where the join is executed by constructing an in-memory lookup for the right hand table, then iterating over rows from the left hand table. May be faster and/or more resource efficient where the right table is small and the left table is large. Left and right tables with different key fields can be handled via the `lkey` and `rkey` arguments. """ lkey, rkey = keys_from_args(left, right, key, lkey, rkey) return HashLookupJoinView(left, right, lkey, rkey, missing=missing, lprefix=lprefix, rprefix=rprefix) Table.hashlookupjoin = hashlookupjoin class HashLookupJoinView(Table): def __init__(self, left, right, lkey, rkey, missing=None, lprefix=None, rprefix=None): self.left = left self.right = right self.lkey = lkey self.rkey = rkey self.missing = missing self.lprefix = lprefix self.rprefix = rprefix def __iter__(self): return iterhashlookupjoin(self.left, self.right, self.lkey, self.rkey, self.missing, self.lprefix, self.rprefix) def iterhashlookupjoin(left, right, lkey, rkey, missing, lprefix, rprefix): lit = iter(left) lhdr = next(lit) rhdr, rit = iterpeek(right) # need the whole lot to pass to lookup rlookup = lookupone(rit, rkey, strict=False) # determine indices of the key fields in left and right tables lkind = asindices(lhdr, lkey) rkind = asindices(rhdr, rkey) # construct functions to extract key values from left table lgetk = operator.itemgetter(*lkind) # determine indices of non-key fields in the right table # (in the output, we only include key fields from the left table - we # don't want to duplicate fields) rvind = [i for i in range(len(rhdr)) if i not in rkind] rgetv = rowgetter(*rvind) # determine the output fields if lprefix is None: outhdr = list(lhdr) else: outhdr = [(text_type(lprefix) + text_type(f)) for f in lhdr] if rprefix is None: outhdr.extend(rgetv(rhdr)) else: outhdr.extend([(text_type(rprefix) + text_type(f)) for f in rgetv(rhdr)]) yield tuple(outhdr) # define a function to join rows def joinrows(_lrow, _rrow): # start with the left row _outrow = list(_lrow) # extend with non-key values from the right row _outrow.extend(rgetv(_rrow)) return tuple(_outrow) for lrow in lit: k = lgetk(lrow) if k in rlookup: rrow = rlookup[k] yield joinrows(lrow, rrow) else: outrow = list(lrow) # start with the left row # extend with missing values in place of the right row outrow.extend([missing] * len(rvind)) yield tuple(outrow) petl-1.7.15/petl/transform/headers.py000066400000000000000000000226341457414240700175340ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division import itertools from petl.compat import next, text_type from petl.errors import FieldSelectionError from petl.util.base import Table, asindices, rowgetter def rename(table, *args, **kwargs): """ Replace one or more values in the table's header row. E.g.:: >>> import petl as etl >>> table1 = [['sex', 'age'], ... ['m', 12], ... ['f', 34], ... ['-', 56]] >>> # rename a single field ... table2 = etl.rename(table1, 'sex', 'gender') >>> table2 +--------+-----+ | gender | age | +========+=====+ | 'm' | 12 | +--------+-----+ | 'f' | 34 | +--------+-----+ | '-' | 56 | +--------+-----+ >>> # rename multiple fields by passing dictionary as second argument ... table3 = etl.rename(table1, {'sex': 'gender', 'age': 'age_years'}) >>> table3 +--------+-----------+ | gender | age_years | +========+===========+ | 'm' | 12 | +--------+-----------+ | 'f' | 34 | +--------+-----------+ | '-' | 56 | +--------+-----------+ The field to rename can be specified as an index (i.e., integer representing field position). If any nonexistent fields are specified, the default behaviour is to raise a `FieldSelectionError`. However, if `strict` keyword argument is `False`, any nonexistent fields specified will be silently ignored. """ return RenameView(table, *args, **kwargs) Table.rename = rename class RenameView(Table): def __init__(self, table, *args, **kwargs): self.source = table if len(args) == 0: self.spec = dict() elif len(args) == 1: self.spec = args[0] elif len(args) == 2: self.spec = {args[0]: args[1]} self.strict = kwargs.get('strict', True) def __iter__(self): return iterrename(self.source, self.spec, self.strict) def __setitem__(self, key, value): self.spec[key] = value def iterrename(source, spec, strict): it = iter(source) try: hdr = next(it) except StopIteration: hdr = [] flds = list(map(text_type, hdr)) if strict: for x in spec: if isinstance(x, int): if x < 0 or x >= len(hdr): raise FieldSelectionError(x) elif x not in flds: raise FieldSelectionError(x) outhdr = [spec[i] if i in spec else spec[f] if f in spec else f for i, f in enumerate(flds)] yield tuple(outhdr) for row in it: yield tuple(row) def setheader(table, header): """ Replace header row in the given table. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar'], ... ['a', 1], ... ['b', 2]] >>> table2 = etl.setheader(table1, ['foofoo', 'barbar']) >>> table2 +--------+--------+ | foofoo | barbar | +========+========+ | 'a' | 1 | +--------+--------+ | 'b' | 2 | +--------+--------+ See also :func:`petl.transform.headers.extendheader`, :func:`petl.transform.headers.pushheader`. """ return SetHeaderView(table, header) Table.setheader = setheader class SetHeaderView(Table): def __init__(self, source, header): self.source = source self.header = header def __iter__(self): return itersetheader(self.source, self.header) def itersetheader(source, header): it = iter(source) try: next(it) # discard source header except StopIteration: pass # no previous header yield tuple(header) for row in it: yield tuple(row) def extendheader(table, fields): """ Extend header row in the given table. E.g.:: >>> import petl as etl >>> table1 = [['foo'], ... ['a', 1, True], ... ['b', 2, False]] >>> table2 = etl.extendheader(table1, ['bar', 'baz']) >>> table2 +-----+-----+-------+ | foo | bar | baz | +=====+=====+=======+ | 'a' | 1 | True | +-----+-----+-------+ | 'b' | 2 | False | +-----+-----+-------+ See also :func:`petl.transform.headers.setheader`, :func:`petl.transform.headers.pushheader`. """ return ExtendHeaderView(table, fields) Table.extendheader = extendheader class ExtendHeaderView(Table): def __init__(self, source, fields): self.source = source self.fields = fields def __iter__(self): return iterextendheader(self.source, self.fields) def iterextendheader(source, fields): it = iter(source) try: hdr = next(it) except StopIteration: hdr = [] outhdr = list(hdr) outhdr.extend(fields) yield tuple(outhdr) for row in it: yield tuple(row) def pushheader(table, header, *args): """ Push rows down and prepend a header row. E.g.:: >>> import petl as etl >>> table1 = [['a', 1], ... ['b', 2]] >>> table2 = etl.pushheader(table1, ['foo', 'bar']) >>> table2 +-----+-----+ | foo | bar | +=====+=====+ | 'a' | 1 | +-----+-----+ | 'b' | 2 | +-----+-----+ The header row can either be a list or positional arguments. """ return PushHeaderView(table, header, *args) Table.pushheader = pushheader class PushHeaderView(Table): def __init__(self, source, header, *args): self.source = source self.args = args # if user passes header as a list, just use this and ignore args if isinstance(header, (list, tuple)): self.header = header # otherwise, elif len(args) > 0: self.header = [] self.header.append(header) # first argument is named header self.header.extend(args) # add the other positional arguments else: assert False, 'bad parameters' def __iter__(self): return iterpushheader(self.source, self.header) def iterpushheader(source, header): it = iter(source) yield tuple(header) for row in it: yield tuple(row) def skip(table, n): """ Skip `n` rows, including the header row. E.g.:: >>> import petl as etl >>> table1 = [['#aaa', 'bbb', 'ccc'], ... ['#mmm'], ... ['foo', 'bar'], ... ['a', 1], ... ['b', 2]] >>> table2 = etl.skip(table1, 2) >>> table2 +-----+-----+ | foo | bar | +=====+=====+ | 'a' | 1 | +-----+-----+ | 'b' | 2 | +-----+-----+ See also :func:`petl.transform.basics.skipcomments`. """ return SkipView(table, n) Table.skip = skip class SkipView(Table): def __init__(self, source, n): self.source = source self.n = n def __iter__(self): return iterskip(self.source, self.n) def iterskip(source, n): return itertools.islice(source, n, None) def prefixheader(table, prefix): """Prefix all fields in the table header.""" return PrefixHeaderView(table, prefix) Table.prefixheader = prefixheader class PrefixHeaderView(Table): def __init__(self, table, prefix): self.table = table self.prefix = prefix def __iter__(self): it = iter(self.table) try: hdr = next(it) except StopIteration: return outhdr = tuple((text_type(self.prefix) + text_type(f)) for f in hdr) yield outhdr for row in it: yield row def suffixheader(table, suffix): """Suffix all fields in the table header.""" return SuffixHeaderView(table, suffix) Table.suffixheader = suffixheader class SuffixHeaderView(Table): def __init__(self, table, suffix): self.table = table self.suffix = suffix def __iter__(self): it = iter(self.table) try: hdr = next(it) except StopIteration: return outhdr = tuple((text_type(f) + text_type(self.suffix)) for f in hdr) yield outhdr for row in it: yield row def sortheader(table, reverse=False, missing=None): """Re-order columns so the header is sorted. .. versionadded:: 1.1.0 """ return SortHeaderView(table, reverse, missing) Table.sortheader = sortheader class SortHeaderView(Table): def __init__(self, table, reverse, missing): self.table = table self.reverse = reverse self.missing = missing def __iter__(self): it = iter(self.table) try: hdr = next(it) except StopIteration: return shdr = sorted(hdr) indices = asindices(hdr, shdr) transform = rowgetter(*indices) # yield the transformed header yield tuple(shdr) # construct the transformed data missing = self.missing for row in it: try: yield transform(row) except IndexError: # row is short, let's be kind and fill in any missing fields yield tuple(row[i] if i < len(row) else missing for i in indices) petl-1.7.15/petl/transform/intervals.py000066400000000000000000001050021457414240700201170ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division from operator import itemgetter, attrgetter from petl.compat import text_type from petl.util.base import asindices, records, Table, values, rowgroupby from petl.errors import DuplicateKeyError from petl.transform.basics import addfield from petl.transform.sorts import sort def tupletree(table, start='start', stop='stop', value=None): """ Construct an interval tree for the given table, where each node in the tree is a row of the table. """ import intervaltree tree = intervaltree.IntervalTree() it = iter(table) hdr = next(it) flds = list(map(text_type, hdr)) assert start in flds, 'start field not recognised' assert stop in flds, 'stop field not recognised' getstart = itemgetter(flds.index(start)) getstop = itemgetter(flds.index(stop)) if value is None: getvalue = tuple else: valueindices = asindices(hdr, value) assert len(valueindices) > 0, 'invalid value field specification' getvalue = itemgetter(*valueindices) for row in it: tree.addi(getstart(row), getstop(row), getvalue(row)) return tree def facettupletrees(table, key, start='start', stop='stop', value=None): """ Construct faceted interval trees for the given table, where each node in the tree is a row of the table. """ import intervaltree it = iter(table) hdr = next(it) flds = list(map(text_type, hdr)) assert start in flds, 'start field not recognised' assert stop in flds, 'stop field not recognised' getstart = itemgetter(flds.index(start)) getstop = itemgetter(flds.index(stop)) if value is None: getvalue = tuple else: valueindices = asindices(hdr, value) assert len(valueindices) > 0, 'invalid value field specification' getvalue = itemgetter(*valueindices) keyindices = asindices(hdr, key) assert len(keyindices) > 0, 'invalid key' getkey = itemgetter(*keyindices) trees = dict() for row in it: k = getkey(row) if k not in trees: trees[k] = intervaltree.IntervalTree() trees[k].addi(getstart(row), getstop(row), getvalue(row)) return trees def recordtree(table, start='start', stop='stop'): """ Construct an interval tree for the given table, where each node in the tree is a row of the table represented as a record object. """ import intervaltree getstart = attrgetter(start) getstop = attrgetter(stop) tree = intervaltree.IntervalTree() for rec in records(table): tree.addi(getstart(rec), getstop(rec), rec) return tree def facetrecordtrees(table, key, start='start', stop='stop'): """ Construct faceted interval trees for the given table, where each node in the tree is a record. """ import intervaltree getstart = attrgetter(start) getstop = attrgetter(stop) getkey = attrgetter(key) trees = dict() for rec in records(table): k = getkey(rec) if k not in trees: trees[k] = intervaltree.IntervalTree() trees[k].addi(getstart(rec), getstop(rec), rec) return trees def intervallookup(table, start='start', stop='stop', value=None, include_stop=False): """ Construct an interval lookup for the given table. E.g.:: >>> import petl as etl >>> table = [['start', 'stop', 'value'], ... [1, 4, 'foo'], ... [3, 7, 'bar'], ... [4, 9, 'baz']] >>> lkp = etl.intervallookup(table, 'start', 'stop') >>> lkp.search(0, 1) [] >>> lkp.search(1, 2) [(1, 4, 'foo')] >>> lkp.search(2, 4) [(1, 4, 'foo'), (3, 7, 'bar')] >>> lkp.search(2, 5) [(1, 4, 'foo'), (3, 7, 'bar'), (4, 9, 'baz')] >>> lkp.search(9, 14) [] >>> lkp.search(19, 140) [] >>> lkp.search(0) [] >>> lkp.search(1) [(1, 4, 'foo')] >>> lkp.search(2) [(1, 4, 'foo')] >>> lkp.search(4) [(3, 7, 'bar'), (4, 9, 'baz')] >>> lkp.search(5) [(3, 7, 'bar'), (4, 9, 'baz')] Note start coordinates are included and stop coordinates are excluded from the interval. Use the `include_stop` keyword argument to include the upper bound of the interval when finding overlaps. Some examples using the `include_stop` and `value` keyword arguments:: >>> import petl as etl >>> table = [['start', 'stop', 'value'], ... [1, 4, 'foo'], ... [3, 7, 'bar'], ... [4, 9, 'baz']] >>> lkp = etl.intervallookup(table, 'start', 'stop', include_stop=True, ... value='value') >>> lkp.search(0, 1) ['foo'] >>> lkp.search(1, 2) ['foo'] >>> lkp.search(2, 4) ['foo', 'bar', 'baz'] >>> lkp.search(2, 5) ['foo', 'bar', 'baz'] >>> lkp.search(9, 14) ['baz'] >>> lkp.search(19, 140) [] >>> lkp.search(0) [] >>> lkp.search(1) ['foo'] >>> lkp.search(2) ['foo'] >>> lkp.search(4) ['foo', 'bar', 'baz'] >>> lkp.search(5) ['bar', 'baz'] """ tree = tupletree(table, start=start, stop=stop, value=value) return IntervalTreeLookup(tree, include_stop=include_stop) Table.intervallookup = intervallookup def _search_tree(tree, start, stop, include_stop): if stop is None: if include_stop: stop = start + 1 start -= 1 args = (start, stop) else: args = (start,) else: if include_stop: stop += 1 start -= 1 args = (start, stop) if len(args) == 2: results = sorted(tree.overlap(*args)) else: results = sorted(tree.at(*args)) return results class IntervalTreeLookup(object): def __init__(self, tree, include_stop=False): self.tree = tree self.include_stop = include_stop def search(self, start, stop=None): results = _search_tree(self.tree, start, stop, self.include_stop) return [r.data for r in results] find = search def intervallookupone(table, start='start', stop='stop', value=None, include_stop=False, strict=True): """ Construct an interval lookup for the given table, returning at most one result for each query. E.g.:: >>> import petl as etl >>> table = [['start', 'stop', 'value'], ... [1, 4, 'foo'], ... [3, 7, 'bar'], ... [4, 9, 'baz']] >>> lkp = etl.intervallookupone(table, 'start', 'stop', strict=False) >>> lkp.search(0, 1) >>> lkp.search(1, 2) (1, 4, 'foo') >>> lkp.search(2, 4) (1, 4, 'foo') >>> lkp.search(2, 5) (1, 4, 'foo') >>> lkp.search(9, 14) >>> lkp.search(19, 140) >>> lkp.search(0) >>> lkp.search(1) (1, 4, 'foo') >>> lkp.search(2) (1, 4, 'foo') >>> lkp.search(4) (3, 7, 'bar') >>> lkp.search(5) (3, 7, 'bar') If ``strict=True``, queries returning more than one result will raise a `DuplicateKeyError`. If ``strict=False`` and there is more than one result, the first result is returned. Note start coordinates are included and stop coordinates are excluded from the interval. Use the `include_stop` keyword argument to include the upper bound of the interval when finding overlaps. """ tree = tupletree(table, start=start, stop=stop, value=value) return IntervalTreeLookupOne(tree, strict=strict, include_stop=include_stop) Table.intervallookupone = intervallookupone class IntervalTreeLookupOne(object): def __init__(self, tree, strict=True, include_stop=False): self.tree = tree self.strict = strict self.include_stop = include_stop def search(self, start, stop=None): results = _search_tree(self.tree, start, stop, self.include_stop) if len(results) == 0: return None elif len(results) > 1 and self.strict: raise DuplicateKeyError((start, stop)) else: return results[0].data find = search def intervalrecordlookup(table, start='start', stop='stop', include_stop=False): """ As :func:`petl.transform.intervals.intervallookup` but return records instead of tuples. """ tree = recordtree(table, start=start, stop=stop) return IntervalTreeLookup(tree, include_stop=include_stop) Table.intervalrecordlookup = intervalrecordlookup def intervalrecordlookupone(table, start='start', stop='stop', include_stop=False, strict=True): """ As :func:`petl.transform.intervals.intervallookupone` but return records instead of tuples. """ tree = recordtree(table, start=start, stop=stop) return IntervalTreeLookupOne(tree, include_stop=include_stop, strict=strict) Table.intervalrecordlookupone = intervalrecordlookupone def facetintervallookup(table, key, start='start', stop='stop', value=None, include_stop=False): """ Construct a faceted interval lookup for the given table. E.g.:: >>> import petl as etl >>> table = (('type', 'start', 'stop', 'value'), ... ('apple', 1, 4, 'foo'), ... ('apple', 3, 7, 'bar'), ... ('orange', 4, 9, 'baz')) >>> lkp = etl.facetintervallookup(table, key='type', start='start', stop='stop') >>> lkp['apple'].search(1, 2) [('apple', 1, 4, 'foo')] >>> lkp['apple'].search(2, 4) [('apple', 1, 4, 'foo'), ('apple', 3, 7, 'bar')] >>> lkp['apple'].search(2, 5) [('apple', 1, 4, 'foo'), ('apple', 3, 7, 'bar')] >>> lkp['orange'].search(2, 5) [('orange', 4, 9, 'baz')] >>> lkp['orange'].search(9, 14) [] >>> lkp['orange'].search(19, 140) [] >>> lkp['apple'].search(1) [('apple', 1, 4, 'foo')] >>> lkp['apple'].search(2) [('apple', 1, 4, 'foo')] >>> lkp['apple'].search(4) [('apple', 3, 7, 'bar')] >>> lkp['apple'].search(5) [('apple', 3, 7, 'bar')] >>> lkp['orange'].search(5) [('orange', 4, 9, 'baz')] """ trees = facettupletrees(table, key, start=start, stop=stop, value=value) out = dict() for k in trees: out[k] = IntervalTreeLookup(trees[k], include_stop=include_stop) return out Table.facetintervallookup = facetintervallookup def facetintervallookupone(table, key, start='start', stop='stop', value=None, include_stop=False, strict=True): """ Construct a faceted interval lookup for the given table, returning at most one result for each query. If ``strict=True``, queries returning more than one result will raise a `DuplicateKeyError`. If ``strict=False`` and there is more than one result, the first result is returned. """ trees = facettupletrees(table, key, start=start, stop=stop, value=value) out = dict() for k in trees: out[k] = IntervalTreeLookupOne(trees[k], include_stop=include_stop, strict=strict) return out Table.facetintervallookupone = facetintervallookupone def facetintervalrecordlookup(table, key, start='start', stop='stop', include_stop=False): """ As :func:`petl.transform.intervals.facetintervallookup` but return records. """ trees = facetrecordtrees(table, key, start=start, stop=stop) out = dict() for k in trees: out[k] = IntervalTreeLookup(trees[k], include_stop=include_stop) return out Table.facetintervalrecordlookup = facetintervalrecordlookup def facetintervalrecordlookupone(table, key, start, stop, include_stop=False, strict=True): """ As :func:`petl.transform.intervals.facetintervallookupone` but return records. """ trees = facetrecordtrees(table, key, start=start, stop=stop) out = dict() for k in trees: out[k] = IntervalTreeLookupOne(trees[k], include_stop=include_stop, strict=strict) return out Table.facetintervalrecordlookupone = facetintervalrecordlookupone def intervaljoin(left, right, lstart='start', lstop='stop', rstart='start', rstop='stop', lkey=None, rkey=None, include_stop=False, lprefix=None, rprefix=None): """ Join two tables by overlapping intervals. E.g.:: >>> import petl as etl >>> left = [['begin', 'end', 'quux'], ... [1, 2, 'a'], ... [2, 4, 'b'], ... [2, 5, 'c'], ... [9, 14, 'd'], ... [1, 1, 'e'], ... [10, 10, 'f']] >>> right = [['start', 'stop', 'value'], ... [1, 4, 'foo'], ... [3, 7, 'bar'], ... [4, 9, 'baz']] >>> table1 = etl.intervaljoin(left, right, ... lstart='begin', lstop='end', ... rstart='start', rstop='stop') >>> table1.lookall() +-------+-----+------+-------+------+-------+ | begin | end | quux | start | stop | value | +=======+=====+======+=======+======+=======+ | 1 | 2 | 'a' | 1 | 4 | 'foo' | +-------+-----+------+-------+------+-------+ | 2 | 4 | 'b' | 1 | 4 | 'foo' | +-------+-----+------+-------+------+-------+ | 2 | 4 | 'b' | 3 | 7 | 'bar' | +-------+-----+------+-------+------+-------+ | 2 | 5 | 'c' | 1 | 4 | 'foo' | +-------+-----+------+-------+------+-------+ | 2 | 5 | 'c' | 3 | 7 | 'bar' | +-------+-----+------+-------+------+-------+ | 2 | 5 | 'c' | 4 | 9 | 'baz' | +-------+-----+------+-------+------+-------+ >>> # include stop coordinate in intervals ... table2 = etl.intervaljoin(left, right, ... lstart='begin', lstop='end', ... rstart='start', rstop='stop', ... include_stop=True) >>> table2.lookall() +-------+-----+------+-------+------+-------+ | begin | end | quux | start | stop | value | +=======+=====+======+=======+======+=======+ | 1 | 2 | 'a' | 1 | 4 | 'foo' | +-------+-----+------+-------+------+-------+ | 2 | 4 | 'b' | 1 | 4 | 'foo' | +-------+-----+------+-------+------+-------+ | 2 | 4 | 'b' | 3 | 7 | 'bar' | +-------+-----+------+-------+------+-------+ | 2 | 4 | 'b' | 4 | 9 | 'baz' | +-------+-----+------+-------+------+-------+ | 2 | 5 | 'c' | 1 | 4 | 'foo' | +-------+-----+------+-------+------+-------+ | 2 | 5 | 'c' | 3 | 7 | 'bar' | +-------+-----+------+-------+------+-------+ | 2 | 5 | 'c' | 4 | 9 | 'baz' | +-------+-----+------+-------+------+-------+ | 9 | 14 | 'd' | 4 | 9 | 'baz' | +-------+-----+------+-------+------+-------+ | 1 | 1 | 'e' | 1 | 4 | 'foo' | +-------+-----+------+-------+------+-------+ Note start coordinates are included and stop coordinates are excluded from the interval. Use the `include_stop` keyword argument to include the upper bound of the interval when finding overlaps. An additional key comparison can be made, e.g.:: >>> import petl as etl >>> left = (('fruit', 'begin', 'end'), ... ('apple', 1, 2), ... ('apple', 2, 4), ... ('apple', 2, 5), ... ('orange', 2, 5), ... ('orange', 9, 14), ... ('orange', 19, 140), ... ('apple', 1, 1)) >>> right = (('type', 'start', 'stop', 'value'), ... ('apple', 1, 4, 'foo'), ... ('apple', 3, 7, 'bar'), ... ('orange', 4, 9, 'baz')) >>> table3 = etl.intervaljoin(left, right, ... lstart='begin', lstop='end', lkey='fruit', ... rstart='start', rstop='stop', rkey='type') >>> table3.lookall() +----------+-------+-----+----------+-------+------+-------+ | fruit | begin | end | type | start | stop | value | +==========+=======+=====+==========+=======+======+=======+ | 'apple' | 1 | 2 | 'apple' | 1 | 4 | 'foo' | +----------+-------+-----+----------+-------+------+-------+ | 'apple' | 2 | 4 | 'apple' | 1 | 4 | 'foo' | +----------+-------+-----+----------+-------+------+-------+ | 'apple' | 2 | 4 | 'apple' | 3 | 7 | 'bar' | +----------+-------+-----+----------+-------+------+-------+ | 'apple' | 2 | 5 | 'apple' | 1 | 4 | 'foo' | +----------+-------+-----+----------+-------+------+-------+ | 'apple' | 2 | 5 | 'apple' | 3 | 7 | 'bar' | +----------+-------+-----+----------+-------+------+-------+ | 'orange' | 2 | 5 | 'orange' | 4 | 9 | 'baz' | +----------+-------+-----+----------+-------+------+-------+ """ assert (lkey is None) == (rkey is None), \ 'facet key field must be provided for both or neither table' return IntervalJoinView(left, right, lstart=lstart, lstop=lstop, rstart=rstart, rstop=rstop, lkey=lkey, rkey=rkey, include_stop=include_stop, lprefix=lprefix, rprefix=rprefix) Table.intervaljoin = intervaljoin class IntervalJoinView(Table): def __init__(self, left, right, lstart='start', lstop='stop', rstart='start', rstop='stop', lkey=None, rkey=None, include_stop=False, lprefix=None, rprefix=None): self.left = left self.lstart = lstart self.lstop = lstop self.lkey = lkey self.right = right self.rstart = rstart self.rstop = rstop self.rkey = rkey self.include_stop = include_stop self.lprefix = lprefix self.rprefix = rprefix def __iter__(self): return iterintervaljoin( left=self.left, right=self.right, lstart=self.lstart, lstop=self.lstop, rstart=self.rstart, rstop=self.rstop, lkey=self.lkey, rkey=self.rkey, include_stop=self.include_stop, missing=None, lprefix=self.lprefix, rprefix=self.rprefix, leftouter=False ) def intervalleftjoin(left, right, lstart='start', lstop='stop', rstart='start', rstop='stop', lkey=None, rkey=None, include_stop=False, missing=None, lprefix=None, rprefix=None): """ Like :func:`petl.transform.intervals.intervaljoin` but rows from the left table without a match in the right table are also included. E.g.:: >>> import petl as etl >>> left = [['begin', 'end', 'quux'], ... [1, 2, 'a'], ... [2, 4, 'b'], ... [2, 5, 'c'], ... [9, 14, 'd'], ... [1, 1, 'e'], ... [10, 10, 'f']] >>> right = [['start', 'stop', 'value'], ... [1, 4, 'foo'], ... [3, 7, 'bar'], ... [4, 9, 'baz']] >>> table1 = etl.intervalleftjoin(left, right, ... lstart='begin', lstop='end', ... rstart='start', rstop='stop') >>> table1.lookall() +-------+-----+------+-------+------+-------+ | begin | end | quux | start | stop | value | +=======+=====+======+=======+======+=======+ | 1 | 2 | 'a' | 1 | 4 | 'foo' | +-------+-----+------+-------+------+-------+ | 2 | 4 | 'b' | 1 | 4 | 'foo' | +-------+-----+------+-------+------+-------+ | 2 | 4 | 'b' | 3 | 7 | 'bar' | +-------+-----+------+-------+------+-------+ | 2 | 5 | 'c' | 1 | 4 | 'foo' | +-------+-----+------+-------+------+-------+ | 2 | 5 | 'c' | 3 | 7 | 'bar' | +-------+-----+------+-------+------+-------+ | 2 | 5 | 'c' | 4 | 9 | 'baz' | +-------+-----+------+-------+------+-------+ | 9 | 14 | 'd' | None | None | None | +-------+-----+------+-------+------+-------+ | 1 | 1 | 'e' | None | None | None | +-------+-----+------+-------+------+-------+ | 10 | 10 | 'f' | None | None | None | +-------+-----+------+-------+------+-------+ Note start coordinates are included and stop coordinates are excluded from the interval. Use the `include_stop` keyword argument to include the upper bound of the interval when finding overlaps. """ assert (lkey is None) == (rkey is None), \ 'facet key field must be provided for both or neither table' return IntervalLeftJoinView(left, right, lstart=lstart, lstop=lstop, rstart=rstart, rstop=rstop, lkey=lkey, rkey=rkey, include_stop=include_stop, missing=missing, lprefix=lprefix, rprefix=rprefix) Table.intervalleftjoin = intervalleftjoin class IntervalLeftJoinView(Table): def __init__(self, left, right, lstart='start', lstop='stop', rstart='start', rstop='stop', lkey=None, rkey=None, missing=None, include_stop=False, lprefix=None, rprefix=None): self.left = left self.lstart = lstart self.lstop = lstop self.lkey = lkey self.right = right self.rstart = rstart self.rstop = rstop self.rkey = rkey self.missing = missing self.include_stop = include_stop self.lprefix = lprefix self.rprefix = rprefix def __iter__(self): return iterintervaljoin( left=self.left, right=self.right, lstart=self.lstart, lstop=self.lstop, rstart=self.rstart, rstop=self.rstop, lkey=self.lkey, rkey=self.rkey, include_stop=self.include_stop, missing=self.missing, lprefix=self.lprefix, rprefix=self.rprefix, leftouter=True ) def intervalantijoin(left, right, lstart='start', lstop='stop', rstart='start', rstop='stop', lkey=None, rkey=None, include_stop=False, missing=None): """ Return rows from the `left` table with no overlapping rows from the `right` table. Note start coordinates are included and stop coordinates are excluded from the interval. Use the `include_stop` keyword argument to include the upper bound of the interval when finding overlaps. """ assert (lkey is None) == (rkey is None), \ 'facet key field must be provided for both or neither table' return IntervalAntiJoinView(left, right, lstart=lstart, lstop=lstop, rstart=rstart, rstop=rstop, lkey=lkey, rkey=rkey, include_stop=include_stop, missing=missing) Table.intervalantijoin = intervalantijoin class IntervalAntiJoinView(Table): def __init__(self, left, right, lstart='start', lstop='stop', rstart='start', rstop='stop', lkey=None, rkey=None, missing=None, include_stop=False): self.left = left self.lstart = lstart self.lstop = lstop self.lkey = lkey self.right = right self.rstart = rstart self.rstop = rstop self.rkey = rkey self.missing = missing self.include_stop = include_stop def __iter__(self): return iterintervaljoin( left=self.left, right=self.right, lstart=self.lstart, lstop=self.lstop, rstart=self.rstart, rstop=self.rstop, lkey=self.lkey, rkey=self.rkey, include_stop=self.include_stop, missing=self.missing, lprefix=None, rprefix=None, leftouter=True, anti=True ) def iterintervaljoin(left, right, lstart, lstop, rstart, rstop, lkey, rkey, include_stop, missing, lprefix, rprefix, leftouter, anti=False): # create iterators and obtain fields lit = iter(left) lhdr = next(lit) lflds = list(map(text_type, lhdr)) rit = iter(right) rhdr = next(rit) rflds = list(map(text_type, rhdr)) # check fields via petl.util.asindices (raises FieldSelectionError if spec # is not valid) asindices(lhdr, lstart) asindices(lhdr, lstop) if lkey is not None: asindices(lhdr, lkey) asindices(rhdr, rstart) asindices(rhdr, rstop) if rkey is not None: asindices(rhdr, rkey) # determine output fields if lprefix is None: outhdr = list(lflds) if not anti: outhdr.extend(rflds) else: outhdr = list(lprefix + f for f in lflds) if not anti: outhdr.extend(rprefix + f for f in rflds) yield tuple(outhdr) # create getters for start and stop positions getlstart = itemgetter(lflds.index(lstart)) getlstop = itemgetter(lflds.index(lstop)) if rkey is None: # build interval lookup for right table lookup = intervallookup(right, rstart, rstop, include_stop=include_stop) search = lookup.search # main loop for lrow in lit: start = getlstart(lrow) stop = getlstop(lrow) rrows = search(start, stop) if rrows: if not anti: for rrow in rrows: outrow = list(lrow) outrow.extend(rrow) yield tuple(outrow) elif leftouter: outrow = list(lrow) if not anti: outrow.extend([missing] * len(rflds)) yield tuple(outrow) else: # build interval lookup for right table lookup = facetintervallookup(right, key=rkey, start=rstart, stop=rstop, include_stop=include_stop) search = dict() for f in lookup: search[f] = lookup[f].search # getter for facet key values in left table getlkey = itemgetter(*asindices(lflds, lkey)) # main loop for lrow in lit: lkey = getlkey(lrow) start = getlstart(lrow) stop = getlstop(lrow) try: rrows = search[lkey](start, stop) except KeyError: rrows = None except AttributeError: rrows = None if rrows: if not anti: for rrow in rrows: outrow = list(lrow) outrow.extend(rrow) yield tuple(outrow) elif leftouter: outrow = list(lrow) if not anti: outrow.extend([missing] * len(rflds)) yield tuple(outrow) def intervaljoinvalues(left, right, value, lstart='start', lstop='stop', rstart='start', rstop='stop', lkey=None, rkey=None, include_stop=False): """ Convenience function to join the left table with values from a specific field in the right hand table. Note start coordinates are included and stop coordinates are excluded from the interval. Use the `include_stop` keyword argument to include the upper bound of the interval when finding overlaps. """ assert (lkey is None) == (rkey is None), \ 'facet key field must be provided for both or neither table' if lkey is None: lkp = intervallookup(right, start=rstart, stop=rstop, value=value, include_stop=include_stop) f = lambda row: lkp.search(row[lstart], row[lstop]) else: lkp = facetintervallookup(right, rkey, start=rstart, stop=rstop, value=value, include_stop=include_stop) f = lambda row: lkp[row[lkey]].search(row[lstart], row[lstop]) return addfield(left, value, f) Table.intervaljoinvalues = intervaljoinvalues def intervalsubtract(left, right, lstart='start', lstop='stop', rstart='start', rstop='stop', lkey=None, rkey=None, include_stop=False): """ Subtract intervals in the right hand table from intervals in the left hand table. """ assert (lkey is None) == (rkey is None), \ 'facet key field must be provided for both or neither table' return IntervalSubtractView(left, right, lstart=lstart, lstop=lstop, rstart=rstart, rstop=rstop, lkey=lkey, rkey=rkey, include_stop=include_stop) Table.intervalsubtract = intervalsubtract class IntervalSubtractView(Table): def __init__(self, left, right, lstart='start', lstop='stop', rstart='start', rstop='stop', lkey=None, rkey=None, include_stop=False): self.left = left self.lstart = lstart self.lstop = lstop self.lkey = lkey self.right = right self.rstart = rstart self.rstop = rstop self.rkey = rkey self.include_stop = include_stop def __iter__(self): return iterintervalsubtract(self.left, self.right, self.lstart, self.lstop, self.rstart, self.rstop, self.lkey, self.rkey, self.include_stop) def iterintervalsubtract(left, right, lstart, lstop, rstart, rstop, lkey, rkey, include_stop): # create iterators and obtain fields lit = iter(left) lhdr = next(lit) lflds = list(map(text_type, lhdr)) rit = iter(right) rhdr = next(rit) # check fields via petl.util.asindices (raises FieldSelectionError if spec # is not valid) asindices(lhdr, lstart) asindices(lhdr, lstop) if lkey is not None: asindices(lhdr, lkey) asindices(rhdr, rstart) asindices(rhdr, rstop) if rkey is not None: asindices(rhdr, rkey) # determine output fields outhdr = list(lflds) yield tuple(outhdr) # create getters for start and stop positions lstartidx, lstopidx = asindices(lhdr, (lstart, lstop)) getlcoords = itemgetter(lstartidx, lstopidx) getrcoords = itemgetter(*asindices(rhdr, (rstart, rstop))) if rkey is None: # build interval lookup for right table lookup = intervallookup(right, rstart, rstop, include_stop=include_stop) search = lookup.search # main loop for lrow in lit: start, stop = getlcoords(lrow) rrows = search(start, stop) if not rrows: yield tuple(lrow) else: rivs = sorted([getrcoords(rrow) for rrow in rrows], key=itemgetter(0)) # sort by start for x, y in _subtract(start, stop, rivs): out = list(lrow) out[lstartidx] = x out[lstopidx] = y yield tuple(out) else: # build interval lookup for right table lookup = facetintervallookup(right, key=rkey, start=rstart, stop=rstop, include_stop=include_stop) # getter for facet key values in left table getlkey = itemgetter(*asindices(lhdr, lkey)) # main loop for lrow in lit: lkey = getlkey(lrow) start, stop = getlcoords(lrow) try: rrows = lookup[lkey].search(start, stop) except KeyError: rrows = None except AttributeError: rrows = None if not rrows: yield tuple(lrow) else: rivs = sorted([getrcoords(rrow) for rrow in rrows], key=itemgetter(0)) # sort by start for x, y in _subtract(start, stop, rivs): out = list(lrow) out[lstartidx] = x out[lstopidx] = y yield tuple(out) from collections import namedtuple _Interval = namedtuple('Interval', 'start stop') def collapsedintervals(table, start='start', stop='stop', key=None): """ Utility function to collapse intervals in a table. If no facet `key` is given, returns an iterator over `(start, stop)` tuples. If facet `key` is given, returns an iterator over `(key, start, stop)` tuples. """ if key is None: table = sort(table, key=start) for iv in _collapse(values(table, (start, stop))): yield iv else: table = sort(table, key=(key, start)) for k, g in rowgroupby(table, key=key, value=(start, stop)): for iv in _collapse(g): yield (k,) + iv Table.collapsedintervals = collapsedintervals def _collapse(intervals): """ Collapse an iterable of intervals sorted by start coord. """ span = None for start, stop in intervals: if span is None: span = _Interval(start, stop) elif start <= span.stop < stop: span = _Interval(span.start, stop) elif start > span.stop: yield span span = _Interval(start, stop) if span is not None: yield span def _subtract(start, stop, intervals): """ Subtract intervals from a spanning interval. """ remainder_start = start sub_stop = None for sub_start, sub_stop in _collapse(intervals): if remainder_start < sub_start: yield _Interval(remainder_start, sub_start) remainder_start = sub_stop if sub_stop is not None and sub_stop < stop: yield _Interval(sub_stop, stop) petl-1.7.15/petl/transform/joins.py000066400000000000000000000764421457414240700172510ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division import itertools import operator from petl.compat import next, text_type from petl.errors import ArgumentError from petl.comparison import comparable_itemgetter, Comparable from petl.util.base import Table, asindices, rowgetter, rowgroupby, \ header, data from petl.transform.sorts import sort from petl.transform.basics import cut, cutout from petl.transform.dedup import distinct def natural_key(left, right): # determine key field or fields lhdr = header(left) lflds = list(map(str, lhdr)) rhdr = header(right) rflds = list(map(str, rhdr)) key = [f for f in lflds if f in rflds] assert len(key) > 0, 'no fields in common' if len(key) == 1: key = key[0] # deal with singletons return key def keys_from_args(left, right, key, lkey, rkey): if key is lkey is rkey is None: # no keys specified, attempt natural join lkey = rkey = natural_key(left, right) elif key is not None and lkey is rkey is None: # common key specified lkey = rkey = key elif key is None and lkey is not None and rkey is not None: # left and right keys specified pass else: raise ArgumentError( 'bad key arguments: either specify key, or specify both lkey and ' 'rkey, or provide no key/lkey/rkey arguments at all (natural join)' ) return lkey, rkey def join(left, right, key=None, lkey=None, rkey=None, presorted=False, buffersize=None, tempdir=None, cache=True, lprefix=None, rprefix=None): """ Perform an equi-join on the given tables. E.g.:: >>> import petl as etl >>> table1 = [['id', 'colour'], ... [1, 'blue'], ... [2, 'red'], ... [3, 'purple']] >>> table2 = [['id', 'shape'], ... [1, 'circle'], ... [3, 'square'], ... [4, 'ellipse']] >>> table3 = etl.join(table1, table2, key='id') >>> table3 +----+----------+----------+ | id | colour | shape | +====+==========+==========+ | 1 | 'blue' | 'circle' | +----+----------+----------+ | 3 | 'purple' | 'square' | +----+----------+----------+ >>> # if no key is given, a natural join is tried ... table4 = etl.join(table1, table2) >>> table4 +----+----------+----------+ | id | colour | shape | +====+==========+==========+ | 1 | 'blue' | 'circle' | +----+----------+----------+ | 3 | 'purple' | 'square' | +----+----------+----------+ >>> # note behaviour if the key is not unique in either or both tables ... table5 = [['id', 'colour'], ... [1, 'blue'], ... [1, 'red'], ... [2, 'purple']] >>> table6 = [['id', 'shape'], ... [1, 'circle'], ... [1, 'square'], ... [2, 'ellipse']] >>> table7 = etl.join(table5, table6, key='id') >>> table7 +----+----------+-----------+ | id | colour | shape | +====+==========+===========+ | 1 | 'blue' | 'circle' | +----+----------+-----------+ | 1 | 'blue' | 'square' | +----+----------+-----------+ | 1 | 'red' | 'circle' | +----+----------+-----------+ | 1 | 'red' | 'square' | +----+----------+-----------+ | 2 | 'purple' | 'ellipse' | +----+----------+-----------+ >>> # compound keys are supported ... table8 = [['id', 'time', 'height'], ... [1, 1, 12.3], ... [1, 2, 34.5], ... [2, 1, 56.7]] >>> table9 = [['id', 'time', 'weight'], ... [1, 2, 4.5], ... [2, 1, 6.7], ... [2, 2, 8.9]] >>> table10 = etl.join(table8, table9, key=['id', 'time']) >>> table10 +----+------+--------+--------+ | id | time | height | weight | +====+======+========+========+ | 1 | 2 | 34.5 | 4.5 | +----+------+--------+--------+ | 2 | 1 | 56.7 | 6.7 | +----+------+--------+--------+ If `presorted` is True, it is assumed that the data are already sorted by the given key, and the `buffersize`, `tempdir` and `cache` arguments are ignored. Otherwise, the data are sorted, see also the discussion of the `buffersize`, `tempdir` and `cache` arguments under the :func:`petl.transform.sorts.sort` function. Left and right tables with different key fields can be handled via the `lkey` and `rkey` arguments. """ # TODO don't read data twice (occurs if using natural key) lkey, rkey = keys_from_args(left, right, key, lkey, rkey) return JoinView(left, right, lkey=lkey, rkey=rkey, presorted=presorted, buffersize=buffersize, tempdir=tempdir, cache=cache, lprefix=lprefix, rprefix=rprefix) Table.join = join class JoinView(Table): def __init__(self, left, right, lkey, rkey, presorted=False, leftouter=False, rightouter=False, missing=None, buffersize=None, tempdir=None, cache=True, lprefix=None, rprefix=None): self.lkey = lkey self.rkey = rkey if presorted: self.left = left self.right = right else: self.left = sort(left, lkey, buffersize=buffersize, tempdir=tempdir, cache=cache) self.right = sort(right, rkey, buffersize=buffersize, tempdir=tempdir, cache=cache) self.leftouter = leftouter self.rightouter = rightouter self.missing = missing self.lprefix = lprefix self.rprefix = rprefix def __iter__(self): return iterjoin(self.left, self.right, self.lkey, self.rkey, leftouter=self.leftouter, rightouter=self.rightouter, missing=self.missing, lprefix=self.lprefix, rprefix=self.rprefix) def leftjoin(left, right, key=None, lkey=None, rkey=None, missing=None, presorted=False, buffersize=None, tempdir=None, cache=True, lprefix=None, rprefix=None): """ Perform a left outer join on the given tables. E.g.:: >>> import petl as etl >>> table1 = [['id', 'colour'], ... [1, 'blue'], ... [2, 'red'], ... [3, 'purple']] >>> table2 = [['id', 'shape'], ... [1, 'circle'], ... [3, 'square'], ... [4, 'ellipse']] >>> table3 = etl.leftjoin(table1, table2, key='id') >>> table3 +----+----------+----------+ | id | colour | shape | +====+==========+==========+ | 1 | 'blue' | 'circle' | +----+----------+----------+ | 2 | 'red' | None | +----+----------+----------+ | 3 | 'purple' | 'square' | +----+----------+----------+ If `presorted` is True, it is assumed that the data are already sorted by the given key, and the `buffersize`, `tempdir` and `cache` arguments are ignored. Otherwise, the data are sorted, see also the discussion of the `buffersize`, `tempdir` and `cache` arguments under the :func:`petl.transform.sorts.sort` function. Left and right tables with different key fields can be handled via the `lkey` and `rkey` arguments. """ # TODO don't read data twice (occurs if using natural key) lkey, rkey = keys_from_args(left, right, key, lkey, rkey) return JoinView(left, right, lkey=lkey, rkey=rkey, presorted=presorted, leftouter=True, rightouter=False, missing=missing, buffersize=buffersize, tempdir=tempdir, cache=cache, lprefix=lprefix, rprefix=rprefix) Table.leftjoin = leftjoin def rightjoin(left, right, key=None, lkey=None, rkey=None, missing=None, presorted=False, buffersize=None, tempdir=None, cache=True, lprefix=None, rprefix=None): """ Perform a right outer join on the given tables. E.g.:: >>> import petl as etl >>> table1 = [['id', 'colour'], ... [1, 'blue'], ... [2, 'red'], ... [3, 'purple']] >>> table2 = [['id', 'shape'], ... [1, 'circle'], ... [3, 'square'], ... [4, 'ellipse']] >>> table3 = etl.rightjoin(table1, table2, key='id') >>> table3 +----+----------+-----------+ | id | colour | shape | +====+==========+===========+ | 1 | 'blue' | 'circle' | +----+----------+-----------+ | 3 | 'purple' | 'square' | +----+----------+-----------+ | 4 | None | 'ellipse' | +----+----------+-----------+ If `presorted` is True, it is assumed that the data are already sorted by the given key, and the `buffersize`, `tempdir` and `cache` arguments are ignored. Otherwise, the data are sorted, see also the discussion of the `buffersize`, `tempdir` and `cache` arguments under the :func:`petl.transform.sorts.sort` function. Left and right tables with different key fields can be handled via the `lkey` and `rkey` arguments. """ # TODO don't read data twice (occurs if using natural key) lkey, rkey = keys_from_args(left, right, key, lkey, rkey) return JoinView(left, right, lkey=lkey, rkey=rkey, presorted=presorted, leftouter=False, rightouter=True, missing=missing, buffersize=buffersize, tempdir=tempdir, cache=cache, lprefix=lprefix, rprefix=rprefix) Table.rightjoin = rightjoin def outerjoin(left, right, key=None, lkey=None, rkey=None, missing=None, presorted=False, buffersize=None, tempdir=None, cache=True, lprefix=None, rprefix=None): """ Perform a full outer join on the given tables. E.g.:: >>> import petl as etl >>> table1 = [['id', 'colour'], ... [1, 'blue'], ... [2, 'red'], ... [3, 'purple']] >>> table2 = [['id', 'shape'], ... [1, 'circle'], ... [3, 'square'], ... [4, 'ellipse']] >>> table3 = etl.outerjoin(table1, table2, key='id') >>> table3 +----+----------+-----------+ | id | colour | shape | +====+==========+===========+ | 1 | 'blue' | 'circle' | +----+----------+-----------+ | 2 | 'red' | None | +----+----------+-----------+ | 3 | 'purple' | 'square' | +----+----------+-----------+ | 4 | None | 'ellipse' | +----+----------+-----------+ If `presorted` is True, it is assumed that the data are already sorted by the given key, and the `buffersize`, `tempdir` and `cache` arguments are ignored. Otherwise, the data are sorted, see also the discussion of the `buffersize`, `tempdir` and `cache` arguments under the :func:`petl.transform.sorts.sort` function. Left and right tables with different key fields can be handled via the `lkey` and `rkey` arguments. """ # TODO don't read data twice (occurs if using natural key) lkey, rkey = keys_from_args(left, right, key, lkey, rkey) return JoinView(left, right, lkey=lkey, rkey=rkey, presorted=presorted, leftouter=True, rightouter=True, missing=missing, buffersize=buffersize, tempdir=tempdir, cache=cache, lprefix=lprefix, rprefix=rprefix) Table.outerjoin = outerjoin def iterjoin(left, right, lkey, rkey, leftouter=False, rightouter=False, missing=None, lprefix=None, rprefix=None): lit = iter(left) rit = iter(right) lhdr = next(lit) rhdr = next(rit) # determine indices of the key fields in left and right tables lkind = asindices(lhdr, lkey) rkind = asindices(rhdr, rkey) # construct functions to extract key values from both tables lgetk = comparable_itemgetter(*lkind) rgetk = comparable_itemgetter(*rkind) # determine indices of non-key fields in the right table # (in the output, we only include key fields from the left table - we # don't want to duplicate fields) rvind = [i for i in range(len(rhdr)) if i not in rkind] rgetv = rowgetter(*rvind) # determine the output fields if lprefix is None: outhdr = list(lhdr) else: outhdr = [(text_type(lprefix) + text_type(f)) for f in lhdr] if rprefix is None: outhdr.extend(rgetv(rhdr)) else: outhdr.extend([(text_type(rprefix) + text_type(f)) for f in rgetv(rhdr)]) yield tuple(outhdr) # define a function to join two groups of rows def joinrows(_lrowgrp, _rrowgrp): if _rrowgrp is None: for lrow in _lrowgrp: outrow = list(lrow) # start with the left row # extend with missing values in place of the right row outrow.extend([missing] * len(rvind)) yield tuple(outrow) elif _lrowgrp is None: for rrow in _rrowgrp: # start with missing values in place of the left row outrow = [missing] * len(lhdr) # set key values for li, ri in zip(lkind, rkind): outrow[li] = rrow[ri] # extend with non-key values from the right row outrow.extend(rgetv(rrow)) yield tuple(outrow) else: _rrowgrp = list(_rrowgrp) # may need to iterate more than once for lrow in _lrowgrp: for rrow in _rrowgrp: # start with the left row outrow = list(lrow) # extend with non-key values from the right row outrow.extend(rgetv(rrow)) yield tuple(outrow) # construct group iterators for both tables lgit = itertools.groupby(lit, key=lgetk) rgit = itertools.groupby(rit, key=rgetk) lrowgrp = [] rrowgrp = [] # loop until *either* of the iterators is exhausted # initialise here to handle empty tables lkval, rkval = Comparable(None), Comparable(None) try: # pick off initial row groups lkval, lrowgrp = next(lgit) rkval, rrowgrp = next(rgit) while True: if lkval < rkval: if leftouter: for row in joinrows(lrowgrp, None): yield tuple(row) # advance left lkval, lrowgrp = next(lgit) elif lkval > rkval: if rightouter: for row in joinrows(None, rrowgrp): yield tuple(row) # advance right rkval, rrowgrp = next(rgit) else: for row in joinrows(lrowgrp, rrowgrp): yield tuple(row) # advance both lkval, lrowgrp = next(lgit) rkval, rrowgrp = next(rgit) except StopIteration: pass # make sure any left rows remaining are yielded if leftouter: if lkval > rkval: # yield anything that got left hanging for row in joinrows(lrowgrp, None): yield tuple(row) # yield the rest for lkval, lrowgrp in lgit: for row in joinrows(lrowgrp, None): yield tuple(row) # make sure any right rows remaining are yielded if rightouter: if lkval < rkval: # yield anything that got left hanging for row in joinrows(None, rrowgrp): yield tuple(row) # yield the rest for rkval, rrowgrp in rgit: for row in joinrows(None, rrowgrp): yield tuple(row) def crossjoin(*tables, **kwargs): """ Form the cartesian product of the given tables. E.g.:: >>> import petl as etl >>> table1 = [['id', 'colour'], ... [1, 'blue'], ... [2, 'red']] >>> table2 = [['id', 'shape'], ... [1, 'circle'], ... [3, 'square']] >>> table3 = etl.crossjoin(table1, table2) >>> table3 +----+--------+----+----------+ | id | colour | id | shape | +====+========+====+==========+ | 1 | 'blue' | 1 | 'circle' | +----+--------+----+----------+ | 1 | 'blue' | 3 | 'square' | +----+--------+----+----------+ | 2 | 'red' | 1 | 'circle' | +----+--------+----+----------+ | 2 | 'red' | 3 | 'square' | +----+--------+----+----------+ If `prefix` is `True` then field names in the output table header will be prefixed by the index of the input table. """ return CrossJoinView(*tables, **kwargs) Table.crossjoin = crossjoin class CrossJoinView(Table): def __init__(self, *sources, **kwargs): self.sources = sources self.prefix = kwargs.get('prefix', False) def __iter__(self): return itercrossjoin(self.sources, self.prefix) def itercrossjoin(sources, prefix): # construct fields outhdr = list() for i, s in enumerate(sources): if prefix: # use one-based numbering outhdr.extend([text_type(i+1) + '_' + text_type(f) for f in header(s)]) else: outhdr.extend(header(s)) yield tuple(outhdr) datasrcs = [data(src) for src in sources] for prod in itertools.product(*datasrcs): outrow = list() for row in prod: outrow.extend(row) yield tuple(outrow) def antijoin(left, right, key=None, lkey=None, rkey=None, presorted=False, buffersize=None, tempdir=None, cache=True): """ Return rows from the `left` table where the key value does not occur in the `right` table. E.g.:: >>> import petl as etl >>> table1 = [['id', 'colour'], ... [0, 'black'], ... [1, 'blue'], ... [2, 'red'], ... [4, 'yellow'], ... [5, 'white']] >>> table2 = [['id', 'shape'], ... [1, 'circle'], ... [3, 'square']] >>> table3 = etl.antijoin(table1, table2, key='id') >>> table3 +----+----------+ | id | colour | +====+==========+ | 0 | 'black' | +----+----------+ | 2 | 'red' | +----+----------+ | 4 | 'yellow' | +----+----------+ | 5 | 'white' | +----+----------+ If `presorted` is True, it is assumed that the data are already sorted by the given key, and the `buffersize`, `tempdir` and `cache` arguments are ignored. Otherwise, the data are sorted, see also the discussion of the `buffersize`, `tempdir` and `cache` arguments under the :func:`petl.transform.sorts.sort` function. Left and right tables with different key fields can be handled via the `lkey` and `rkey` arguments. """ lkey, rkey = keys_from_args(left, right, key, lkey, rkey) return AntiJoinView(left=left, right=right, lkey=lkey, rkey=rkey, presorted=presorted, buffersize=buffersize, tempdir=tempdir, cache=cache) Table.antijoin = antijoin class AntiJoinView(Table): def __init__(self, left, right, lkey, rkey, presorted=False, buffersize=None, tempdir=None, cache=True): if presorted: self.left = left self.right = right else: self.left = sort(left, lkey, buffersize=buffersize, tempdir=tempdir, cache=cache) self.right = sort(right, rkey, buffersize=buffersize, tempdir=tempdir, cache=cache) self.lkey = lkey self.rkey = rkey def __iter__(self): return iterantijoin(self.left, self.right, self.lkey, self.rkey) def iterantijoin(left, right, lkey, rkey): lit = iter(left) rit = iter(right) lhdr = next(lit) rhdr = next(rit) yield tuple(lhdr) # determine indices of the key fields in left and right tables lkind = asindices(lhdr, lkey) rkind = asindices(rhdr, rkey) # construct functions to extract key values from both tables lgetk = comparable_itemgetter(*lkind) rgetk = comparable_itemgetter(*rkind) # construct group iterators for both tables lgit = itertools.groupby(lit, key=lgetk) rgit = itertools.groupby(rit, key=rgetk) lrowgrp = [] # loop until *either* of the iterators is exhausted lkval, rkval = Comparable(None), Comparable(None) try: # pick off initial row groups lkval, lrowgrp = next(lgit) rkval, _ = next(rgit) while True: if lkval < rkval: for row in lrowgrp: yield tuple(row) # advance left lkval, lrowgrp = next(lgit) elif lkval > rkval: # advance right rkval, _ = next(rgit) else: # advance both lkval, lrowgrp = next(lgit) rkval, _ = next(rgit) except StopIteration: pass # any left over? if lkval > rkval: # yield anything that got left hanging for row in lrowgrp: yield tuple(row) # and the rest... for lkval, lrowgrp in lgit: for row in lrowgrp: yield tuple(row) def lookupjoin(left, right, key=None, lkey=None, rkey=None, missing=None, presorted=False, buffersize=None, tempdir=None, cache=True, lprefix=None, rprefix=None): """ Perform a left join, but where the key is not unique in the right-hand table, arbitrarily choose the first row and ignore others. E.g.:: >>> import petl as etl >>> table1 = [['id', 'color', 'cost'], ... [1, 'blue', 12], ... [2, 'red', 8], ... [3, 'purple', 4]] >>> table2 = [['id', 'shape', 'size'], ... [1, 'circle', 'big'], ... [1, 'circle', 'small'], ... [2, 'square', 'tiny'], ... [2, 'square', 'big'], ... [3, 'ellipse', 'small'], ... [3, 'ellipse', 'tiny']] >>> table3 = etl.lookupjoin(table1, table2, key='id') >>> table3 +----+----------+------+-----------+---------+ | id | color | cost | shape | size | +====+==========+======+===========+=========+ | 1 | 'blue' | 12 | 'circle' | 'big' | +----+----------+------+-----------+---------+ | 2 | 'red' | 8 | 'square' | 'tiny' | +----+----------+------+-----------+---------+ | 3 | 'purple' | 4 | 'ellipse' | 'small' | +----+----------+------+-----------+---------+ See also :func:`petl.transform.joins.leftjoin`. """ lkey, rkey = keys_from_args(left, right, key, lkey, rkey) return LookupJoinView(left, right, lkey, rkey, presorted=presorted, missing=missing, buffersize=buffersize, tempdir=tempdir, cache=cache, lprefix=lprefix, rprefix=rprefix) Table.lookupjoin = lookupjoin class LookupJoinView(Table): def __init__(self, left, right, lkey, rkey, presorted=False, missing=None, buffersize=None, tempdir=None, cache=True, lprefix=None, rprefix=None): if presorted: self.left = left self.right = right else: self.left = sort(left, lkey, buffersize=buffersize, tempdir=tempdir, cache=cache) self.right = sort(right, rkey, buffersize=buffersize, tempdir=tempdir, cache=cache) self.lkey = lkey self.rkey = rkey self.missing = missing self.lprefix = lprefix self.rprefix = rprefix def __iter__(self): return iterlookupjoin(self.left, self.right, self.lkey, self.rkey, missing=self.missing, lprefix=self.lprefix, rprefix=self.rprefix) def iterlookupjoin(left, right, lkey, rkey, missing=None, lprefix=None, rprefix=None): lit = iter(left) rit = iter(right) lhdr = next(lit) rhdr = next(rit) # determine indices of the key fields in left and right tables lkind = asindices(lhdr, lkey) rkind = asindices(rhdr, rkey) # construct functions to extract key values from both tables lgetk = operator.itemgetter(*lkind) rgetk = operator.itemgetter(*rkind) # determine indices of non-key fields in the right table # (in the output, we only include key fields from the left table - we # don't want to duplicate fields) rvind = [i for i in range(len(rhdr)) if i not in rkind] rgetv = rowgetter(*rvind) # determine the output fields if lprefix is None: outhdr = list(lhdr) else: outhdr = [(text_type(lprefix) + text_type(f)) for f in lhdr] if rprefix is None: outhdr.extend(rgetv(rhdr)) else: outhdr.extend([(text_type(rprefix) + text_type(f)) for f in rgetv(rhdr)]) yield tuple(outhdr) # define a function to join two groups of rows def joinrows(_lrowgrp, _rrowgrp): if _rrowgrp is None: for lrow in _lrowgrp: outrow = list(lrow) # start with the left row # extend with missing values in place of the right row outrow.extend([missing] * len(rvind)) yield tuple(outrow) else: rrow = next(iter(_rrowgrp)) # pick first arbitrarily for lrow in _lrowgrp: # start with the left row outrow = list(lrow) # extend with non-key values from the right row outrow.extend(rgetv(rrow)) yield tuple(outrow) # construct group iterators for both tables lgit = itertools.groupby(lit, key=lgetk) rgit = itertools.groupby(rit, key=rgetk) lrowgrp = [] # loop until *either* of the iterators is exhausted lkval, rkval = None, None # initialise here to handle empty tables try: # pick off initial row groups lkval, lrowgrp = next(lgit) rkval, rrowgrp = next(rgit) while True: if lkval < rkval: for row in joinrows(lrowgrp, None): yield tuple(row) # advance left lkval, lrowgrp = next(lgit) elif lkval > rkval: # advance right rkval, rrowgrp = next(rgit) else: for row in joinrows(lrowgrp, rrowgrp): yield tuple(row) # advance both lkval, lrowgrp = next(lgit) rkval, rrowgrp = next(rgit) except StopIteration: pass # make sure any left rows remaining are yielded if lkval > rkval: # yield anything that got left hanging for row in joinrows(lrowgrp, None): yield tuple(row) # yield the rest for lkval, lrowgrp in lgit: for row in joinrows(lrowgrp, None): yield tuple(row) def unjoin(table, value, key=None, autoincrement=(1, 1), presorted=False, buffersize=None, tempdir=None, cache=True): """ Split a table into two tables by reversing an inner join. E.g.:: >>> import petl as etl >>> # join key is present in the table ... table1 = (('foo', 'bar', 'baz'), ... ('A', 1, 'apple'), ... ('B', 1, 'apple'), ... ('C', 2, 'orange')) >>> table2, table3 = etl.unjoin(table1, 'baz', key='bar') >>> table2 +-----+-----+ | foo | bar | +=====+=====+ | 'A' | 1 | +-----+-----+ | 'B' | 1 | +-----+-----+ | 'C' | 2 | +-----+-----+ >>> table3 +-----+----------+ | bar | baz | +=====+==========+ | 1 | 'apple' | +-----+----------+ | 2 | 'orange' | +-----+----------+ >>> # an integer join key can also be reconstructed ... table4 = (('foo', 'bar'), ... ('A', 'apple'), ... ('B', 'apple'), ... ('C', 'orange')) >>> table5, table6 = etl.unjoin(table4, 'bar') >>> table5 +-----+--------+ | foo | bar_id | +=====+========+ | 'A' | 1 | +-----+--------+ | 'B' | 1 | +-----+--------+ | 'C' | 2 | +-----+--------+ >>> table6 +----+----------+ | id | bar | +====+==========+ | 1 | 'apple' | +----+----------+ | 2 | 'orange' | +----+----------+ The `autoincrement` parameter controls how an integer join key is reconstructed, and should be a tuple of (`start`, `step`). """ if key is None: # first sort the table by the value field if presorted: tbl_sorted = table else: tbl_sorted = sort(table, value, buffersize=buffersize, tempdir=tempdir, cache=cache) # on the left, return the original table but with the value field # replaced by an incrementing integer left = ConvertToIncrementingCounterView(tbl_sorted, value, autoincrement) # on the right, return a new table with distinct values from the # given field right = EnumerateDistinctView(tbl_sorted, value, autoincrement) else: # on the left, return distinct rows from the original table # with the value field cut out left = distinct(cutout(table, value)) # on the right, return distinct rows from the original table # with all fields but the key and value cut out right = distinct(cut(table, key, value)) return left, right class ConvertToIncrementingCounterView(Table): def __init__(self, tbl, value, autoincrement): self.table = tbl self.value = value self.autoincrement = autoincrement def __iter__(self): it = iter(self.table) hdr = next(it) table = itertools.chain([hdr], it) value = self.value vidx = hdr.index(value) outhdr = list(hdr) outhdr[vidx] = '%s_id' % value yield tuple(outhdr) offset, multiplier = self.autoincrement for n, (_, group) in enumerate(rowgroupby(table, value)): for row in group: outrow = list(row) outrow[vidx] = (n * multiplier) + offset yield tuple(outrow) Table.unjoin = unjoin class EnumerateDistinctView(Table): def __init__(self, tbl, value, autoincrement): self.table = tbl self.value = value self.autoincrement = autoincrement def __iter__(self): offset, multiplier = self.autoincrement yield ('id', self.value) for n, (v, _) in enumerate(rowgroupby(self.table, self.value)): yield ((n * multiplier) + offset, v) petl-1.7.15/petl/transform/maps.py000066400000000000000000000323631457414240700170610ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division import operator from collections import OrderedDict from petl.compat import next, string_types, text_type import petl.config as config from petl.errors import ArgumentError from petl.util.base import Table, expr, rowgroupby, Record from petl.transform.sorts import sort def fieldmap(table, mappings=None, failonerror=None, errorvalue=None): """ Transform a table, mapping fields arbitrarily between input and output. E.g.:: >>> import petl as etl >>> from collections import OrderedDict >>> table1 = [['id', 'sex', 'age', 'height', 'weight'], ... [1, 'male', 16, 1.45, 62.0], ... [2, 'female', 19, 1.34, 55.4], ... [3, 'female', 17, 1.78, 74.4], ... [4, 'male', 21, 1.33, 45.2], ... [5, '-', 25, 1.65, 51.9]] >>> mappings = OrderedDict() >>> # rename a field ... mappings['subject_id'] = 'id' >>> # translate a field ... mappings['gender'] = 'sex', {'male': 'M', 'female': 'F'} >>> # apply a calculation to a field ... mappings['age_months'] = 'age', lambda v: v * 12 >>> # apply a calculation to a combination of fields ... mappings['bmi'] = lambda rec: rec['weight'] / rec['height']**2 >>> # transform and inspect the output ... table2 = etl.fieldmap(table1, mappings) >>> table2 +------------+--------+------------+--------------------+ | subject_id | gender | age_months | bmi | +============+========+============+====================+ | 1 | 'M' | 192 | 29.48870392390012 | +------------+--------+------------+--------------------+ | 2 | 'F' | 228 | 30.8531967030519 | +------------+--------+------------+--------------------+ | 3 | 'F' | 204 | 23.481883600555488 | +------------+--------+------------+--------------------+ | 4 | 'M' | 252 | 25.55260331279326 | +------------+--------+------------+--------------------+ | 5 | '-' | 300 | 19.0633608815427 | +------------+--------+------------+--------------------+ Note also that the mapping value can be an expression string, which will be converted to a lambda function via :func:`petl.util.base.expr`. The `failonerror` and `errorvalue` keyword arguments are documented under :func:`petl.config.failonerror` """ return FieldMapView(table, mappings=mappings, failonerror=failonerror, errorvalue=errorvalue) Table.fieldmap = fieldmap class FieldMapView(Table): def __init__(self, source, mappings=None, failonerror=None, errorvalue=None): self.source = source if mappings is None: self.mappings = OrderedDict() else: self.mappings = mappings self.failonerror = (config.failonerror if failonerror is None else failonerror) self.errorvalue = errorvalue def __setitem__(self, key, value): self.mappings[key] = value def __iter__(self): return iterfieldmap(self.source, self.mappings, self.failonerror, self.errorvalue) def iterfieldmap(source, mappings, failonerror, errorvalue): it = iter(source) try: hdr = next(it) except StopIteration: return flds = list(map(text_type, hdr)) outhdr = mappings.keys() yield tuple(outhdr) mapfuns = dict() for outfld, m in mappings.items(): if m in hdr: mapfuns[outfld] = operator.itemgetter(m) elif isinstance(m, int) and m < len(hdr): mapfuns[outfld] = operator.itemgetter(m) elif isinstance(m, string_types): mapfuns[outfld] = expr(m) elif callable(m): mapfuns[outfld] = m elif isinstance(m, (tuple, list)) and len(m) == 2: srcfld = m[0] fm = m[1] if callable(fm): mapfuns[outfld] = composefun(fm, srcfld) elif isinstance(fm, dict): mapfuns[outfld] = composedict(fm, srcfld) else: raise ArgumentError('expected callable or dict') else: raise ArgumentError('invalid mapping %r: %r' % (outfld, m)) # wrap rows as records it = (Record(row, flds) for row in it) for row in it: outrow = list() for outfld in outhdr: try: val = mapfuns[outfld](row) except Exception as e: if failonerror == 'inline': val = e elif failonerror: raise e else: val = errorvalue outrow.append(val) yield tuple(outrow) def composefun(f, srcfld): def g(rec): return f(rec[srcfld]) return g def composedict(d, srcfld): def g(rec): k = rec[srcfld] if k in d: return d[k] else: return k return g def rowmap(table, rowmapper, header, failonerror=None): """ Transform rows via an arbitrary function. E.g.:: >>> import petl as etl >>> table1 = [['id', 'sex', 'age', 'height', 'weight'], ... [1, 'male', 16, 1.45, 62.0], ... [2, 'female', 19, 1.34, 55.4], ... [3, 'female', 17, 1.78, 74.4], ... [4, 'male', 21, 1.33, 45.2], ... [5, '-', 25, 1.65, 51.9]] >>> def rowmapper(row): ... transmf = {'male': 'M', 'female': 'F'} ... return [row[0], ... transmf[row['sex']] if row['sex'] in transmf else None, ... row.age * 12, ... row.height / row.weight ** 2] ... >>> table2 = etl.rowmap(table1, rowmapper, ... header=['subject_id', 'gender', 'age_months', ... 'bmi']) >>> table2 +------------+--------+------------+-----------------------+ | subject_id | gender | age_months | bmi | +============+========+============+=======================+ | 1 | 'M' | 192 | 0.0003772112382934443 | +------------+--------+------------+-----------------------+ | 2 | 'F' | 228 | 0.0004366015456998006 | +------------+--------+------------+-----------------------+ | 3 | 'F' | 204 | 0.0003215689675106949 | +------------+--------+------------+-----------------------+ | 4 | 'M' | 252 | 0.0006509906805544679 | +------------+--------+------------+-----------------------+ | 5 | None | 300 | 0.0006125608384287258 | +------------+--------+------------+-----------------------+ The `rowmapper` function should accept a single row and return a single row (list or tuple). The `failonerror` keyword argument is documented under :func:`petl.config.failonerror` """ return RowMapView(table, rowmapper, header, failonerror=failonerror) Table.rowmap = rowmap class RowMapView(Table): def __init__(self, source, rowmapper, header, failonerror=None): self.source = source self.rowmapper = rowmapper self.header = header self.failonerror = (config.failonerror if failonerror is None else failonerror) def __iter__(self): return iterrowmap(self.source, self.rowmapper, self.header, self.failonerror) def iterrowmap(source, rowmapper, header, failonerror): it = iter(source) try: hdr = next(it) except StopIteration: return flds = list(map(text_type, hdr)) yield tuple(header) it = (Record(row, flds) for row in it) for row in it: try: outrow = rowmapper(row) yield tuple(outrow) except Exception as e: if failonerror == 'inline': yield tuple([e]) elif failonerror: raise e def rowmapmany(table, rowgenerator, header, failonerror=None): """ Map each input row to any number of output rows via an arbitrary function. E.g.:: >>> import petl as etl >>> table1 = [['id', 'sex', 'age', 'height', 'weight'], ... [1, 'male', 16, 1.45, 62.0], ... [2, 'female', 19, 1.34, 55.4], ... [3, '-', 17, 1.78, 74.4], ... [4, 'male', 21, 1.33]] >>> def rowgenerator(row): ... transmf = {'male': 'M', 'female': 'F'} ... yield [row[0], 'gender', ... transmf[row['sex']] if row['sex'] in transmf else None] ... yield [row[0], 'age_months', row.age * 12] ... yield [row[0], 'bmi', row.height / row.weight ** 2] ... >>> table2 = etl.rowmapmany(table1, rowgenerator, ... header=['subject_id', 'variable', 'value']) >>> table2.lookall() +------------+--------------+-----------------------+ | subject_id | variable | value | +============+==============+=======================+ | 1 | 'gender' | 'M' | +------------+--------------+-----------------------+ | 1 | 'age_months' | 192 | +------------+--------------+-----------------------+ | 1 | 'bmi' | 0.0003772112382934443 | +------------+--------------+-----------------------+ | 2 | 'gender' | 'F' | +------------+--------------+-----------------------+ | 2 | 'age_months' | 228 | +------------+--------------+-----------------------+ | 2 | 'bmi' | 0.0004366015456998006 | +------------+--------------+-----------------------+ | 3 | 'gender' | None | +------------+--------------+-----------------------+ | 3 | 'age_months' | 204 | +------------+--------------+-----------------------+ | 3 | 'bmi' | 0.0003215689675106949 | +------------+--------------+-----------------------+ | 4 | 'gender' | 'M' | +------------+--------------+-----------------------+ | 4 | 'age_months' | 252 | +------------+--------------+-----------------------+ The `rowgenerator` function should accept a single row and yield zero or more rows (lists or tuples). The `failonerror` keyword argument is documented under :func:`petl.config.failonerror` See also the :func:`petl.transform.reshape.melt` function. """ return RowMapManyView(table, rowgenerator, header, failonerror=failonerror) Table.rowmapmany = rowmapmany class RowMapManyView(Table): def __init__(self, source, rowgenerator, header, failonerror=None): self.source = source self.rowgenerator = rowgenerator self.header = header self.failonerror = (config.failonerror if failonerror is None else failonerror) def __iter__(self): return iterrowmapmany(self.source, self.rowgenerator, self.header, self.failonerror) def iterrowmapmany(source, rowgenerator, header, failonerror): it = iter(source) try: hdr = next(it) except StopIteration: return flds = list(map(text_type, hdr)) yield tuple(header) it = (Record(row, flds) for row in it) for row in it: try: for outrow in rowgenerator(row): yield tuple(outrow) except Exception as e: if failonerror == 'inline': yield tuple([e]) elif failonerror: raise e else: pass def rowgroupmap(table, key, mapper, header=None, presorted=False, buffersize=None, tempdir=None, cache=True): """ Group rows under the given key then apply `mapper` to yield zero or more output rows for each input group of rows. """ return RowGroupMapView(table, key, mapper, header=header, presorted=presorted, buffersize=buffersize, tempdir=tempdir, cache=cache) Table.rowgroupmap = rowgroupmap class RowGroupMapView(Table): def __init__(self, source, key, mapper, header=None, presorted=False, buffersize=None, tempdir=None, cache=True): if presorted: self.source = source else: self.source = sort(source, key, buffersize=buffersize, tempdir=tempdir, cache=cache) self.key = key self.header = header self.mapper = mapper def __iter__(self): return iterrowgroupmap(self.source, self.key, self.mapper, self.header) def iterrowgroupmap(source, key, mapper, header): yield tuple(header) for key, rows in rowgroupby(source, key): for row in mapper(key, rows): yield row petl-1.7.15/petl/transform/reductions.py000066400000000000000000000622731457414240700203030ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division import itertools import operator from collections import OrderedDict from petl.compat import next, string_types, reduce, text_type from petl.errors import ArgumentError from petl.util.base import Table, iterpeek, rowgroupby from petl.util.base import values from petl.util.counting import nrows from petl.transform.sorts import sort, mergesort from petl.transform.basics import cut from petl.transform.dedup import distinct def rowreduce(table, key, reducer, header=None, presorted=False, buffersize=None, tempdir=None, cache=True): """ Group rows under the given key then apply `reducer` to produce a single output row for each input group of rows. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar'], ... ['a', 3], ... ['a', 7], ... ['b', 2], ... ['b', 1], ... ['b', 9], ... ['c', 4]] >>> def sumbar(key, rows): ... return [key, sum(row[1] for row in rows)] ... >>> table2 = etl.rowreduce(table1, key='foo', reducer=sumbar, ... header=['foo', 'barsum']) >>> table2 +-----+--------+ | foo | barsum | +=====+========+ | 'a' | 10 | +-----+--------+ | 'b' | 12 | +-----+--------+ | 'c' | 4 | +-----+--------+ N.B., this is not strictly a "reduce" in the sense of the standard Python :func:`reduce` function, i.e., the `reducer` function is *not* applied recursively to values within a group, rather it is applied once to each row group as a whole. See also :func:`petl.transform.reductions.aggregate` and :func:`petl.transform.reductions.fold`. """ return RowReduceView(table, key, reducer, header=header, presorted=presorted, buffersize=buffersize, tempdir=tempdir, cache=cache) Table.rowreduce = rowreduce class RowReduceView(Table): def __init__(self, source, key, reducer, header=None, presorted=False, buffersize=None, tempdir=None, cache=True): if presorted: self.source = source else: self.source = sort(source, key, buffersize=buffersize, tempdir=tempdir, cache=cache) self.key = key self.header = header self.reducer = reducer def __iter__(self): return iterrowreduce(self.source, self.key, self.reducer, self.header) def iterrowreduce(source, key, reducer, header): if header is None: # output header from source header, source = iterpeek(source) yield tuple(header) for key, rows in rowgroupby(source, key): yield tuple(reducer(key, rows)) def aggregate(table, key, aggregation=None, value=None, presorted=False, buffersize=None, tempdir=None, cache=True, field='value'): """Apply aggregation functions. E.g.:: >>> import petl as etl >>> >>> table1 = [['foo', 'bar', 'baz'], ... ['a', 3, True], ... ['a', 7, False], ... ['b', 2, True], ... ['b', 2, False], ... ['b', 9, False], ... ['c', 4, True]] >>> # aggregate whole rows ... table2 = etl.aggregate(table1, 'foo', len) >>> table2 +-----+-------+ | foo | value | +=====+=======+ | 'a' | 2 | +-----+-------+ | 'b' | 3 | +-----+-------+ | 'c' | 1 | +-----+-------+ >>> # aggregate whole rows without a key >>> etl.aggregate(table1, None, len) +-------+ | value | +=======+ | 6 | +-------+ >>> # aggregate single field ... table3 = etl.aggregate(table1, 'foo', sum, 'bar') >>> table3 +-----+-------+ | foo | value | +=====+=======+ | 'a' | 10 | +-----+-------+ | 'b' | 13 | +-----+-------+ | 'c' | 4 | +-----+-------+ >>> # aggregate single field without a key >>> etl.aggregate(table1, None, sum, 'bar') +-------+ | value | +=======+ | 27 | +-------+ >>> # alternative signature using keyword args ... table4 = etl.aggregate(table1, key=('foo', 'bar'), ... aggregation=list, value=('bar', 'baz')) >>> table4 +-----+-----+-------------------------+ | foo | bar | value | +=====+=====+=========================+ | 'a' | 3 | [(3, True)] | +-----+-----+-------------------------+ | 'a' | 7 | [(7, False)] | +-----+-----+-------------------------+ | 'b' | 2 | [(2, True), (2, False)] | +-----+-----+-------------------------+ | 'b' | 9 | [(9, False)] | +-----+-----+-------------------------+ | 'c' | 4 | [(4, True)] | +-----+-----+-------------------------+ >>> # alternative signature using keyword args without a key >>> etl.aggregate(table1, key=None, ... aggregation=list, value=('bar', 'baz')) +-----------------------------------------------------------------------+ | value | +=======================================================================+ | [(3, True), (7, False), (2, True), (2, False), (9, False), (4, True)] | +-----------------------------------------------------------------------+ >>> # aggregate multiple fields ... from collections import OrderedDict >>> import petl as etl >>> >>> aggregation = OrderedDict() >>> aggregation['count'] = len >>> aggregation['minbar'] = 'bar', min >>> aggregation['maxbar'] = 'bar', max >>> aggregation['sumbar'] = 'bar', sum >>> # default aggregation function is list ... aggregation['listbar'] = 'bar' >>> aggregation['listbarbaz'] = ('bar', 'baz'), list >>> aggregation['bars'] = 'bar', etl.strjoin(', ') >>> table5 = etl.aggregate(table1, 'foo', aggregation) >>> table5 +-----+-------+--------+--------+--------+-----------+-------------------------------------+-----------+ | foo | count | minbar | maxbar | sumbar | listbar | listbarbaz | bars | +=====+=======+========+========+========+===========+=====================================+===========+ | 'a' | 2 | 3 | 7 | 10 | [3, 7] | [(3, True), (7, False)] | '3, 7' | +-----+-------+--------+--------+--------+-----------+-------------------------------------+-----------+ | 'b' | 3 | 2 | 9 | 13 | [2, 2, 9] | [(2, True), (2, False), (9, False)] | '2, 2, 9' | +-----+-------+--------+--------+--------+-----------+-------------------------------------+-----------+ | 'c' | 1 | 4 | 4 | 4 | [4] | [(4, True)] | '4' | +-----+-------+--------+--------+--------+-----------+-------------------------------------+-----------+ >>> # aggregate multiple fields without a key >>> etl.aggregate(table1, None, aggregation) +-------+--------+--------+--------+--------------------+-----------------------------------------------------------------------+--------------------+ | count | minbar | maxbar | sumbar | listbar | listbarbaz | bars | +=======+========+========+========+====================+=======================================================================+====================+ | 6 | 2 | 9 | 27 | [3, 7, 2, 2, 9, 4] | [(3, True), (7, False), (2, True), (2, False), (9, False), (4, True)] | '3, 7, 2, 2, 9, 4' | +-------+--------+--------+--------+--------------------+-----------------------------------------------------------------------+--------------------+ If `presorted` is True, it is assumed that the data are already sorted by the given key, and the `buffersize`, `tempdir` and `cache` arguments are ignored. Otherwise, the data are sorted, see also the discussion of the `buffersize`, `tempdir` and `cache` arguments under the :func:`petl.transform.sorts.sort` function. If `key` is None, sorting is not necessary. """ if callable(aggregation): return SimpleAggregateView(table, key, aggregation=aggregation, value=value, presorted=presorted, buffersize=buffersize, tempdir=tempdir, cache=cache, field=field) elif aggregation is None or isinstance(aggregation, (list, tuple, dict)): # ignore value arg return MultiAggregateView(table, key, aggregation=aggregation, presorted=presorted, buffersize=buffersize, tempdir=tempdir, cache=cache) else: raise ArgumentError('expected aggregation is callable, list, tuple, dict ' 'or None') Table.aggregate = aggregate class SimpleAggregateView(Table): def __init__(self, table, key, aggregation=list, value=None, presorted=False, buffersize=None, tempdir=None, cache=True, field='value'): if presorted or key is None: self.table = table else: self.table = sort(table, key, buffersize=buffersize, tempdir=tempdir, cache=cache) self.key = key self.aggregation = aggregation self.value = value self.field = field def __iter__(self): return itersimpleaggregate(self.table, self.key, self.aggregation, self.value, self.field) def itersimpleaggregate(table, key, aggregation, value, field): # special case counting if aggregation == len and key is not None: aggregation = lambda g: sum(1 for _ in g) # count length of iterable # special case where length of key is 1 if isinstance(key, (list, tuple)) and len(key) == 1: key = key[0] # determine output header if isinstance(key, (list, tuple)): outhdr = tuple(key) + (field,) elif callable(key): outhdr = ('key', field) elif key is None: outhdr = field, else: outhdr = (key, field) yield outhdr # generate data if isinstance(key, (list, tuple)): for k, grp in rowgroupby(table, key, value): yield tuple(k) + (aggregation(grp),) elif key is None: # special case counting if aggregation == len: yield nrows(table), else: yield aggregation(values(table, value)), else: for k, grp in rowgroupby(table, key, value): yield k, aggregation(grp) class MultiAggregateView(Table): def __init__(self, source, key, aggregation=None, presorted=False, buffersize=None, tempdir=None, cache=True): if presorted or key is None: self.source = source else: self.source = sort(source, key, buffersize=buffersize, tempdir=tempdir, cache=cache) self.key = key if aggregation is None: self.aggregation = OrderedDict() elif isinstance(aggregation, (list, tuple)): self.aggregation = OrderedDict() for t in aggregation: self.aggregation[t[0]] = t[1:] elif isinstance(aggregation, dict): self.aggregation = aggregation else: raise ArgumentError( 'expected aggregation is None, list, tuple or dict, found %r' % aggregation ) def __iter__(self): return itermultiaggregate(self.source, self.key, self.aggregation) def __setitem__(self, key, value): self.aggregation[key] = value def itermultiaggregate(source, key, aggregation): aggregation = OrderedDict(aggregation.items()) # take a copy it = iter(source) hdr = next(it) # push back header to ensure we iterate only once it = itertools.chain([hdr], it) # normalise aggregators for outfld in aggregation: agg = aggregation[outfld] if callable(agg): aggregation[outfld] = None, agg elif isinstance(agg, string_types): aggregation[outfld] = agg, list # list is default elif len(agg) == 1 and isinstance(agg[0], string_types): aggregation[outfld] = agg[0], list # list is default elif len(agg) == 1 and callable(agg[0]): aggregation[outfld] = None, agg[0] # aggregate whole rows elif len(agg) == 2: pass # no need to normalise else: raise ArgumentError('invalid aggregation: %r, %r' % (outfld, agg)) # determine output header if isinstance(key, (list, tuple)): outhdr = list(key) elif callable(key): outhdr = ['key'] elif key is None: outhdr = [] else: outhdr = [key] for outfld in aggregation: outhdr.append(outfld) yield tuple(outhdr) if key is None: grouped = rowgroupby(it, lambda x: None) else: grouped = rowgroupby(it, key) # generate data for k, rows in grouped: rows = list(rows) # may need to iterate over these more than once # handle compound key if isinstance(key, (list, tuple)): outrow = list(k) elif key is None: outrow = [] else: outrow = [k] for outfld in aggregation: srcfld, aggfun = aggregation[outfld] if srcfld is None: aggval = aggfun(rows) outrow.append(aggval) elif isinstance(srcfld, (list, tuple)): idxs = [hdr.index(f) for f in srcfld] valgetter = operator.itemgetter(*idxs) vals = (valgetter(row) for row in rows) aggval = aggfun(vals) outrow.append(aggval) else: idx = hdr.index(srcfld) # try using generator comprehension vals = (row[idx] for row in rows) aggval = aggfun(vals) outrow.append(aggval) yield tuple(outrow) def groupcountdistinctvalues(table, key, value): """Group by the `key` field then count the number of distinct values in the `value` field.""" s1 = cut(table, key, value) s2 = distinct(s1) s3 = aggregate(s2, key, len) return s3 Table.groupcountdistinctvalues = groupcountdistinctvalues def groupselectfirst(table, key, presorted=False, buffersize=None, tempdir=None, cache=True): """Group by the `key` field then return the first row within each group. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar', 'baz'], ... ['A', 1, True], ... ['C', 7, False], ... ['B', 2, False], ... ['C', 9, True]] >>> table2 = etl.groupselectfirst(table1, key='foo') >>> table2 +-----+-----+-------+ | foo | bar | baz | +=====+=====+=======+ | 'A' | 1 | True | +-----+-----+-------+ | 'B' | 2 | False | +-----+-----+-------+ | 'C' | 7 | False | +-----+-----+-------+ See also :func:`petl.transform.reductions.groupselectlast`, :func:`petl.transform.dedup.distinct`. """ def _reducer(k, rows): return next(rows) return rowreduce(table, key, reducer=_reducer, presorted=presorted, buffersize=buffersize, tempdir=tempdir, cache=cache) Table.groupselectfirst = groupselectfirst def groupselectlast(table, key, presorted=False, buffersize=None, tempdir=None, cache=True): """Group by the `key` field then return the last row within each group. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar', 'baz'], ... ['A', 1, True], ... ['C', 7, False], ... ['B', 2, False], ... ['C', 9, True]] >>> table2 = etl.groupselectlast(table1, key='foo') >>> table2 +-----+-----+-------+ | foo | bar | baz | +=====+=====+=======+ | 'A' | 1 | True | +-----+-----+-------+ | 'B' | 2 | False | +-----+-----+-------+ | 'C' | 9 | True | +-----+-----+-------+ See also :func:`petl.transform.reductions.groupselectfirst`, :func:`petl.transform.dedup.distinct`. .. versionadded:: 1.1.0 """ def _reducer(k, rows): row = None for row in rows: pass return row return rowreduce(table, key, reducer=_reducer, presorted=presorted, buffersize=buffersize, tempdir=tempdir, cache=cache) Table.groupselectlast = groupselectlast def groupselectmin(table, key, value, presorted=False, buffersize=None, tempdir=None, cache=True): """Group by the `key` field then return the row with the minimum of the `value` field within each group. N.B., will only return one row for each group, even if multiple rows have the same (minimum) value.""" return groupselectfirst(sort(table, value, reverse=False), key, presorted=presorted, buffersize=buffersize, tempdir=tempdir, cache=cache) Table.groupselectmin = groupselectmin def groupselectmax(table, key, value, presorted=False, buffersize=None, tempdir=None, cache=True): """Group by the `key` field then return the row with the maximum of the `value` field within each group. N.B., will only return one row for each group, even if multiple rows have the same (maximum) value.""" return groupselectfirst(sort(table, value, reverse=True), key, presorted=presorted, buffersize=buffersize, tempdir=tempdir, cache=cache) Table.groupselectmax = groupselectmax def mergeduplicates(table, key, missing=None, presorted=False, buffersize=None, tempdir=None, cache=True): """ Merge duplicate rows under the given key. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar', 'baz'], ... ['A', 1, 2.7], ... ['B', 2, None], ... ['D', 3, 9.4], ... ['B', None, 7.8], ... ['E', None, 42.], ... ['D', 3, 12.3], ... ['A', 2, None]] >>> table2 = etl.mergeduplicates(table1, 'foo') >>> table2 +-----+------------------+-----------------------+ | foo | bar | baz | +=====+==================+=======================+ | 'A' | Conflict({1, 2}) | 2.7 | +-----+------------------+-----------------------+ | 'B' | 2 | 7.8 | +-----+------------------+-----------------------+ | 'D' | 3 | Conflict({9.4, 12.3}) | +-----+------------------+-----------------------+ | 'E' | None | 42.0 | +-----+------------------+-----------------------+ Missing values are overridden by non-missing values. Conflicting values are reported as an instance of the Conflict class (sub-class of frozenset). If `presorted` is True, it is assumed that the data are already sorted by the given key, and the `buffersize`, `tempdir` and `cache` arguments are ignored. Otherwise, the data are sorted, see also the discussion of the `buffersize`, `tempdir` and `cache` arguments under the :func:`petl.transform.sorts.sort` function. See also :func:`petl.transform.dedup.conflicts`. """ return MergeDuplicatesView(table, key, missing=missing, presorted=presorted, buffersize=buffersize, tempdir=tempdir, cache=cache) Table.mergeduplicates = mergeduplicates class MergeDuplicatesView(Table): def __init__(self, table, key, missing=None, presorted=False, buffersize=None, tempdir=None, cache=True): if presorted: self.table = table else: self.table = sort(table, key, buffersize=buffersize, tempdir=tempdir, cache=cache) self.key = key self.missing = missing def __iter__(self): return itermergeduplicates(self.table, self.key, self.missing) def itermergeduplicates(table, key, missing): it = iter(table) hdr, it = iterpeek(it) flds = list(map(text_type, hdr)) # determine output fields if isinstance(key, string_types): outhdr = [key] keyflds = {key} else: outhdr = list(key) keyflds = set(key) valflds = [f for f in flds if f not in keyflds] valfldidxs = [flds.index(f) for f in valflds] outhdr.extend(valflds) yield tuple(outhdr) # do the work for k, grp in rowgroupby(it, key): grp = list(grp) if isinstance(key, string_types): outrow = [k] else: outrow = list(k) mergedvals = [set(row[i] for row in grp if len(row) > i and row[i] != missing) for i in valfldidxs] normedvals = [vals.pop() if len(vals) == 1 else missing if len(vals) == 0 else Conflict(vals) for vals in mergedvals] outrow.extend(normedvals) yield tuple(outrow) def merge(*tables, **kwargs): """ Convenience function to combine multiple tables (via :func:`petl.transform.sorts.mergesort`) then combine duplicate rows by merging under the given key (via :func:`petl.transform.reductions.mergeduplicates`). E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar', 'baz'], ... [1, 'A', True], ... [2, 'B', None], ... [4, 'C', True]] >>> table2 = [['bar', 'baz', 'quux'], ... ['A', True, 42.0], ... ['B', False, 79.3], ... ['C', False, 12.4]] >>> table3 = etl.merge(table1, table2, key='bar') >>> table3 +-----+-----+-------------------------+------+ | bar | foo | baz | quux | +=====+=====+=========================+======+ | 'A' | 1 | True | 42.0 | +-----+-----+-------------------------+------+ | 'B' | 2 | False | 79.3 | +-----+-----+-------------------------+------+ | 'C' | 4 | Conflict({False, True}) | 12.4 | +-----+-----+-------------------------+------+ Keyword arguments are the same as for :func:`petl.transform.sorts.mergesort`, except `key` is required. """ assert 'key' in kwargs, 'keyword argument "key" is required' key = kwargs['key'] t1 = mergesort(*tables, **kwargs) t2 = mergeduplicates(t1, key=key, presorted=True) return t2 Table.merge = merge class Conflict(frozenset): def __new__(cls, items): s = super(Conflict, cls).__new__(cls, items) return s def fold(table, key, f, value=None, presorted=False, buffersize=None, tempdir=None, cache=True): """ Reduce rows recursively via the Python standard :func:`reduce` function. E.g.:: >>> import petl as etl >>> table1 = [['id', 'count'], ... [1, 3], ... [1, 5], ... [2, 4], ... [2, 8]] >>> import operator >>> table2 = etl.fold(table1, 'id', operator.add, 'count', ... presorted=True) >>> table2 +-----+-------+ | key | value | +=====+=======+ | 1 | 8 | +-----+-------+ | 2 | 12 | +-----+-------+ See also :func:`petl.transform.reductions.aggregate`, :func:`petl.transform.reductions.rowreduce`. """ return FoldView(table, key, f, value=value, presorted=presorted, buffersize=buffersize, tempdir=tempdir, cache=cache) Table.fold = fold class FoldView(Table): def __init__(self, table, key, f, value=None, presorted=False, buffersize=None, tempdir=None, cache=True): if presorted: self.table = table else: self.table = sort(table, key, buffersize=buffersize, tempdir=tempdir, cache=cache) self.key = key self.f = f self.value = value def __iter__(self): return iterfold(self.table, self.key, self.f, self.value) def iterfold(table, key, f, value): yield ('key', 'value') for k, grp in rowgroupby(table, key, value): yield k, reduce(f, grp) petl-1.7.15/petl/transform/regex.py000066400000000000000000000364021457414240700172310ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division import re import operator from petl.compat import next, text_type from petl.errors import ArgumentError from petl.util.base import Table, asindices from petl.transform.basics import TransformError from petl.transform.conversions import convert def capture(table, field, pattern, newfields=None, include_original=False, flags=0, fill=None): """ Add one or more new fields with values captured from an existing field searched via a regular expression. E.g.:: >>> import petl as etl >>> table1 = [['id', 'variable', 'value'], ... ['1', 'A1', '12'], ... ['2', 'A2', '15'], ... ['3', 'B1', '18'], ... ['4', 'C12', '19']] >>> table2 = etl.capture(table1, 'variable', '([A-Z,a-z]+)([0-9]+)', ... ['treat', 'time']) >>> table2 +-----+-------+-------+------+ | id | value | treat | time | +=====+=======+=======+======+ | '1' | '12' | 'A' | '1' | +-----+-------+-------+------+ | '2' | '15' | 'A' | '2' | +-----+-------+-------+------+ | '3' | '18' | 'B' | '1' | +-----+-------+-------+------+ | '4' | '19' | 'C' | '12' | +-----+-------+-------+------+ >>> # using the include_original argument ... table3 = etl.capture(table1, 'variable', '([A-Z,a-z]+)([0-9]+)', ... ['treat', 'time'], ... include_original=True) >>> table3 +-----+----------+-------+-------+------+ | id | variable | value | treat | time | +=====+==========+=======+=======+======+ | '1' | 'A1' | '12' | 'A' | '1' | +-----+----------+-------+-------+------+ | '2' | 'A2' | '15' | 'A' | '2' | +-----+----------+-------+-------+------+ | '3' | 'B1' | '18' | 'B' | '1' | +-----+----------+-------+-------+------+ | '4' | 'C12' | '19' | 'C' | '12' | +-----+----------+-------+-------+------+ By default the field on which the capture is performed is omitted. It can be included using the `include_original` argument. The ``fill`` parameter can be used to provide a list or tuple of values to use if the regular expression does not match. The ``fill`` parameter should contain as many values as there are capturing groups in the regular expression. If ``fill`` is ``None`` (default) then a ``petl.transform.TransformError`` will be raised on the first non-matching value. """ return CaptureView(table, field, pattern, newfields=newfields, include_original=include_original, flags=flags, fill=fill) Table.capture = capture class CaptureView(Table): def __init__(self, source, field, pattern, newfields=None, include_original=False, flags=0, fill=None): self.source = source self.field = field self.pattern = pattern self.newfields = newfields self.include_original = include_original self.flags = flags self.fill = fill def __iter__(self): return itercapture(self.source, self.field, self.pattern, self.newfields, self.include_original, self.flags, self.fill) def itercapture(source, field, pattern, newfields, include_original, flags, fill): it = iter(source) prog = re.compile(pattern, flags) try: hdr = next(it) except StopIteration: hdr = [] flds = list(map(text_type, hdr)) if isinstance(field, int) and field < len(hdr): field_index = field elif field in flds: field_index = flds.index(field) else: raise ArgumentError('field invalid: must be either field name or index') # determine output fields outhdr = list(flds) if not include_original: outhdr.remove(field) if newfields: outhdr.extend(newfields) yield tuple(outhdr) # construct the output data for row in it: value = row[field_index] if include_original: out_row = list(row) else: out_row = [v for i, v in enumerate(row) if i != field_index] match = prog.search(value) if match is None: if fill is not None: out_row.extend(fill) else: raise TransformError('value %r did not match pattern %r' % (value, pattern)) else: out_row.extend(match.groups()) yield tuple(out_row) def split(table, field, pattern, newfields=None, include_original=False, maxsplit=0, flags=0): """ Add one or more new fields with values generated by splitting an existing value around occurrences of a regular expression. E.g.:: >>> import petl as etl >>> table1 = [['id', 'variable', 'value'], ... ['1', 'parad1', '12'], ... ['2', 'parad2', '15'], ... ['3', 'tempd1', '18'], ... ['4', 'tempd2', '19']] >>> table2 = etl.split(table1, 'variable', 'd', ['variable', 'day']) >>> table2 +-----+-------+----------+-----+ | id | value | variable | day | +=====+=======+==========+=====+ | '1' | '12' | 'para' | '1' | +-----+-------+----------+-----+ | '2' | '15' | 'para' | '2' | +-----+-------+----------+-----+ | '3' | '18' | 'temp' | '1' | +-----+-------+----------+-----+ | '4' | '19' | 'temp' | '2' | +-----+-------+----------+-----+ By default the field on which the split is performed is omitted. It can be included using the `include_original` argument. """ return SplitView(table, field, pattern, newfields, include_original, maxsplit, flags) Table.split = split class SplitView(Table): def __init__(self, source, field, pattern, newfields=None, include_original=False, maxsplit=0, flags=0): self.source = source self.field = field self.pattern = pattern self.newfields = newfields self.include_original = include_original self.maxsplit = maxsplit self.flags = flags def __iter__(self): return itersplit(self.source, self.field, self.pattern, self.newfields, self.include_original, self.maxsplit, self.flags) def itersplit(source, field, pattern, newfields, include_original, maxsplit, flags): it = iter(source) prog = re.compile(pattern, flags) try: hdr = next(it) except StopIteration: hdr = [] flds = list(map(text_type, hdr)) if isinstance(field, int) and field < len(hdr): field_index = field field = hdr[field_index] elif field in flds: field_index = flds.index(field) else: raise ArgumentError('field invalid: must be either field name or index') # determine output fields outhdr = list(flds) if not include_original: outhdr.remove(field) if newfields: outhdr.extend(newfields) yield tuple(outhdr) # construct the output data for row in it: value = row[field_index] if include_original: out_row = list(row) else: out_row = [v for i, v in enumerate(row) if i != field_index] out_row.extend(prog.split(value, maxsplit)) yield tuple(out_row) def sub(table, field, pattern, repl, count=0, flags=0): """ Convenience function to convert values under the given field using a regular expression substitution. See also :func:`re.sub`. """ prog = re.compile(pattern, flags) conv = lambda v: prog.sub(repl, v, count=count) return convert(table, field, conv) Table.sub = sub def search(table, *args, **kwargs): """ Perform a regular expression search, returning rows that match a given pattern, either anywhere in the row or within a specific field. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar', 'baz'], ... ['orange', 12, 'oranges are nice fruit'], ... ['mango', 42, 'I like them'], ... ['banana', 74, 'lovely too'], ... ['cucumber', 41, 'better than mango']] >>> # search any field ... table2 = etl.search(table1, '.g.') >>> table2 +------------+-----+--------------------------+ | foo | bar | baz | +============+=====+==========================+ | 'orange' | 12 | 'oranges are nice fruit' | +------------+-----+--------------------------+ | 'mango' | 42 | 'I like them' | +------------+-----+--------------------------+ | 'cucumber' | 41 | 'better than mango' | +------------+-----+--------------------------+ >>> # search a specific field ... table3 = etl.search(table1, 'foo', '.g.') >>> table3 +----------+-----+--------------------------+ | foo | bar | baz | +==========+=====+==========================+ | 'orange' | 12 | 'oranges are nice fruit' | +----------+-----+--------------------------+ | 'mango' | 42 | 'I like them' | +----------+-----+--------------------------+ The complement can be found via :func:`petl.transform.regex.searchcomplement`. """ if len(args) == 1: field = None pattern = args[0] elif len(args) == 2: field = args[0] pattern = args[1] else: raise ArgumentError('expected 1 or 2 positional arguments') return SearchView(table, pattern, field=field, **kwargs) Table.search = search class SearchView(Table): def __init__(self, table, pattern, field=None, flags=0, complement=False): self.table = table self.pattern = pattern self.field = field self.flags = flags self.complement = complement def __iter__(self): return itersearch(self.table, self.pattern, self.field, self.flags, self.complement) def itersearch(table, pattern, field, flags, complement): prog = re.compile(pattern, flags) it = iter(table) try: hdr = next(it) except StopIteration: return flds = list(map(text_type, hdr)) yield tuple(hdr) if field is None: # search whole row test = lambda r: any(prog.search(text_type(v)) for v in r) else: indices = asindices(hdr, field) if len(indices) == 1: index = indices[0] test = lambda r: prog.search(text_type(r[index])) else: getvals = operator.itemgetter(*indices) test = lambda r: any(prog.search(text_type(v)) for v in getvals(r)) # complement==False, return rows that match if not complement: for row in it: if test(row): yield tuple(row) # complement==True, return rows that do not match else: for row in it: if not test(row): yield tuple(row) def searchcomplement(table, *args, **kwargs): """ Perform a regular expression search, returning rows that **do not** match a given pattern, either anywhere in the row or within a specific field. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar', 'baz'], ... ['orange', 12, 'oranges are nice fruit'], ... ['mango', 42, 'I like them'], ... ['banana', 74, 'lovely too'], ... ['cucumber', 41, 'better than mango']] >>> # search any field ... table2 = etl.searchcomplement(table1, '.g.') >>> table2 +----------+-----+--------------+ | foo | bar | baz | +==========+=====+==============+ | 'banana' | 74 | 'lovely too' | +----------+-----+--------------+ >>> # search a specific field ... table3 = etl.searchcomplement(table1, 'foo', '.g.') >>> table3 +------------+-----+---------------------+ | foo | bar | baz | +============+=====+=====================+ | 'banana' | 74 | 'lovely too' | +------------+-----+---------------------+ | 'cucumber' | 41 | 'better than mango' | +------------+-----+---------------------+ This returns the complement of :func:`petl.transform.regex.search`. """ return search(table, *args, complement=True, **kwargs) Table.searchcomplement = searchcomplement def splitdown(table, field, pattern, maxsplit=0, flags=0): """ Split a field into multiple rows using a regular expression. E.g.: >>> import petl as etl >>> table1 = [['name', 'roles'], ... ['Jane Doe', 'president,engineer,tailor,lawyer'], ... ['John Doe', 'rocket scientist,optometrist,chef,knight,sailor']] >>> table2 = etl.splitdown(table1, 'roles', ',') >>> table2.lookall() +------------+--------------------+ | name | roles | +============+====================+ | 'Jane Doe' | 'president' | +------------+--------------------+ | 'Jane Doe' | 'engineer' | +------------+--------------------+ | 'Jane Doe' | 'tailor' | +------------+--------------------+ | 'Jane Doe' | 'lawyer' | +------------+--------------------+ | 'John Doe' | 'rocket scientist' | +------------+--------------------+ | 'John Doe' | 'optometrist' | +------------+--------------------+ | 'John Doe' | 'chef' | +------------+--------------------+ | 'John Doe' | 'knight' | +------------+--------------------+ | 'John Doe' | 'sailor' | +------------+--------------------+ """ return SplitDownView(table, field, pattern, maxsplit, flags) Table.splitdown = splitdown class SplitDownView(Table): def __init__(self, table, field, pattern, maxsplit=0, flags=0): self.table = table self.field = field self.pattern = pattern self.maxsplit = maxsplit self.flags = flags def __iter__(self): return itersplitdown(self.table, self.field, self.pattern, self.maxsplit, self.flags) def itersplitdown(table, field, pattern, maxsplit, flags): prog = re.compile(pattern, flags) it = iter(table) try: hdr = next(it) except StopIteration: return flds = list(map(text_type, hdr)) if isinstance(field, int) and field < len(hdr): field_index = field field = hdr[field_index] elif field in flds: field_index = flds.index(field) else: raise ArgumentError('field invalid: must be either field name or index') yield tuple(hdr) for row in it: value = row[field_index] for v in prog.split(value, maxsplit): yield tuple(v if i == field_index else row[i] for i in range(len(hdr))) petl-1.7.15/petl/transform/reshape.py000066400000000000000000000534171457414240700175530ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division import itertools import collections import operator from petl.compat import next, text_type from petl.comparison import comparable_itemgetter from petl.util.base import Table, rowgetter, values, itervalues, \ header, data, asindices from petl.transform.sorts import sort def melt(table, key=None, variables=None, variablefield='variable', valuefield='value'): """ Reshape a table, melting fields into data. E.g.:: >>> import petl as etl >>> table1 = [['id', 'gender', 'age'], ... [1, 'F', 12], ... [2, 'M', 17], ... [3, 'M', 16]] >>> table2 = etl.melt(table1, 'id') >>> table2.lookall() +----+----------+-------+ | id | variable | value | +====+==========+=======+ | 1 | 'gender' | 'F' | +----+----------+-------+ | 1 | 'age' | 12 | +----+----------+-------+ | 2 | 'gender' | 'M' | +----+----------+-------+ | 2 | 'age' | 17 | +----+----------+-------+ | 3 | 'gender' | 'M' | +----+----------+-------+ | 3 | 'age' | 16 | +----+----------+-------+ >>> # compound keys are supported ... table3 = [['id', 'time', 'height', 'weight'], ... [1, 11, 66.4, 12.2], ... [2, 16, 53.2, 17.3], ... [3, 12, 34.5, 9.4]] >>> table4 = etl.melt(table3, key=['id', 'time']) >>> table4.lookall() +----+------+----------+-------+ | id | time | variable | value | +====+======+==========+=======+ | 1 | 11 | 'height' | 66.4 | +----+------+----------+-------+ | 1 | 11 | 'weight' | 12.2 | +----+------+----------+-------+ | 2 | 16 | 'height' | 53.2 | +----+------+----------+-------+ | 2 | 16 | 'weight' | 17.3 | +----+------+----------+-------+ | 3 | 12 | 'height' | 34.5 | +----+------+----------+-------+ | 3 | 12 | 'weight' | 9.4 | +----+------+----------+-------+ >>> # a subset of variable fields can be selected ... table5 = etl.melt(table3, key=['id', 'time'], ... variables=['height']) >>> table5.lookall() +----+------+----------+-------+ | id | time | variable | value | +====+======+==========+=======+ | 1 | 11 | 'height' | 66.4 | +----+------+----------+-------+ | 2 | 16 | 'height' | 53.2 | +----+------+----------+-------+ | 3 | 12 | 'height' | 34.5 | +----+------+----------+-------+ See also :func:`petl.transform.reshape.recast`. """ return MeltView(table, key=key, variables=variables, variablefield=variablefield, valuefield=valuefield) Table.melt = melt class MeltView(Table): def __init__(self, source, key=None, variables=None, variablefield='variable', valuefield='value'): self.source = source self.key = key self.variables = variables self.variablefield = variablefield self.valuefield = valuefield def __iter__(self): return itermelt(self.source, self.key, self.variables, self.variablefield, self.valuefield) def itermelt(source, key, variables, variablefield, valuefield): if key is None and variables is None: raise ValueError('either key or variables must be specified') it = iter(source) try: hdr = next(it) except StopIteration: return # determine key and variable field indices key_indices = variables_indices = None if key is not None: key_indices = asindices(hdr, key) if variables is not None: if not isinstance(variables, (list, tuple)): variables = (variables,) variables_indices = asindices(hdr, variables) if key is None: # assume key is fields not in variables key_indices = [i for i in range(len(hdr)) if i not in variables_indices] if variables is None: # assume variables are fields not in key variables_indices = [i for i in range(len(hdr)) if i not in key_indices] variables = [hdr[i] for i in variables_indices] getkey = rowgetter(*key_indices) # determine the output fields outhdr = [hdr[i] for i in key_indices] outhdr.append(variablefield) outhdr.append(valuefield) yield tuple(outhdr) # construct the output data for row in it: k = getkey(row) for v, i in zip(variables, variables_indices): try: o = list(k) # populate with key values initially o.append(v) # add variable o.append(row[i]) # add value yield tuple(o) except IndexError: # row is missing this value, and melt() should yield no row pass def recast(table, key=None, variablefield='variable', valuefield='value', samplesize=1000, reducers=None, missing=None): """ Recast molten data. E.g.:: >>> import petl as etl >>> table1 = [['id', 'variable', 'value'], ... [3, 'age', 16], ... [1, 'gender', 'F'], ... [2, 'gender', 'M'], ... [2, 'age', 17], ... [1, 'age', 12], ... [3, 'gender', 'M']] >>> table2 = etl.recast(table1) >>> table2 +----+-----+--------+ | id | age | gender | +====+=====+========+ | 1 | 12 | 'F' | +----+-----+--------+ | 2 | 17 | 'M' | +----+-----+--------+ | 3 | 16 | 'M' | +----+-----+--------+ >>> # specifying variable and value fields ... table3 = [['id', 'vars', 'vals'], ... [3, 'age', 16], ... [1, 'gender', 'F'], ... [2, 'gender', 'M'], ... [2, 'age', 17], ... [1, 'age', 12], ... [3, 'gender', 'M']] >>> table4 = etl.recast(table3, variablefield='vars', valuefield='vals') >>> table4 +----+-----+--------+ | id | age | gender | +====+=====+========+ | 1 | 12 | 'F' | +----+-----+--------+ | 2 | 17 | 'M' | +----+-----+--------+ | 3 | 16 | 'M' | +----+-----+--------+ >>> # if there are multiple values for each key/variable pair, and no ... # reducers function is provided, then all values will be listed ... table6 = [['id', 'time', 'variable', 'value'], ... [1, 11, 'weight', 66.4], ... [1, 14, 'weight', 55.2], ... [2, 12, 'weight', 53.2], ... [2, 16, 'weight', 43.3], ... [3, 12, 'weight', 34.5], ... [3, 17, 'weight', 49.4]] >>> table7 = etl.recast(table6, key='id') >>> table7 +----+--------------+ | id | weight | +====+==============+ | 1 | [66.4, 55.2] | +----+--------------+ | 2 | [53.2, 43.3] | +----+--------------+ | 3 | [34.5, 49.4] | +----+--------------+ >>> # multiple values can be reduced via an aggregation function ... def mean(values): ... return float(sum(values)) / len(values) ... >>> table8 = etl.recast(table6, key='id', reducers={'weight': mean}) >>> table8 +----+--------------------+ | id | weight | +====+====================+ | 1 | 60.800000000000004 | +----+--------------------+ | 2 | 48.25 | +----+--------------------+ | 3 | 41.95 | +----+--------------------+ >>> # missing values are padded with whatever is provided via the ... # missing keyword argument (None by default) ... table9 = [['id', 'variable', 'value'], ... [1, 'gender', 'F'], ... [2, 'age', 17], ... [1, 'age', 12], ... [3, 'gender', 'M']] >>> table10 = etl.recast(table9, key='id') >>> table10 +----+------+--------+ | id | age | gender | +====+======+========+ | 1 | 12 | 'F' | +----+------+--------+ | 2 | 17 | None | +----+------+--------+ | 3 | None | 'M' | +----+------+--------+ Note that the table is scanned once to discover variables, then a second time to reshape the data and recast variables as fields. How many rows are scanned in the first pass is determined by the `samplesize` argument. See also :func:`petl.transform.reshape.melt`. """ return RecastView(table, key=key, variablefield=variablefield, valuefield=valuefield, samplesize=samplesize, reducers=reducers, missing=missing) Table.recast = recast class RecastView(Table): def __init__(self, source, key=None, variablefield='variable', valuefield='value', samplesize=1000, reducers=None, missing=None): self.source = source self.key = key self.variablefield = variablefield self.valuefield = valuefield self.samplesize = samplesize if reducers is None: self.reducers = dict() else: self.reducers = reducers self.missing = missing def __iter__(self): return iterrecast(self.source, self.key, self.variablefield, self.valuefield, self.samplesize, self.reducers, self.missing) def iterrecast(source, key, variablefield, valuefield, samplesize, reducers, missing): # TODO only make one pass through the data it = iter(source) try: hdr = next(it) except StopIteration: return flds = list(map(text_type, hdr)) # normalise some stuff keyfields = key variablefields = variablefield # N.B., could be more than one # normalise key fields if keyfields and not isinstance(keyfields, (list, tuple)): keyfields = (keyfields,) # normalise variable fields if variablefields: if isinstance(variablefields, dict): pass # handle this later elif not isinstance(variablefields, (list, tuple)): variablefields = (variablefields,) # infer key fields if not keyfields: # assume keyfields is fields not in variables keyfields = [f for f in flds if f not in variablefields and f != valuefield] # infer key fields if not variablefields: # assume variables are fields not in keyfields variablefields = [f for f in flds if f not in keyfields and f != valuefield] # sanity checks assert valuefield in flds, 'invalid value field: %s' % valuefield assert valuefield not in keyfields, 'value field cannot be keyfields' assert valuefield not in variablefields, \ 'value field cannot be variable field' for f in keyfields: assert f in flds, 'invalid keyfields field: %s' % f for f in variablefields: assert f in flds, 'invalid variable field: %s' % f # we'll need these later valueindex = flds.index(valuefield) keyindices = [flds.index(f) for f in keyfields] variableindices = [flds.index(f) for f in variablefields] # determine the actual variable names to be cast as fields if isinstance(variablefields, dict): # user supplied dictionary variables = variablefields else: variables = collections.defaultdict(set) # sample the data to discover variables to be cast as fields for row in itertools.islice(it, 0, samplesize): for i, f in zip(variableindices, variablefields): variables[f].add(row[i]) for f in variables: # turn from sets to sorted lists variables[f] = sorted(variables[f]) # finished the first pass # determine the output fields outhdr = list(keyfields) for f in variablefields: outhdr.extend(variables[f]) yield tuple(outhdr) # output data source = sort(source, key=keyfields) it = itertools.islice(source, 1, None) # skip header row getsortablekey = comparable_itemgetter(*keyindices) getactualkey = operator.itemgetter(*keyindices) # process sorted data in newfields groups = itertools.groupby(it, key=getsortablekey) for _, group in groups: # may need to iterate over the group more than once group = list(group) # N.B., key returned by groupby may be wrapped as SortableItem, we want # to output the actual key value, get it from the first row in the group key_value = getactualkey(group[0]) if len(keyfields) > 1: out_row = list(key_value) else: out_row = [key_value] for f, i in zip(variablefields, variableindices): for variable in variables[f]: # collect all values for the current variable vals = [r[valueindex] for r in group if r[i] == variable] if len(vals) == 0: val = missing elif len(vals) == 1: val = vals[0] else: if variable in reducers: redu = reducers[variable] else: redu = list # list all values val = redu(vals) out_row.append(val) yield tuple(out_row) def transpose(table): """ Transpose rows into columns. E.g.:: >>> import petl as etl >>> table1 = [['id', 'colour'], ... [1, 'blue'], ... [2, 'red'], ... [3, 'purple'], ... [5, 'yellow'], ... [7, 'orange']] >>> table2 = etl.transpose(table1) >>> table2 +----------+--------+-------+----------+----------+----------+ | id | 1 | 2 | 3 | 5 | 7 | +==========+========+=======+==========+==========+==========+ | 'colour' | 'blue' | 'red' | 'purple' | 'yellow' | 'orange' | +----------+--------+-------+----------+----------+----------+ See also :func:`petl.transform.reshape.recast`. """ return TransposeView(table) Table.transpose = transpose class TransposeView(Table): def __init__(self, source): self.source = source def __iter__(self): return itertranspose(self.source) def itertranspose(source): hdr = header(source) its = [iter(source) for _ in hdr] for i in range(len(hdr)): yield tuple(row[i] for row in its[i]) def pivot(table, f1, f2, f3, aggfun, missing=None, presorted=False, buffersize=None, tempdir=None, cache=True): """ Construct a pivot table. E.g.:: >>> import petl as etl >>> table1 = [['region', 'gender', 'style', 'units'], ... ['east', 'boy', 'tee', 12], ... ['east', 'boy', 'golf', 14], ... ['east', 'boy', 'fancy', 7], ... ['east', 'girl', 'tee', 3], ... ['east', 'girl', 'golf', 8], ... ['east', 'girl', 'fancy', 18], ... ['west', 'boy', 'tee', 12], ... ['west', 'boy', 'golf', 15], ... ['west', 'boy', 'fancy', 8], ... ['west', 'girl', 'tee', 6], ... ['west', 'girl', 'golf', 16], ... ['west', 'girl', 'fancy', 1]] >>> table2 = etl.pivot(table1, 'region', 'gender', 'units', sum) >>> table2 +--------+-----+------+ | region | boy | girl | +========+=====+======+ | 'east' | 33 | 29 | +--------+-----+------+ | 'west' | 35 | 23 | +--------+-----+------+ >>> table3 = etl.pivot(table1, 'region', 'style', 'units', sum) >>> table3 +--------+-------+------+-----+ | region | fancy | golf | tee | +========+=======+======+=====+ | 'east' | 25 | 22 | 15 | +--------+-------+------+-----+ | 'west' | 9 | 31 | 18 | +--------+-------+------+-----+ >>> table4 = etl.pivot(table1, 'gender', 'style', 'units', sum) >>> table4 +--------+-------+------+-----+ | gender | fancy | golf | tee | +========+=======+======+=====+ | 'boy' | 15 | 29 | 24 | +--------+-------+------+-----+ | 'girl' | 19 | 24 | 9 | +--------+-------+------+-----+ See also :func:`petl.transform.reshape.recast`. """ return PivotView(table, f1, f2, f3, aggfun, missing=missing, presorted=presorted, buffersize=buffersize, tempdir=tempdir, cache=cache) Table.pivot = pivot class PivotView(Table): def __init__(self, source, f1, f2, f3, aggfun, missing=None, presorted=False, buffersize=None, tempdir=None, cache=True): if presorted: self.source = source else: self.source = sort(source, key=(f1, f2), buffersize=buffersize, tempdir=tempdir, cache=cache) self.f1, self.f2, self.f3 = f1, f2, f3 self.aggfun = aggfun self.missing = missing def __iter__(self): return iterpivot(self.source, self.f1, self.f2, self.f3, self.aggfun, self.missing) def iterpivot(source, f1, f2, f3, aggfun, missing): # first pass - collect fields f2vals = set(itervalues(source, f2)) # TODO only make one pass f2vals = list(f2vals) f2vals.sort() outhdr = [f1] outhdr.extend(f2vals) yield tuple(outhdr) # second pass - generate output it = iter(source) try: hdr = next(it) except StopIteration: hdr = [] flds = list(map(text_type, hdr)) f1i = flds.index(f1) f2i = flds.index(f2) f3i = flds.index(f3) for v1, v1rows in itertools.groupby(it, key=operator.itemgetter(f1i)): outrow = [v1] + [missing] * len(f2vals) for v2, v12rows in itertools.groupby(v1rows, key=operator.itemgetter(f2i)): aggval = aggfun([row[f3i] for row in v12rows]) outrow[1 + f2vals.index(v2)] = aggval yield tuple(outrow) def flatten(table): """ Convert a table to a sequence of values in row-major order. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar', 'baz'], ... ['A', 1, True], ... ['C', 7, False], ... ['B', 2, False], ... ['C', 9, True]] >>> list(etl.flatten(table1)) ['A', 1, True, 'C', 7, False, 'B', 2, False, 'C', 9, True] See also :func:`petl.transform.reshape.unflatten`. """ return FlattenView(table) Table.flatten = flatten class FlattenView(Table): def __init__(self, table): self.table = table def __iter__(self): for row in data(self.table): for value in row: yield value def unflatten(*args, **kwargs): """ Convert a sequence of values in row-major order into a table. E.g.:: >>> import petl as etl >>> a = ['A', 1, True, 'C', 7, False, 'B', 2, False, 'C', 9] >>> table1 = etl.unflatten(a, 3) >>> table1 +-----+----+-------+ | f0 | f1 | f2 | +=====+====+=======+ | 'A' | 1 | True | +-----+----+-------+ | 'C' | 7 | False | +-----+----+-------+ | 'B' | 2 | False | +-----+----+-------+ | 'C' | 9 | None | +-----+----+-------+ >>> # a table and field name can also be provided as arguments ... table2 = [['lines'], ... ['A'], ... [1], ... [True], ... ['C'], ... [7], ... [False], ... ['B'], ... [2], ... [False], ... ['C'], ... [9]] >>> table3 = etl.unflatten(table2, 'lines', 3) >>> table3 +-----+----+-------+ | f0 | f1 | f2 | +=====+====+=======+ | 'A' | 1 | True | +-----+----+-------+ | 'C' | 7 | False | +-----+----+-------+ | 'B' | 2 | False | +-----+----+-------+ | 'C' | 9 | None | +-----+----+-------+ See also :func:`petl.transform.reshape.flatten`. """ return UnflattenView(*args, **kwargs) Table.unflatten = unflatten class UnflattenView(Table): def __init__(self, *args, **kwargs): if len(args) == 2: self.input = args[0] self.period = args[1] elif len(args) == 3: self.input = values(args[0], args[1]) self.period = args[2] else: assert False, 'invalid arguments' self.missing = kwargs.get('missing', None) def __iter__(self): inpt = self.input period = self.period missing = self.missing # generate header row outhdr = tuple('f%s' % i for i in range(period)) yield outhdr # generate data rows row = list() for v in inpt: if len(row) < period: row.append(v) else: yield tuple(row) row = [v] # deal with last row if len(row) > 0: if len(row) < period: row.extend([missing] * (period - len(row))) yield tuple(row) petl-1.7.15/petl/transform/selects.py000066400000000000000000000362001457414240700175550ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division import operator from petl.compat import next, string_types, callable, text_type from petl.comparison import Comparable from petl.errors import ArgumentError from petl.util.base import asindices, expr, Table, values, Record def select(table, *args, **kwargs): """ Select rows meeting a condition. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar', 'baz'], ... ['a', 4, 9.3], ... ['a', 2, 88.2], ... ['b', 1, 23.3], ... ['c', 8, 42.0], ... ['d', 7, 100.9], ... ['c', 2]] >>> # the second positional argument can be a function accepting ... # a row ... table2 = etl.select(table1, ... lambda rec: rec.foo == 'a' and rec.baz > 88.1) >>> table2 +-----+-----+------+ | foo | bar | baz | +=====+=====+======+ | 'a' | 2 | 88.2 | +-----+-----+------+ >>> # the second positional argument can also be an expression ... # string, which will be converted to a function using petl.expr() ... table3 = etl.select(table1, "{foo} == 'a' and {baz} > 88.1") >>> table3 +-----+-----+------+ | foo | bar | baz | +=====+=====+======+ | 'a' | 2 | 88.2 | +-----+-----+------+ >>> # the condition can also be applied to a single field ... table4 = etl.select(table1, 'foo', lambda v: v == 'a') >>> table4 +-----+-----+------+ | foo | bar | baz | +=====+=====+======+ | 'a' | 4 | 9.3 | +-----+-----+------+ | 'a' | 2 | 88.2 | +-----+-----+------+ The complement of the selection can be returned (i.e., the query can be inverted) by providing `complement=True` as a keyword argument. """ missing = kwargs.get('missing', None) complement = kwargs.get('complement', False) if len(args) == 0: raise ArgumentError('missing positional argument') elif len(args) == 1: where = args[0] if isinstance(where, string_types): where = expr(where) else: assert callable(where), 'second argument must be string or callable' return RowSelectView(table, where, missing=missing, complement=complement) else: field = args[0] where = args[1] assert callable(where), 'third argument must be callable' return FieldSelectView(table, field, where, complement=complement, missing=missing) Table.select = select class RowSelectView(Table): def __init__(self, source, where, missing=None, complement=False): self.source = source self.where = where self.missing = missing self.complement = complement def __iter__(self): return iterrowselect(self.source, self.where, self.missing, self.complement) class FieldSelectView(Table): def __init__(self, source, field, where, complement=False, missing=None): self.source = source self.field = field self.where = where self.complement = complement self.missing = missing def __iter__(self): return iterfieldselect(self.source, self.field, self.where, self.complement, self.missing) def iterfieldselect(source, field, where, complement, missing): it = iter(source) try: hdr = next(it) yield tuple(hdr) except StopIteration: hdr = [] # will raise FieldSelectionError below indices = asindices(hdr, field) getv = operator.itemgetter(*indices) for row in it: try: v = getv(row) except IndexError: v = missing if bool(where(v)) != complement: # XOR yield tuple(row) def iterrowselect(source, where, missing, complement): it = iter(source) try: hdr = next(it) except StopIteration: return # will yield nothing flds = list(map(text_type, hdr)) yield tuple(hdr) it = (Record(row, flds, missing=missing) for row in it) for row in it: if bool(where(row)) != complement: # XOR yield tuple(row) # need to convert back to tuple? def rowlenselect(table, n, complement=False): """Select rows of length `n`.""" where = lambda row: len(row) == n return select(table, where, complement=complement) Table.rowlenselect = rowlenselect def selectop(table, field, value, op, complement=False): """Select rows where the function `op` applied to the given field and the given value returns `True`.""" return select(table, field, lambda v: op(v, value), complement=complement) Table.selectop = selectop def selecteq(table, field, value, complement=False): """Select rows where the given field equals the given value.""" return selectop(table, field, value, operator.eq, complement=complement) Table.selecteq = selecteq Table.eq = selecteq def selectne(table, field, value, complement=False): """Select rows where the given field does not equal the given value.""" return selectop(table, field, value, operator.ne, complement=complement) Table.selectne = selectne Table.ne = selectne def selectlt(table, field, value, complement=False): """Select rows where the given field is less than the given value.""" value = Comparable(value) return selectop(table, field, value, operator.lt, complement=complement) Table.selectlt = selectlt Table.lt = selectlt def selectle(table, field, value, complement=False): """Select rows where the given field is less than or equal to the given value.""" value = Comparable(value) return selectop(table, field, value, operator.le, complement=complement) Table.selectle = selectle Table.le = selectle def selectgt(table, field, value, complement=False): """Select rows where the given field is greater than the given value.""" value = Comparable(value) return selectop(table, field, value, operator.gt, complement=complement) Table.selectgt = selectgt Table.gt = selectgt def selectge(table, field, value, complement=False): """Select rows where the given field is greater than or equal to the given value.""" value = Comparable(value) return selectop(table, field, value, operator.ge, complement=complement) Table.selectge = selectge Table.ge = selectge def selectcontains(table, field, value, complement=False): """Select rows where the given field contains the given value.""" return selectop(table, field, value, operator.contains, complement=complement) Table.selectcontains = selectcontains def selectin(table, field, value, complement=False): """Select rows where the given field is a member of the given value.""" return select(table, field, lambda v: v in value, complement=complement) Table.selectin = selectin def selectnotin(table, field, value, complement=False): """Select rows where the given field is not a member of the given value.""" return select(table, field, lambda v: v not in value, complement=complement) Table.selectnotin = selectnotin def selectis(table, field, value, complement=False): """Select rows where the given field `is` the given value.""" return selectop(table, field, value, operator.is_, complement=complement) Table.selectis = selectis def selectisnot(table, field, value, complement=False): """Select rows where the given field `is not` the given value.""" return selectop(table, field, value, operator.is_not, complement=complement) Table.selectisnot = selectisnot def selectisinstance(table, field, value, complement=False): """Select rows where the given field is an instance of the given type.""" return selectop(table, field, value, isinstance, complement=complement) Table.selectisinstance = selectisinstance def selectrangeopenleft(table, field, minv, maxv, complement=False): """Select rows where the given field is greater than or equal to `minv` and less than `maxv`.""" minv = Comparable(minv) maxv = Comparable(maxv) return select(table, field, lambda v: minv <= v < maxv, complement=complement) Table.selectrangeopenleft = selectrangeopenleft def selectrangeopenright(table, field, minv, maxv, complement=False): """Select rows where the given field is greater than `minv` and less than or equal to `maxv`.""" minv = Comparable(minv) maxv = Comparable(maxv) return select(table, field, lambda v: minv < v <= maxv, complement=complement) Table.selectrangeopenright = selectrangeopenright def selectrangeopen(table, field, minv, maxv, complement=False): """Select rows where the given field is greater than or equal to `minv` and less than or equal to `maxv`.""" minv = Comparable(minv) maxv = Comparable(maxv) return select(table, field, lambda v: minv <= v <= maxv, complement=complement) Table.selectrangeopen = selectrangeopen def selectrangeclosed(table, field, minv, maxv, complement=False): """Select rows where the given field is greater than `minv` and less than `maxv`.""" minv = Comparable(minv) maxv = Comparable(maxv) return select(table, field, lambda v: minv < Comparable(v) < maxv, complement=complement) Table.selectrangeclosed = selectrangeclosed def selecttrue(table, field, complement=False): """Select rows where the given field evaluates `True`.""" return select(table, field, lambda v: bool(v), complement=complement) Table.selecttrue = selecttrue Table.true = selecttrue def selectfalse(table, field, complement=False): """Select rows where the given field evaluates `False`.""" return select(table, field, lambda v: not bool(v), complement=complement) Table.selectfalse = selectfalse Table.false = selectfalse def selectnone(table, field, complement=False): """Select rows where the given field is `None`.""" return select(table, field, lambda v: v is None, complement=complement) Table.selectnone = selectnone Table.none = selectnone def selectnotnone(table, field, complement=False): """Select rows where the given field is not `None`.""" return select(table, field, lambda v: v is not None, complement=complement) Table.selectnotnone = selectnotnone Table.notnone = selectnotnone def selectusingcontext(table, query): """ Select rows based on data in the current row and/or previous and next row. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar'], ... ['A', 1], ... ['B', 4], ... ['C', 5], ... ['D', 9]] >>> def query(prv, cur, nxt): ... return ((prv is not None and (cur.bar - prv.bar) < 2) ... or (nxt is not None and (nxt.bar - cur.bar) < 2)) ... >>> table2 = etl.selectusingcontext(table1, query) >>> table2 +-----+-----+ | foo | bar | +=====+=====+ | 'B' | 4 | +-----+-----+ | 'C' | 5 | +-----+-----+ The `query` function should accept three rows and return a boolean value. """ return SelectUsingContextView(table, query) Table.selectusingcontext = selectusingcontext class SelectUsingContextView(Table): def __init__(self, table, query): self.table = table self.query = query def __iter__(self): return iterselectusingcontext(self.table, self.query) def iterselectusingcontext(table, query): it = iter(table) try: hdr = tuple(next(it)) except StopIteration: return # will yield nothing flds = list(map(text_type, hdr)) yield hdr it = (Record(row, flds) for row in it) prv = None cur = next(it) for nxt in it: if query(prv, cur, nxt): yield cur prv = cur cur = nxt # handle last row if query(prv, cur, None): yield cur def facet(table, key): """ Return a dictionary mapping field values to tables. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar', 'baz'], ... ['a', 4, 9.3], ... ['a', 2, 88.2], ... ['b', 1, 23.3], ... ['c', 8, 42.0], ... ['d', 7, 100.9], ... ['c', 2]] >>> foo = etl.facet(table1, 'foo') >>> sorted(foo.keys()) ['a', 'b', 'c', 'd'] >>> foo['a'] +-----+-----+------+ | foo | bar | baz | +=====+=====+======+ | 'a' | 4 | 9.3 | +-----+-----+------+ | 'a' | 2 | 88.2 | +-----+-----+------+ >>> foo['c'] +-----+-----+------+ | foo | bar | baz | +=====+=====+======+ | 'c' | 8 | 42.0 | +-----+-----+------+ | 'c' | 2 | | +-----+-----+------+ >>> # works with compound keys too >>> table2 = [['foo', 'bar', 'baz'], ... ['a', 1, True], ... ['b', 2, False], ... ['b', 3, True], ... ['b', 3, False]] >>> foobar = etl.facet(table2, ('foo', 'bar')) >>> sorted(foobar.keys()) [('a', 1), ('b', 2), ('b', 3)] >>> foobar[('b', 3)] +-----+-----+-------+ | foo | bar | baz | +=====+=====+=======+ | 'b' | 3 | True | +-----+-----+-------+ | 'b' | 3 | False | +-----+-----+-------+ See also :func:`petl.util.materialise.facetcolumns`. """ fct = dict() for v in set(values(table, key)): fct[v] = selecteq(table, key, v) return fct Table.facet = facet def biselect(table, *args, **kwargs): """Return two tables, the first containing selected rows, the second containing remaining rows. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar', 'baz'], ... ['a', 4, 9.3], ... ['a', 2, 88.2], ... ['b', 1, 23.3], ... ['c', 8, 42.0], ... ['d', 7, 100.9], ... ['c', 2]] >>> table2, table3 = etl.biselect(table1, lambda rec: rec.foo == 'a') >>> table2 +-----+-----+------+ | foo | bar | baz | +=====+=====+======+ | 'a' | 4 | 9.3 | +-----+-----+------+ | 'a' | 2 | 88.2 | +-----+-----+------+ >>> table3 +-----+-----+-------+ | foo | bar | baz | +=====+=====+=======+ | 'b' | 1 | 23.3 | +-----+-----+-------+ | 'c' | 8 | 42.0 | +-----+-----+-------+ | 'd' | 7 | 100.9 | +-----+-----+-------+ | 'c' | 2 | | +-----+-----+-------+ .. versionadded:: 1.1.0 """ # override complement kwarg kwargs['complement'] = False t1 = select(table, *args, **kwargs) kwargs['complement'] = True t2 = select(table, *args, **kwargs) return t1, t2 Table.biselect = biselect petl-1.7.15/petl/transform/setops.py000066400000000000000000000363261457414240700174410ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division from collections import Counter from petl.compat import next from petl.comparison import Comparable from petl.util.base import header, Table from petl.transform.sorts import sort from petl.transform.basics import cut def complement(a, b, presorted=False, buffersize=None, tempdir=None, cache=True, strict=False): """ Return rows in `a` that are not in `b`. E.g.:: >>> import petl as etl >>> a = [['foo', 'bar', 'baz'], ... ['A', 1, True], ... ['C', 7, False], ... ['B', 2, False], ... ['C', 9, True]] >>> b = [['x', 'y', 'z'], ... ['B', 2, False], ... ['A', 9, False], ... ['B', 3, True], ... ['C', 9, True]] >>> aminusb = etl.complement(a, b) >>> aminusb +-----+-----+-------+ | foo | bar | baz | +=====+=====+=======+ | 'A' | 1 | True | +-----+-----+-------+ | 'C' | 7 | False | +-----+-----+-------+ >>> bminusa = etl.complement(b, a) >>> bminusa +-----+---+-------+ | x | y | z | +=====+===+=======+ | 'A' | 9 | False | +-----+---+-------+ | 'B' | 3 | True | +-----+---+-------+ Note that the field names of each table are ignored - rows are simply compared following a lexical sort. See also the :func:`petl.transform.setops.recordcomplement` function. If `presorted` is True, it is assumed that the data are already sorted by the given key, and the `buffersize`, `tempdir` and `cache` arguments are ignored. Otherwise, the data are sorted, see also the discussion of the `buffersize`, `tempdir` and `cache` arguments under the :func:`petl.transform.sorts.sort` function. Note that the default behaviour is not strictly set-like, because duplicate rows are counted separately, e.g.:: >>> a = [['foo', 'bar'], ... ['A', 1], ... ['B', 2], ... ['B', 2], ... ['C', 7]] >>> b = [['foo', 'bar'], ... ['B', 2]] >>> aminusb = etl.complement(a, b) >>> aminusb +-----+-----+ | foo | bar | +=====+=====+ | 'A' | 1 | +-----+-----+ | 'B' | 2 | +-----+-----+ | 'C' | 7 | +-----+-----+ This behaviour can be changed with the `strict` keyword argument, e.g.:: >>> aminusb = etl.complement(a, b, strict=True) >>> aminusb +-----+-----+ | foo | bar | +=====+=====+ | 'A' | 1 | +-----+-----+ | 'C' | 7 | +-----+-----+ .. versionchanged:: 1.1.0 If `strict` is `True` then strict set-like behaviour is used, i.e., only rows in `a` not found in `b` are returned. """ return ComplementView(a, b, presorted=presorted, buffersize=buffersize, tempdir=tempdir, cache=cache, strict=strict) Table.complement = complement class ComplementView(Table): def __init__(self, a, b, presorted=False, buffersize=None, tempdir=None, cache=True, strict=False): if presorted: self.a = a self.b = b else: self.a = sort(a, buffersize=buffersize, tempdir=tempdir, cache=cache) self.b = sort(b, buffersize=buffersize, tempdir=tempdir, cache=cache) self.strict = strict def __iter__(self): return itercomplement(self.a, self.b, self.strict) def itercomplement(ta, tb, strict): # coerce rows to tuples to ensure hashable and comparable ita = (tuple(row) for row in iter(ta)) itb = (tuple(row) for row in iter(tb)) ahdr = tuple(next(ita)) next(itb) # ignore b fields yield ahdr try: a = next(ita) except StopIteration: pass else: try: b = next(itb) except StopIteration: yield a for row in ita: yield row else: # we want the elements in a that are not in b while True: if b is None or Comparable(a) < Comparable(b): yield a try: a = next(ita) except StopIteration: break elif a == b: try: a = next(ita) except StopIteration: break if not strict: try: b = next(itb) except StopIteration: b = None else: try: b = next(itb) except StopIteration: b = None def recordcomplement(a, b, buffersize=None, tempdir=None, cache=True, strict=False): """ Find records in `a` that are not in `b`. E.g.:: >>> import petl as etl >>> a = [['foo', 'bar', 'baz'], ... ['A', 1, True], ... ['C', 7, False], ... ['B', 2, False], ... ['C', 9, True]] >>> b = [['bar', 'foo', 'baz'], ... [2, 'B', False], ... [9, 'A', False], ... [3, 'B', True], ... [9, 'C', True]] >>> aminusb = etl.recordcomplement(a, b) >>> aminusb +-----+-----+-------+ | foo | bar | baz | +=====+=====+=======+ | 'A' | 1 | True | +-----+-----+-------+ | 'C' | 7 | False | +-----+-----+-------+ >>> bminusa = etl.recordcomplement(b, a) >>> bminusa +-----+-----+-------+ | bar | foo | baz | +=====+=====+=======+ | 3 | 'B' | True | +-----+-----+-------+ | 9 | 'A' | False | +-----+-----+-------+ Note that both tables must have the same set of fields, but that the order of the fields does not matter. See also the :func:`petl.transform.setops.complement` function. See also the discussion of the `buffersize`, `tempdir` and `cache` arguments under the :func:`petl.transform.sorts.sort` function. """ # TODO possible with only one pass? ha = header(a) hb = header(b) assert set(ha) == set(hb), 'both tables must have the same set of fields' # make sure fields are in the same order bv = cut(b, *ha) return complement(a, bv, buffersize=buffersize, tempdir=tempdir, cache=cache, strict=strict) Table.recordcomplement = recordcomplement def diff(a, b, presorted=False, buffersize=None, tempdir=None, cache=True, strict=False): """ Find the difference between rows in two tables. Returns a pair of tables. E.g.:: >>> import petl as etl >>> a = [['foo', 'bar', 'baz'], ... ['A', 1, True], ... ['C', 7, False], ... ['B', 2, False], ... ['C', 9, True]] >>> b = [['x', 'y', 'z'], ... ['B', 2, False], ... ['A', 9, False], ... ['B', 3, True], ... ['C', 9, True]] >>> added, subtracted = etl.diff(a, b) >>> # rows in b not in a ... added +-----+---+-------+ | x | y | z | +=====+===+=======+ | 'A' | 9 | False | +-----+---+-------+ | 'B' | 3 | True | +-----+---+-------+ >>> # rows in a not in b ... subtracted +-----+-----+-------+ | foo | bar | baz | +=====+=====+=======+ | 'A' | 1 | True | +-----+-----+-------+ | 'C' | 7 | False | +-----+-----+-------+ Convenient shorthand for ``(complement(b, a), complement(a, b))``. See also :func:`petl.transform.setops.complement`. If `presorted` is True, it is assumed that the data are already sorted by the given key, and the `buffersize`, `tempdir` and `cache` arguments are ignored. Otherwise, the data are sorted, see also the discussion of the `buffersize`, `tempdir` and `cache` arguments under the :func:`petl.transform.sorts.sort` function. .. versionchanged:: 1.1.0 If `strict` is `True` then strict set-like behaviour is used. """ if not presorted: a = sort(a) b = sort(b) added = complement(b, a, presorted=True, buffersize=buffersize, tempdir=tempdir, cache=cache, strict=strict) subtracted = complement(a, b, presorted=True, buffersize=buffersize, tempdir=tempdir, cache=cache, strict=strict) return added, subtracted Table.diff = diff def recorddiff(a, b, buffersize=None, tempdir=None, cache=True, strict=False): """ Find the difference between records in two tables. E.g.:: >>> import petl as etl >>> a = [['foo', 'bar', 'baz'], ... ['A', 1, True], ... ['C', 7, False], ... ['B', 2, False], ... ['C', 9, True]] >>> b = [['bar', 'foo', 'baz'], ... [2, 'B', False], ... [9, 'A', False], ... [3, 'B', True], ... [9, 'C', True]] >>> added, subtracted = etl.recorddiff(a, b) >>> added +-----+-----+-------+ | bar | foo | baz | +=====+=====+=======+ | 3 | 'B' | True | +-----+-----+-------+ | 9 | 'A' | False | +-----+-----+-------+ >>> subtracted +-----+-----+-------+ | foo | bar | baz | +=====+=====+=======+ | 'A' | 1 | True | +-----+-----+-------+ | 'C' | 7 | False | +-----+-----+-------+ Convenient shorthand for ``(recordcomplement(b, a), recordcomplement(a, b))``. See also :func:`petl.transform.setops.recordcomplement`. See also the discussion of the `buffersize`, `tempdir` and `cache` arguments under the :func:`petl.transform.sorts.sort` function. .. versionchanged:: 1.1.0 If `strict` is `True` then strict set-like behaviour is used. """ added = recordcomplement(b, a, buffersize=buffersize, tempdir=tempdir, cache=cache, strict=strict) subtracted = recordcomplement(a, b, buffersize=buffersize, tempdir=tempdir, cache=cache, strict=strict) return added, subtracted Table.recorddiff = recorddiff def intersection(a, b, presorted=False, buffersize=None, tempdir=None, cache=True): """ Return rows in `a` that are also in `b`. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar', 'baz'], ... ['A', 1, True], ... ['C', 7, False], ... ['B', 2, False], ... ['C', 9, True]] >>> table2 = [['x', 'y', 'z'], ... ['B', 2, False], ... ['A', 9, False], ... ['B', 3, True], ... ['C', 9, True]] >>> table3 = etl.intersection(table1, table2) >>> table3 +-----+-----+-------+ | foo | bar | baz | +=====+=====+=======+ | 'B' | 2 | False | +-----+-----+-------+ | 'C' | 9 | True | +-----+-----+-------+ If `presorted` is True, it is assumed that the data are already sorted by the given key, and the `buffersize`, `tempdir` and `cache` arguments are ignored. Otherwise, the data are sorted, see also the discussion of the `buffersize`, `tempdir` and `cache` arguments under the :func:`petl.transform.sorts.sort` function. """ return IntersectionView(a, b, presorted=presorted, buffersize=buffersize, tempdir=tempdir, cache=cache) Table.intersection = intersection class IntersectionView(Table): def __init__(self, a, b, presorted=False, buffersize=None, tempdir=None, cache=True): if presorted: self.a = a self.b = b else: self.a = sort(a, buffersize=buffersize, tempdir=tempdir, cache=cache) self.b = sort(b, buffersize=buffersize, tempdir=tempdir, cache=cache) def __iter__(self): return iterintersection(self.a, self.b) def iterintersection(a, b): ita = iter(a) itb = iter(b) ahdr = next(ita) next(itb) # ignore b header yield tuple(ahdr) try: a = tuple(next(ita)) b = tuple(next(itb)) while True: if Comparable(a) < Comparable(b): a = tuple(next(ita)) elif a == b: yield a a = tuple(next(ita)) b = tuple(next(itb)) else: b = tuple(next(itb)) except StopIteration: pass def hashcomplement(a, b, strict=False): """ Alternative implementation of :func:`petl.transform.setops.complement`, where the complement is executed by constructing an in-memory set for all rows found in the right hand table, then iterating over rows from the left hand table. May be faster and/or more resource efficient where the right table is small and the left table is large. .. versionchanged:: 1.1.0 If `strict` is `True` then strict set-like behaviour is used, i.e., only rows in `a` not found in `b` are returned. """ return HashComplementView(a, b, strict=strict) Table.hashcomplement = hashcomplement class HashComplementView(Table): def __init__(self, a, b, strict=False): self.a = a self.b = b self.strict = strict def __iter__(self): return iterhashcomplement(self.a, self.b, self.strict) def iterhashcomplement(a, b, strict): ita = iter(a) ahdr = next(ita) yield tuple(ahdr) itb = iter(b) next(itb) # discard b header, assume same as a # N.B., need to account for possibility of duplicate rows bcnt = Counter(tuple(row) for row in itb) for ar in ita: t = tuple(ar) if bcnt[t] > 0: if not strict: bcnt[t] -= 1 else: yield t def hashintersection(a, b): """ Alternative implementation of :func:`petl.transform.setops.intersection`, where the intersection is executed by constructing an in-memory set for all rows found in the right hand table, then iterating over rows from the left hand table. May be faster and/or more resource efficient where the right table is small and the left table is large. """ return HashIntersectionView(a, b) Table.hashintersection = hashintersection class HashIntersectionView(Table): def __init__(self, a, b): self.a = a self.b = b def __iter__(self): return iterhashintersection(self.a, self.b) def iterhashintersection(a, b): ita = iter(a) ahdr = next(ita) yield tuple(ahdr) itb = iter(b) next(itb) # discard b header, assume same as a # N.B., need to account for possibility of duplicate rows bcnt = Counter(tuple(row) for row in itb) for ar in ita: t = tuple(ar) if bcnt[t] > 0: yield t bcnt[t] -= 1 petl-1.7.15/petl/transform/sorts.py000077500000000000000000000443021457414240700172720ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division import os import heapq from tempfile import NamedTemporaryFile import itertools import logging from collections import namedtuple import operator from petl.compat import pickle, next, text_type import petl.config as config from petl.comparison import comparable_itemgetter from petl.util.base import Table, asindices logger = logging.getLogger(__name__) warning = logger.warning info = logger.info debug = logger.debug def sort(table, key=None, reverse=False, buffersize=None, tempdir=None, cache=True): """ Sort the table. Field names or indices (from zero) can be used to specify the key. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar'], ... ['C', 2], ... ['A', 9], ... ['A', 6], ... ['F', 1], ... ['D', 10]] >>> table2 = etl.sort(table1, 'foo') >>> table2 +-----+-----+ | foo | bar | +=====+=====+ | 'A' | 9 | +-----+-----+ | 'A' | 6 | +-----+-----+ | 'C' | 2 | +-----+-----+ | 'D' | 10 | +-----+-----+ | 'F' | 1 | +-----+-----+ >>> # sorting by compound key is supported ... table3 = etl.sort(table1, key=['foo', 'bar']) >>> table3 +-----+-----+ | foo | bar | +=====+=====+ | 'A' | 6 | +-----+-----+ | 'A' | 9 | +-----+-----+ | 'C' | 2 | +-----+-----+ | 'D' | 10 | +-----+-----+ | 'F' | 1 | +-----+-----+ >>> # if no key is specified, the default is a lexical sort ... table4 = etl.sort(table1) >>> table4 +-----+-----+ | foo | bar | +=====+=====+ | 'A' | 6 | +-----+-----+ | 'A' | 9 | +-----+-----+ | 'C' | 2 | +-----+-----+ | 'D' | 10 | +-----+-----+ | 'F' | 1 | +-----+-----+ The `buffersize` argument should be an `int` or `None`. If the number of rows in the table is less than `buffersize`, the table will be sorted in memory. Otherwise, the table is sorted in chunks of no more than `buffersize` rows, each chunk is written to a temporary file, and then a merge sort is performed on the temporary files. If `buffersize` is `None`, the value of `petl.config.sort_buffersize` will be used. By default this is set to 100000 rows, but can be changed, e.g.:: >>> import petl.config >>> petl.config.sort_buffersize = 500000 If `petl.config.sort_buffersize` is set to `None`, this forces all sorting to be done entirely in memory. By default the results of the sort will be cached, and so a second pass over the sorted table will yield rows from the cache and will not repeat the sort operation. To turn off caching, set the `cache` argument to `False`. """ return SortView(table, key=key, reverse=reverse, buffersize=buffersize, tempdir=tempdir, cache=cache) Table.sort = sort def _iterchunk(fn): # reopen so iterators from file cache are independent debug('iterchunk, opening %s' % fn) with open(fn, 'rb') as f: try: while True: yield pickle.load(f) except EOFError: pass debug('end of iterchunk, closed %s' % fn) class _Keyed(namedtuple('Keyed', ['key', 'obj'])): # Override default behavior of namedtuple comparisons, only keys need to be compared for heapmerge def __eq__(self, other): return self.key == other.key def __lt__(self, other): return self.key < other.key def __le__(self, other): return self.key <= other.key def __ne__(self, other): return self.key != other.key def __gt__(self, other): return self.key > other.key def __ge__(self, other): return self.key >= other.key def _heapqmergesorted(key=None, *iterables): """Return a single iterator over the given iterables, sorted by the given `key` function, assuming the input iterables are already sorted by the same function. (I.e., the merge part of a general merge sort.) Uses :func:`heapq.merge` for the underlying implementation.""" if key is None: keyed_iterables = iterables for element in heapq.merge(*keyed_iterables): yield element else: keyed_iterables = [(_Keyed(key(obj), obj) for obj in iterable) for iterable in iterables] for element in heapq.merge(*keyed_iterables): yield element.obj def _shortlistmergesorted(key=None, reverse=False, *iterables): """Return a single iterator over the given iterables, sorted by the given `key` function, assuming the input iterables are already sorted by the same function. (I.e., the merge part of a general merge sort.) Uses :func:`min` (or :func:`max` if ``reverse=True``) for the underlying implementation.""" if reverse: op = max else: op = min if key is not None: opkwargs = {'key': key} else: opkwargs = dict() # populate initial shortlist # (remember some iterables might be empty) iterators = list() shortlist = list() for iterable in iterables: it = iter(iterable) try: first = next(it) iterators.append(it) shortlist.append(first) except StopIteration: pass # do the mergesort while iterators: nxt = op(shortlist, **opkwargs) yield nxt nextidx = shortlist.index(nxt) try: shortlist[nextidx] = next(iterators[nextidx]) except StopIteration: del shortlist[nextidx] del iterators[nextidx] def _mergesorted(key=None, reverse=False, *iterables): # N.B., I've used heapq for normal merge sort and shortlist merge sort for # reverse merge sort because I've assumed that heapq.merge is faster and # so is preferable but it doesn't support reverse sorting so the shortlist # merge sort has to be used for reverse sorting. Some casual profiling # suggests there isn't much between the two in terms of speed, but might be # worth profiling more carefully if reverse: return _shortlistmergesorted(key, True, *iterables) else: return _heapqmergesorted(key, *iterables) class SortView(Table): def __init__(self, source, key=None, reverse=False, buffersize=None, tempdir=None, cache=True): self.source = source self.key = key self.reverse = reverse if buffersize is None: self.buffersize = config.sort_buffersize else: self.buffersize = buffersize self.tempdir = tempdir self.cache = cache self._hdrcache = None self._memcache = None self._filecache = None self._getkey = None def clearcache(self): debug('clear cache') self._hdrcache = None self._memcache = None self._filecache = None self._getkey = None def __iter__(self): source = self.source key = self.key reverse = self.reverse if self.cache and self._memcache is not None: return self._iterfrommemcache() elif self.cache and self._filecache is not None: return self._iterfromfilecache() else: return self._iternocache(source, key, reverse) def _iterfrommemcache(self): debug('iterate from memory cache') yield tuple(self._hdrcache) for row in self._memcache: yield tuple(row) def _iterfromfilecache(self): # create a reference to the filecache here, so cleanup happens in the # correct order filecache = self._filecache filenames = list(map(operator.attrgetter('name'), filecache)) debug('iterate from file cache: %r', filenames) yield tuple(self._hdrcache) chunkiters = [_iterchunk(fn) for fn in filenames] rows = _mergesorted(self._getkey, self.reverse, *chunkiters) try: for row in rows: yield tuple(row) finally: debug('attempt cleanup from generator') # N.B., need to ensure that any open files are closed **before** # temporary files are deleted, as deletion will fail on Windows # if file is in use (i.e., still open) del chunkiters del rows del filecache debug('exiting generator') def _iternocache(self, source, key, reverse): debug('iterate without cache') self.clearcache() it = iter(source) try: hdr = next(it) except StopIteration: if key is None: return # nothing to do on a table without headers hdr = [] yield tuple(hdr) if key is not None: # convert field selection into field indices indices = asindices(hdr, key) else: indices = range(len(hdr)) # now use field indices to construct a _getkey function getkey = comparable_itemgetter(*indices) # TODO support native comparison # initialise the first chunk rows = list(itertools.islice(it, 0, self.buffersize)) rows.sort(key=getkey, reverse=reverse) # have we exhausted the source iterator? if self.buffersize is None or len(rows) < self.buffersize: # yes, table fits within sort buffer if self.cache: debug('caching mem') self._hdrcache = hdr self._memcache = rows # actually not needed to iterate from memcache self._getkey = getkey for row in rows: yield tuple(row) else: # no, table is too big, need to sort in chunks chunkfiles = [] while rows: # dump the chunk with NamedTemporaryFile(dir=self.tempdir, delete=False, mode='wb') as f: # N.B., we **don't** want the file to be deleted on close, # but we **do** want the file to be deleted when self # is garbage collected, or when the program exits. When # all references to the wrapper are gone, the file should # get deleted. wrapper = _NamedTempFileDeleteOnGC(f.name) debug('created temporary chunk file %s' % f.name) for row in rows: pickle.dump(row, f, protocol=-1) f.flush() chunkfiles.append(wrapper) # grab the next chunk rows = list(itertools.islice(it, 0, self.buffersize)) rows.sort(key=getkey, reverse=reverse) if self.cache: debug('caching files') self._hdrcache = hdr self._filecache = chunkfiles self._getkey = getkey chunkiters = [_iterchunk(f.name) for f in chunkfiles] for row in _mergesorted(getkey, reverse, *chunkiters): yield tuple(row) class _NamedTempFileDeleteOnGC(object): def __init__(self, name): self.name = name def delete(self, unlink=os.unlink, log=logger.debug): name = self.name try: log('deleting %s' % name) unlink(name) except Exception as e: log('exception deleting %s: %s' % (name, e)) raise else: log('deleted %s' % name) def __del__(self): self.delete() def __str__(self): return self.name def __repr__(self): return self.name def mergesort(*tables, **kwargs): """ Combine multiple input tables into one sorted output table. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar'], ... ['A', 9], ... ['C', 2], ... ['D', 10], ... ['A', 6], ... ['F', 1]] >>> table2 = [['foo', 'bar'], ... ['B', 3], ... ['D', 10], ... ['A', 10], ... ['F', 4]] >>> table3 = etl.mergesort(table1, table2, key='foo') >>> table3.lookall() +-----+-----+ | foo | bar | +=====+=====+ | 'A' | 9 | +-----+-----+ | 'A' | 6 | +-----+-----+ | 'A' | 10 | +-----+-----+ | 'B' | 3 | +-----+-----+ | 'C' | 2 | +-----+-----+ | 'D' | 10 | +-----+-----+ | 'D' | 10 | +-----+-----+ | 'F' | 1 | +-----+-----+ | 'F' | 4 | +-----+-----+ If the input tables are already sorted by the given key, give ``presorted=True`` as a keyword argument. This function is equivalent to concatenating the input tables using :func:`cat` then sorting, however this function will typically be more efficient, especially if the input tables are presorted. Keyword arguments: key : string or tuple of strings, optional Field name or tuple of fields to sort by (defaults to `None` lexical sort) reverse : bool, optional `True` if sort in reverse (descending) order (defaults to `False`) presorted : bool, optional `True` if inputs are already sorted by the given key (defaults to `False`) missing : object Value to fill with when input tables have different fields (defaults to `None`) header : sequence of strings, optional Specify a fixed header for the output table buffersize : int, optional Limit the number of rows in memory per input table when inputs are not presorted """ return MergeSortView(tables, **kwargs) Table.mergesort = mergesort class MergeSortView(Table): def __init__(self, tables, key=None, reverse=False, presorted=False, missing=None, header=None, buffersize=None, tempdir=None, cache=True): self.key = key if presorted: self.tables = tables else: self.tables = [sort(t, key=key, reverse=reverse, buffersize=buffersize, tempdir=tempdir, cache=cache) for t in tables] self.missing = missing self.header = header self.reverse = reverse def __iter__(self): return itermergesort(self.tables, self.key, self.header, self.missing, self.reverse) def itermergesort(sources, key, header, missing, reverse): # first need to standardise headers of all input tables # borrow this from itercat - TODO remove code smells its = [iter(t) for t in sources] src_hdrs = [] for it in its: try: src_hdrs.append(next(it)) except StopIteration: src_hdrs.append([]) if header is None: # determine output fields by gathering all fields found in the sources outhdr = list() for hdr in src_hdrs: for f in list(map(text_type, hdr)): if f not in outhdr: # add any new fields as we find them outhdr.append(f) else: # predetermined output fields outhdr = header yield tuple(outhdr) def _standardisedata(it, hdr, ofs): flds = list(map(text_type, hdr)) # now construct and yield the data rows for _row in it: try: # should be quickest to do this way yield tuple(_row[flds.index(fo)] if fo in flds else missing for fo in ofs) except IndexError: # handle short rows outrow = [missing] * len(ofs) for i, fi in enumerate(flds): try: outrow[ofs.index(fi)] = _row[i] except IndexError: pass # be relaxed about short rows yield tuple(outrow) # wrap all iterators to standardise fields sits = [_standardisedata(it, hdr, outhdr) for hdr, it in zip(src_hdrs, its)] # now determine key function getkey = None if key is not None: # convert field selection into field indices indices = asindices(outhdr, key) # now use field indices to construct a _getkey function # N.B., this will probably raise an exception on short rows getkey = comparable_itemgetter(*indices) # OK, do the merge sort for row in _shortlistmergesorted(getkey, reverse, *sits): yield row def issorted(table, key=None, reverse=False, strict=False): """ Return True if the table is ordered (i.e., sorted) by the given key. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar', 'baz'], ... ['a', 1, True], ... ['b', 3, True], ... ['b', 2]] >>> etl.issorted(table1, key='foo') True >>> etl.issorted(table1, key='bar') False >>> etl.issorted(table1, key='foo', strict=True) False >>> etl.issorted(table1, key='foo', reverse=True) False """ # determine the operator to use when comparing rows if reverse and strict: op = operator.lt elif reverse and not strict: op = operator.le elif strict: op = operator.gt else: op = operator.ge it = iter(table) try: flds = [text_type(f) for f in next(it)] except StopIteration: flds = [] if key is None: prev = next(it) for curr in it: if not op(curr, prev): return False prev = curr else: getkey = comparable_itemgetter(*asindices(flds, key)) prev = next(it) prevkey = getkey(prev) for curr in it: currkey = getkey(curr) if not op(currkey, prevkey): return False prevkey = currkey return True Table.issorted = issorted petl-1.7.15/petl/transform/unpacks.py000066400000000000000000000137731457414240700175710ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division import itertools from petl.compat import next, text_type from petl.errors import ArgumentError from petl.util.base import Table def unpack(table, field, newfields=None, include_original=False, missing=None): """ Unpack data values that are lists or tuples. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar'], ... [1, ['a', 'b']], ... [2, ['c', 'd']], ... [3, ['e', 'f']]] >>> table2 = etl.unpack(table1, 'bar', ['baz', 'quux']) >>> table2 +-----+-----+------+ | foo | baz | quux | +=====+=====+======+ | 1 | 'a' | 'b' | +-----+-----+------+ | 2 | 'c' | 'd' | +-----+-----+------+ | 3 | 'e' | 'f' | +-----+-----+------+ This function will attempt to unpack exactly the number of values as given by the number of new fields specified. If there are more values than new fields, remaining values will not be unpacked. If there are less values than new fields, `missing` values will be added. See also :func:`petl.transform.unpacks.unpackdict`. """ return UnpackView(table, field, newfields=newfields, include_original=include_original, missing=missing) Table.unpack = unpack class UnpackView(Table): def __init__(self, source, field, newfields=None, include_original=False, missing=None): self.source = source self.field = field self.newfields = newfields self.include_original = include_original self.missing = missing def __iter__(self): return iterunpack(self.source, self.field, self.newfields, self.include_original, self.missing) def iterunpack(source, field, newfields, include_original, missing): it = iter(source) try: hdr = next(it) except StopIteration: hdr = [] flds = list(map(text_type, hdr)) if field in flds: field_index = flds.index(field) elif isinstance(field, int) and field < len(flds): field_index = field field = flds[field_index] else: raise ArgumentError('field invalid: must be either field name or index') # determine output fields outhdr = list(flds) if not include_original: outhdr.remove(field) if isinstance(newfields, (list, tuple)): outhdr.extend(newfields) nunpack = len(newfields) elif isinstance(newfields, int): nunpack = newfields newfields = [text_type(field) + text_type(i+1) for i in range(newfields)] outhdr.extend(newfields) elif newfields is None: nunpack = 0 else: raise ArgumentError('newfields argument must be list or tuple of field ' 'names, or int (number of values to unpack)') yield tuple(outhdr) # construct the output data for row in it: value = row[field_index] if include_original: out_row = list(row) else: out_row = [v for i, v in enumerate(row) if i != field_index] nvals = len(value) if nunpack > 0: if nvals >= nunpack: newvals = value[:nunpack] else: newvals = list(value) + ([missing] * (nunpack - nvals)) out_row.extend(newvals) yield tuple(out_row) def unpackdict(table, field, keys=None, includeoriginal=False, samplesize=1000, missing=None): """ Unpack dictionary values into separate fields. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar'], ... [1, {'baz': 'a', 'quux': 'b'}], ... [2, {'baz': 'c', 'quux': 'd'}], ... [3, {'baz': 'e', 'quux': 'f'}]] >>> table2 = etl.unpackdict(table1, 'bar') >>> table2 +-----+-----+------+ | foo | baz | quux | +=====+=====+======+ | 1 | 'a' | 'b' | +-----+-----+------+ | 2 | 'c' | 'd' | +-----+-----+------+ | 3 | 'e' | 'f' | +-----+-----+------+ See also :func:`petl.transform.unpacks.unpack`. """ return UnpackDictView(table, field, keys=keys, includeoriginal=includeoriginal, samplesize=samplesize, missing=missing) Table.unpackdict = unpackdict class UnpackDictView(Table): def __init__(self, table, field, keys=None, includeoriginal=False, samplesize=1000, missing=None): self.table = table self.field = field self.keys = keys self.includeoriginal = includeoriginal self.samplesize = samplesize self.missing = missing def __iter__(self): return iterunpackdict(self.table, self.field, self.keys, self.includeoriginal, self.samplesize, self.missing) def iterunpackdict(table, field, keys, includeoriginal, samplesize, missing): # set up it = iter(table) try: hdr = next(it) except StopIteration: hdr = [] flds = list(map(text_type, hdr)) fidx = flds.index(field) outhdr = list(flds) if not includeoriginal: del outhdr[fidx] # are keys specified? if not keys: # need to sample to find keys sample = list(itertools.islice(it, samplesize)) keys = set() for row in sample: try: keys |= set(row[fidx].keys()) except AttributeError: pass it = itertools.chain(sample, it) keys = sorted(keys) outhdr.extend(keys) yield tuple(outhdr) # generate the data rows for row in it: outrow = list(row) if not includeoriginal: del outrow[fidx] for key in keys: try: outrow.append(row[fidx][key]) except (IndexError, KeyError, TypeError): outrow.append(missing) yield tuple(outrow) petl-1.7.15/petl/transform/validation.py000066400000000000000000000161501457414240700202470ustar00rootroot00000000000000# -*- coding: utf-8 -*- from __future__ import absolute_import, print_function, division import operator from petl.compat import text_type from petl.util.base import Table, asindices, Record def validate(table, constraints=None, header=None): """ Validate a `table` against a set of `constraints` and/or an expected `header`, e.g.:: >>> import petl as etl >>> # define some validation constraints ... header = ('foo', 'bar', 'baz') >>> constraints = [ ... dict(name='foo_int', field='foo', test=int), ... dict(name='bar_date', field='bar', test=etl.dateparser('%Y-%m-%d')), ... dict(name='baz_enum', field='baz', assertion=lambda v: v in ['Y', 'N']), ... dict(name='not_none', assertion=lambda row: None not in row), ... dict(name='qux_int', field='qux', test=int, optional=True), ... ] >>> # now validate a table ... table = (('foo', 'bar', 'bazzz'), ... (1, '2000-01-01', 'Y'), ... ('x', '2010-10-10', 'N'), ... (2, '2000/01/01', 'Y'), ... (3, '2015-12-12', 'x'), ... (4, None, 'N'), ... ('y', '1999-99-99', 'z'), ... (6, '2000-01-01'), ... (7, '2001-02-02', 'N', True)) >>> problems = etl.validate(table, constraints=constraints, header=header) >>> problems.lookall() +--------------+-----+-------+--------------+------------------+ | name | row | field | value | error | +==============+=====+=======+==============+==================+ | '__header__' | 0 | None | None | 'AssertionError' | +--------------+-----+-------+--------------+------------------+ | 'foo_int' | 2 | 'foo' | 'x' | 'ValueError' | +--------------+-----+-------+--------------+------------------+ | 'bar_date' | 3 | 'bar' | '2000/01/01' | 'ValueError' | +--------------+-----+-------+--------------+------------------+ | 'baz_enum' | 4 | 'baz' | 'x' | 'AssertionError' | +--------------+-----+-------+--------------+------------------+ | 'bar_date' | 5 | 'bar' | None | 'AttributeError' | +--------------+-----+-------+--------------+------------------+ | 'not_none' | 5 | None | None | 'AssertionError' | +--------------+-----+-------+--------------+------------------+ | 'foo_int' | 6 | 'foo' | 'y' | 'ValueError' | +--------------+-----+-------+--------------+------------------+ | 'bar_date' | 6 | 'bar' | '1999-99-99' | 'ValueError' | +--------------+-----+-------+--------------+------------------+ | 'baz_enum' | 6 | 'baz' | 'z' | 'AssertionError' | +--------------+-----+-------+--------------+------------------+ | '__len__' | 7 | None | 2 | 'AssertionError' | +--------------+-----+-------+--------------+------------------+ | 'baz_enum' | 7 | 'baz' | None | 'AssertionError' | +--------------+-----+-------+--------------+------------------+ | '__len__' | 8 | None | 4 | 'AssertionError' | +--------------+-----+-------+--------------+------------------+ Returns a table of validation problems. """ # noqa return ProblemsView(table, constraints=constraints, header=header) Table.validate = validate class ProblemsView(Table): def __init__(self, table, constraints, header): self.table = table self.constraints = constraints self.header = header def __iter__(self): return iterproblems(self.table, self.constraints, self.header) def normalize_constraints(constraints, flds): """ This method renders local constraints such that return value is: * a list, not None * a list of dicts * a list of non-optional constraints or optional with defined field .. note:: We use a new variable 'local_constraints' because the constraints parameter may be a mutable collection, and we do not wish to cause side-effects by modifying it locally """ local_constraints = constraints or [] local_constraints = [dict(**c) for c in local_constraints] local_constraints = [ c for c in local_constraints if c.get('field') in flds or not c.get('optional') ] return local_constraints def iterproblems(table, constraints, expected_header): outhdr = ('name', 'row', 'field', 'value', 'error') yield outhdr it = iter(table) try: actual_header = next(it) except StopIteration: actual_header = [] if expected_header is None: flds = list(map(text_type, actual_header)) else: expected_flds = list(map(text_type, expected_header)) actual_flds = list(map(text_type, actual_header)) try: assert expected_flds == actual_flds except Exception as e: yield ('__header__', 0, None, None, type(e).__name__) flds = expected_flds local_constraints = normalize_constraints(constraints, flds) # setup getters for constraint in local_constraints: if 'getter' not in constraint: if 'field' in constraint: # should ensure FieldSelectionError if bad field in constraint indices = asindices(flds, constraint['field']) getter = operator.itemgetter(*indices) constraint['getter'] = getter # generate problems expected_len = len(flds) for i, row in enumerate(it): row = tuple(row) # row length constraint l = None try: l = len(row) assert l == expected_len except Exception as e: yield ('__len__', i+1, None, l, type(e).__name__) # user defined constraints row = Record(row, flds) for constraint in local_constraints: name = constraint.get('name', None) field = constraint.get('field', None) assertion = constraint.get('assertion', None) test = constraint.get('test', None) getter = constraint.get('getter', lambda x: x) try: target = getter(row) except Exception as e: # getting target value failed, report problem yield (name, i+1, field, None, type(e).__name__) else: value = target if field else None if test is not None: try: test(target) except Exception as e: # test raised exception, report problem yield (name, i+1, field, value, type(e).__name__) if assertion is not None: try: assert assertion(target) except Exception as e: # assertion raised exception, report problem yield (name, i+1, field, value, type(e).__name__) petl-1.7.15/petl/util/000077500000000000000000000000001457414240700145025ustar00rootroot00000000000000petl-1.7.15/petl/util/__init__.py000066400000000000000000000020361457414240700166140ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division from petl.util.base import Table, Record, values, header, data, \ fieldnames, records, dicts, namedtuples, expr, rowgroupby, empty, wrap from petl.util.lookups import lookup, lookupone, dictlookup, dictlookupone, \ recordlookup, recordlookupone from petl.util.parsers import dateparser, timeparser, datetimeparser, \ numparser, boolparser from petl.util.vis import look, lookall, lookstr, lookallstr, see from petl.util.random import randomtable, dummytable from petl.util.counting import parsecounter, parsecounts, typecounter, \ typecounts, valuecount, valuecounter, valuecounts, stringpatterncounter, \ stringpatterns, rowlengths, nrows from petl.util.materialise import listoflists, listoftuples, tupleoflists, \ tupleoftuples, columns, facetcolumns from petl.util.timing import progress, log_progress, clock from petl.util.statistics import limits, stats from petl.util.misc import typeset, diffheaders, diffvalues, nthword, strjoin, \ coalesce petl-1.7.15/petl/util/base.py000066400000000000000000000476271457414240700160060ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division import re from itertools import islice, chain, cycle, product,\ permutations, combinations, takewhile, dropwhile, \ starmap, groupby, tee import operator from collections import Counter, namedtuple, OrderedDict from itertools import compress, combinations_with_replacement from petl.compat import imap, izip, izip_longest, ifilter, ifilterfalse, \ reduce, next, string_types, text_type from petl.errors import FieldSelectionError from petl.comparison import comparable_itemgetter class IterContainer(object): def __contains__(self, item): for o in self: if o == item: return True return False def __len__(self): return sum(1 for _ in self) def __getitem__(self, item): if isinstance(item, int): try: return next(islice(self, item, item+1)) except StopIteration: raise IndexError('index out of range') elif isinstance(item, slice): return islice(self, item.start, item.stop, item.step) def __iter__(self): raise NotImplementedError def index(self, item): for i, o in enumerate(self): if o == item: return i raise ValueError('%s is not in container' % item) def min(self, **kwargs): return min(self, **kwargs) def max(self, **kwargs): return max(self, **kwargs) def len(self): return len(self) def set(self): return set(self) def frozenset(self): return frozenset(self) def list(self): # avoid iterating twice return list(iter(self)) def tuple(self): # avoid iterating twice return tuple(iter(self)) def dict(self, **kwargs): return dict(self, **kwargs) def enumerate(self, start=0): return enumerate(self, start) def filter(self, function): return filter(function, self) def map(self, function): return map(function, self) def reduce(self, function, **kwargs): return reduce(function, self, **kwargs) def sum(self, *args, **kwargs): return sum(self, *args, **kwargs) def all(self): return all(self) def any(self): return any(self) def apply(self, function): for item in self: function(item) def counter(self): return Counter(self) def ordereddict(self): return OrderedDict(self) def cycle(self): return cycle(self) def chain(self, *others): return chain(self, *others) def dropwhile(self, predicate): return dropwhile(predicate, self) def takewhile(self, predicate): return takewhile(predicate, self) def ifilter(self, predicate): return ifilter(predicate, self) def ifilterfalse(self, predicate): return ifilterfalse(predicate, self) def imap(self, function): return imap(function, self) def starmap(self, function): return starmap(function, self) def islice(self, *args): return islice(self, *args) def compress(self, selectors): return compress(self, selectors) def groupby(self, *args, **kwargs): return groupby(self, *args, **kwargs) def tee(self, *args, **kwargs): return tee(self, *args, **kwargs) def permutations(self, *args, **kwargs): return permutations(self, *args, **kwargs) def combinations(self, *args, **kwargs): return combinations(self, *args, **kwargs) def combinations_with_replacement(self, *args, **kwargs): return combinations_with_replacement(self, *args, **kwargs) def izip(self, *args, **kwargs): return izip(self, *args, **kwargs) def izip_longest(self, *args, **kwargs): return izip_longest(self, *args, **kwargs) def product(self, *args, **kwargs): return product(self, *args, **kwargs) def __add__(self, other): return chain(self, other) def __iadd__(self, other): return chain(self, other) class Table(IterContainer): def __getitem__(self, item): if isinstance(item, string_types): return ValuesView(self, item) else: return super(Table, self).__getitem__(item) def values(table, *field, **kwargs): """ Return a container supporting iteration over values in a given field or fields. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar'], ... ['a', True], ... ['b'], ... ['b', True], ... ['c', False]] >>> foo = etl.values(table1, 'foo') >>> foo foo: 'a', 'b', 'b', 'c' >>> list(foo) ['a', 'b', 'b', 'c'] >>> bar = etl.values(table1, 'bar') >>> bar bar: True, None, True, False >>> list(bar) [True, None, True, False] >>> # values from multiple fields ... table2 = [['foo', 'bar', 'baz'], ... [1, 'a', True], ... [2, 'bb', True], ... [3, 'd', False]] >>> foobaz = etl.values(table2, 'foo', 'baz') >>> foobaz ('foo', 'baz'): (1, True), (2, True), (3, False) >>> list(foobaz) [(1, True), (2, True), (3, False)] The field argument can be a single field name or index (starting from zero) or a tuple of field names and/or indexes. Multiple fields can also be provided as positional arguments. If rows are uneven, the value of the keyword argument `missing` is returned. """ return ValuesView(table, *field, **kwargs) Table.values = values class ValuesView(IterContainer): def __init__(self, table, *field, **kwargs): self.table = table # deal with field arg in a backwards-compatible way if len(field) == 1: field = field[0] self.field = field self.kwargs = kwargs def __iter__(self): return itervalues(self.table, self.field, **self.kwargs) def __repr__(self): vreprs = list(map(repr, islice(self, 6))) r = text_type(self.field) + ': ' r += ', '.join(vreprs[:5]) if len(vreprs) > 5: r += ', ...' return r def itervalues(table, field, **kwargs): missing = kwargs.get('missing', None) it = iter(table) try: hdr = next(it) except StopIteration: hdr = [] indices = asindices(hdr, field) assert len(indices) > 0, 'no field selected' getvalue = operator.itemgetter(*indices) for row in it: try: value = getvalue(row) yield value except IndexError: if len(indices) > 1: # try one at a time value = list() for i in indices: if i < len(row): value.append(row[i]) else: value.append(missing) yield tuple(value) else: yield missing class TableWrapper(Table): def __init__(self, inner): self.inner = inner def __iter__(self): return iter(self.inner) wrap = TableWrapper def asindices(hdr, spec): """Convert the given field `spec` into a list of field indices.""" flds = list(map(text_type, hdr)) indices = list() if not isinstance(spec, (list, tuple)): spec = (spec,) for s in spec: # spec could be a field index (takes priority) if isinstance(s, int) and s < len(hdr): indices.append(s) # index fields from 0 # spec could be a field elif s in flds: idx = flds.index(s) indices.append(idx) flds[idx] = None # replace with None to mark as used else: raise FieldSelectionError(s) return indices def rowitemgetter(hdr, spec): indices = asindices(hdr, spec) getter = comparable_itemgetter(*indices) return getter def rowgetter(*indices): if len(indices) == 0: return lambda row: tuple() elif len(indices) == 1: # if only one index, we cannot use itemgetter, because we want a # singleton sequence to be returned, but itemgetter with a single # argument returns the value itself, so let's define a function index = indices[0] return lambda row: (row[index],) # note comma - singleton tuple # if more than one index, use itemgetter, it should be the most efficient else: return operator.itemgetter(*indices) def header(table): """ Return the header row for the given table. E.g.:: >>> import petl as etl >>> table = [['foo', 'bar'], ['a', 1], ['b', 2]] >>> etl.header(table) ('foo', 'bar') Note that the header row will always be returned as a tuple, regardless of what the underlying data are. """ it = iter(table) return tuple(next(it)) Table.header = header def fieldnames(table): """ Return the string values of the header row. If the header row contains only strings, then this function is equivalent to header(), i.e.:: >>> import petl as etl >>> table = [['foo', 'bar'], ['a', 1], ['b', 2]] >>> etl.fieldnames(table) ('foo', 'bar') >>> etl.header(table) ('foo', 'bar') """ return tuple(text_type(f) for f in header(table)) Table.fieldnames = fieldnames def data(table, *sliceargs): """ Return a container supporting iteration over data rows in a given table (i.e., without the header). E.g.:: >>> import petl as etl >>> table = [['foo', 'bar'], ['a', 1], ['b', 2]] >>> d = etl.data(table) >>> list(d) [['a', 1], ['b', 2]] Positional arguments can be used to slice the data rows. The sliceargs are passed to :func:`itertools.islice`. """ return DataView(table, *sliceargs) Table.data = data class DataView(Table): def __init__(self, table, *sliceargs): self.table = table self.sliceargs = sliceargs def __iter__(self): return iterdata(self.table, *self.sliceargs) def iterdata(table, *sliceargs): it = islice(table, 1, None) # skip header row if sliceargs: it = islice(it, *sliceargs) return it def dicts(table, *sliceargs, **kwargs): """ Return a container supporting iteration over rows as dicts. E.g.:: >>> import petl as etl >>> table = [['foo', 'bar'], ['a', 1], ['b', 2]] >>> d = etl.dicts(table) >>> d {'foo': 'a', 'bar': 1} {'foo': 'b', 'bar': 2} >>> list(d) [{'foo': 'a', 'bar': 1}, {'foo': 'b', 'bar': 2}] Short rows are padded with the value of the `missing` keyword argument. """ return DictsView(table, *sliceargs, **kwargs) Table.dicts = dicts class DictsView(IterContainer): def __init__(self, table, *sliceargs, **kwargs): self.table = table self.sliceargs = sliceargs self.kwargs = kwargs def __iter__(self): return iterdicts(self.table, *self.sliceargs, **self.kwargs) def __repr__(self): vreprs = list(map(repr, islice(self, 6))) r = '\n'.join(vreprs[:5]) if len(vreprs) > 5: r += '\n...' return r def iterdicts(table, *sliceargs, **kwargs): missing = kwargs.get('missing', None) it = iter(table) try: hdr = next(it) except StopIteration: return if sliceargs: it = islice(it, *sliceargs) for row in it: yield asdict(hdr, row, missing) def asdict(hdr, row, missing=None): flds = [text_type(f) for f in hdr] try: # list comprehension should be faster items = [(flds[i], row[i]) for i in range(len(flds))] except IndexError: # short row, fall back to slower for loop items = list() for i, f in enumerate(flds): try: v = row[i] except IndexError: v = missing items.append((f, v)) return dict(items) def namedtuples(table, *sliceargs, **kwargs): """ View the table as a container of named tuples. E.g.:: >>> import petl as etl >>> table = [['foo', 'bar'], ['a', 1], ['b', 2]] >>> d = etl.namedtuples(table) >>> d row(foo='a', bar=1) row(foo='b', bar=2) >>> list(d) [row(foo='a', bar=1), row(foo='b', bar=2)] Short rows are padded with the value of the `missing` keyword argument. The `name` keyword argument can be given to override the name of the named tuple class (defaults to 'row'). """ return NamedTuplesView(table, *sliceargs, **kwargs) Table.namedtuples = namedtuples class NamedTuplesView(IterContainer): def __init__(self, table, *sliceargs, **kwargs): self.table = table self.sliceargs = sliceargs self.kwargs = kwargs def __iter__(self): return iternamedtuples(self.table, *self.sliceargs, **self.kwargs) def __repr__(self): vreprs = list(map(repr, islice(self, 6))) r = '\n'.join(vreprs[:5]) if len(vreprs) > 5: r += '\n...' return r def iternamedtuples(table, *sliceargs, **kwargs): missing = kwargs.get('missing', None) name = kwargs.get('name', 'row') it = iter(table) try: hdr = next(it) except StopIteration: return flds = list(map(text_type, hdr)) nt = namedtuple(name, tuple(flds)) if sliceargs: it = islice(it, *sliceargs) for row in it: yield asnamedtuple(nt, row, missing) def asnamedtuple(nt, row, missing=None): try: return nt(*row) except TypeError: # row may be long or short # expected number of fields ne = len(nt._fields) # actual number of values na = len(row) if ne > na: # pad short rows padded = tuple(row) + (missing,) * (ne-na) return nt(*padded) elif ne < na: # truncate long rows return nt(*row[:ne]) else: raise class Record(tuple): def __new__(cls, row, flds, missing=None): t = super(Record, cls).__new__(cls, row) return t def __init__(self, row, flds, missing=None): self.flds = flds self.missing = missing def __getitem__(self, f): if isinstance(f, int): idx = f elif f in self.flds: idx = self.flds.index(f) else: raise KeyError('item ' + repr(f) + ' not in fields ' + repr(self.flds)) try: return super(Record, self).__getitem__(idx) except IndexError: # handle short rows return self.missing def __getattr__(self, f): if f in self.flds: try: return super(Record, self).__getitem__(self.flds.index(f)) except IndexError: # handle short rows return self.missing else: raise AttributeError('item ' + repr(f) + ' not in fields ' + repr(self.flds)) def get(self, key, default=None): try: return self[key] except KeyError: return default def records(table, *sliceargs, **kwargs): """ Return a container supporting iteration over rows as records, where a record is a hybrid object supporting all possible ways of accessing values. E.g.:: >>> import petl as etl >>> table = [['foo', 'bar'], ['a', 1], ['b', 2]] >>> d = etl.records(table) >>> d ('a', 1) ('b', 2) >>> list(d) [('a', 1), ('b', 2)] >>> [r[0] for r in d] ['a', 'b'] >>> [r['foo'] for r in d] ['a', 'b'] >>> [r.foo for r in d] ['a', 'b'] Short rows are padded with the value of the `missing` keyword argument. """ return RecordsView(table, *sliceargs, **kwargs) Table.records = records class RecordsView(IterContainer): def __init__(self, table, *sliceargs, **kwargs): self.table = table self.sliceargs = sliceargs self.kwargs = kwargs def __iter__(self): return iterrecords(self.table, *self.sliceargs, **self.kwargs) def __repr__(self): vreprs = list(map(repr, islice(self, 6))) r = '\n'.join(vreprs[:5]) if len(vreprs) > 5: r += '\n...' return r def iterrecords(table, *sliceargs, **kwargs): missing = kwargs.get('missing', None) it = iter(table) try: hdr = next(it) except StopIteration: return flds = list(map(text_type, hdr)) if sliceargs: it = islice(it, *sliceargs) for row in it: yield Record(row, flds, missing=missing) def expr(s): """ Construct a function operating on a table record. The expression string is converted into a lambda function by prepending the string with ``'lambda rec: '``, then replacing anything enclosed in curly braces (e.g., ``"{foo}"``) with a lookup on the record (e.g., ``"rec['foo']"``), then finally calling :func:`eval`. So, e.g., the expression string ``"{foo} * {bar}"`` is converted to the function ``lambda rec: rec['foo'] * rec['bar']`` """ prog = re.compile(r'\{([^}]+)\}') def repl(matchobj): return "rec['%s']" % matchobj.group(1) return eval("lambda rec: " + prog.sub(repl, s)) def rowgroupby(table, key, value=None): """Convenient adapter for :func:`itertools.groupby`. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar', 'baz'], ... ['a', 1, True], ... ['b', 3, True], ... ['b', 2]] >>> # group entire rows ... for key, group in etl.rowgroupby(table1, 'foo'): ... print(key, list(group)) ... a [('a', 1, True)] b [('b', 3, True), ('b', 2)] >>> # group specific values ... for key, group in etl.rowgroupby(table1, 'foo', 'bar'): ... print(key, list(group)) ... a [1] b [3, 2] N.B., assumes the input table is already sorted by the given key. """ it = iter(table) try: hdr = next(it) except StopIteration: hdr = [] flds = list(map(text_type, hdr)) # wrap rows as records it = (Record(row, flds) for row in it) # determine key function if callable(key): getkey = key native_key = True else: kindices = asindices(hdr, key) getkey = comparable_itemgetter(*kindices) native_key = False git = groupby(it, key=getkey) if value is None: if native_key: return git else: return ((k.inner, vals) for (k, vals) in git) else: if callable(value): getval = value else: vindices = asindices(hdr, value) getval = operator.itemgetter(*vindices) if native_key: return ((k, (getval(v) for v in vals)) for (k, vals) in git) else: return ((k.inner, (getval(v) for v in vals)) for (k, vals) in git) Table.rowgroupby = rowgroupby def iterpeek(it, n=1): it = iter(it) # make sure it's an iterator if n == 1: peek = next(it) return peek, chain([peek], it) else: peek = list(islice(it, n)) return peek, chain(peek, it) def empty(): """ Return an empty table. Can be useful when building up a table from a set of columns, e.g.:: >>> import petl as etl >>> table = ( ... etl ... .empty() ... .addcolumn('foo', ['A', 'B']) ... .addcolumn('bar', [1, 2]) ... ) >>> table +-----+-----+ | foo | bar | +=====+=====+ | 'A' | 1 | +-----+-----+ | 'B' | 2 | +-----+-----+ """ return EmptyTable() class EmptyTable(Table): def __iter__(self): # empty header row yield tuple() petl-1.7.15/petl/util/counting.py000066400000000000000000000337271457414240700167160ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division from collections import Counter from petl.compat import string_types, maketrans from petl.util.base import values, Table, data, wrap def nrows(table): """ Count the number of data rows in a table. E.g.:: >>> import petl as etl >>> table = [['foo', 'bar'], ['a', 1], ['b', 2]] >>> etl.nrows(table) 2 """ return sum(1 for _ in data(table)) Table.nrows = nrows def valuecount(table, field, value, missing=None): """ Count the number of occurrences of `value` under the given field. Returns the absolute count and relative frequency as a pair. E.g.:: >>> import petl as etl >>> table = [['foo', 'bar'], ... ['a', 1], ... ['b', 2], ... ['b', 7]] >>> etl.valuecount(table, 'foo', 'b') (2, 0.6666666666666666) The `field` argument can be a single field name or index (starting from zero) or a tuple of field names and/or indexes. """ total = 0 vs = 0 for v in values(table, field, missing=missing): total += 1 if v == value: vs += 1 return vs, float(vs)/total Table.valuecount = valuecount def valuecounter(table, *field, **kwargs): """ Find distinct values for the given field and count the number of occurrences. Returns a :class:`dict` mapping values to counts. E.g.:: >>> import petl as etl >>> table = [['foo', 'bar'], ... ['a', True], ... ['b'], ... ['b', True], ... ['c', False]] >>> etl.valuecounter(table, 'foo') Counter({'b': 2, 'a': 1, 'c': 1}) The `field` argument can be a single field name or index (starting from zero) or a tuple of field names and/or indexes. """ missing = kwargs.get('missing', None) counter = Counter() for v in values(table, field, missing=missing): try: counter[v] += 1 except IndexError: pass # short row return counter Table.valuecounter = valuecounter def valuecounts(table, *field, **kwargs): """ Find distinct values for the given field and count the number and relative frequency of occurrences. Returns a table mapping values to counts, with most common values first. E.g.:: >>> import petl as etl >>> table = [['foo', 'bar', 'baz'], ... ['a', True, 0.12], ... ['a', True, 0.17], ... ['b', False, 0.34], ... ['b', False, 0.44], ... ['b']] >>> etl.valuecounts(table, 'foo') +-----+-------+-----------+ | foo | count | frequency | +=====+=======+===========+ | 'b' | 3 | 0.6 | +-----+-------+-----------+ | 'a' | 2 | 0.4 | +-----+-------+-----------+ >>> etl.valuecounts(table, 'foo', 'bar') +-----+-------+-------+-----------+ | foo | bar | count | frequency | +=====+=======+=======+===========+ | 'a' | True | 2 | 0.4 | +-----+-------+-------+-----------+ | 'b' | False | 2 | 0.4 | +-----+-------+-------+-----------+ | 'b' | None | 1 | 0.2 | +-----+-------+-------+-----------+ If rows are short, the value of the keyword argument `missing` is counted. Multiple fields can be given as positional arguments. If multiple fields are given, these are treated as a compound key. """ return ValueCountsView(table, field, **kwargs) Table.valuecounts = valuecounts class ValueCountsView(Table): def __init__(self, table, field, missing=None): self.table = table self.field = field self.missing = missing def __iter__(self): # construct output header if isinstance(self.field, (tuple, list)): outhdr = tuple(self.field) + ('count', 'frequency') else: outhdr = (self.field, 'count', 'frequency') yield outhdr # count values counter = valuecounter(self.table, *self.field, missing=self.missing) counts = counter.most_common() # sort descending total = sum(c[1] for c in counts) if len(self.field) > 1: for c in counts: yield tuple(c[0]) + (c[1], float(c[1])/total) else: for c in counts: yield (c[0], c[1], float(c[1])/total) def parsecounter(table, field, parsers=(('int', int), ('float', float))): """ Count the number of `str` or `unicode` values under the given fields that can be parsed as ints, floats or via custom parser functions. Return a pair of `Counter` objects, the first mapping parser names to the number of strings successfully parsed, the second mapping parser names to the number of errors. E.g.:: >>> import petl as etl >>> table = [['foo', 'bar', 'baz'], ... ['A', 'aaa', 2], ... ['B', u'2', '3.4'], ... [u'B', u'3', u'7.8', True], ... ['D', '3.7', 9.0], ... ['E', 42]] >>> counter, errors = etl.parsecounter(table, 'bar') >>> counter Counter({'float': 3, 'int': 2}) >>> errors Counter({'int': 2, 'float': 1}) The `field` argument can be a field name or index (starting from zero). """ if isinstance(parsers, (list, tuple)): parsers = dict(parsers) counter, errors = Counter(), Counter() # need to initialise for n in parsers.keys(): counter[n] = 0 errors[n] = 0 for v in values(table, field): if isinstance(v, string_types): for name, parser in parsers.items(): try: parser(v) except: errors[name] += 1 else: counter[name] += 1 return counter, errors Table.parsecounter = parsecounter def parsecounts(table, field, parsers=(('int', int), ('float', float))): """ Count the number of `str` or `unicode` values that can be parsed as ints, floats or via custom parser functions. Return a table mapping parser names to the number of values successfully parsed and the number of errors. E.g.:: >>> import petl as etl >>> table = [['foo', 'bar', 'baz'], ... ['A', 'aaa', 2], ... ['B', u'2', '3.4'], ... [u'B', u'3', u'7.8', True], ... ['D', '3.7', 9.0], ... ['E', 42]] >>> etl.parsecounts(table, 'bar') +---------+-------+--------+ | type | count | errors | +=========+=======+========+ | 'float' | 3 | 1 | +---------+-------+--------+ | 'int' | 2 | 2 | +---------+-------+--------+ The `field` argument can be a field name or index (starting from zero). """ return ParseCountsView(table, field, parsers=parsers) Table.parsecounts = parsecounts class ParseCountsView(Table): def __init__(self, table, field, parsers=(('int', int), ('float', float))): self.table = table self.field = field if isinstance(parsers, (list, tuple)): parsers = dict(parsers) self.parsers = parsers def __iter__(self): counter, errors = parsecounter(self.table, self.field, self.parsers) yield ('type', 'count', 'errors') for (item, n) in counter.most_common(): yield (item, n, errors[item]) def typecounter(table, field): """ Count the number of values found for each Python type. >>> import petl as etl >>> table = [['foo', 'bar', 'baz'], ... ['A', 1, 2], ... ['B', u'2', '3.4'], ... [u'B', u'3', u'7.8', True], ... ['D', u'xyz', 9.0], ... ['E', 42]] >>> etl.typecounter(table, 'foo') Counter({'str': 5}) >>> etl.typecounter(table, 'bar') Counter({'str': 3, 'int': 2}) >>> etl.typecounter(table, 'baz') Counter({'str': 2, 'int': 1, 'float': 1, 'NoneType': 1}) The `field` argument can be a field name or index (starting from zero). """ counter = Counter() for v in values(table, field): try: counter[v.__class__.__name__] += 1 except IndexError: pass # ignore short rows return counter Table.typecounter = typecounter def typecounts(table, field): """ Count the number of values found for each Python type and return a table mapping class names to counts and frequencies. E.g.:: >>> import petl as etl >>> table = [['foo', 'bar', 'baz'], ... [b'A', 1, 2], ... [b'B', '2', b'3.4'], ... ['B', '3', '7.8', True], ... ['D', u'xyz', 9.0], ... ['E', 42]] >>> etl.typecounts(table, 'foo') +---------+-------+-----------+ | type | count | frequency | +=========+=======+===========+ | 'str' | 3 | 0.6 | +---------+-------+-----------+ | 'bytes' | 2 | 0.4 | +---------+-------+-----------+ >>> etl.typecounts(table, 'bar') +-------+-------+-----------+ | type | count | frequency | +=======+=======+===========+ | 'str' | 3 | 0.6 | +-------+-------+-----------+ | 'int' | 2 | 0.4 | +-------+-------+-----------+ >>> etl.typecounts(table, 'baz') +------------+-------+-----------+ | type | count | frequency | +============+=======+===========+ | 'int' | 1 | 0.2 | +------------+-------+-----------+ | 'bytes' | 1 | 0.2 | +------------+-------+-----------+ | 'str' | 1 | 0.2 | +------------+-------+-----------+ | 'float' | 1 | 0.2 | +------------+-------+-----------+ | 'NoneType' | 1 | 0.2 | +------------+-------+-----------+ The `field` argument can be a field name or index (starting from zero). """ return TypeCountsView(table, field) Table.typecounts = typecounts class TypeCountsView(Table): def __init__(self, table, field): self.table = table self.field = field def __iter__(self): counter = typecounter(self.table, self.field) yield ('type', 'count', 'frequency') counts = counter.most_common() total = sum(c[1] for c in counts) for c in counts: yield (c[0], c[1], float(c[1])/total) def stringpatterncounter(table, field): """ Profile string patterns in the given field, returning a :class:`dict` mapping patterns to counts. """ trans = maketrans( 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789', 'AAAAAAAAAAAAAAAAAAAAAAAAAAaaaaaaaaaaaaaaaaaaaaaaaaaa9999999999' ) counter = Counter() for v in values(table, field): p = str(v).translate(trans) counter[p] += 1 return counter Table.stringpatterncounter = stringpatterncounter def stringpatterns(table, field): """ Profile string patterns in the given field, returning a table of patterns, counts and frequencies. E.g.:: >>> import petl as etl >>> table = [['foo', 'bar'], ... ['Mr. Foo', '123-1254'], ... ['Mrs. Bar', '234-1123'], ... ['Mr. Spo', '123-1254'], ... [u'Mr. Baz', u'321 1434'], ... [u'Mrs. Baz', u'321 1434'], ... ['Mr. Quux', '123-1254-XX']] >>> etl.stringpatterns(table, 'foo') +------------+-------+---------------------+ | pattern | count | frequency | +============+=======+=====================+ | 'Aa. Aaa' | 3 | 0.5 | +------------+-------+---------------------+ | 'Aaa. Aaa' | 2 | 0.3333333333333333 | +------------+-------+---------------------+ | 'Aa. Aaaa' | 1 | 0.16666666666666666 | +------------+-------+---------------------+ >>> etl.stringpatterns(table, 'bar') +---------------+-------+---------------------+ | pattern | count | frequency | +===============+=======+=====================+ | '999-9999' | 3 | 0.5 | +---------------+-------+---------------------+ | '999 9999' | 2 | 0.3333333333333333 | +---------------+-------+---------------------+ | '999-9999-AA' | 1 | 0.16666666666666666 | +---------------+-------+---------------------+ """ counter = stringpatterncounter(table, field) output = [('pattern', 'count', 'frequency')] counter = counter.most_common() total = sum(c[1] for c in counter) cnts = [(c[0], c[1], float(c[1])/total) for c in counter] output.extend(cnts) return wrap(output) Table.stringpatterns = stringpatterns def rowlengths(table): """ Report on row lengths found in the table. E.g.:: >>> import petl as etl >>> table = [['foo', 'bar', 'baz'], ... ['A', 1, 2], ... ['B', '2', '3.4'], ... [u'B', u'3', u'7.8', True], ... ['D', 'xyz', 9.0], ... ['E', None], ... ['F', 9]] >>> etl.rowlengths(table) +--------+-------+ | length | count | +========+=======+ | 3 | 3 | +--------+-------+ | 2 | 2 | +--------+-------+ | 4 | 1 | +--------+-------+ Useful for finding potential problems in data files. """ counter = Counter() for row in data(table): counter[len(row)] += 1 output = [('length', 'count')] output.extend(counter.most_common()) return wrap(output) Table.rowlengths = rowlengths petl-1.7.15/petl/util/lookups.py000066400000000000000000000263601457414240700165570ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division import operator from petl.compat import text_type from petl.errors import DuplicateKeyError from petl.util.base import Table, asindices, asdict, Record, rowgetter def _setup_lookup(table, key, value): # obtain iterator and header row it = iter(table) try: hdr = next(it) except StopIteration: hdr = [] # prepare key getter keyindices = asindices(hdr, key) assert len(keyindices) > 0, 'no key selected' getkey = operator.itemgetter(*keyindices) # prepare value getter if value is None: # default value is complete row getvalue = rowgetter(*range(len(hdr))) else: valueindices = asindices(hdr, value) assert len(valueindices) > 0, 'no value selected' getvalue = operator.itemgetter(*valueindices) return it, getkey, getvalue def lookup(table, key, value=None, dictionary=None): """ Load a dictionary with data from the given table. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar'], ... ['a', 1], ... ['b', 2], ... ['b', 3]] >>> lkp = etl.lookup(table1, 'foo', 'bar') >>> lkp['a'] [1] >>> lkp['b'] [2, 3] >>> # if no value argument is given, defaults to the whole ... # row (as a tuple) ... lkp = etl.lookup(table1, 'foo') >>> lkp['a'] [('a', 1)] >>> lkp['b'] [('b', 2), ('b', 3)] >>> # compound keys are supported ... table2 = [['foo', 'bar', 'baz'], ... ['a', 1, True], ... ['b', 2, False], ... ['b', 3, True], ... ['b', 3, False]] >>> lkp = etl.lookup(table2, ('foo', 'bar'), 'baz') >>> lkp[('a', 1)] [True] >>> lkp[('b', 2)] [False] >>> lkp[('b', 3)] [True, False] >>> # data can be loaded into an existing dictionary-like ... # object, including persistent dictionaries created via the ... # shelve module ... import shelve >>> lkp = shelve.open('example.dat', flag='n') >>> lkp = etl.lookup(table1, 'foo', 'bar', lkp) >>> lkp.close() >>> lkp = shelve.open('example.dat', flag='r') >>> lkp['a'] [1] >>> lkp['b'] [2, 3] """ if dictionary is None: dictionary = dict() # setup it, getkey, getvalue = _setup_lookup(table, key, value) # build lookup dictionary for row in it: k = getkey(row) v = getvalue(row) if k in dictionary: # work properly with shelve l = dictionary[k] l.append(v) dictionary[k] = l else: dictionary[k] = [v] return dictionary Table.lookup = lookup def lookupone(table, key, value=None, dictionary=None, strict=False): """ Load a dictionary with data from the given table, assuming there is at most one value for each key. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar'], ... ['a', 1], ... ['b', 2], ... ['b', 3]] >>> # if the specified key is not unique and strict=False (default), ... # the first value wins ... lkp = etl.lookupone(table1, 'foo', 'bar') >>> lkp['a'] 1 >>> lkp['b'] 2 >>> # if the specified key is not unique and strict=True, will raise ... # DuplicateKeyError ... try: ... lkp = etl.lookupone(table1, 'foo', strict=True) ... except etl.errors.DuplicateKeyError as e: ... print(e) ... duplicate key: 'b' >>> # compound keys are supported ... table2 = [['foo', 'bar', 'baz'], ... ['a', 1, True], ... ['b', 2, False], ... ['b', 3, True], ... ['b', 3, False]] >>> lkp = etl.lookupone(table2, ('foo', 'bar'), 'baz') >>> lkp[('a', 1)] True >>> lkp[('b', 2)] False >>> lkp[('b', 3)] True >>> # data can be loaded into an existing dictionary-like ... # object, including persistent dictionaries created via the ... # shelve module ... import shelve >>> lkp = shelve.open('example.dat', flag='n') >>> lkp = etl.lookupone(table1, 'foo', 'bar', lkp) >>> lkp.close() >>> lkp = shelve.open('example.dat', flag='r') >>> lkp['a'] 1 >>> lkp['b'] 2 """ if dictionary is None: dictionary = dict() # setup it, getkey, getvalue = _setup_lookup(table, key, value) # build lookup dictionary for row in it: k = getkey(row) if strict and k in dictionary: raise DuplicateKeyError(k) elif k not in dictionary: v = getvalue(row) dictionary[k] = v return dictionary Table.lookupone = lookupone def dictlookup(table, key, dictionary=None): """ Load a dictionary with data from the given table, mapping to dicts. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar'], ... ['a', 1], ... ['b', 2], ... ['b', 3]] >>> lkp = etl.dictlookup(table1, 'foo') >>> lkp['a'] [{'foo': 'a', 'bar': 1}] >>> lkp['b'] [{'foo': 'b', 'bar': 2}, {'foo': 'b', 'bar': 3}] >>> # compound keys are supported ... table2 = [['foo', 'bar', 'baz'], ... ['a', 1, True], ... ['b', 2, False], ... ['b', 3, True], ... ['b', 3, False]] >>> lkp = etl.dictlookup(table2, ('foo', 'bar')) >>> lkp[('a', 1)] [{'foo': 'a', 'bar': 1, 'baz': True}] >>> lkp[('b', 2)] [{'foo': 'b', 'bar': 2, 'baz': False}] >>> lkp[('b', 3)] [{'foo': 'b', 'bar': 3, 'baz': True}, {'foo': 'b', 'bar': 3, 'baz': False}] >>> # data can be loaded into an existing dictionary-like ... # object, including persistent dictionaries created via the ... # shelve module ... import shelve >>> lkp = shelve.open('example.dat', flag='n') >>> lkp = etl.dictlookup(table1, 'foo', lkp) >>> lkp.close() >>> lkp = shelve.open('example.dat', flag='r') >>> lkp['a'] [{'foo': 'a', 'bar': 1}] >>> lkp['b'] [{'foo': 'b', 'bar': 2}, {'foo': 'b', 'bar': 3}] """ if dictionary is None: dictionary = dict() it = iter(table) try: hdr = next(it) except StopIteration: hdr = [] flds = list(map(text_type, hdr)) keyindices = asindices(hdr, key) assert len(keyindices) > 0, 'no key selected' getkey = operator.itemgetter(*keyindices) for row in it: k = getkey(row) rec = asdict(flds, row) if k in dictionary: # work properly with shelve l = dictionary[k] l.append(rec) dictionary[k] = l else: dictionary[k] = [rec] return dictionary Table.dictlookup = dictlookup def dictlookupone(table, key, dictionary=None, strict=False): """ Load a dictionary with data from the given table, mapping to dicts, assuming there is at most one row for each key. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar'], ... ['a', 1], ... ['b', 2], ... ['b', 3]] >>> # if the specified key is not unique and strict=False (default), ... # the first value wins ... lkp = etl.dictlookupone(table1, 'foo') >>> lkp['a'] {'foo': 'a', 'bar': 1} >>> lkp['b'] {'foo': 'b', 'bar': 2} >>> # if the specified key is not unique and strict=True, will raise ... # DuplicateKeyError ... try: ... lkp = etl.dictlookupone(table1, 'foo', strict=True) ... except etl.errors.DuplicateKeyError as e: ... print(e) ... duplicate key: 'b' >>> # compound keys are supported ... table2 = [['foo', 'bar', 'baz'], ... ['a', 1, True], ... ['b', 2, False], ... ['b', 3, True], ... ['b', 3, False]] >>> lkp = etl.dictlookupone(table2, ('foo', 'bar')) >>> lkp[('a', 1)] {'foo': 'a', 'bar': 1, 'baz': True} >>> lkp[('b', 2)] {'foo': 'b', 'bar': 2, 'baz': False} >>> lkp[('b', 3)] {'foo': 'b', 'bar': 3, 'baz': True} >>> # data can be loaded into an existing dictionary-like ... # object, including persistent dictionaries created via the ... # shelve module ... import shelve >>> lkp = shelve.open('example.dat', flag='n') >>> lkp = etl.dictlookupone(table1, 'foo', lkp) >>> lkp.close() >>> lkp = shelve.open('example.dat', flag='r') >>> lkp['a'] {'foo': 'a', 'bar': 1} >>> lkp['b'] {'foo': 'b', 'bar': 2} """ if dictionary is None: dictionary = dict() it = iter(table) try: hdr = next(it) except StopIteration: hdr = [] flds = list(map(text_type, hdr)) keyindices = asindices(hdr, key) assert len(keyindices) > 0, 'no key selected' getkey = operator.itemgetter(*keyindices) for row in it: k = getkey(row) if strict and k in dictionary: raise DuplicateKeyError(k) elif k not in dictionary: d = asdict(flds, row) dictionary[k] = d return dictionary Table.dictlookupone = dictlookupone def recordlookup(table, key, dictionary=None): """ Load a dictionary with data from the given table, mapping to record objects. """ if dictionary is None: dictionary = dict() it = iter(table) try: hdr = next(it) except StopIteration: hdr = [] flds = list(map(text_type, hdr)) keyindices = asindices(hdr, key) assert len(keyindices) > 0, 'no key selected' getkey = operator.itemgetter(*keyindices) for row in it: k = getkey(row) rec = Record(row, flds) if k in dictionary: # work properly with shelve l = dictionary[k] l.append(rec) dictionary[k] = l else: dictionary[k] = [rec] return dictionary Table.recordlookup = recordlookup def recordlookupone(table, key, dictionary=None, strict=False): """ Load a dictionary with data from the given table, mapping to record objects, assuming there is at most one row for each key. """ if dictionary is None: dictionary = dict() it = iter(table) try: hdr = next(it) except StopIteration: hdr = [] flds = list(map(text_type, hdr)) keyindices = asindices(hdr, key) assert len(keyindices) > 0, 'no key selected' getkey = operator.itemgetter(*keyindices) for row in it: k = getkey(row) if strict and k in dictionary: raise DuplicateKeyError(k) elif k not in dictionary: d = Record(row, flds) dictionary[k] = d return dictionary Table.recordlookupone = recordlookupone petl-1.7.15/petl/util/materialise.py000066400000000000000000000075761457414240700173720ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division import operator from collections import OrderedDict from itertools import islice from petl.compat import izip_longest, text_type, next from petl.util.base import asindices, Table def listoflists(tbl): return [list(row) for row in tbl] Table.listoflists = listoflists Table.lol = listoflists def tupleoftuples(tbl): return tuple(tuple(row) for row in tbl) Table.tupleoftuples = tupleoftuples Table.tot = tupleoftuples def listoftuples(tbl): return [tuple(row) for row in tbl] Table.listoftuples = listoftuples Table.lot = listoftuples def tupleoflists(tbl): return tuple(list(row) for row in tbl) Table.tupleoflists = tupleoflists Table.tol = tupleoflists def columns(table, missing=None): """ Construct a :class:`dict` mapping field names to lists of values. E.g.:: >>> import petl as etl >>> table = [['foo', 'bar'], ['a', 1], ['b', 2], ['b', 3]] >>> cols = etl.columns(table) >>> cols['foo'] ['a', 'b', 'b'] >>> cols['bar'] [1, 2, 3] See also :func:`petl.util.materialise.facetcolumns`. """ cols = OrderedDict() it = iter(table) try: hdr = next(it) except StopIteration: hdr = [] flds = list(map(text_type, hdr)) for f in flds: cols[f] = list() for row in it: for f, v in izip_longest(flds, row, fillvalue=missing): if f in cols: cols[f].append(v) return cols Table.columns = columns def facetcolumns(table, key, missing=None): """ Like :func:`petl.util.materialise.columns` but stratified by values of the given key field. E.g.:: >>> import petl as etl >>> table = [['foo', 'bar', 'baz'], ... ['a', 1, True], ... ['b', 2, True], ... ['b', 3]] >>> fc = etl.facetcolumns(table, 'foo') >>> fc['a'] {'foo': ['a'], 'bar': [1], 'baz': [True]} >>> fc['b'] {'foo': ['b', 'b'], 'bar': [2, 3], 'baz': [True, None]} """ fct = dict() it = iter(table) try: hdr = next(it) except StopIteration: hdr = [] flds = list(map(text_type, hdr)) indices = asindices(hdr, key) assert len(indices) > 0, 'no key field selected' getkey = operator.itemgetter(*indices) for row in it: kv = getkey(row) if kv not in fct: cols = dict() for f in flds: cols[f] = list() fct[kv] = cols else: cols = fct[kv] for f, v in izip_longest(flds, row, fillvalue=missing): if f in cols: cols[f].append(v) return fct Table.facetcolumns = facetcolumns def cache(table, n=None): """ Wrap the table with a cache that caches up to `n` rows as they are initially requested via iteration (cache all rows be default). """ return CacheView(table, n=n) Table.cache = cache class CacheView(Table): def __init__(self, inner, n=None): self.inner = inner self.n = n self.cache = list() self.cachecomplete = False def clearcache(self): self.cache = list() self.cachecomplete = False def __iter__(self): # serve whatever is in the cache first for row in self.cache: yield row if not self.cachecomplete: # serve the remainder from the inner iterator it = iter(self.inner) for row in islice(it, len(self.cache), None): # maybe there's more room in the cache? if not self.n or len(self.cache) < self.n: self.cache.append(row) yield row # does the cache contain a complete copy of the inner table? if not self.n or len(self.cache) < self.n: self.cachecomplete = True petl-1.7.15/petl/util/misc.py000066400000000000000000000064001457414240700160070ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division from petl.util.base import values, header, Table def typeset(table, field): """ Return a set containing all Python types found for values in the given field. E.g.:: >>> import petl as etl >>> table = [['foo', 'bar', 'baz'], ... ['A', 1, '2'], ... ['B', u'2', '3.4'], ... [u'B', u'3', '7.8', True], ... ['D', u'xyz', 9.0], ... ['E', 42]] >>> sorted(etl.typeset(table, 'foo')) ['str'] >>> sorted(etl.typeset(table, 'bar')) ['int', 'str'] >>> sorted(etl.typeset(table, 'baz')) ['NoneType', 'float', 'str'] The `field` argument can be a field name or index (starting from zero). """ s = set() for v in values(table, field): try: s.add(type(v).__name__) except IndexError: pass # ignore short rows return s Table.typeset = typeset def diffheaders(t1, t2): """ Return the difference between the headers of the two tables as a pair of sets. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar', 'baz'], ... ['a', 1, .3]] >>> table2 = [['baz', 'bar', 'quux'], ... ['a', 1, .3]] >>> add, sub = etl.diffheaders(table1, table2) >>> add {'quux'} >>> sub {'foo'} """ t1h = set(header(t1)) t2h = set(header(t2)) return t2h - t1h, t1h - t2h Table.diffheaders = diffheaders def diffvalues(t1, t2, f): """ Return the difference between the values under the given field in the two tables, e.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar'], ... ['a', 1], ... ['b', 3]] >>> table2 = [['bar', 'foo'], ... [1, 'a'], ... [3, 'c']] >>> add, sub = etl.diffvalues(table1, table2, 'foo') >>> add {'c'} >>> sub {'b'} """ t1v = set(values(t1, f)) t2v = set(values(t2, f)) return t2v - t1v, t1v - t2v Table.diffvalues = diffvalues def strjoin(s): """ Return a function to join sequences using `s` as the separator. Intended for use with :func:`petl.transform.conversions.convert`. """ return lambda l: s.join(map(str, l)) def nthword(n, sep=None): """ Construct a function to return the nth word in a string. E.g.:: >>> import petl as etl >>> s = 'foo bar' >>> f = etl.nthword(0) >>> f(s) 'foo' >>> g = etl.nthword(1) >>> g(s) 'bar' Intended for use with :func:`petl.transform.conversions.convert`. """ return lambda s: s.split(sep)[n] def coalesce(*fields, **kwargs): """ Return a function which accepts a row and returns the first non-missing value from the specified fields. Intended for use with :func:`petl.transform.basics.addfield`. """ missing = kwargs.get('missing', None) default = kwargs.get('default', None) def _coalesce(row): for f in fields: v = row[f] if v is not missing: return v return default return _coalesce petl-1.7.15/petl/util/parsers.py000066400000000000000000000126051457414240700165370ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division import datetime from petl.compat import long def datetimeparser(fmt, strict=True): """Return a function to parse strings as :class:`datetime.datetime` objects using a given format. E.g.:: >>> from petl import datetimeparser >>> isodatetime = datetimeparser('%Y-%m-%dT%H:%M:%S') >>> isodatetime('2002-12-25T00:00:00') datetime.datetime(2002, 12, 25, 0, 0) >>> try: ... isodatetime('2002-12-25T00:00:99') ... except ValueError as e: ... print(e) ... unconverted data remains: 9 If ``strict=False`` then if an error occurs when parsing, the original value will be returned as-is, and no error will be raised. """ def parser(value): try: return datetime.datetime.strptime(value.strip(), fmt) except Exception as e: if strict: raise e else: return value return parser def dateparser(fmt, strict=True): """Return a function to parse strings as :class:`datetime.date` objects using a given format. E.g.:: >>> from petl import dateparser >>> isodate = dateparser('%Y-%m-%d') >>> isodate('2002-12-25') datetime.date(2002, 12, 25) >>> try: ... isodate('2002-02-30') ... except ValueError as e: ... print(e) ... day is out of range for month If ``strict=False`` then if an error occurs when parsing, the original value will be returned as-is, and no error will be raised. """ def parser(value): try: return datetime.datetime.strptime(value.strip(), fmt).date() except Exception as e: if strict: raise e else: return value return parser def timeparser(fmt, strict=True): """Return a function to parse strings as :class:`datetime.time` objects using a given format. E.g.:: >>> from petl import timeparser >>> isotime = timeparser('%H:%M:%S') >>> isotime('00:00:00') datetime.time(0, 0) >>> isotime('13:00:00') datetime.time(13, 0) >>> try: ... isotime('12:00:99') ... except ValueError as e: ... print(e) ... unconverted data remains: 9 >>> try: ... isotime('25:00:00') ... except ValueError as e: ... print(e) ... time data '25:00:00' does not match format '%H:%M:%S' If ``strict=False`` then if an error occurs when parsing, the original value will be returned as-is, and no error will be raised. """ def parser(value): try: return datetime.datetime.strptime(value.strip(), fmt).time() except Exception as e: if strict: raise e else: return value return parser def boolparser(true_strings=('true', 't', 'yes', 'y', '1'), false_strings=('false', 'f', 'no', 'n', '0'), case_sensitive=False, strict=True): """Return a function to parse strings as :class:`bool` objects using a given set of string representations for `True` and `False`. E.g.:: >>> from petl import boolparser >>> mybool = boolparser(true_strings=['yes', 'y'], false_strings=['no', 'n']) >>> mybool('y') True >>> mybool('yes') True >>> mybool('Y') True >>> mybool('No') False >>> try: ... mybool('foo') ... except ValueError as e: ... print(e) ... value is not one of recognised boolean strings: 'foo' >>> try: ... mybool('True') ... except ValueError as e: ... print(e) ... value is not one of recognised boolean strings: 'true' If ``strict=False`` then if an error occurs when parsing, the original value will be returned as-is, and no error will be raised. """ if not case_sensitive: true_strings = [s.lower() for s in true_strings] false_strings = [s.lower() for s in false_strings] def parser(value): value = value.strip() if not case_sensitive: value = value.lower() if value in true_strings: return True elif value in false_strings: return False elif strict: raise ValueError('value is not one of recognised boolean strings: ' '%r' % value) else: return value return parser def numparser(strict=False): """Return a function that will attempt to parse the value as a number, trying :func:`int`, :func:`long`, :func:`float` and :func:`complex` in that order. If all fail, return the value as-is, unless ``strict=True``, in which case raise the underlying exception. """ def f(v): try: return int(v) except (ValueError, TypeError): pass try: return long(v) except (ValueError, TypeError): pass try: return float(v) except (ValueError, TypeError): pass try: return complex(v) except (ValueError, TypeError) as e: if strict: raise e return v return f petl-1.7.15/petl/util/random.py000066400000000000000000000157201457414240700163410ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division import hashlib import random as pyrandom import time from collections import OrderedDict from functools import partial from petl.compat import xrange, text_type from petl.util.base import Table def randomseed(): """ Obtain the hex digest of a sha256 hash of the current epoch time in nanoseconds. """ time_ns = str(time.time()).encode() hash_time = hashlib.sha256(time_ns).hexdigest() return hash_time def randomtable(numflds=5, numrows=100, wait=0, seed=None): """ Construct a table with random numerical data. Use `numflds` and `numrows` to specify the number of fields and rows respectively. Set `wait` to a float greater than zero to simulate a delay on each row generation (number of seconds per row). E.g.:: >>> import petl as etl >>> table = etl.randomtable(3, 100, seed=42) >>> table +----------------------+----------------------+---------------------+ | f0 | f1 | f2 | +======================+======================+=====================+ | 0.6394267984578837 | 0.025010755222666936 | 0.27502931836911926 | +----------------------+----------------------+---------------------+ | 0.22321073814882275 | 0.7364712141640124 | 0.6766994874229113 | +----------------------+----------------------+---------------------+ | 0.8921795677048454 | 0.08693883262941615 | 0.4219218196852704 | +----------------------+----------------------+---------------------+ | 0.029797219438070344 | 0.21863797480360336 | 0.5053552881033624 | +----------------------+----------------------+---------------------+ | 0.026535969683863625 | 0.1988376506866485 | 0.6498844377795232 | +----------------------+----------------------+---------------------+ ... Note that the data are generated on the fly and are not stored in memory, so this function can be used to simulate very large tables. The only supported seed types are: None, int, float, str, bytes, and bytearray. """ return RandomTable(numflds, numrows, wait=wait, seed=seed) class RandomTable(Table): def __init__(self, numflds=5, numrows=100, wait=0, seed=None): self.numflds = numflds self.numrows = numrows self.wait = wait if seed is None: self.seed = randomseed() else: self.seed = seed def __iter__(self): nf = self.numflds nr = self.numrows seed = self.seed # N.B., we want this to be stable, i.e., same data each time pyrandom.seed(seed) # construct fields flds = ["f%s" % n for n in range(nf)] yield tuple(flds) # construct data rows for _ in xrange(nr): # artificial delay if self.wait: time.sleep(self.wait) yield tuple(pyrandom.random() for n in range(nf)) def reseed(self): self.seed = randomseed() def dummytable( numrows=100, fields=( ('foo', partial(pyrandom.randint, 0, 100)), ('bar', partial(pyrandom.choice, ('apples', 'pears', 'bananas', 'oranges'))), ('baz', pyrandom.random), ), wait=0, seed=None, ): """ Construct a table with dummy data. Use `numrows` to specify the number of rows. Set `wait` to a float greater than zero to simulate a delay on each row generation (number of seconds per row). E.g.:: >>> import petl as etl >>> table1 = etl.dummytable(100, seed=42) >>> table1 +-----+----------+----------------------+ | foo | bar | baz | +=====+==========+======================+ | 81 | 'apples' | 0.025010755222666936 | +-----+----------+----------------------+ | 35 | 'pears' | 0.22321073814882275 | +-----+----------+----------------------+ | 94 | 'apples' | 0.6766994874229113 | +-----+----------+----------------------+ | 69 | 'apples' | 0.5904925124490397 | +-----+----------+----------------------+ | 4 | 'apples' | 0.09369523986159245 | +-----+----------+----------------------+ ... >>> import random as pyrandom >>> from functools import partial >>> fields = [('foo', pyrandom.random), ... ('bar', partial(pyrandom.randint, 0, 500)), ... ('baz', partial(pyrandom.choice, ['chocolate', 'strawberry', 'vanilla']))] >>> table2 = etl.dummytable(100, fields=fields, seed=42) >>> table2 +---------------------+-----+-------------+ | foo | bar | baz | +=====================+=====+=============+ | 0.6394267984578837 | 12 | 'vanilla' | +---------------------+-----+-------------+ | 0.27502931836911926 | 114 | 'chocolate' | +---------------------+-----+-------------+ | 0.7364712141640124 | 346 | 'vanilla' | +---------------------+-----+-------------+ | 0.8921795677048454 | 44 | 'vanilla' | +---------------------+-----+-------------+ | 0.4219218196852704 | 15 | 'chocolate' | +---------------------+-----+-------------+ ... >>> table3_1 = etl.dummytable(50) >>> table3_2 = etl.dummytable(100) >>> table3_1[5] == table3_2[5] False Data generation functions can be specified via the `fields` keyword argument. Note that the data are generated on the fly and are not stored in memory, so this function can be used to simulate very large tables. The only supported seed types are: None, int, float, str, bytes, and bytearray. """ return DummyTable(numrows=numrows, fields=fields, wait=wait, seed=seed) class DummyTable(Table): def __init__(self, numrows=100, fields=None, wait=0, seed=None): self.numrows = numrows self.wait = wait if fields is None: self.fields = OrderedDict() else: self.fields = OrderedDict(fields) if seed is None: self.seed = randomseed() else: self.seed = seed def __setitem__(self, item, value): self.fields[text_type(item)] = value def __iter__(self): nr = self.numrows seed = self.seed fields = self.fields.copy() # N.B., we want this to be stable, i.e., same data each time pyrandom.seed(seed) # construct header row hdr = tuple(text_type(f) for f in fields.keys()) yield hdr # construct data rows for _ in xrange(nr): # artificial delay if self.wait: time.sleep(self.wait) yield tuple(fields[f]() for f in fields) def reseed(self): self.seed = randomseed() petl-1.7.15/petl/util/statistics.py000066400000000000000000000047231457414240700172540ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division from collections import namedtuple from petl.util.base import values, Table def limits(table, field): """ Find minimum and maximum values under the given field. E.g.:: >>> import petl as etl >>> table = [['foo', 'bar'], ['a', 1], ['b', 2], ['b', 3]] >>> minv, maxv = etl.limits(table, 'bar') >>> minv 1 >>> maxv 3 The `field` argument can be a field name or index (starting from zero). """ vals = iter(values(table, field)) try: minv = maxv = next(vals) except StopIteration: return None, None else: for v in vals: if v < minv: minv = v if v > maxv: maxv = v return minv, maxv Table.limits = limits _stats = namedtuple('stats', ('count', 'errors', 'sum', 'min', 'max', 'mean', 'pvariance', 'pstdev')) def stats(table, field): """ Calculate basic descriptive statistics on a given field. E.g.:: >>> import petl as etl >>> table = [['foo', 'bar', 'baz'], ... ['A', 1, 2], ... ['B', '2', '3.4'], ... [u'B', u'3', u'7.8', True], ... ['D', 'xyz', 9.0], ... ['E', None]] >>> etl.stats(table, 'bar') stats(count=3, errors=2, sum=6.0, min=1.0, max=3.0, mean=2.0, pvariance=0.6666666666666666, pstdev=0.816496580927726) The `field` argument can be a field name or index (starting from zero). """ _min = None _max = None _sum = 0 _mean = 0 _var = 0 _count = 0 _errors = 0 for v in values(table, field): try: v = float(v) except (ValueError, TypeError): _errors += 1 else: _count += 1 if _min is None or v < _min: _min = v if _max is None or v > _max: _max = v _sum += v _mean, _var = onlinestats(v, _count, mean=_mean, variance=_var) _std = _var**.5 return _stats(_count, _errors, _sum, _min, _max, _mean, _var, _std) Table.stats = stats def onlinestats(xi, n, mean=0, variance=0): # function to calculate online mean and variance meanprv = mean varianceprv = variance mean = (((n - 1)*meanprv) + xi)/n variance = (((n - 1)*varianceprv) + ((xi - meanprv)*(xi - mean)))/n return mean, variance petl-1.7.15/petl/util/timing.py000066400000000000000000000227141457414240700163510ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division import abc import logging import sys import time from petl.compat import PY3 from petl.util.base import Table from petl.util.statistics import onlinestats def progress(table, batchsize=1000, prefix="", out=None): """ Report progress on rows passing through to a file or file-like object (defaults to sys.stderr). E.g.:: >>> import petl as etl >>> table = etl.dummytable(100000) >>> table.progress(10000).tocsv('example.csv') # doctest: +SKIP 10000 rows in 0.13s (78363 row/s); batch in 0.13s (78363 row/s) 20000 rows in 0.22s (91679 row/s); batch in 0.09s (110448 row/s) 30000 rows in 0.31s (96573 row/s); batch in 0.09s (108114 row/s) 40000 rows in 0.40s (99535 row/s); batch in 0.09s (109625 row/s) 50000 rows in 0.49s (101396 row/s); batch in 0.09s (109591 row/s) 60000 rows in 0.59s (102245 row/s); batch in 0.09s (106709 row/s) 70000 rows in 0.68s (103221 row/s); batch in 0.09s (109498 row/s) 80000 rows in 0.77s (103810 row/s); batch in 0.09s (108126 row/s) 90000 rows in 0.90s (99465 row/s); batch in 0.13s (74516 row/s) 100000 rows in 1.02s (98409 row/s); batch in 0.11s (89821 row/s) 100000 rows in 1.02s (98402 row/s); batches in 0.10 +/- 0.02s [0.09-0.13] (100481 +/- 13340 rows/s [74516-110448]) See also :func:`petl.util.timing.clock`. """ return ProgressView(table, batchsize, prefix, out) def log_progress(table, batchsize=1000, prefix="", logger=None, level=logging.INFO): """ Report progress on rows passing through to a python logger. If logger is none, a new logger will be created that, by default, streams to stdout. E.g.:: >>> import petl as etl >>> table = etl.dummytable(100000) >>> table.log_progress(10000).tocsv('example.csv') # doctest: +SKIP 10000 rows in 0.13s (78363 row/s); batch in 0.13s (78363 row/s) 20000 rows in 0.22s (91679 row/s); batch in 0.09s (110448 row/s) 30000 rows in 0.31s (96573 row/s); batch in 0.09s (108114 row/s) 40000 rows in 0.40s (99535 row/s); batch in 0.09s (109625 row/s) 50000 rows in 0.49s (101396 row/s); batch in 0.09s (109591 row/s) 60000 rows in 0.59s (102245 row/s); batch in 0.09s (106709 row/s) 70000 rows in 0.68s (103221 row/s); batch in 0.09s (109498 row/s) 80000 rows in 0.77s (103810 row/s); batch in 0.09s (108126 row/s) 90000 rows in 0.90s (99465 row/s); batch in 0.13s (74516 row/s) 100000 rows in 1.02s (98409 row/s); batch in 0.11s (89821 row/s) 100000 rows in 1.02s (98402 row/s); batches in 0.10 +/- 0.02s [0.09-0.13] (100481 +/- 13340 rows/s [74516-110448]) See also :func:`petl.util.timing.clock`. """ return LoggingProgressView(table, batchsize, prefix, logger, level=level) Table.progress = progress Table.log_progress = log_progress class ProgressViewBase(Table): """ Abstract base class for reporting on proecessing status """ def __init__(self, inner, batchsize, prefix): self.inner = inner self.batchsize = batchsize self.prefix = prefix @abc.abstractmethod def print_message(self, message): pass def __iter__(self): start = time.time() batchstart = start batchn = 0 batchtimemin, batchtimemax = None, None batchtimemean, batchtimevar = 0, 0 batchratemean, batchratevar = 0, 0 for n, r in enumerate(self.inner): if n % self.batchsize == 0 and n > 0: batchn += 1 batchend = time.time() batchtime = batchend - batchstart if batchtimemin is None or batchtime < batchtimemin: batchtimemin = batchtime if batchtimemax is None or batchtime > batchtimemax: batchtimemax = batchtime elapsedtime = batchend - start try: rate = int(n / elapsedtime) except ZeroDivisionError: rate = 0 try: batchrate = int(self.batchsize / batchtime) except ZeroDivisionError: batchrate = 0 v = (n, elapsedtime, rate, batchtime, batchrate) message = self.prefix + \ '%s rows in %.2fs (%s row/s); ' \ 'batch in %.2fs (%s row/s)' % v self.print_message(message) batchstart = batchend batchtimemean, batchtimevar = \ onlinestats(batchtime, batchn, mean=batchtimemean, variance=batchtimevar) batchratemean, batchratevar = \ onlinestats(batchrate, batchn, mean=batchratemean, variance=batchratevar) yield r # compute total elapsed time and rate end = time.time() elapsedtime = end - start try: rate = int(n / elapsedtime) except ZeroDivisionError: rate = 0 # construct the final message if batchn > 1: if batchtimemin is None: batchtimemin = 0 if batchtimemax is None: batchtimemax = 0 try: batchratemin = int(self.batchsize / batchtimemax) except ZeroDivisionError: batchratemin = 0 try: batchratemax = int(self.batchsize / batchtimemin) except ZeroDivisionError: batchratemax = 0 v = (n, elapsedtime, rate, batchtimemean, batchtimevar**.5, batchtimemin, batchtimemax, int(batchratemean), int(batchratevar**.5), int(batchratemin), int(batchratemax)) message = self.prefix + '%s rows in %.2fs (%s row/s); batches in ' \ '%.2f +/- %.2fs [%.2f-%.2f] ' \ '(%s +/- %s rows/s [%s-%s])' % v else: v = (n, elapsedtime, rate) message = self.prefix + '%s rows in %.2fs (%s row/s)' % v self.print_message(message) class ProgressView(ProgressViewBase): """ Reports progress to a file_object like sys.stdout or a file handler """ def __init__(self, inner, batchsize, prefix, out): if out is None: self.file_object = sys.stderr else: self.file_object = out super(ProgressView, self).__init__(inner, batchsize, prefix) def print_message(self, message): print(message, file=self.file_object) if hasattr(self.file_object, 'flush'): self.file_object.flush() class LoggingProgressView(ProgressViewBase): """ Reports progress to a logger, log handler, or log adapter """ def __init__(self, inner, batchsize, prefix, logger, level=logging.INFO): if logger is None: self.logger = logging.getLogger(__name__) self.logger.setLevel(level) else: self.logger = logger self.level = level super(LoggingProgressView, self).__init__(inner, batchsize, prefix) def print_message(self, message): self.logger.log(self.level, message) def clock(table): """ Time how long is spent retrieving rows from the wrapped container. Enables diagnosis of which steps in a pipeline are taking the most time. E.g.:: >>> import petl as etl >>> t1 = etl.dummytable(100000) >>> c1 = etl.clock(t1) >>> t2 = etl.convert(c1, 'foo', lambda v: v**2) >>> c2 = etl.clock(t2) >>> p = etl.progress(c2, 10000) >>> etl.tocsv(p, 'example.csv') # doctest: +SKIP 10000 rows in 0.23s (44036 row/s); batch in 0.23s (44036 row/s) 20000 rows in 0.38s (52167 row/s); batch in 0.16s (63979 row/s) 30000 rows in 0.54s (55749 row/s); batch in 0.15s (64624 row/s) 40000 rows in 0.69s (57765 row/s); batch in 0.15s (64793 row/s) 50000 rows in 0.85s (59031 row/s); batch in 0.15s (64707 row/s) 60000 rows in 1.00s (59927 row/s); batch in 0.15s (64847 row/s) 70000 rows in 1.16s (60483 row/s); batch in 0.16s (64051 row/s) 80000 rows in 1.31s (61008 row/s); batch in 0.15s (64953 row/s) 90000 rows in 1.47s (61356 row/s); batch in 0.16s (64285 row/s) 100000 rows in 1.62s (61703 row/s); batch in 0.15s (65012 row/s) 100000 rows in 1.62s (61700 row/s); batches in 0.16 +/- 0.02s [0.15-0.23] (62528 +/- 6173 rows/s [44036-65012]) >>> # time consumed retrieving rows from t1 ... c1.time # doctest: +SKIP 0.7243089999999492 >>> # time consumed retrieving rows from t2 ... c2.time # doctest: +SKIP 1.1704209999999766 >>> # actual time consumed by the convert step ... c2.time - c1.time # doctest: +SKIP 0.4461120000000274 See also :func:`petl.util.timing.progress`. """ return ClockView(table) Table.clock = clock class ClockView(Table): def __init__(self, wrapped): self.wrapped = wrapped def __iter__(self): self.time = 0 it = iter(self.wrapped) while True: before = time.perf_counter() if PY3 else time.clock() try: row = next(it) except StopIteration: return after = time.perf_counter() if PY3 else time.clock() self.time += (after - before) yield row petl-1.7.15/petl/util/vis.py000066400000000000000000000405741457414240700156670ustar00rootroot00000000000000from __future__ import absolute_import, print_function, division import locale from itertools import islice from collections import defaultdict from petl.compat import numeric_types, text_type from petl import config from petl.util.base import Table from petl.io.sources import MemorySource from petl.io.html import tohtml def look(table, limit=0, vrepr=None, index_header=None, style=None, truncate=None, width=None): """ Format a portion of the table as text for inspection in an interactive session. E.g.:: >>> import petl as etl >>> table1 = [['foo', 'bar'], ... ['a', 1], ... ['b', 2]] >>> etl.look(table1) +-----+-----+ | foo | bar | +=====+=====+ | 'a' | 1 | +-----+-----+ | 'b' | 2 | +-----+-----+ >>> # alternative formatting styles ... etl.look(table1, style='simple') === === foo bar === === 'a' 1 'b' 2 === === >>> etl.look(table1, style='minimal') foo bar 'a' 1 'b' 2 >>> # any irregularities in the length of header and/or data ... # rows will appear as blank cells ... table2 = [['foo', 'bar'], ... ['a'], ... ['b', 2, True]] >>> etl.look(table2) +-----+-----+------+ | foo | bar | | +=====+=====+======+ | 'a' | | | +-----+-----+------+ | 'b' | 2 | True | +-----+-----+------+ Three alternative presentation styles are available: 'grid', 'simple' and 'minimal', where 'grid' is the default. A different style can be specified using the `style` keyword argument. The default style can also be changed by setting ``petl.config.look_style``. """ # determine defaults if limit == 0: limit = config.look_limit if vrepr is None: vrepr = config.look_vrepr if index_header is None: index_header = config.look_index_header if style is None: style = config.look_style if width is None: width = config.look_width return Look(table, limit=limit, vrepr=vrepr, index_header=index_header, style=style, truncate=truncate, width=width) Table.look = look class Look(object): def __init__(self, table, limit, vrepr, index_header, style, truncate, width): self.table = table self.limit = limit self.vrepr = vrepr self.index_header = index_header self.style = style self.truncate = truncate self.width = width def __repr__(self): # determine if table overflows limit table, overflow = _vis_overflow(self.table, self.limit) # construct output style = self.style vrepr = self.vrepr index_header = self.index_header truncate = self.truncate width = self.width if style == 'simple': output = _look_simple(table, vrepr=vrepr, index_header=index_header, truncate=truncate, width=width) elif style == 'minimal': output = _look_minimal(table, vrepr=vrepr, index_header=index_header, truncate=truncate, width=width) else: output = _look_grid(table, vrepr=vrepr, index_header=index_header, truncate=truncate, width=width) # add overflow indicator if overflow: output += '...\n' return output __str__ = __repr__ __unicode__ = __repr__ def _table_repr(table): return str(look(table)) Table.__repr__ = _table_repr def lookall(table, **kwargs): """ Format the entire table as text for inspection in an interactive session. N.B., this will load the entire table into memory. See also :func:`petl.util.vis.look` and :func:`petl.util.vis.see`. """ kwargs['limit'] = None return look(table, **kwargs) def lookstr(table, limit=0, **kwargs): """Like :func:`petl.util.vis.look` but use str() rather than repr() for data values. """ kwargs['vrepr'] = str return look(table, limit=limit, **kwargs) Table.lookstr = lookstr def _table_str(table): return str(lookstr(table)) Table.__str__ = _table_str Table.__unicode__ = _table_str def lookallstr(table, **kwargs): """ Like :func:`petl.util.vis.lookall` but use str() rather than repr() for data values. """ kwargs['vrepr'] = str return lookall(table, **kwargs) Table.lookallstr = lookallstr Table.lookall = lookall def _look_grid(table, vrepr, index_header, truncate, width): it = iter(table) # fields representation try: hdr = next(it) except StopIteration: return '' flds = list(map(text_type, hdr)) if index_header: fldsrepr = ['%s|%s' % (i, r) for (i, r) in enumerate(flds)] else: fldsrepr = flds # rows representations rows = list(it) rowsrepr = [[vrepr(v) for v in row] for row in rows] # find maximum row length - may be uneven rowlens = [len(hdr)] rowlens.extend([len(row) for row in rows]) maxrowlen = max(rowlens) # pad short fields and rows if len(hdr) < maxrowlen: fldsrepr.extend([''] * (maxrowlen - len(hdr))) for valsrepr in rowsrepr: if len(valsrepr) < maxrowlen: valsrepr.extend([''] * (maxrowlen - len(valsrepr))) # truncate if truncate: fldsrepr = [x[:truncate] for x in fldsrepr] rowsrepr = [[x[:truncate] for x in valsrepr] for valsrepr in rowsrepr] # find longest representations so we know how wide to make cells colwidths = [0] * maxrowlen # initialise to 0 for i, fr in enumerate(fldsrepr): colwidths[i] = len(fr) for valsrepr in rowsrepr: for i, vr in enumerate(valsrepr): if len(vr) > colwidths[i]: colwidths[i] = len(vr) # construct a line separator sep = '+' for w in colwidths: sep += '-' * (w + 2) sep += '+' if width: sep = sep[:width] sep += '\n' # construct a header separator hedsep = '+' for w in colwidths: hedsep += '=' * (w + 2) hedsep += '+' if width: hedsep = hedsep[:width] hedsep += '\n' # construct a line for the header row fldsline = '|' for i, w in enumerate(colwidths): f = fldsrepr[i] fldsline += ' ' + f fldsline += ' ' * (w - len(f)) # padding fldsline += ' |' if width: fldsline = fldsline[:width] fldsline += '\n' # construct a line for each data row rowlines = list() for vals, valsrepr in zip(rows, rowsrepr): rowline = '|' for i, w in enumerate(colwidths): vr = valsrepr[i] if i < len(vals) and isinstance(vals[i], numeric_types) \ and not isinstance(vals[i], bool): # left pad numbers rowline += ' ' * (w + 1 - len(vr)) # padding rowline += vr + ' |' else: # right pad everything else rowline += ' ' + vr rowline += ' ' * (w - len(vr)) # padding rowline += ' |' if width: rowline = rowline[:width] rowline += '\n' rowlines.append(rowline) # put it all together output = sep + fldsline + hedsep for line in rowlines: output += line + sep return output def _look_simple(table, vrepr, index_header, truncate, width): it = iter(table) # fields representation try: hdr = next(it) except StopIteration: return '' flds = list(map(text_type, hdr)) if index_header: fldsrepr = ['%s|%s' % (i, r) for (i, r) in enumerate(flds)] else: fldsrepr = flds # rows representations rows = list(it) rowsrepr = [[vrepr(v) for v in row] for row in rows] # find maximum row length - may be uneven rowlens = [len(hdr)] rowlens.extend([len(row) for row in rows]) maxrowlen = max(rowlens) # pad short fields and rows if len(hdr) < maxrowlen: fldsrepr.extend([''] * (maxrowlen - len(hdr))) for valsrepr in rowsrepr: if len(valsrepr) < maxrowlen: valsrepr.extend([''] * (maxrowlen - len(valsrepr))) # truncate if truncate: fldsrepr = [x[:truncate] for x in fldsrepr] rowsrepr = [[x[:truncate] for x in valsrepr] for valsrepr in rowsrepr] # find longest representations so we know how wide to make cells colwidths = [0] * maxrowlen # initialise to 0 for i, fr in enumerate(fldsrepr): colwidths[i] = len(fr) for valsrepr in rowsrepr: for i, vr in enumerate(valsrepr): if len(vr) > colwidths[i]: colwidths[i] = len(vr) # construct a header separator hedsep = ' '.join('=' * w for w in colwidths) if width: hedsep = hedsep[:width] hedsep += '\n' # construct a line for the header row fldsline = ' '.join(f.ljust(w) for f, w in zip(fldsrepr, colwidths)) if width: fldsline = fldsline[:width] fldsline += '\n' # construct a line for each data row rowlines = list() for vals, valsrepr in zip(rows, rowsrepr): rowline = '' for i, w in enumerate(colwidths): vr = valsrepr[i] if i < len(vals) and isinstance(vals[i], numeric_types) \ and not isinstance(vals[i], bool): # left pad numbers rowline += vr.rjust(w) else: # right pad everything else rowline += vr.ljust(w) if i < len(colwidths) - 1: rowline += ' ' if width: rowline = rowline[:width] rowline += '\n' rowlines.append(rowline) # put it all together output = hedsep + fldsline + hedsep for line in rowlines: output += line output += hedsep return output def _look_minimal(table, vrepr, index_header, truncate, width): it = iter(table) # fields representation try: hdr = next(it) except StopIteration: return '' flds = list(map(text_type, hdr)) if index_header: fldsrepr = ['%s|%s' % (i, r) for (i, r) in enumerate(flds)] else: fldsrepr = flds # rows representations rows = list(it) rowsrepr = [[vrepr(v) for v in row] for row in rows] # find maximum row length - may be uneven rowlens = [len(hdr)] rowlens.extend([len(row) for row in rows]) maxrowlen = max(rowlens) # pad short fields and rows if len(hdr) < maxrowlen: fldsrepr.extend([''] * (maxrowlen - len(hdr))) for valsrepr in rowsrepr: if len(valsrepr) < maxrowlen: valsrepr.extend([''] * (maxrowlen - len(valsrepr))) # truncate if truncate: fldsrepr = [x[:truncate] for x in fldsrepr] rowsrepr = [[x[:truncate] for x in valsrepr] for valsrepr in rowsrepr] # find longest representations so we know how wide to make cells colwidths = [0] * maxrowlen # initialise to 0 for i, fr in enumerate(fldsrepr): colwidths[i] = len(fr) for valsrepr in rowsrepr: for i, vr in enumerate(valsrepr): if len(vr) > colwidths[i]: colwidths[i] = len(vr) # construct a line for the header row fldsline = ' '.join(f.ljust(w) for f, w in zip(fldsrepr, colwidths)) if width: fldsline = fldsline[:width] fldsline += '\n' # construct a line for each data row rowlines = list() for vals, valsrepr in zip(rows, rowsrepr): rowline = '' for i, w in enumerate(colwidths): vr = valsrepr[i] if i < len(vals) and isinstance(vals[i], numeric_types) \ and not isinstance(vals[i], bool): # left pad numbers rowline += vr.rjust(w) else: # right pad everything else rowline += vr.ljust(w) if i < len(colwidths) - 1: rowline += ' ' if width: rowline = rowline[:width] rowline += '\n' rowlines.append(rowline) # put it all together output = fldsline for line in rowlines: output += line return output def see(table, limit=0, vrepr=None, index_header=None): """ Format a portion of a table as text in a column-oriented layout for inspection in an interactive session. E.g.:: >>> import petl as etl >>> table = [['foo', 'bar'], ['a', 1], ['b', 2]] >>> etl.see(table) foo: 'a', 'b' bar: 1, 2 Useful for tables with a larger number of fields. """ # determine defaults if limit == 0: limit = config.see_limit if vrepr is None: vrepr = config.see_vrepr if index_header is None: index_header = config.see_index_header return See(table, limit=limit, vrepr=vrepr, index_header=index_header) class See(object): def __init__(self, table, limit, vrepr, index_header): self.table = table self.limit = limit self.vrepr = vrepr self.index_header = index_header def __repr__(self): # determine if table overflows limit table, overflow = _vis_overflow(self.table, self.limit) vrepr = self.vrepr index_header = self.index_header # construct output output = '' it = iter(table) try: flds = next(it) except StopIteration: return '' cols = defaultdict(list) for row in it: for i, f in enumerate(flds): try: cols[str(i)].append(vrepr(row[i])) except IndexError: cols[str(f)].append('') for i, f in enumerate(flds): if index_header: f = '%s|%s' % (i, f) output += '%s: %s' % (f, ', '.join(cols[str(i)])) if overflow: output += '...\n' else: output += '\n' return output __str__ = __repr__ __unicode__ = __repr__ Table.see = see def _vis_overflow(table, limit): overflow = False if limit: # try reading one more than the limit, to see if there are more rows table = list(islice(table, 0, limit+2)) if len(table) > limit+1: overflow = True table = table[:-1] return table, overflow def _display_html(table, limit=0, vrepr=None, index_header=None, caption=None, tr_style=None, td_styles=None, encoding=None, truncate=None, epilogue=None): # determine defaults if limit == 0: limit = config.display_limit if vrepr is None: vrepr = config.display_vrepr if index_header is None: index_header = config.display_index_header if encoding is None: encoding = locale.getpreferredencoding() table, overflow = _vis_overflow(table, limit) buf = MemorySource() tohtml(table, buf, encoding=encoding, index_header=index_header, vrepr=vrepr, caption=caption, tr_style=tr_style, td_styles=td_styles, truncate=truncate) output = text_type(buf.getvalue(), encoding) if epilogue: output += '

%s

' % epilogue elif overflow: output += '

...

' return output Table._repr_html_ = _display_html def display(table, limit=0, vrepr=None, index_header=None, caption=None, tr_style=None, td_styles=None, encoding=None, truncate=None, epilogue=None): """ Display a table inline within an IPython notebook. """ from IPython.core.display import display_html html = _display_html(table, limit=limit, vrepr=vrepr, index_header=index_header, caption=caption, tr_style=tr_style, td_styles=td_styles, encoding=encoding, truncate=truncate, epilogue=epilogue) display_html(html, raw=True) Table.display = display def displayall(table, **kwargs): """ Display **all rows** from a table inline within an IPython notebook (use with caution, big tables will kill your browser). """ kwargs['limit'] = None display(table, **kwargs) Table.displayall = displayall petl-1.7.15/pyproject.toml000066400000000000000000000002641457414240700154770ustar00rootroot00000000000000[build-system] requires = ["setuptools", "setuptools-scm", "wheel"] [tool.bandit] exclude_dirs = ["bin", "docs"] [tool.bandit.assert_used] skips = ["*/*_test.py", "*/test_*.py"] petl-1.7.15/pytest.ini000066400000000000000000000002171457414240700146120ustar00rootroot00000000000000[pytest] log_level=DEBUG doctest_optionflags = NORMALIZE_WHITESPACE ALLOW_UNICODE addopts = --ignore-glob=*_py2.py --ignore-glob=petl/io/db.py petl-1.7.15/repr_html.ipynb000066400000000000000000000526621457414240700156330ustar00rootroot00000000000000{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "sys.version_info(major=3, minor=4, micro=3, releaselevel='final', serial=0)" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import sys\n", "sys.version_info" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'1.1.0.dev0'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import petl as etl\n", "etl.__version__" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
foobarbaz
81apples0.025010755222666936
35pears0.22321073814882275
94apples0.6766994874229113
69apples0.5904925124490397
4apples0.09369523986159245
\n", "

...

" ], "text/plain": [ "+-----+----------+----------------------+\n", "| foo | bar | baz |\n", "+=====+==========+======================+\n", "| 81 | 'apples' | 0.025010755222666936 |\n", "+-----+----------+----------------------+\n", "| 35 | 'pears' | 0.22321073814882275 |\n", "+-----+----------+----------------------+\n", "| 94 | 'apples' | 0.6766994874229113 |\n", "+-----+----------+----------------------+\n", "| 69 | 'apples' | 0.5904925124490397 |\n", "+-----+----------+----------------------+\n", "| 4 | 'apples' | 0.09369523986159245 |\n", "+-----+----------+----------------------+\n", "..." ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tbl = etl.dummytable(10, seed=42)\n", "tbl" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
foobarbaz
81apples0.025010755222666936
35pears0.22321073814882275
\n", "

...

" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "tbl.display(2)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
foobarbaz
81apples0.025010755222666936
35pears0.22321073814882275
94apples0.6766994874229113
69apples0.5904925124490397
4apples0.09369523986159245
29apples0.561245062938613
91oranges0.2204406220406967
75bananas0.8094304566778266
0pears0.6981393949882269
43bananas0.15547949981178155
\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "tbl.displayall()" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
foobarbaz
81apples0.025010755222666936
35pears0.22321073814882275
94apples0.6766994874229113
69apples0.5904925124490397
4apples0.09369523986159245
\n", "

...

" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
0|foo1|bar2|baz
81apples0.025010755222666936
35pears0.22321073814882275
94apples0.6766994874229113
69apples0.5904925124490397
4apples0.09369523986159245
\n", "

...

" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "tbl.display()\n", "tbl.display(index_header=True)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
example data
foobarbaz
81apples0.025010755222666936
35pears0.22321073814882275
94apples0.6766994874229113
69apples0.5904925124490397
4apples0.09369523986159245
\n", "

...

" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "tbl.display(caption='example data')" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
foobarbaz
81apples0.025010755222666936
35pears0.22321073814882275
94apples0.6766994874229113
69apples0.5904925124490397
4apples0.09369523986159245
\n", "

example data

" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "tbl.display(epilogue='example data')" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
foobarbaz
81appl0.02
35pear0.22
94appl0.67
69appl0.59
4appl0.09
\n", "

...

" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "tbl.display(truncate=4)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
foobarbaz
81apples0.025010755222666936
35pears0.22321073814882275
94apples0.6766994874229113
69apples0.5904925124490397
4apples0.09369523986159245
\n", "

...

" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "tbl.display(tr_style=lambda row: 'background-color: %s' % ('#faa' if row.foo > 50 else 'white'))" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
foobarbaz
81apples0.025010755222666936
35pears0.22321073814882275
94apples0.6766994874229113
69apples0.5904925124490397
4apples0.09369523986159245
\n", "

...

" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "tbl.display(tr_style='font-size: .8em',\n", " td_styles={'bar': 'background-color: yellow',\n", " 'baz': lambda v: 'background-color: %s' % ('#faa' if v > .5 else '#aaf')})" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(('name', 'id'),\n", " ('Արամ Խաչատրյան', 1),\n", " ('Johann Strauß', 2),\n", " ('Вагиф Сәмәдоғлу', 3),\n", " ('章子怡', 4),\n", " ('Արամ Խաչատրյան', 1),\n", " ('Johann Strauß', 2),\n", " ('Вагиф Сәмәдоғлу', 3),\n", " ('章子怡', 4))" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "t = ((u'name', u'id'),\n", " (u'Արամ Խաչատրյան', 1),\n", " (u'Johann Strauß', 2),\n", " (u'Вагиф Сәмәдоғлу', 3),\n", " (u'章子怡', 4),\n", " (u'Արամ Խաչատրյան', 1),\n", " (u'Johann Strauß', 2),\n", " (u'Вагиф Сәмәдоғлу', 3),\n", " (u'章子怡', 4))\n", "t" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
nameid
Արամ Խաչատրյան1
Johann Strauß2
Вагиф Сәмәдоғлу3
章子怡4
Արամ Խաչատրյան1
\n", "

...

" ], "text/plain": [ "+-------------------+----+\n", "| name | id |\n", "+===================+====+\n", "| 'Արամ Խաչատրյան' | 1 |\n", "+-------------------+----+\n", "| 'Johann Strauß' | 2 |\n", "+-------------------+----+\n", "| 'Вагиф Сәмәдоғлу' | 3 |\n", "+-------------------+----+\n", "| '章子怡' | 4 |\n", "+-------------------+----+\n", "| 'Արամ Խաչատրյան' | 1 |\n", "+-------------------+----+\n", "..." ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tbl2 = etl.wrap(t)\n", "tbl2" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
unicode example
nameid
Արամ Խաչատրյան1
Johann Strauß2
Вагиф Сәмәдоғлу3
章子怡4
Արամ Խաչատրյան1
\n", "

...

" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "tbl2.display(caption='unicode example')" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.4.3+" } }, "nbformat": 4, "nbformat_minor": 0 } petl-1.7.15/requirements-database.txt000066400000000000000000000002041457414240700176030ustar00rootroot00000000000000# packages required for testing petl with databases cryptography pymysql SQLAlchemy>=1.3.6,<2.0 psycopg2-binary # PyMySQL==0.9.3 petl-1.7.15/requirements-docs.txt000066400000000000000000000000641457414240700167730ustar00rootroot00000000000000setuptools setuptools_scm sphinx sphinx-issues mock petl-1.7.15/requirements-formats.txt000066400000000000000000000005041457414240700175150ustar00rootroot00000000000000Cython numpy numexpr intervaltree>=3.0.2 lxml>=4.6.5 openpyxl>=2.6.2 pandas Whoosh>=2.7.4 xlrd>=2.0.1 xlwt>=1.3.0 fastavro>=0.24.2 ; python_version >= '3.4' fastavro==0.24.2 ; python_version < '3.0' gspread>=3.4.0 ; python_version >= '3.4' # version 3.7.0 doesn't work yet with python3.11 tables ; python_version != '3.11' petl-1.7.15/requirements-linting.txt000066400000000000000000000003271457414240700175110ustar00rootroot00000000000000## Used as main formatter/linter: ruff >= 0.3 # Used in Github: pylint >= 3.0.0 flake8 >= 7.0.0 black >= 24.0.0 bandit[toml,sarif] >= 1.7.0 ## Suggestions: # pre-commit #? Obs: Should work with python >= 3.8 petl-1.7.15/requirements-optional.txt000066400000000000000000000006161457414240700176730ustar00rootroot00000000000000# Packages bellow need complex local setup # # Also check: .github/workflows/test-changes.yml # Throubleshooting: # 1. $ export DISABLE_BLOSC_AVX2=1 # 2. $ brew install c-blosc blosc ; python_version >= '3.7' # Throubleshooting: # 1. pip install --prefer-binary -r requirements-optional.txt # 2. pip install --prefer-binary bcolz bcolz ; python_version >= '3.7' and python_version < '3.10' petl-1.7.15/requirements-remote.txt000066400000000000000000000004201457414240700173320ustar00rootroot00000000000000# packages for testing remote sources fastavro>=0.24.2 ; python_version >= '3.4' smbprotocol>=1.0.1 paramiko>=2.7.1 requests; python_version >= '3.4' fsspec>=0.7.4 ; python_version >= '3.4' aiohttp>=3.6.2 ; python_version >= '3.5.3' s3fs>=0.2.2 ; python_version >= '3.4' petl-1.7.15/requirements-tests.txt000066400000000000000000000001731457414240700172060ustar00rootroot00000000000000wheel setuptools pytest-cov>=2.12.0 pytest>=4.6.6,<7.0.0 tox coveralls coverage setuptools-scm mock; python_version < '3.0'petl-1.7.15/setup.py000066400000000000000000000042101457414240700142700ustar00rootroot00000000000000from __future__ import print_function, absolute_import, division from setuptools import setup, find_packages setup( name='petl', author='Alistair Miles', author_email='alimanfoo@googlemail.com', package_dir={'': '.'}, packages=find_packages('.'), scripts=['bin/petl'], url='https://github.com/petl-developers/petl', license='MIT License', description='A Python package for extracting, transforming and loading ' 'tables of data.', long_description=open('README.txt').read(), python_requires='>=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*', setup_requires=["setuptools>18.0", "setuptools-scm>1.5.4"], extras_require={ 'avro': ['fastavro>=0.24.0'], 'bcolz': ['bcolz>=1.2.1'], 'db': ['SQLAlchemy>=1.3.6,<2.0'], 'hdf5': ['cython>=0.29.13', 'numpy>=1.16.4', 'numexpr>=2.6.9', 'tables>=3.5.2'], 'http': ['aiohttp>=3.6.2', 'requests'], 'interval': ['intervaltree>=3.0.2'], 'numpy': ['numpy>=1.16.4'], 'pandas': ['pandas>=0.24.2'], 'remote': ['fsspec>=0.7.4'], 'smb': ['smbprotocol>=1.0.1'], 'xls': ['xlrd>=2.0.1', 'xlwt>=1.3.0'], 'xlsx': ['openpyxl>=2.6.2'], 'xpath': ['lxml>=4.4.0'], 'whoosh': ['whoosh'], }, use_scm_version={ "version_scheme": "guess-next-dev", "local_scheme": "dirty-tag", "write_to": "petl/version.py", }, classifiers=['Intended Audience :: Developers', 'License :: OSI Approved :: MIT License', 'Programming Language :: Python :: 2', 'Programming Language :: Python :: 2.7', 'Programming Language :: Python :: 3', 'Programming Language :: Python :: 3.6', 'Programming Language :: Python :: 3.7', 'Programming Language :: Python :: 3.8', 'Programming Language :: Python :: 3.9', 'Programming Language :: Python :: 3.10', 'Programming Language :: Python :: 3.11', 'Topic :: Software Development :: Libraries :: Python Modules' ] ) petl-1.7.15/tox.ini000066400000000000000000000045751457414240700141070ustar00rootroot00000000000000# Tox (http://tox.testrun.org/) is a tool for running tests # in multiple virtualenvs. This configuration file will run the # test suite on all supported python versions. To use it, "pip install tox" # and then run "tox" from this directory. [tox] envlist = py27, py36, py37, py38, py39, py310, py311, {py36,py37,py38,py39,py310,py311}-docs [testenv] # get stable output for unordered types setenv = PYTHONHASHSEED = 42 py27: PY_MAJOR_VERSION = py2 py36,py37,py38,py39,py310,py311: PY_MAJOR_VERSION = py3 commands = pytest --cov=petl petl coverage report -m deps = -rrequirements-tests.txt -rrequirements-formats.txt [testenv:{py36,py37,py38,py39,py310,py311}-docs] # build documentation under similar environment to readthedocs changedir = docs deps = -rrequirements-docs.txt commands = sphinx-build -W -b html -d {envtmpdir}/doctrees . {envtmpdir}/html [testenv:{py36,py37,py38,py39,py310,py311}-doctest] commands = py36,py37,py38,py39,py310,py311: pytest --doctest-modules --cov=petl petl [testenv:{py36,py37,py38,py39}-dochtml] changedir = docs deps = -rrequirements-docs.txt commands = sphinx-build -W -b singlehtml -d {envtmpdir}/doctrees . _build/singlehtml [testenv:remote] # Create test containers with the following commands: # docker run -it --name samba -p 139:139 -p 445:445 -d "dperson/samba" -p -u "petl;test" -s "public;/public-dir;yes;no;yes;all" # docker run -it --name sftp -p 22:22 -d atmoz/sftp petl:test:::public setenv = {[testenv]setenv} PETL_TEST_SMB=smb://WORKGROUP;petl:test@localhost/public/ PETL_TEST_SFTP=sftp://petl:test@localhost/public/ commands = pytest --cov=petl petl deps = {[testenv]deps} -rrequirements-remote.txt [testenv:database] # Create test containers with the following commands: # docker run -it --name mysql -p 3306:3306 -p 33060:33060 -e MYSQL_ROOT_PASSWORD=pass0 -e MYSQL_DATABASE=petl -e MYSQL_USER=petl -e MYSQL_PASSWORD=test -d mysql:latest # docker run -it --name postgres -p 5432:5432 -e POSTGRES_DB=petl -e POSTGRES_USER=petl -e POSTGRES_PASSWORD=test -d postgres:latest setenv = {[testenv]setenv} commands = pytest --cov=petl petl deps = -rrequirements-tests.txt -rrequirements-database.txt [testenv:mysqldb] basepython = python2.7 setenv = PYTHONHASHSEED = 42 deps = MySQL-python==1.2.5 SQLAlchemy==1.2.10 -rrequirements-tests.txt commands = pytest petl