pax_global_header 0000666 0000000 0000000 00000000064 14074061410 0014507 g ustar 00root root 0000000 0000000 52 comment=dc0684a082286bd11c50a32045a9624fb6b3a60f
agate-1.6.3/ 0000775 0000000 0000000 00000000000 14074061410 0012577 5 ustar 00root root 0000000 0000000 agate-1.6.3/.github/ 0000775 0000000 0000000 00000000000 14074061410 0014137 5 ustar 00root root 0000000 0000000 agate-1.6.3/.github/CONTRIBUTING.md 0000664 0000000 0000000 00000005660 14074061410 0016377 0 ustar 00root root 0000000 0000000 Contributing
============
Principles
----------
agate is a intended to fill a very particular programming niche. It should not be allowed to become as complex as [numpy] or [pandas]. Please bear in mind the following principles when contemplating an addition:
- Humans have less time than computers. Optimize for humans.
- Most datasets are small. Don’t optimize for “big data”.
- Text is data. It must always be a first-class citizen.
- Python gets it right. Make it work like Python does.
- Humans lives are nasty, brutish and short. Make it easy.
- Mutability leads to confusion. Processes that alter data must create new copies.
- Extensions are the way. Don’t add it to core unless everybody needs it.
Process for contributing code
-----------------------------
Contributors should use the following roadmap to guide them through the process of submitting a contribution:
1. Fork the project on [GitHub].
2. Check out the [issue tracker] and find a task that needs to be done and is of a scope you can realistically expect to complete in a few days. Don’t worry about the priority of the issues at first, but try to choose something you’ll enjoy. You’re much more likely to finish something to the point it can be merged if it’s something you really enjoy hacking on.
3. Comment on the ticket letting everyone know you’re going to be hacking on it so that nobody duplicates your effort. It’s also good practice to provide some general idea of how you plan on resolving the issue so that other developers can make suggestions.
4. Write tests for the feature you’re building. Follow the format of the existing tests in the test directory to see how this works. You can run all the tests with the command `nosetests tests`.
5. Write the code. Try to stay consistent with the style and organization of the existing codebase. A good patch won’t be refused for stylistic reasons, but large parts of it may be rewritten and nobody wants that.
6. As you are coding, periodically merge in work from the master branch and verify you haven’t broken anything by running the test suite.
7. Write documentation. Seriously.
8. Once it works, is tested, and has documentation, submit a pull request on GitHub.
9. Wait for it to either be merged or to receive a comment about what needs to be fixed.
10. Rejoice.
Legalese
--------
To the extent that they care, contributors should keep in mind that the source of agate and therefore of any contributions are licensed under the permissive [MIT license]. By submitting a patch or pull request you are agreeing to release your code under this license. You will be acknowledged in the AUTHORS list, the commit history and the hearts and minds of jo
[numpy]: http://www.numpy.org/
[pandas]: http://pandas.pydata.org/
[GitHub]: https://github.com/wireservice/agate
[issue tracker]: https://github.com/wireservice/agate/issues
[MIT license]: http://www.opensource.org/licenses/mit-license.php
agate-1.6.3/.github/workflows/ 0000775 0000000 0000000 00000000000 14074061410 0016174 5 ustar 00root root 0000000 0000000 agate-1.6.3/.github/workflows/ci.yml 0000664 0000000 0000000 00000004067 14074061410 0017321 0 ustar 00root root 0000000 0000000 name: CI
on: [push, pull_request]
jobs:
build:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [macos-latest, windows-latest, ubuntu-latest]
python-version: [2.7, 3.6, 3.7, 3.8, 3.9, pypy-2.7, pypy-3.6, pypy-3.7]
exclude:
# UnicodeDecodeError on test_to_csv
- os: windows-latest
python-version: 2.7
- os: windows-latest
python-version: pypy-2.7
- os: windows-latest
python-version: 3.6
- os: windows-latest
python-version: pypy-3.6
- os: windows-latest
python-version: 3.7
- os: windows-latest
python-version: pypy-3.7
steps:
- if: matrix.os == 'ubuntu-latest'
name: Install UTF-8 locales and lxml requirements
run: |
sudo apt install libxml2-dev libxslt-dev
sudo locale-gen de_DE.UTF-8
sudo locale-gen en_US.UTF-8
sudo locale-gen ko_KR.UTF-8
sudo update-locale
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
# https://github.com/actions/cache/blob/main/examples.md#using-a-script-to-get-cache-location
- id: pip-cache
run: python -c "from pip._internal.locations import USER_CACHE_DIR; print('::set-output name=dir::' + USER_CACHE_DIR)"
- uses: actions/cache@v1
with:
path: ${{ steps.pip-cache.outputs.dir }}
key: ${{ runner.os }}-pip-${{ hashFiles('**/setup.py') }}
restore-keys: |
${{ runner.os }}-pip-
- run: pip install --upgrade check-manifest flake8 isort setuptools
- run: check-manifest
- run: flake8 .
- run: isort . --check-only
- run: pip install .[test]
- run: nosetests --with-coverage --cover-package=agate
# UnicodeDecodeError on print_bars
- if: matrix.os != 'windows-latest' && matrix.python-version != '2.7' && matrix.python-version != 'pypy-2.7'
run: python example.py
- run: python charts.py
agate-1.6.3/.gitignore 0000664 0000000 0000000 00000000150 14074061410 0014563 0 ustar 00root root 0000000 0000000 .DS_Store
*.pyc
*.swp
*.swo
*.egg-info
docs/_build
dist
.coverage
build
.proof
.ipynb_checkpoints
.idea
agate-1.6.3/.pre-commit-config.yaml 0000664 0000000 0000000 00000000435 14074061410 0017062 0 ustar 00root root 0000000 0000000 repos:
- repo: https://github.com/pycqa/flake8
rev: 3.9.2
hooks:
- id: flake8
- repo: https://github.com/pycqa/isort
rev: 5.8.0
hooks:
- id: isort
- repo: https://github.com/mgedmin/check-manifest
rev: "0.46"
hooks:
- id: check-manifest
agate-1.6.3/AUTHORS.rst 0000664 0000000 0000000 00000004465 14074061410 0014467 0 ustar 00root root 0000000 0000000 agate is made by a community. The following individuals have contributed code, documentation, or expertise to agate:
* `Christopher Groskopf `_
* `Jeff Larson `_
* `Eric Sagara `_
* `John Heasly `_
* `Mick O'Brien `_
* `David Eads `_
* `Nikhil Sonnad `_
* `Matt Riggott `_
* `Tyler Fisher `_
* `William P. Davis `_
* `Ryan Murphy `_
* `Raphael Deem `_
* `Robin Linderborg `_
* `Chris Keller `_
* `Neil Bedi `_
* `Geoffrey Hing `_
* `Taurus Olson `_
* `Danny Page `_
* `James McKinney `_
* `Tony Papousek `_
* `Mila Frerichs `_
* `Paul Fitzpatrick `_
* `Ben Welsh `_
* `Kevin Schaul `_
* `sandyp `_
* `Lexie Heinle `_
* `Will Skora `_
* `Joe Germuska `_
* `Eli Murray `_
* `Derek Swingley `_
* `Or Sharir `_
* `Anthony DeBarros `_
* `Apoorv Anand `_
* `Ghislain Antony Vaillant `_
* `Neil MartinsenBurrell `_
* `Aliaksei Urbanski `_
* `Forest Gregg `_
* `Robert Schütz `_
* `Wouter de Vries `_
* `Kartik Agaram `_
* `Loïc Corbasson `_
* `Danny Sepler `_
* `brian-from-quantrocket `_
* `mathdesc `_
* `Tim Gates `_
agate-1.6.3/CHANGELOG.rst 0000664 0000000 0000000 00000111340 14074061410 0014620 0 ustar 00root root 0000000 0000000 1.6.3 - July 15, 2021
---------------------
* feat: :meth:`.Table.from_csv` accepts a ``row_limit`` keyword argument. (#740)
* feat: :meth:`.Table.from_json` accepts an ``encoding`` keyword argument. (#734)
* feat: :meth:`.Table.print_html` accepts a ``max_precision`` keyword argument, like :meth:`.Table.print_table`. (#753)
* feat: :class:`.TypeTester` accepts a ``null_values`` keyword argument, like individual data types. (#745)
* feat: :class:`.Min`, :class:`.Max` and :class:`.Sum` (#735) work with :class:`.TimeDelta`.
* feat: :class:`.FieldSizeLimitError` includes the line number in the error message. (#681)
* feat: :class:`.csv.Sniffer` warns on error while sniffing CSV dialect.
* fix: :meth:`.Table.normalize` works with basic processing methods. (#691)
* fix: :meth:`.Table.homogenize` works with basic processing methods. (#756)
* fix: :meth:`.Table.homogenize` casts ``compare_values`` and ``default_row``. (#700)
* fix: :meth:`.Table.homogenize` accepts tuples. (#710)
* fix: :meth:`.TableSet.group_by` accepts input with no rows. (#703)
* fix: :class:`.TypeTester` warns if a column specified by the ``force`` argument is not in the table, instead of raising an error. (#747)
* fix: Aggregations return ``None`` if all values are ``None``, instead of raising an error. Note that ``Sum``, ``MaxLength`` and ``MaxPrecision`` continue to return ``0`` if all values are ``None``. (#706)
* fix: Ensure files are closed when errors occur. (#734)
* build: Make PyICU an optional dependency.
1.6.2 - March 10, 2021
----------------------
* feat: :meth:`.Date.__init__` and :meth:`.DateTime.__init__` accepts a ``locale`` keyword argument (e.g. :code:`en_US`) for parsing formatted dates. (#730)
* feat: :meth:`.Number.cast` casts ``True`` to ``1`` and ``False`` to ``0``. (#733)
* fix: :meth:`.utils.max_precision` ignores infinity when calculating precision. (#726)
* fix: :meth:`.Date.cast` catches ``OverflowError`` when type testing. (#720)
* Included examples in Python package. (#716)
1.6.1 - March 11, 2018
----------------------
* feat: :meth:`.Table.to_json` can use Decimal as keys. (#696)
* fix: :meth:`.Date.cast` and :meth:`.DateTime.cast` no longer parse non-date strings that contain date sub-strings as dates. (#705)
* docs: Link to tutorial now uses version through Sphinx to avoid bad links on future releases. (#682)
1.6.0 - February 28, 2017
-------------------------
This update should not cause any breaking changes, however, it is being classified as major release because the dependency on awesome-slugify, which is licensed with GPLv3, has been replaced with python-slugify, which is licensed with MIT.
* Suppress warning from babel about Time Zone expressions on Python 3.6. (#665)
* Reimplemented slugify with python-slugify instead of awesome-slugify. (#660)
* Slugify renaming of duplicate values is now consistent with :meth:`.Table.init`. (#615)
1.5.5 - December 29, 2016
-------------------------
* Added a "full outer join" example to the SQL section of the cookbook. (#658)
* Warnings are now more explicit when column names are missing. (#652)
* :meth:`.Date.cast` will no longer parse strings like :code:`05_leslie3d_base` as dates. (#653)
* :meth:`.Text.cast` will no longer strip leading or trailing whitespace. (#654)
* Fixed :code:`'NoneType' object has no attribute 'groupdict'` error in :meth:`.TimeDelta.cast`. (#656)
1.5.4 - December 27, 2016
-------------------------
* Cleaned up handling of warnings in tests.
* Blank column names are not treated as unspecified (letter names will be generated).
1.5.3 - December 26, 2016
-------------------------
This is a minor release that adds one feature: sequential joins (by row number). It also fixes several small bugs blocking a downstream release of csvkit.
* Fixed empty :class:`.Table` column names would be intialized as list instead of tuple.
* :meth:`.Table.join` can now join by row numbers—a sequential join.
* :meth:`.Table.join` now supports full outer joins via the ``full_outer`` keyword.
* :meth:`.Table.join` can now accept column indicies instead of column names.
* :meth:`.Table.from_csv` now buffers input files to prevent issues with using STDIN as an input.
1.5.2 - December 24, 2016
-------------------------
* Improved handling of non-ascii encoded CSV files under Python 2.
1.5.1 - December 23, 2016
-------------------------
This is a minor release fixing several small bugs that were blocking a downstream release of csvkit.
* Documented differing behavior of :class:`.MaxLength` under Python 2. (#649)
* agate is now tested against Python 3.6. (#650)
* Fix bug when :class:`.MaxLength` was called on an all-null column.
* Update extensions documentation to match new API. (#645)
* Fix bug in :class:`.Change` and :class:`.PercentChange` where ``0`` values could cause ``None`` to be returned incorrectly.
1.5.0 - November 16, 2016
-------------------------
This release adds SVG charting via the `leather `_ charting library. Charts methods have been added for both :class:`.Table` and :class:`.TableSet`. (The latter create lattice plots.) See the revised tutorial and new cookbook entries for examples. Leather is still an early library. Please `report any bugs `_.
Also in this release are a :class:`.Slugify` computation and a variety of small fixes and improvements.
The complete list of changes is as follows:
* Remove support for monkey-patching of extensions. (#594)
* :class:`.TableSet` methods which proxy :class:`.Table` methods now appear in the API docs. (#640)
* :class:`.Any` and :class:`.All` aggregations no longer behave differently for boolean data. (#636)
* :class:`.Any` and :class:`.All` aggregations now accept a single value as a test argument, in addition to a function.
* :class:`.Any` and :class:`.All` aggregations now require a test argument.
* Tables rendered by :meth:`.Table.print_table` are now GitHub Flavored Markdown (GFM) compatible. (#626)
* The agate tutorial has been converted to a Jupyter Notebook.
* :class:`.Table` now supports ``len`` as a proxy for ``len(table.rows)``.
* Simple SVG charting is now integrated via `leather `_.
* Added :class:`.First` computation. (#634)
* :meth:`.Table.print_table` now has a `max_precision` argument to limit Number precision. (#544)
* Slug computation now accepts an array of column names to merge. (#617)
* Cookbook: standardize column values with :class:`.Slugify` computation. (#613)
* Cookbook: slugify/standardize row and column names. (#612)
* Fixed condition that prevents integer row names to allow bools in :meth:`.Table.__init__`. (#627)
* :class:`.PercentChange` is now null-safe, returns None for null values. (#623)
* :class:`.Table` can now be iterated, yielding :class:`Row` instances. (Previously it was necessarily to iterate :code:`table.rows`.)
1.4.0 - May 26, 2016
--------------------
This release adds several new features, fixes numerous small bug-fixes, and improves performance for common use cases. There are some minor breaking changes, but few user are likely to encounter them. The most important changes in this release are:
1. There is now a :meth:`.TableSet.having` method, which behaves similarly to SQL's ``HAVING`` keyword.
2. :meth:`.Table.from_csv` is much faster. In particular, the type inference routines for parsing numbers have been optimized.
3. The :meth:`.Table.compute` method now accepts a ``replace`` keyword which allows new columns to replace existing columns "in place."" (As with all agate operations, a new table is still created.)
4. There is now a :class:`.Slug` computation which can be used to compute a column of slugs. The :meth:`.Table.rename` method has also added new options for slugifying column and row names.
The complete list of changes is as follows:
* Added a deprecation warning for ``patch`` methods. New extensions should not use it. (#594)
* Added :class:`.Slug` computation (#466)
* Added ``slug_columns`` and ``slug_rows`` arguments to :meth:`Table.rename`. (#466)
* Added :meth:`.utils.slugify` to standardize a sequence of strings. (#466)
* :meth:`.Table.__init__` now prints row and column on ``CastError``. (#593)
* Fix null sorting in :meth:`.Table.order_by` when ordering by multiple columns. (#607)
* Implemented configuration system.
* Fixed bug in :meth:`.Table.print_bars` when ``value_column`` contains ``None`` (#608)
* :meth:`.Table.print_table` now restricts header on max_column_width. (#605)
* Cookbook: filling gaps in a dataset with Table.homogenize. (#538)
* Reduced memory usage and improved performance of :meth:`.Table.from_csv`.
* :meth:`.Table.from_csv` no longer accepts a sequence of row ids for :code:`skip_lines`.
* :meth:`.Number.cast` is now three times as fast.
* :class:`.Number` now accepts :code:`group_symbol`, :code:`decimal_symbol` and :code:`currency_symbols` arguments. (#224)
* Tutorial: clean up state data under computing columns (#570)
* :meth:`.Table.__init__` now explicitly checks that ``row_names`` are not ints. (#322)
* Cookbook: CPI deflation, agate-lookup. (#559)
* :meth:`.Table.bins` now includes values outside ``start`` or ``end`` in computed ``column_names``. (#596)
* Fixed bug in :meth:`.Table.bins` where ``start`` or ``end`` arguments were ignored when specified alone. (#599)
* :meth:`.Table.compute` now accepts a :code:`replace` argument that allows columns to be overwritten. (#597)
* :meth:`.Table.from_fixed` now creates an agate table from a fixed-width file. (#358)
* :mod:`.fixed` now implements a general-purpose fixed-width file reader. (#358)
* :class:`TypeTester` now correctly parses negative currency values as Number. (#595)
* Cookbook: removing a column (`select` and `exclude`). (#592)
* Cookbook: overriding specific column types. (#591)
* :class:`.TableSet` now has a :meth:`.TableSet._fork` method used internally for deriving new tables.
* Added an example of SQL's :code:`HAVING` to the cookbook.
* :meth:`.Table.aggregate` interface has been revised to be more similar to :meth:`.TableSet.aggregate`.
* :meth:`.TableSet.having` is now implemented. (#587)
* There is now a better error when a forced column name does not exist. (#591)
* Arguments to :meth:`.Table.print_html` now mirror :meth:`.Table.print_table`.
1.3.1 - March 30, 2016
----------------------
The major feature of this release is new API documentation. Several minor features and bug fixes are also included. There are no major breaking changes in this release.
Internally, the agate codebase has been reorganized to be more modular, but this should be invisible to most users.
* The :class:`.MaxLength` aggregation now returns a `Decimal` object. (#574)
* Fixed an edge case where datetimes were parsed as dates. (#568)
* Fixed column alignment in tutorial tables. (#572)
* :meth:`.Table.print_table` now defaults to printing ``20`` rows and ``6`` columns. (#589)
* Added Eli Murray to AUTHORS.
* :meth:`.Table.__init__` now accepts a dict to specify partial column types. (#580)
* :meth:`.Table.from_csv` now accepts a ``skip_lines`` argument. (#581)
* Moved every :class:`.Aggregation` and :class:`.Computation` into their own modules. (#565)
* :class:`.Column` and :class:`.Row` are now importable from `agate`.
* Completely reorgnized the API documentation.
* Moved unit tests into modules to match new code organization.
* Moved major :class:`.Table` and :class:`.TableSet` methods into their own modules.
* Fixed bug when using non-unicode encodings with :meth:`.Table.from_csv`. (#560)
* :meth:`.Table.homogenize` now accepts an array of values as compare values if key is a single column name. (#539)
1.3.0 - February 28, 2016
-------------------------
This version implements several new features and includes two major breaking changes.
Please take note of the following breaking changes:
1. There is no longer a :code:`Length` aggregation. The more obvious :class:`.Count` is now used instead.
2. Agate's replacements for Python's CSV reader and writer have been moved to the :code:`agate.csv` namespace. To use as a drop-in replacement: :code:`from agate import csv`.
The major new features in this release are primarly related to transforming (reshaping) tables. They are:
1. :meth:`.Table.normalize` for converting columns to rows.
2. :meth:`.Table.denormalize` for converting rows to columns.
3. :meth:`.Table.pivot` for generating "crosstabs".
4. :meth:`.Table.homogenize` for filling gaps in data series.
Please see the following complete list of changes for a variety of other bug fixes and improvements.
* Moved CSV reader/writer to :code:`agate.csv` namespace.
* Added numerous new examples to the R section of the cookbook. (#529-#535)
* Updated Excel cookbook entry for pivot tables. (#536)
* Updated Excel cookbook entry for VLOOKUP. (#537)
* Fix number rendering in :meth:`.Table.print_table` on Windows. (#528)
* Added cookbook examples of using :meth:`.Table.pivot` to count frequency/distribution.
* :meth:`.Table.bins` now has smarter output column names. (#524)
* :meth:`.Table.bins` is now a wrapper around pivot. (#522)
* :meth:`.Table.counts` has been removed. Use :meth:`.Table.pivot` instead. (#508)
* :class:`.Count` can now count non-null values in a column.
* Removed :class:`.Length`. :class:`.Count` now works without any arguments. (#520)
* :meth:`.Table.pivot` implemented. (#495)
* :meth:`.Table.denormalize` implemented. (#493)
* Added ``columns`` argument to :meth:`Table.join`. (#479)
* Cookbook: Custom statistics/agate.Summary
* Added Kevin Schaul to AUTHORS.
* :meth:`Quantiles.locate` now correctly returns `Decimal` instances. (#509)
* Cookbook: Filter for distinct values of a column (#498)
* Added :meth:`.Column.values_distinct()` (#498)
* Cookbook: Fuzzy phonetic search example. (#207)
* Cookbook: Create a table from a remote file. (#473)
* Added ``printable`` argument to :meth:`.Table.print_bars` to use only printable characters. (#500)
* :class:`.MappedSequence` now throws an explicit error on __setitem__. (#499)
* Added ``require_match`` argument to :meth:`.Table.join`. (#480)
* Cookbook: Rename columns in a table. (#469)
* :meth:`.Table.normalize` implemented. (#487)
* Added :class:`.Percent` computation with example in Cookbook. (#490)
* Added Ben Welsh to AUTHORS.
* :meth:`.Table.__init__` now throws a warning if auto-generated columns are used. (#483)
* :meth:`.Table.__init__` no longer fails on duplicate columns. Instead it renames them and throws a warning. (#484)
* :meth:`.Table.merge` now takes a ``column_names`` argument to specify columns included in new table. (#481)
* :meth:`.Table.select` now accepts a single column name as a key.
* :meth:`.Table.exclude` now accepts a single column name as a key.
* Added :meth:`.Table.homogenize` to find gaps in a table and fill them with default rows. (#407)
* :meth:`.Table.distinct` now accepts sequences of column names as a key.
* :meth:`.Table.join` now accepts sequences of column names as either a left or right key. (#475)
* :meth:`.Table.order_by` now accepts a sequence of column names as a key.
* :meth:`.Table.distinct` now accepts a sequence of column names as a key.
* :meth:`.Table.join` now accepts a sequence of column names as either a left or right key. (#475)
* Cookbook: Create a table from a DBF file. (#472)
* Cookbook: Create a table from an Excel spreadsheet.
* Added explicit error if a filename is passed to the :class:`.Table` constructor. (#438)
1.2.2 - February 5, 2016
------------------------
This release adds several minor features. The only breaking change is that default column names will now be lowercase instead of uppercase. If you depended on these names in your scripts you will need to update them accordingly.
* :class:`.TypeTester` no longer takes a ``locale`` argument. Use ``types`` instead.
* :class:`.TypeTester` now takes a ``types`` argument that is a list of possible types to test. (#461)
* Null conversion can now be disabled for :class:`.Text` by passing ``cast_nulls=False``. (#460)
* Default column names are now lowercase letters instead of uppercase. (#464)
* :meth:`.Table.merge` can now merge tables with different columns or columns in a different order. (#465)
* :meth:`.MappedSequence.get` will no longer raise ``KeyError`` if a default is not provided. (#467)
* :class:`.Number` can now test/cast the ``long`` type on Python 2.
1.2.1 - February 5, 2016
------------------------
This release implements several new features and bug fixes. There are no significant breaking changes.
Special thanks to `Neil Bedi `_ for his extensive contributions to this release.
* Added a ``max_column_width`` argument to :meth:`.Table.print_table`. Defaults to ``20``. (#442)
* :meth:`.Table.from_json` now defers most functionality to :meth:`.Table.from_object`.
* Implemented :meth:`.Table.from_object` for parsing JSON-like Python objects.
* Fixed a bug that prevented :meth:`.Table.order_by` on empty table. (#454)
* :meth:`.Table.from_json` and :meth:`TableSet.from_json` now have ``column_types`` as an optional argument. (#451)
* :class:`.csv.Reader` now has ``line_numbers`` and ``header`` options to add column for line numbers (#447)
* Renamed ``maxfieldsize`` to ``field_size_limit`` in :class:`.csv.Reader` for consistency (#447)
* :meth:`.Table.from_csv` now has a ``sniff_limit`` option to use :class:`.csv.Sniffer` (#444)
* :class:`.csv.Sniffer` implemented. (#444)
* :meth:`.Table.__init__` no longer fails on empty rows. (#445)
* :meth:`.TableSet.from_json` implemented. (#373)
* Fixed a bug that breaks :meth:`TypeTester.run` on variable row length. (#440)
* Added :meth:`.TableSet.__str__` to display :class:`.Table` keys and row counts. (#418)
* Fixed a bug that incorrectly checked for column_types equivalence in :meth:`.Table.merge` and :meth:`.TableSet.__init__`. (#435)
* :meth:`.TableSet.merge` now has the ability to specify grouping factors with ``group``, ``group_name`` and ``group_type``. (#406)
* :class:`.Table` can now be constructed with ``None`` for some column names. Those columns will receive letter names. (#432)
* Slightly changed the parsing of dates and datetimes from strings.
* Numbers are now written to CSV without extra zeros after the decimal point. (#429)
* Made it possible for ``datetime.date`` instances to be considered valid :class:`.DateTime` inputs. (#427)
* Changed preference order in type testing so :class:`.Date` is preferred to :class:`.DateTime`.
* Removed ``float_precision`` argument from :class:`.Number`. (#428)
* :class:`.AgateTestCase` is now available as ``agate.AgateTestCase``. (#426)
* :meth:`.TableSet.to_json` now has an ``indent`` option for use with ``nested``.
* :meth:`.TableSet.to_json` now has a ``nested`` option for writing a single, nested JSON file. (#417)
* :meth:`.TestCase.assertRowNames` and :meth:`.TestCase.assertColumnNames` now validate the row and column instance keys.
* Fixed a bug that prevented :meth:`.Table.rename` from renaming column names in :class:`.Row` instances. (#423)
1.2.0 - January 18, 2016
------------------------
This version introduces one breaking change, which is only relevant if you are using custom :class:`.Computation` subclasses.
1. :class:`.Computation` has been modified so that :meth:`.Computation.run` takes a :class:`.Table` instance as its argument, rather than a single row. It must return a sequence of values to use for a new column. In addition, the :meth:`.Computation._prepare` method has been renamed to :meth:`.Computation.validate` to more accurately describe it's function. These changes were made to facilitate computing moving averages, streaks and other values that require data for the full column.
* Existing :class:`.Aggregation` subclasses have been updated to use :meth:`.Aggregate.validate`. (This brings a noticeable performance boost.)
* :class:`.Aggregation` now has a :meth:`.Aggregation.validate` method that functions identically to :meth:`.Computation.validate`. (#421)
* :meth:`.Change.validate` now correctly raises :class:`.DataTypeError`.
* Added a ``SimpleMovingAverage`` implementation to the cookbook's examples of custom :class:`.Computation` classes.
* :meth:`.Computation._prepare` has been renamed to :meth:`.Computation.validate`.
* :meth:`.Computation.run` now takes a :class:`.Table` instance as an argument. (#415)
* Fix a bug in Python 2 where printing a table could raise ``decimal.InvalidOperation``. (#412)
* Fix :class:`.Rank` so it returns Decimal. (#411)
* Added Taurus Olson to AUTHORS.
* Printing a table will now print the table's structure.
* :meth:`.Table.print_structure` implemented. (#393)
* Added Geoffrey Hing to AUTHORS.
* :meth:`.Table.print_html` implemented. (#408)
* Instances of :class:`.Date` and :class:`.DateTime` can now be pickled. (#362)
* :class:`.AgateTestCase` is available as ``agate.testcase.AgateTestCase`` for extensions to use. (#384)
* :meth:`.Table.exclude` implemented. Opposite of :meth:`.Table.select`. (#388)
* :meth:`.Table.merge` now accepts a ``row_names`` argument. (#403)
* :class:`.Formula` now automatically casts computed values to specified data type unless ``cast`` is set to ``False``. (#398)
* Added Neil Bedi to AUTHORS.
* :meth:`.Table.rename` is implemented. (#389)
* :meth:`.TableSet.to_json` is implemented. (#374)
* :meth:`.Table.to_csv` and :meth:`.Table.to_json` will now create the target directory if it does not exist. (#392)
* :class:`.Boolean` will now correctly cast numerical ``0`` and ``1``. (#386)
* :meth:`.Table.merge` now consistently maps column names to rows. (#402)
1.1.0 - November 4, 2015
------------------------
This version of agate introduces three major changes.
1. :class:`.Table`, :meth:`.Table.from_csv` and :meth:`.TableSet.from_csv` now all take ``column_names`` and ``column_types`` as separate arguments instead of as a sequence of tuples. This was done to enable more flexible type inference and to streamline the API.
2. The interfaces for :meth:`.TableSet.aggregate` and :meth:`.Table.compute` have been changed. In both cases the new column name now comes first. Aggregations have also been modified so that the input column name is an argument to the aggregation class, rather than a third element in the tuple.
3. This version drops support for Python 2.6. Testing and bug-fixing for this version was taking substantial time with no evidence that anyone was actually using it. Also, multiple dependencies claim to not support 2.6, even though agate's tests were passing.
* DataType's now have :meth:`.DataType.csvify` and :meth:`.DataType.jsonify` methods for serializing native values.
* Added a dependency on `isodate `_ for handling ISO8601 formatted dates. (#233)
* :class:`.Aggregation` results are no longer cached. (#378)
* Removed `Column.aggregate` method. Use :meth:`.Table.aggregate` instead. (#378)
* Added :meth:`.Table.aggregate` for aggregating single column results. (#378)
* :class:`.Aggregation` subclasses now take column names as their first argument. (#378)
* :meth:`.TableSet.aggregate` and :meth:`.Table.compute` now take the new column name as the first argument. (#378)
* Remove support for Python 2.6.
* :meth:`.Table.to_json` is implemented. (#345)
* :meth:`.Table.from_json` is implemented. (#344, #347)
* :class:`.Date` and :class:`.DateTime` type testing now takes specified format into account. (#361)
* :class:`.Number` data type now takes a ``float_precision`` argument.
* :class:`.Number` data types now work with native float values. (#370)
* :class:`.TypeTester` can now validate Python native types (not just strings). (#367)
* :class:`.TypeTester` can now be used with the :class:`.Table` constructor, not just :meth:`.Table.from_csv`. (#350)
* :class:`.Table`, :meth:`.Table.from_csv` and :meth:`.TableSet.from_csv` now take ``column_names`` and ``column_types`` as separate parameters. (#350)
* :const:`.DEFAULT_NULL_VALUES` (the list of strings that mean null) is now importable from ``agate``.
* :meth:`.Table.from_csv` and :meth:`.Table.to_csv` are now unicode-safe without separately importing csvkit.
* ``agate`` can now be used as a drop-in replacement for Python's ``csv`` module.
* Migrated `csvkit `_'s unicode CSV reading/writing support into agate. (#354)
1.0.1 - October 29, 2015
------------------------
* TypeTester now takes a "limit" arg that restricts how many rows it tests. (#332)
* Table.from_csv now supports CSVs with neither headers nor manual column names.
* Tables can now be created with automatically generated column names. (#331)
* File handles passed to Table.to_csv are now left open. (#330)
* Added Table.print_csv method. (#307, #339)
* Fixed stripping currency symbols when casting Numbers from strings. (#333)
* Fixed two major join issues. (#336)
1.0.0 - October 22, 2015
------------------------
* Table.from_csv now defaults to TypeTester() if column_info is not provided. (#324)
* New tutorial section: "Navigating table data" (#315)
* 100% test coverage reached. (#312)
* NullCalculationError is now a warning instead of an error. (#311)
* TableSet is now a subclass of MappedSequence.
* Rows and Columns are now subclasses of MappedSequence.
* Add Column.values_without_nulls_sorted().
* Column.get_data_without_nulls() is now Column.values_without_nulls().
* Column.get_data_sorted() is now Column.values_sorted().
* Column.get_data() is now Column.values().
* Columns can now be sliced.
* Columns can now be indexed by row name. (#301)
* Added support for Python 3.5.
* Row objects can now be sliced. (#303)
* Replaced RowSequence and ColumnSequence with MappedSequence.
* Replace RowDoesNotExistError with KeyError.
* Replaced ColumnDoesNotExistError with IndexError.
* Removed unnecessary custom RowIterator, ColumnIterator and CellIterator.
* Performance improvements for Table "forks". (where, limit, etc)
* TableSet keys are now converted to row names during aggregation. (#291)
* Removed fancy __repr__ implementations. Use __str__ instead. (#290)
* Rows can now be accessed by name as well as index. (#282)
* Added row_names argument to Table constructor. (#282)
* Removed Row.table and Row.index properties. (#287)
* Columns can now be accessed by index as well as name. (#281)
* Added column name and type validation to Table constructor. (#285)
* Table now supports variable-length rows during construction. (#39)
* aggregations.Summary implemented for generic aggregations. (#181)
* Fix TableSet.key_type being lost after proxying Table methods. (#278)
* Massive performance increases for joins. (#277)
* Added join benchmark. (#73)
0.11.0 - October 6, 2015
------------------------
* Implemented __repr__ for Table, TableSet, Column and Row. (#261)
* Row.index property added.
* Column constructor no longer takes a data_type argument.
* Column.index and Column.name properties added.
* Table.counts implemented. (#271)
* Table.bins implemented. (#267, #227)
* Table.join now raises ColumnDoesNotExistError. (#264)
* Table.select now raises ColumnDoesNotExistError.
* computations.ZScores moved into agate-stats.
* computations.Rank cmp argument renamed comparer.
* aggregations.MaxPrecision added. (#265)
* Table.print_bars added.
* Table.pretty_print renamed Table.print_table.
* Reimplement Table method proxying via @allow_tableset_proxy decorator. (#263)
* Add agate-stats references to docs.
* Move stdev_outliers, mad_outliers and pearson_correlation into agate-stats. (#260)
* Prevent issues with applying patches multiple times. (#258)
0.10.0 - September 22, 2015
---------------------------
* Add reverse and cmp arguments to Rank computation. (#248)
* Document how to use agate-sql to read/write SQL tables. (#238, #241)
* Document how to write extensions.
* Add monkeypatching extensibility pattern via utils.Patchable.
* Reversed order of argument pairs for Table.compute. (#249)
* TableSet.merge method can be used to ungroup data. (#253)
* Columns with identical names are now suffixed "2" after a Table.join.
* Duplicate key columns are no longer included in the result of a Table.join. (#250)
* Table.join right_key no longer necessary if identical to left_key. (#254)
* Table.inner_join is now more. Use `inner` keyword to Table.join.
* Table.left_outer_join is now Table.join.
0.9.0 - September 14, 2015
--------------------------
* Add many missing unit tests. Up to 99% coverage.
* Add property accessors for TableSet.key_name and TableSet.key_type. (#247)
* Table.rows and Table.columns are now behind properties. (#247)
* Column.data_type is now a property. (#247)
* Table[Set].get_column_types() is now the Table[Set].column_types property. (#247)
* Table[Set].get_column_names() is now the Table[Set].column_names property. (#247)
* Table.pretty_print now displays consistent decimal places for each Number column.
* Discrete data types (Number, Date etc) are now right-aligned in Table.pretty_print.
* Implement aggregation result caching. (#245)
* Reimplement Percentiles, Quartiles, etc as aggregations.
* UnsupportedAggregationError is now used to disable TableSet aggregations.
* Replaced several exceptions with more general DataTypeError.
* Column type information can now be accessed as Column.data_type.
* Eliminated Column subclasses. Restructured around DataType classes.
* Table.merge implemented. (#9)
* Cookbook: guess column types. (#230)
* Fix issue where all group keys were being cast to text. (#235)
* Table.group_by will now default key_type to the type of the grouping column. (#234)
* Add Matt Riggott to AUTHORS. (#231)
* Support file-like objects in Table.to_csv and Table.from_csv. (#229)
* Fix bug when applying multiple computations with Table.compute.
0.8.0 - September 9, 2015
-------------------------
* Cookbook: dealing with locales. (#220)
* Cookbook: working with dates and times.
* Add timezone support to DateTimeType.
* Use pytimeparse instead of python-dateutil. (#221)
* Handle percents and currency symbols when casting numbers. (#217)
* Table.format is now Table.pretty_print. (#223)
* Rename TextType to Text, NumberType to Number, etc.
* Rename agate.ColumnType to agate.DataType (#216)
* Rename agate.column_types to agate.data_types.
* Implement locale support for number parsing. (#116)
* Cookbook: ranking. (#110)
* Cookbook: date change and date ranking. (#113)
* Add tests for unicode support. (#138)
* Fix computations.ZScores calculation. (#123)
* Differentiate sample and population variance and stdev. (#208)
* Support for overriding column inference with "force".
* Competition ranking implemented as default. (#125)
* TypeTester: robust type inference. (#210)
0.7.0 - September 3, 2015
-------------------------
* Cookbook: USA Today diversity index.
* Cookbook: filter to top x%. (#47)
* Cookbook: fuzzy string search example. (#176)
* Values to coerce to true/false can now be overridden for BooleanType.
* Values to coerce to null can now be overridden for all ColumnType subclasses. (#206)
* Add key_type argument to TableSet and Table.group_by. (#205)
* Nested TableSet's and multi-dimensional aggregates. (#204)
* TableSet.aggregate will now use key_name as the group column name. (#203)
* Added key_name argument to TableSet and Table.group_by.
* Added Length aggregation and removed count from TableSet.aggregate output. (#203)
* Fix error messages for RowDoesNotExistError and ColumnDoesNotExistError.
0.6.0 - September 1, 2015
-------------------------
* Fix missing package definition in setup.py.
* Split Analysis off into the proof library.
* Change computation now works with DateType, DateTimeType and TimeDeltaType. (#159)
* TimeDeltaType and TimeDeltaColumn implemented.
* NonNullAggregation class removed.
* Some private Column methods made public. (#183)
* Rename agate.aggegators to agate.aggregations.
* TableSet.to_csv implemented. (#195)
* TableSet.from_csv implemented. (#194)
* Table.to_csv implemented (#169)
* Table.from_csv implemented. (#168)
* Added Table.format method for pretty-printing tables. (#191)
* Analysis class now implements a caching workflow. (#171)
0.5.0 - August 28, 2015
-----------------------
* Table now takes (column_name, column_type) pairs. (#180)
* Renamed the library to agate. (#179)
* Results of common column operations are now cached using a common memoize decorator. (#162)
* ated support for Python version 3.2.
* Added support for Python wheel packaging. (#127)
* Add PercentileRank computation and usage example to cookbook. (#152)
* Add indexed change example to cookbook. (#151)
* Add annual change example to cookbook. (#150)
* Column.aggregate now invokes Aggregations.
* Column.any, NumberColumn.sum, etc. converted to Aggregations.
* Implement Aggregation and subclasses. (#155)
* Move ColumnType subclasses and ColumnOperation subclasses into new modules.
* Table.percent_change, Table.rank and Table.zscores reimplemented as Computers.
* Computer implemented. Table.compute reimplemented. (#147)
* NumberColumn.iqr (inter-quartile range) implemented. (#102)
* Remove Column.counts as it is not the best way.
* Implement ColumnOperation and subclasses.
* Table.aggregate migrated to TableSet.aggregate.
* Table.group_by now supports grouping by a key function. (#140)
* NumberColumn.deciles implemented.
* NumberColumn.quintiles implemented. (#46)
* NumberColumn.quartiles implemented. (#45)
* Added robust test case for NumberColumn.percentiles. (#129)
* NumberColumn.percentiles reimplemented using new method. (#130)
* Reorganized and modularized column implementations.
* Table.group_by now returns a TableSet.
* Implement TableSet object. (#141)
0.4.0 - September 27, 2014
--------------------------
* Upgrade to python-dateutil 2.2. (#134)
* Wrote introductory tutorial. (#133)
* Reorganize documentation (#132)
* Add John Heasly to AUTHORS.
* Implement percentile. (#35)
* no_null_computations now accepts args. (#122)
* Table.z_scores implemented. (#123)
* DateTimeColumn implemented. (#23)
* Column.counts now returns dict instead of Table. (#109)
* ColumnType.create_column renamed _create_column. (#118)
* Added Mick O'Brien to AUTHORS. (#121)
* Pearson correlation implemented. (#103)
0.3.0
-----
* DateType.date_format implemented. (#112)
* Create ColumnType classes to simplify data parsing.
* DateColumn implemented. (#7)
* Cookbook: Excel pivot tables. (#41)
* Cookbook: statistics, including outlier detection. (#82)
* Cookbook: emulating Underscore's any and all. (#107)
* Parameter documention for method parameters. (#108)
* Table.rank now accepts a column name or key function.
* Optionally use cdecimal for improved performance. (#106)
* Smart naming of aggregate columns.
* Duplicate columns names are now an error. (#92)
* BooleanColumn implemented. (#6)
* TextColumn.max_length implemented. (#95)
* Table.find implemented. (#14)
* Better error handling in Table.__init__. (#38)
* Collapse IntColumn and FloatColumn into NumberColumn. (#64)
* Table.mad_outliers implemented. (#93)
* Column.mad implemented. (#93)
* Table.stdev_outliers implemented. (#86)
* Table.group_by implemented. (#3)
* Cookbook: emulating R. (#81)
* Table.left_outer_join now accepts column names or key functions. (#80)
* Table.inner_join now accepts column names or key functions. (#80)
* Table.distinct now accepts a column name or key function. (#80)
* Table.order_by now accepts a column name or key function. (#80)
* Table.rank implemented. (#15)
* Reached 100% test coverage. (#76)
* Tests for Column._cast methods. (#20)
* Table.distinct implemented. (#83)
* Use assertSequenceEqual in tests. (#84)
* Docs: features section. (#87)
* Cookbook: emulating SQL. (#79)
* Table.left_outer_join implemented. (#11)
* Table.inner_join implemented. (#11)
0.2.0
-----
* Python 3.2, 3.3 and 3.4 support. (#52)
* Documented supported platforms.
* Cookbook: csvkit. (#36)
* Cookbook: glob syntax. (#28)
* Cookbook: filter to values in range. (#30)
* RowDoesNotExistError implemented. (#70)
* ColumnDoesNotExistError implemented. (#71)
* Cookbook: percent change. (#67)
* Cookbook: sampleing. (#59)
* Cookbook: random sort order. (#68)
* Eliminate Table.get_data.
* Use tuples everywhere. (#66)
* Fixes for Python 2.6 compatibility. (#53)
* Cookbook: multi-column sorting. (#13)
* Cookbook: simple sorting.
* Destructive Table ops now deepcopy row data. (#63)
* Non-destructive Table ops now share row data. (#63)
* Table.sort_by now accepts a function. (#65)
* Cookbook: pygal.
* Cookbook: Matplotlib.
* Cookbook: VLOOKUP. (#40)
* Cookbook: Excel formulas. (#44)
* Cookbook: Rounding to two decimal places. (#49)
* Better repr for Column and Row. (#56)
* Cookbook: Filter by regex. (#27)
* Cookbook: Underscore filter & reject. (#57)
* Table.limit implemented. (#58)
* Cookbook: writing a CSV. (#51)
* Kill Table.filter and Table.reject. (#55)
* Column.map removed. (#43)
* Column instance & data caching implemented. (#42)
* Table.select implemented. (#32)
* Eliminate repeated column index lookups. (#25)
* Precise DecimalColumn tests.
* Use Decimal type everywhere internally.
* FloatColumn converted to DecimalColumn. (#17)
* Added Eric Sagara to AUTHORS. (#48)
* NumberColumn.variance implemented. (#1)
* Cookbook: loading a CSV. (#37)
* Table.percent_change implemented. (#16)
* Table.compute implemented. (#31)
* Table.filter and Table.reject now take funcs. (#24)
* Column.count implemented. (#12)
* Column.counts implemented. (#8)
* Column.all implemented. (#5)
* Column.any implemented. (#4)
* Added Jeff Larson to AUTHORS. (#18)
* NumberColumn.mode implmented. (#18)
0.1.0
-----
* Initial prototype
agate-1.6.3/COPYING 0000664 0000000 0000000 00000002113 14074061410 0013627 0 ustar 00root root 0000000 0000000 The MIT License
Copyright (c) 2017 Christopher Groskopf and contributors
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
agate-1.6.3/MANIFEST.in 0000664 0000000 0000000 00000000615 14074061410 0014337 0 ustar 00root root 0000000 0000000 include *.ipynb
include *.py
include *.rst
include COPYING
recursive-include benchmarks *.py
recursive-include docs *.py
recursive-include docs *.rst
recursive-include docs *.svg
recursive-include docs Makefile
recursive-include examples *.csv
recursive-include examples *.json
recursive-include examples testfixed
recursive-include tests *.py
exclude .pre-commit-config.yaml
global-exclude *.pyc
agate-1.6.3/README.rst 0000664 0000000 0000000 00000001627 14074061410 0014274 0 ustar 00root root 0000000 0000000 .. image:: https://travis-ci.org/wireservice/agate.png
:target: https://travis-ci.org/wireservice/agate
:alt: Build status
.. image:: https://img.shields.io/pypi/v/agate.svg
:target: https://pypi.python.org/pypi/agate
:alt: Version
.. image:: https://img.shields.io/pypi/l/agate.svg
:target: https://pypi.python.org/pypi/agate
:alt: License
.. image:: https://img.shields.io/pypi/pyversions/agate.svg
:target: https://pypi.python.org/pypi/agate
:alt: Support Python versions
agate is a Python data analysis library that is optimized for humans instead of machines. It is an alternative to numpy and pandas that solves real-world problems with readable code.
agate was previously known as journalism.
Important links:
* Documentation: http://agate.rtfd.org
* Repository: https://github.com/wireservice/agate
* Issues: https://github.com/wireservice/agate/issues
agate-1.6.3/agate/ 0000775 0000000 0000000 00000000000 14074061410 0013660 5 ustar 00root root 0000000 0000000 agate-1.6.3/agate/__init__.py 0000664 0000000 0000000 00000001365 14074061410 0015776 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
import six
from agate.aggregations import *
from agate.columns import Column
from agate.computations import *
from agate.config import get_option, set_option, set_options
from agate.data_types import *
from agate.exceptions import *
# import agate.fixed as fixed
from agate.mapped_sequence import MappedSequence
from agate.rows import Row
from agate.table import Table
from agate.tableset import TableSet
from agate.testcase import AgateTestCase
from agate.type_tester import TypeTester
from agate.utils import *
from agate.warns import DuplicateColumnWarning, NullCalculationWarning, warn_duplicate_column, warn_null_calculation
if six.PY2: # pragma: no cover
import agate.csv_py2 as csv
else:
import agate.csv_py3 as csv
agate-1.6.3/agate/aggregations/ 0000775 0000000 0000000 00000000000 14074061410 0016332 5 ustar 00root root 0000000 0000000 agate-1.6.3/agate/aggregations/__init__.py 0000664 0000000 0000000 00000003403 14074061410 0020443 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
"""
Aggregations create a new value by summarizing a :class:`.Column`. For
example, :class:`.Mean`, when applied to a column containing :class:`.Number`
data, returns a single :class:`decimal.Decimal` value which is the average of
all values in that column.
Aggregations can be applied to single columns using the :meth:`.Table.aggregate`
method. The result is a single value if a one aggregation was applied, or
a tuple of values if a sequence of aggregations was applied.
Aggregations can be applied to instances of :class:`.TableSet` using the
:meth:`.TableSet.aggregate` method. The result is a new :class:`.Table`
with a column for each aggregation and a row for each table in the set.
"""
from agate.aggregations.all import All
from agate.aggregations.any import Any
from agate.aggregations.base import Aggregation
from agate.aggregations.count import Count
from agate.aggregations.deciles import Deciles
from agate.aggregations.first import First
from agate.aggregations.has_nulls import HasNulls
from agate.aggregations.iqr import IQR
from agate.aggregations.mad import MAD
from agate.aggregations.max import Max
from agate.aggregations.max_length import MaxLength
from agate.aggregations.max_precision import MaxPrecision
from agate.aggregations.mean import Mean
from agate.aggregations.median import Median
from agate.aggregations.min import Min
from agate.aggregations.mode import Mode
from agate.aggregations.percentiles import Percentiles
from agate.aggregations.quartiles import Quartiles
from agate.aggregations.quintiles import Quintiles
from agate.aggregations.stdev import PopulationStDev, StDev
from agate.aggregations.sum import Sum
from agate.aggregations.summary import Summary
from agate.aggregations.variance import PopulationVariance, Variance
agate-1.6.3/agate/aggregations/all.py 0000664 0000000 0000000 00000002013 14074061410 0017450 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
from agate.aggregations.base import Aggregation
from agate.data_types import Boolean
class All(Aggregation):
"""
Check if all values in a column pass a test.
:param column_name:
The name of the column to check.
:param test:
Either a single value that all values in the column are compared against
(for equality) or a function that takes a column value and returns
`True` or `False`.
"""
def __init__(self, column_name, test):
self._column_name = column_name
if callable(test):
self._test = test
else:
self._test = lambda d: d == test
def get_aggregate_data_type(self, table):
return Boolean()
def validate(self, table):
table.columns[self._column_name]
def run(self, table):
"""
:returns:
:class:`bool`
"""
column = table.columns[self._column_name]
data = column.values()
return all(self._test(d) for d in data)
agate-1.6.3/agate/aggregations/any.py 0000664 0000000 0000000 00000001710 14074061410 0017472 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
from agate.aggregations.base import Aggregation
from agate.data_types import Boolean
class Any(Aggregation):
"""
Check if any value in a column passes a test.
:param column_name:
The name of the column to check.
:param test:
Either a single value that all values in the column are compared against
(for equality) or a function that takes a column value and returns
`True` or `False`.
"""
def __init__(self, column_name, test):
self._column_name = column_name
if callable(test):
self._test = test
else:
self._test = lambda d: d == test
def get_aggregate_data_type(self, table):
return Boolean()
def validate(self, table):
table.columns[self._column_name]
def run(self, table):
column = table.columns[self._column_name]
data = column.values()
return any(self._test(d) for d in data)
agate-1.6.3/agate/aggregations/base.py 0000664 0000000 0000000 00000003251 14074061410 0017617 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
import six
from agate.exceptions import UnsupportedAggregationError
@six.python_2_unicode_compatible
class Aggregation(object): # pragma: no cover
"""
Aggregations create a new value by summarizing a :class:`.Column`.
Aggregations are applied with :meth:`.Table.aggregate` and
:meth:`.TableSet.aggregate`.
When creating a custom aggregation, ensure that the values returned by
:meth:`.Aggregation.run` are of the type specified by
:meth:`.Aggregation.get_aggregate_data_type`. This can be ensured by using
the :meth:`.DataType.cast` method. See :class:`.Summary` for an example.
"""
def __str__(self):
"""
String representation of this column. May be used as a column name in
generated tables.
"""
return self.__class__.__name__
def get_aggregate_data_type(self, table):
"""
Get the data type that should be used when using this aggregation with
a :class:`.TableSet` to produce a new column.
Should raise :class:`.UnsupportedAggregationError` if this column does
not support aggregation into a :class:`.TableSet`. (For example, if it
does not return a single value.)
"""
raise UnsupportedAggregationError()
def validate(self, table):
"""
Perform any checks necessary to verify this aggregation can run on the
provided table without errors. This is called by
:meth:`.Table.aggregate` before :meth:`run`.
"""
pass
def run(self, table):
"""
Execute this aggregation on a given column and return the result.
"""
raise NotImplementedError()
agate-1.6.3/agate/aggregations/count.py 0000664 0000000 0000000 00000002423 14074061410 0020035 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
from agate.aggregations.base import Aggregation
from agate.data_types import Number
from agate.utils import default
class Count(Aggregation):
"""
Count occurences of a value or values.
This aggregation can be used in three ways:
1. If no arguments are specified, then it will count the number of rows in the table.
2. If only :code:`column_name` is specified, then it will count the number of non-null values in that column.
3. If both :code:`column_name` and :code:`value` are specified, then it will count occurrences of a specific value.
:param column_name:
The column containing the values to be counted.
:param value:
Any value to be counted, including :code:`None`.
"""
def __init__(self, column_name=None, value=default):
self._column_name = column_name
self._value = value
def get_aggregate_data_type(self, table):
return Number()
def run(self, table):
if self._column_name is not None:
if self._value is not default:
return table.columns[self._column_name].values().count(self._value)
else:
return len(table.columns[self._column_name].values_without_nulls())
else:
return len(table.rows)
agate-1.6.3/agate/aggregations/deciles.py 0000664 0000000 0000000 00000002777 14074061410 0020331 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
from agate.aggregations.base import Aggregation
from agate.aggregations.has_nulls import HasNulls
from agate.aggregations.percentiles import Percentiles
from agate.data_types import Number
from agate.exceptions import DataTypeError
from agate.utils import Quantiles
from agate.warns import warn_null_calculation
class Deciles(Aggregation):
"""
Calculate the deciles of a column based on its percentiles.
Deciles will be equivalent to the 10th, 20th ... 90th percentiles.
"Zeroth" (min value) and "Tenth" (max value) deciles are included for
reference and intuitive indexing.
See :class:`Percentiles` for implementation details.
This aggregation can not be applied to a :class:`.TableSet`.
:param column_name:
The name of a column containing :class:`.Number` data.
"""
def __init__(self, column_name):
self._column_name = column_name
def validate(self, table):
column = table.columns[self._column_name]
if not isinstance(column.data_type, Number):
raise DataTypeError('Deciles can only be applied to columns containing Number data.')
has_nulls = HasNulls(self._column_name).run(table)
if has_nulls:
warn_null_calculation(self, column)
def run(self, table):
"""
:returns:
An instance of :class:`Quantiles`.
"""
percentiles = Percentiles(self._column_name).run(table)
return Quantiles([percentiles[i] for i in range(0, 101, 10)])
agate-1.6.3/agate/aggregations/first.py 0000664 0000000 0000000 00000002337 14074061410 0020040 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
from agate.aggregations.base import Aggregation
class First(Aggregation):
"""
Returns the first value that passes a test.
If the test is omitted, the aggregation will return the first value in the column.
If no values pass the test, the aggregation will raise an exception.
:param column_name:
The name of the column to check.
:param test:
A function that takes a value and returns `True` or `False`. Test may be
omitted when checking :class:`.Boolean` data.
"""
def __init__(self, column_name, test=None):
self._column_name = column_name
self._test = test
def get_aggregate_data_type(self, table):
return table.columns[self._column_name].data_type
def validate(self, table):
column = table.columns[self._column_name]
data = column.values()
if self._test is not None and len([d for d in data if self._test(d)]) == 0:
raise ValueError('No values pass the given test.')
def run(self, table):
column = table.columns[self._column_name]
data = column.values()
if self._test is None:
return data[0]
return next((d for d in data if self._test(d)))
agate-1.6.3/agate/aggregations/has_nulls.py 0000664 0000000 0000000 00000000774 14074061410 0020704 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
from agate.aggregations.base import Aggregation
from agate.data_types import Boolean
class HasNulls(Aggregation):
"""
Check if the column contains null values.
:param column_name:
The name of the column to check.
"""
def __init__(self, column_name):
self._column_name = column_name
def get_aggregate_data_type(self, table):
return Boolean()
def run(self, table):
return None in table.columns[self._column_name].values()
agate-1.6.3/agate/aggregations/iqr.py 0000664 0000000 0000000 00000002340 14074061410 0017476 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
from agate.aggregations.base import Aggregation
from agate.aggregations.has_nulls import HasNulls
from agate.aggregations.percentiles import Percentiles
from agate.data_types import Number
from agate.exceptions import DataTypeError
from agate.warns import warn_null_calculation
class IQR(Aggregation):
"""
Calculate the interquartile range of a column.
:param column_name:
The name of a column containing :class:`.Number` data.
"""
def __init__(self, column_name):
self._column_name = column_name
self._percentiles = Percentiles(column_name)
def get_aggregate_data_type(self, table):
return Number()
def validate(self, table):
column = table.columns[self._column_name]
if not isinstance(column.data_type, Number):
raise DataTypeError('IQR can only be applied to columns containing Number data.')
has_nulls = HasNulls(self._column_name).run(table)
if has_nulls:
warn_null_calculation(self, column)
def run(self, table):
percentiles = self._percentiles.run(table)
if percentiles[75] is not None and percentiles[25] is not None:
return percentiles[75] - percentiles[25]
agate-1.6.3/agate/aggregations/mad.py 0000664 0000000 0000000 00000002533 14074061410 0017450 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
from agate.aggregations.base import Aggregation
from agate.aggregations.has_nulls import HasNulls
from agate.aggregations.median import Median
from agate.data_types import Number
from agate.exceptions import DataTypeError
from agate.utils import median
from agate.warns import warn_null_calculation
class MAD(Aggregation):
"""
Calculate the `median absolute deviation `_
of a column.
:param column_name:
The name of a column containing :class:`.Number` data.
"""
def __init__(self, column_name):
self._column_name = column_name
self._median = Median(column_name)
def get_aggregate_data_type(self, table):
return Number()
def validate(self, table):
column = table.columns[self._column_name]
if not isinstance(column.data_type, Number):
raise DataTypeError('MAD can only be applied to columns containing Number data.')
has_nulls = HasNulls(self._column_name).run(table)
if has_nulls:
warn_null_calculation(self, column)
def run(self, table):
column = table.columns[self._column_name]
data = column.values_without_nulls_sorted()
if data:
m = self._median.run(table)
return median(tuple(abs(n - m) for n in data))
agate-1.6.3/agate/aggregations/max.py 0000664 0000000 0000000 00000002247 14074061410 0017476 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
from agate.aggregations.base import Aggregation
from agate.data_types import Date, DateTime, Number, TimeDelta
from agate.exceptions import DataTypeError
class Max(Aggregation):
"""
Find the maximum value in a column.
This aggregation can be applied to columns containing :class:`.Date`,
:class:`.DateTime`, or :class:`.Number` data.
:param column_name:
The name of the column to be searched.
"""
def __init__(self, column_name):
self._column_name = column_name
def get_aggregate_data_type(self, table):
column = table.columns[self._column_name]
if isinstance(column.data_type, (Date, DateTime, Number, TimeDelta)):
return column.data_type
def validate(self, table):
column = table.columns[self._column_name]
if not isinstance(column.data_type, (Date, DateTime, Number, TimeDelta)):
raise DataTypeError('Min can only be applied to columns containing DateTime, Date or Number data.')
def run(self, table):
column = table.columns[self._column_name]
data = column.values_without_nulls()
if data:
return max(data)
agate-1.6.3/agate/aggregations/max_length.py 0000664 0000000 0000000 00000002350 14074061410 0021032 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
from decimal import Decimal
from agate.aggregations.base import Aggregation
from agate.data_types import Number, Text
from agate.exceptions import DataTypeError
class MaxLength(Aggregation):
"""
Find the length of the longest string in a column.
Note: On Python 2.7 this function may miscalcuate the length of unicode
strings that contain "wide characters". For details see this StackOverflow
answer: http://stackoverflow.com/a/35462951
:param column_name:
The name of a column containing :class:`.Text` data.
"""
def __init__(self, column_name):
self._column_name = column_name
def get_aggregate_data_type(self, table):
return Number()
def validate(self, table):
column = table.columns[self._column_name]
if not isinstance(column.data_type, Text):
raise DataTypeError('MaxLength can only be applied to columns containing Text data.')
def run(self, table):
"""
:returns:
:class:`int`.
"""
column = table.columns[self._column_name]
lens = [len(d) for d in column.values_without_nulls()]
if not lens:
return Decimal('0')
return Decimal(max(lens))
agate-1.6.3/agate/aggregations/max_precision.py 0000664 0000000 0000000 00000001613 14074061410 0021545 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
from agate.aggregations.base import Aggregation
from agate.data_types import Number
from agate.exceptions import DataTypeError
from agate.utils import max_precision
class MaxPrecision(Aggregation):
"""
Find the most decimal places present for any value in this column.
:param column_name:
The name of the column to be searched.
"""
def __init__(self, column_name):
self._column_name = column_name
def get_aggregate_data_type(self, table):
return Number()
def validate(self, table):
column = table.columns[self._column_name]
if not isinstance(column.data_type, Number):
raise DataTypeError('MaxPrecision can only be applied to columns containing Number data.')
def run(self, table):
column = table.columns[self._column_name]
return max_precision(column.values_without_nulls())
agate-1.6.3/agate/aggregations/mean.py 0000664 0000000 0000000 00000002310 14074061410 0017620 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
from agate.aggregations.base import Aggregation
from agate.aggregations.has_nulls import HasNulls
from agate.aggregations.sum import Sum
from agate.data_types import Number
from agate.exceptions import DataTypeError
from agate.warns import warn_null_calculation
class Mean(Aggregation):
"""
Calculate the mean of a column.
:param column_name:
The name of a column containing :class:`.Number` data.
"""
def __init__(self, column_name):
self._column_name = column_name
self._sum = Sum(column_name)
def get_aggregate_data_type(self, table):
return Number()
def validate(self, table):
column = table.columns[self._column_name]
if not isinstance(column.data_type, Number):
raise DataTypeError('Mean can only be applied to columns containing Number data.')
has_nulls = HasNulls(self._column_name).run(table)
if has_nulls:
warn_null_calculation(self, column)
def run(self, table):
column = table.columns[self._column_name]
data = column.values_without_nulls()
if data:
sum_total = self._sum.run(table)
return sum_total / len(data)
agate-1.6.3/agate/aggregations/median.py 0000664 0000000 0000000 00000002346 14074061410 0020146 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
from agate.aggregations.base import Aggregation
from agate.aggregations.has_nulls import HasNulls
from agate.aggregations.percentiles import Percentiles
from agate.data_types import Number
from agate.exceptions import DataTypeError
from agate.warns import warn_null_calculation
class Median(Aggregation):
"""
Calculate the median of a column.
Median is equivalent to the 50th percentile. See :class:`Percentiles`
for implementation details.
:param column_name:
The name of a column containing :class:`.Number` data.
"""
def __init__(self, column_name):
self._column_name = column_name
self._percentiles = Percentiles(column_name)
def get_aggregate_data_type(self, table):
return Number()
def validate(self, table):
column = table.columns[self._column_name]
if not isinstance(column.data_type, Number):
raise DataTypeError('Median can only be applied to columns containing Number data.')
has_nulls = HasNulls(self._column_name).run(table)
if has_nulls:
warn_null_calculation(self, column)
def run(self, table):
percentiles = self._percentiles.run(table)
return percentiles[50]
agate-1.6.3/agate/aggregations/min.py 0000664 0000000 0000000 00000002247 14074061410 0017474 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
from agate.aggregations.base import Aggregation
from agate.data_types import Date, DateTime, Number, TimeDelta
from agate.exceptions import DataTypeError
class Min(Aggregation):
"""
Find the minimum value in a column.
This aggregation can be applied to columns containing :class:`.Date`,
:class:`.DateTime`, or :class:`.Number` data.
:param column_name:
The name of the column to be searched.
"""
def __init__(self, column_name):
self._column_name = column_name
def get_aggregate_data_type(self, table):
column = table.columns[self._column_name]
if isinstance(column.data_type, (Date, DateTime, Number, TimeDelta)):
return column.data_type
def validate(self, table):
column = table.columns[self._column_name]
if not isinstance(column.data_type, (Date, DateTime, Number, TimeDelta)):
raise DataTypeError('Min can only be applied to columns containing DateTime, Date or Number data.')
def run(self, table):
column = table.columns[self._column_name]
data = column.values_without_nulls()
if data:
return min(data)
agate-1.6.3/agate/aggregations/mode.py 0000664 0000000 0000000 00000002350 14074061410 0017630 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
from collections import defaultdict
from agate.aggregations.base import Aggregation
from agate.aggregations.has_nulls import HasNulls
from agate.data_types import Number
from agate.exceptions import DataTypeError
from agate.warns import warn_null_calculation
class Mode(Aggregation):
"""
Calculate the mode of a column.
:param column_name:
The name of a column containing :class:`.Number` data.
"""
def __init__(self, column_name):
self._column_name = column_name
def get_aggregate_data_type(self, table):
return Number()
def validate(self, table):
column = table.columns[self._column_name]
if not isinstance(column.data_type, Number):
raise DataTypeError('Sum can only be applied to columns containing Number data.')
has_nulls = HasNulls(self._column_name).run(table)
if has_nulls:
warn_null_calculation(self, column)
def run(self, table):
column = table.columns[self._column_name]
data = column.values_without_nulls()
if data:
state = defaultdict(int)
for n in data:
state[n] += 1
return max(state.keys(), key=lambda x: state[x])
agate-1.6.3/agate/aggregations/percentiles.py 0000664 0000000 0000000 00000004421 14074061410 0021222 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
import math
from agate.aggregations.base import Aggregation
from agate.aggregations.has_nulls import HasNulls
from agate.data_types import Number
from agate.exceptions import DataTypeError
from agate.utils import Quantiles
from agate.warns import warn_null_calculation
class Percentiles(Aggregation):
"""
Divide a column into 100 equal-size groups using the "CDF" method.
See `this explanation `_
of the various methods for computing percentiles.
"Zeroth" (min value) and "Hundredth" (max value) percentiles are included
for reference and intuitive indexing.
A reference implementation was provided by
`pycalcstats `_.
This aggregation can not be applied to a :class:`.TableSet`.
:param column_name:
The name of a column containing :class:`.Number` data.
"""
def __init__(self, column_name):
self._column_name = column_name
def validate(self, table):
column = table.columns[self._column_name]
if not isinstance(column.data_type, Number):
raise DataTypeError('Percentiles can only be applied to columns containing Number data.')
has_nulls = HasNulls(self._column_name).run(table)
if has_nulls:
warn_null_calculation(self, column)
def run(self, table):
"""
:returns:
An instance of :class:`Quantiles`.
"""
column = table.columns[self._column_name]
data = column.values_without_nulls_sorted()
if not data:
return Quantiles([None for percentile in range(101)])
# Zeroth percentile is first datum
quantiles = [data[0]]
for percentile in range(1, 100):
k = len(data) * (float(percentile) / 100)
low = max(1, int(math.ceil(k)))
high = min(len(data), int(math.floor(k + 1)))
# No remainder
if low == high:
value = data[low - 1]
# Remainder
else:
value = (data[low - 1] + data[high - 1]) / 2
quantiles.append(value)
# Hundredth percentile is final datum
quantiles.append(data[-1])
return Quantiles(quantiles)
agate-1.6.3/agate/aggregations/quartiles.py 0000664 0000000 0000000 00000003014 14074061410 0020713 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
from agate.aggregations.base import Aggregation
from agate.aggregations.has_nulls import HasNulls
from agate.aggregations.percentiles import Percentiles
from agate.data_types import Number
from agate.exceptions import DataTypeError
from agate.utils import Quantiles
from agate.warns import warn_null_calculation
class Quartiles(Aggregation):
"""
Calculate the quartiles of column based on its percentiles.
Quartiles will be equivalent to the the 25th, 50th and 75th percentiles.
"Zeroth" (min value) and "Fourth" (max value) quartiles are included for
reference and intuitive indexing.
See :class:`Percentiles` for implementation details.
This aggregation can not be applied to a :class:`.TableSet`.
:param column_name:
The name of a column containing :class:`.Number` data.
"""
def __init__(self, column_name):
self._column_name = column_name
def validate(self, table):
column = table.columns[self._column_name]
if not isinstance(column.data_type, Number):
raise DataTypeError('Quartiles can only be applied to columns containing Number data.')
has_nulls = HasNulls(self._column_name).run(table)
if has_nulls:
warn_null_calculation(self, column)
def run(self, table):
"""
:returns:
An instance of :class:`Quantiles`.
"""
percentiles = Percentiles(self._column_name).run(table)
return Quantiles([percentiles[i] for i in range(0, 101, 25)])
agate-1.6.3/agate/aggregations/quintiles.py 0000664 0000000 0000000 00000003017 14074061410 0020722 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
from agate.aggregations.base import Aggregation
from agate.aggregations.has_nulls import HasNulls
from agate.aggregations.percentiles import Percentiles
from agate.data_types import Number
from agate.exceptions import DataTypeError
from agate.utils import Quantiles
from agate.warns import warn_null_calculation
class Quintiles(Aggregation):
"""
Calculate the quintiles of a column based on its percentiles.
Quintiles will be equivalent to the 20th, 40th, 60th and 80th percentiles.
"Zeroth" (min value) and "Fifth" (max value) quintiles are included for
reference and intuitive indexing.
See :class:`Percentiles` for implementation details.
This aggregation can not be applied to a :class:`.TableSet`.
:param column_name:
The name of a column containing :class:`.Number` data.
"""
def __init__(self, column_name):
self._column_name = column_name
def validate(self, table):
column = table.columns[self._column_name]
if not isinstance(column.data_type, Number):
raise DataTypeError('Quintiles can only be applied to columns containing Number data.')
has_nulls = HasNulls(self._column_name).run(table)
if has_nulls:
warn_null_calculation(self, column)
def run(self, table):
"""
:returns:
An instance of :class:`Quantiles`.
"""
percentiles = Percentiles(self._column_name).run(table)
return Quantiles([percentiles[i] for i in range(0, 101, 20)])
agate-1.6.3/agate/aggregations/stdev.py 0000664 0000000 0000000 00000004350 14074061410 0020033 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
from agate.aggregations import Aggregation
from agate.aggregations.has_nulls import HasNulls
from agate.aggregations.variance import PopulationVariance, Variance
from agate.data_types import Number
from agate.exceptions import DataTypeError
from agate.warns import warn_null_calculation
class StDev(Aggregation):
"""
Calculate the sample standard of deviation of a column.
For the population standard of deviation see :class:`.PopulationStDev`.
:param column_name:
The name of a column containing :class:`.Number` data.
"""
def __init__(self, column_name):
self._column_name = column_name
self._variance = Variance(column_name)
def get_aggregate_data_type(self, table):
return Number()
def validate(self, table):
column = table.columns[self._column_name]
if not isinstance(column.data_type, Number):
raise DataTypeError('StDev can only be applied to columns containing Number data.')
has_nulls = HasNulls(self._column_name).run(table)
if has_nulls:
warn_null_calculation(self, column)
def run(self, table):
variance = self._variance.run(table)
if variance is not None:
return variance.sqrt()
class PopulationStDev(StDev):
"""
Calculate the population standard of deviation of a column.
For the sample standard of deviation see :class:`.StDev`.
:param column_name:
The name of a column containing :class:`.Number` data.
"""
def __init__(self, column_name):
self._column_name = column_name
self._population_variance = PopulationVariance(column_name)
def get_aggregate_data_type(self, table):
return Number()
def validate(self, table):
column = table.columns[self._column_name]
if not isinstance(column.data_type, Number):
raise DataTypeError('PopulationStDev can only be applied to columns containing Number data.')
has_nulls = HasNulls(self._column_name).run(table)
if has_nulls:
warn_null_calculation(self, column)
def run(self, table):
variance = self._population_variance.run(table)
if variance is not None:
return variance.sqrt()
agate-1.6.3/agate/aggregations/sum.py 0000664 0000000 0000000 00000002136 14074061410 0017512 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
import datetime
from agate.aggregations.base import Aggregation
from agate.data_types import Number, TimeDelta
from agate.exceptions import DataTypeError
class Sum(Aggregation):
"""
Calculate the sum of a column.
:param column_name:
The name of a column containing :class:`.Number` data.
"""
def __init__(self, column_name):
self._column_name = column_name
def get_aggregate_data_type(self, table):
column = table.columns[self._column_name]
if isinstance(column.data_type, (Number, TimeDelta)):
return column.data_type
def validate(self, table):
column = table.columns[self._column_name]
if not isinstance(column.data_type, (Number, TimeDelta)):
raise DataTypeError('Sum can only be applied to columns containing Number or TimeDelta data.')
def run(self, table):
column = table.columns[self._column_name]
start = 0
if isinstance(column.data_type, TimeDelta):
start = datetime.timedelta()
return sum(column.values_without_nulls(), start)
agate-1.6.3/agate/aggregations/summary.py 0000664 0000000 0000000 00000002034 14074061410 0020400 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
from agate.aggregations.base import Aggregation
class Summary(Aggregation):
"""
Apply an arbitrary function to a column.
:param column_name:
The name of a column to be summarized.
:param data_type:
The return type of this aggregation.
:param func:
A function which will be passed the column for processing.
:param cast:
If :code:`True`, each return value will be cast to the specified
:code:`data_type` to ensure it is valid. Only disable this if you are
certain your summary always returns the correct type.
"""
def __init__(self, column_name, data_type, func, cast=True):
self._column_name = column_name
self._data_type = data_type
self._func = func
self._cast = cast
def get_aggregate_data_type(self, table):
return self._data_type
def run(self, table):
v = self._func(table.columns[self._column_name])
if self._cast:
v = self._data_type.cast(v)
return v
agate-1.6.3/agate/aggregations/variance.py 0000664 0000000 0000000 00000004536 14074061410 0020504 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
from agate.aggregations.base import Aggregation
from agate.aggregations.has_nulls import HasNulls
from agate.aggregations.mean import Mean
from agate.data_types import Number
from agate.exceptions import DataTypeError
from agate.warns import warn_null_calculation
class Variance(Aggregation):
"""
Calculate the sample variance of a column.
For the population variance see :class:`.PopulationVariance`.
:param column_name:
The name of a column containing :class:`.Number` data.
"""
def __init__(self, column_name):
self._column_name = column_name
self._mean = Mean(column_name)
def get_aggregate_data_type(self, table):
return Number()
def validate(self, table):
column = table.columns[self._column_name]
if not isinstance(column.data_type, Number):
raise DataTypeError('Variance can only be applied to columns containing Number data.')
has_nulls = HasNulls(self._column_name).run(table)
if has_nulls:
warn_null_calculation(self, column)
def run(self, table):
column = table.columns[self._column_name]
data = column.values_without_nulls()
if data:
mean = self._mean.run(table)
return sum((n - mean) ** 2 for n in data) / (len(data) - 1)
class PopulationVariance(Variance):
"""
Calculate the population variance of a column.
For the sample variance see :class:`.Variance`.
:param column_name:
The name of a column containing :class:`.Number` data.
"""
def __init__(self, column_name):
self._column_name = column_name
self._mean = Mean(column_name)
def get_aggregate_data_type(self, table):
return Number()
def validate(self, table):
column = table.columns[self._column_name]
if not isinstance(column.data_type, Number):
raise DataTypeError('PopulationVariance can only be applied to columns containing Number data.')
has_nulls = HasNulls(self._column_name).run(table)
if has_nulls:
warn_null_calculation(self, column)
def run(self, table):
column = table.columns[self._column_name]
data = column.values_without_nulls()
if data:
mean = self._mean.run(table)
return sum((n - mean) ** 2 for n in data) / len(data)
agate-1.6.3/agate/columns.py 0000664 0000000 0000000 00000007223 14074061410 0015716 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
"""
This module contains the :class:`Column` class, which defines a "vertical"
array of tabular data. Whereas :class:`.Row` instances are independent of their
parent :class:`.Table`, columns depend on knowledge of both their position in
the parent (column name, data type) as well as the rows that contain their data.
"""
import six
from agate.mapped_sequence import MappedSequence
from agate.utils import NullOrder, memoize
if six.PY3: # pragma: no cover
# pylint: disable=W0622
xrange = range
def null_handler(k):
"""
Key method for sorting nulls correctly.
"""
if k is None:
return NullOrder()
return k
class Column(MappedSequence):
"""
Proxy access to column data. Instances of :class:`Column` should
not be constructed directly. They are created by :class:`.Table`
instances and are unique to them.
Columns are implemented as subclass of :class:`.MappedSequence`. They
deviate from the underlying implementation in that loading of their data
is deferred until it is needed.
:param name:
The name of this column.
:param data_type:
An instance of :class:`.DataType`.
:param rows:
A :class:`.MappedSequence` that contains the :class:`.Row` instances
containing the data for this column.
:param row_names:
An optional list of row names (keys) for this column.
"""
__slots__ = ['_index', '_name', '_data_type', '_rows', '_row_names']
def __init__(self, index, name, data_type, rows, row_names=None):
self._index = index
self._name = name
self._data_type = data_type
self._rows = rows
self._keys = row_names
def __getstate__(self):
"""
Return state values to be pickled.
This is necessary on Python2.7 when using :code:`__slots__`.
"""
return {
'_index': self._index,
'_name': self._name,
'_data_type': self._data_type,
'_rows': self._rows,
'_keys': self._keys
}
def __setstate__(self, data):
"""
Restore pickled state.
This is necessary on Python2.7 when using :code:`__slots__`.
"""
self._index = data['_index']
self._name = data['_name']
self._data_type = data['_data_type']
self._rows = data['_rows']
self._keys = data['_keys']
@property
def index(self):
"""
This column's index.
"""
return self._index
@property
def name(self):
"""
This column's name.
"""
return self._name
@property
def data_type(self):
"""
This column's data type.
"""
return self._data_type
@memoize
def values(self):
"""
Get the values in this column, as a tuple.
"""
return tuple(row[self._index] for row in self._rows)
@memoize
def values_distinct(self):
"""
Get the distinct values in this column, as a tuple.
"""
return tuple(set(self.values()))
@memoize
def values_without_nulls(self):
"""
Get the values in this column with any null values removed.
"""
return tuple(d for d in self.values() if d is not None)
@memoize
def values_sorted(self):
"""
Get the values in this column sorted.
"""
return sorted(self.values(), key=null_handler)
@memoize
def values_without_nulls_sorted(self):
"""
Get the values in this column with any null values removed and sorted.
"""
return sorted(self.values_without_nulls(), key=null_handler)
agate-1.6.3/agate/computations/ 0000775 0000000 0000000 00000000000 14074061410 0016405 5 ustar 00root root 0000000 0000000 agate-1.6.3/agate/computations/__init__.py 0000664 0000000 0000000 00000002054 14074061410 0020517 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
"""
Computations create a new value for each :class:`.Row` in a :class:`.Table`.
When used with :meth:`.Table.compute` these new values become a new column.
For instance, the :class:`.PercentChange` computation takes two column names as
arguments and computes the percentage change between them for each row.
There are a variety of basic computations, such as :class:`.Change` and
:class:`.Percent`. If none of these meet your needs you can use the
:class:`Formula` computation to apply an arbitrary function to the row.
If this still isn't flexible enough, it's simple to create a custom computation
class by inheriting from :class:`Computation`.
"""
from agate.computations.base import Computation
from agate.computations.change import Change
from agate.computations.formula import Formula
from agate.computations.percent import Percent
from agate.computations.percent_change import PercentChange
from agate.computations.percentile_rank import PercentileRank
from agate.computations.rank import Rank
from agate.computations.slug import Slug
agate-1.6.3/agate/computations/base.py 0000664 0000000 0000000 00000002542 14074061410 0017674 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
import six
@six.python_2_unicode_compatible
class Computation(object): # pragma: no cover
"""
Computations produce a new column by performing a calculation on each row.
Computations are applied with :class:`.TableSet.compute`.
When implementing a custom computation, ensure that the values returned by
:meth:`.Computation.run` are of the type specified by
:meth:`.Computation.get_computed_data_type`. This can be ensured by using
the :meth:`.DataType.cast` method. See :class:`.Formula` for an example.
"""
def __str__(self):
"""
String representation of this column. May be used as a column name in
generated tables.
"""
return self.__class__.__name__
def get_computed_data_type(self, table):
"""
Returns an instantiated :class:`.DataType` which will be appended to
the table.
"""
raise NotImplementedError()
def validate(self, table):
"""
Perform any checks necessary to verify this computation can run on the
provided table without errors. This is called by :meth:`.Table.compute`
before :meth:`run`.
"""
pass
def run(self, table):
"""
When invoked with a table, returns a sequence of new column values.
"""
raise NotImplementedError()
agate-1.6.3/agate/computations/change.py 0000664 0000000 0000000 00000004715 14074061410 0020213 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
from agate.aggregations.has_nulls import HasNulls
from agate.computations.base import Computation
from agate.data_types import Date, DateTime, Number, TimeDelta
from agate.exceptions import DataTypeError
from agate.warns import warn_null_calculation
class Change(Computation):
"""
Calculate the difference between two columns.
This calculation can be applied to :class:`.Number` columns to calculate
numbers. It can also be applied to :class:`.Date`, :class:`.DateTime`, and
:class:`.TimeDelta` columns to calculate time deltas.
:param before_column_name:
The name of a column containing the "before" values.
:param after_column_name:
The name of a column containing the "after" values.
"""
def __init__(self, before_column_name, after_column_name):
self._before_column_name = before_column_name
self._after_column_name = after_column_name
def get_computed_data_type(self, table):
before_column = table.columns[self._before_column_name]
if isinstance(before_column.data_type, (Date, DateTime, TimeDelta)):
return TimeDelta()
elif isinstance(before_column.data_type, Number):
return Number()
def validate(self, table):
before_column = table.columns[self._before_column_name]
after_column = table.columns[self._after_column_name]
for data_type in (Number, Date, DateTime, TimeDelta):
if isinstance(before_column.data_type, data_type):
if not isinstance(after_column.data_type, data_type):
raise DataTypeError('Specified columns must be of the same type')
if HasNulls(self._before_column_name).run(table):
warn_null_calculation(self, before_column)
if HasNulls(self._after_column_name).run(table):
warn_null_calculation(self, after_column)
return
raise DataTypeError('Change before and after columns must both contain data that is one of: '
'Number, Date, DateTime or TimeDelta.')
def run(self, table):
new_column = []
for row in table.rows:
before = row[self._before_column_name]
after = row[self._after_column_name]
if before is not None and after is not None:
new_column.append(after - before)
else:
new_column.append(None)
return new_column
agate-1.6.3/agate/computations/formula.py 0000664 0000000 0000000 00000002040 14074061410 0020420 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
from agate.computations.base import Computation
class Formula(Computation):
"""
Apply an arbitrary function to each row.
:param data_type:
The data type this formula will return.
:param func:
The function to be applied to each row. Must return a valid value for
the specified data type.
:param cast:
If :code:`True`, each return value will be cast to the specified
:code:`data_type` to ensure it is valid. Only disable this if you are
certain your formula always returns the correct type.
"""
def __init__(self, data_type, func, cast=True):
self._data_type = data_type
self._func = func
self._cast = cast
def get_computed_data_type(self, table):
return self._data_type
def run(self, table):
new_column = []
for row in table.rows:
v = self._func(row)
if self._cast:
v = self._data_type.cast(v)
new_column.append(v)
return new_column
agate-1.6.3/agate/computations/percent.py 0000664 0000000 0000000 00000004716 14074061410 0020427 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
from agate.aggregations.has_nulls import HasNulls
from agate.aggregations.sum import Sum
from agate.computations.base import Computation
from agate.data_types import Number
from agate.exceptions import DataTypeError
from agate.warns import warn_null_calculation
class Percent(Computation):
"""
Calculate each values percentage of a total.
:param column_name:
The name of a column containing the :class:`.Number` values.
:param total:
If specified, the total value for each number to be divided into. By
default, the :class:`.Sum` of the values in the column will be used.
"""
def __init__(self, column_name, total=None):
self._column_name = column_name
self._total = total
def get_computed_data_type(self, table):
return Number()
def validate(self, table):
column = table.columns[self._column_name]
if not isinstance(column.data_type, Number):
raise DataTypeError('Percent column must contain Number data.')
if self._total is not None and self._total <= 0:
raise DataTypeError('The total must be a positive number')
# Throw a warning if there are nulls in there
if HasNulls(self._column_name).run(table):
warn_null_calculation(self, column)
def run(self, table):
"""
:returns:
:class:`decimal.Decimal`
"""
# If the user has provided a total, use that
if self._total is not None:
total = self._total
# Otherwise compute the sum of all the values in that column to
# act as our denominator
else:
total = table.aggregate(Sum(self._column_name))
# Raise error if sum is less than or equal to zero
if total <= 0:
raise DataTypeError('The sum of column values must be a positive number')
# Create a list new rows
new_column = []
# Loop through the existing rows
for row in table.rows:
# Pull the value
value = row[self._column_name]
if value is None:
new_column.append(None)
continue
# Try to divide it out of the total
percent = value / total
# And multiply it by 100
percent = percent * 100
# Append the value to the new list
new_column.append(percent)
# Pass out the list
return new_column
agate-1.6.3/agate/computations/percent_change.py 0000664 0000000 0000000 00000003656 14074061410 0021736 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
from agate.aggregations.has_nulls import HasNulls
from agate.computations.base import Computation
from agate.data_types import Number
from agate.exceptions import DataTypeError
from agate.warns import warn_null_calculation
class PercentChange(Computation):
"""
Calculate the percent difference between two columns.
:param before_column_name:
The name of a column containing the "before" :class:`.Number` values.
:param after_column_name:
The name of a column containing the "after" :class:`.Number` values.
"""
def __init__(self, before_column_name, after_column_name):
self._before_column_name = before_column_name
self._after_column_name = after_column_name
def get_computed_data_type(self, table):
return Number()
def validate(self, table):
before_column = table.columns[self._before_column_name]
after_column = table.columns[self._after_column_name]
if not isinstance(before_column.data_type, Number):
raise DataTypeError('PercentChange before column must contain Number data.')
if not isinstance(after_column.data_type, Number):
raise DataTypeError('PercentChange after column must contain Number data.')
if HasNulls(self._before_column_name).run(table):
warn_null_calculation(self, before_column)
if HasNulls(self._after_column_name).run(table):
warn_null_calculation(self, after_column)
def run(self, table):
"""
:returns:
:class:`decimal.Decimal`
"""
new_column = []
for row in table.rows:
before = row[self._before_column_name]
after = row[self._after_column_name]
if before is not None and after is not None:
new_column.append((after - before) / before * 100)
else:
new_column.append(None)
return new_column
agate-1.6.3/agate/computations/percentile_rank.py 0000664 0000000 0000000 00000001735 14074061410 0022132 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
from agate.aggregations.percentiles import Percentiles
from agate.computations.rank import Rank
from agate.data_types import Number
from agate.exceptions import DataTypeError
class PercentileRank(Rank):
"""
Calculate the percentile into which each value falls.
See :class:`.Percentiles` for implementation details.
:param column_name:
The name of a column containing the :class:`.Number` values.
"""
def validate(self, table):
column = table.columns[self._column_name]
if not isinstance(column.data_type, Number):
raise DataTypeError('PercentileRank column must contain Number data.')
def run(self, table):
"""
:returns:
:class:`int`
"""
percentiles = Percentiles(self._column_name).run(table)
new_column = []
for row in table.rows:
new_column.append(percentiles.locate(row[self._column_name]))
return new_column
agate-1.6.3/agate/computations/rank.py 0000664 0000000 0000000 00000003450 14074061410 0017714 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
from decimal import Decimal
import six
if six.PY3:
from functools import cmp_to_key
from agate.computations.base import Computation
from agate.data_types import Number
class Rank(Computation):
"""
Calculate rank order of the values in a column.
Uses the "competition" ranking method: if there are four values and the
middle two are tied, then the output will be `[1, 2, 2, 4]`.
Null values will always be ranked last.
:param column_name:
The name of the column to rank.
:param comparer:
An optional comparison function. If not specified ranking will be
ascending, with nulls ranked last.
:param reverse:
Reverse sort order before ranking.
"""
def __init__(self, column_name, comparer=None, reverse=None):
self._column_name = column_name
self._comparer = comparer
self._reverse = reverse
def get_computed_data_type(self, table):
return Number()
def run(self, table):
"""
:returns:
:class:`int`
"""
column = table.columns[self._column_name]
if self._comparer:
if six.PY3:
data_sorted = sorted(column.values(), key=cmp_to_key(self._comparer))
else: # pragma: no cover
data_sorted = sorted(column.values(), cmp=self._comparer)
else:
data_sorted = column.values_sorted()
if self._reverse:
data_sorted.reverse()
ranks = {}
rank = 0
for c in data_sorted:
rank += 1
if c in ranks:
continue
ranks[c] = Decimal(rank)
new_column = []
for row in table.rows:
new_column.append(ranks[row[self._column_name]])
return new_column
agate-1.6.3/agate/computations/slug.py 0000664 0000000 0000000 00000004042 14074061410 0017731 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
from agate.aggregations.has_nulls import HasNulls
from agate.computations.base import Computation
from agate.data_types import Text
from agate.exceptions import DataTypeError
from agate.utils import issequence, slugify
class Slug(Computation):
"""
Convert text values from one or more columns into slugs. If multiple column
names are given, values from those columns will be appended in the given
order before standardizing.
:param column_name:
The name of a column or a sequence of column names containing
:class:`.Text` values.
:param ensure_unique:
If True, any duplicate values will be appended with unique identifers.
Defaults to False.
"""
def __init__(self, column_name, ensure_unique=False, **kwargs):
self._column_name = column_name
self._ensure_unique = ensure_unique
self._slug_args = kwargs
def get_computed_data_type(self, table):
return Text()
def validate(self, table):
if issequence(self._column_name):
column_names = self._column_name
else:
column_names = [self._column_name]
for column_name in column_names:
column = table.columns[column_name]
if not isinstance(column.data_type, Text):
raise DataTypeError('Slug column must contain Text data.')
if HasNulls(column_name).run(table):
raise ValueError('Slug column cannot contain `None`.')
def run(self, table):
"""
:returns:
:class:`string`
"""
new_column = []
for row in table.rows:
if issequence(self._column_name):
column_value = ''
for column_name in self._column_name:
column_value = column_value + ' ' + row[column_name]
new_column.append(column_value)
else:
new_column.append(row[self._column_name])
return slugify(new_column, ensure_unique=self._ensure_unique, **self._slug_args)
agate-1.6.3/agate/config.py 0000664 0000000 0000000 00000007705 14074061410 0015510 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# -*- coding: utf8 -*-
"""
This module contains the global configuration for agate. Users should use
:meth:`get_option` and :meth:`set_option` to modify the global
configuration.
**Available configuation options:**
+-------------------------+------------------------------------------+-----------------------------------------+
| Option | Description | Default value |
+=========================+==========================================+=========================================+
| default_locale | Default locale for number formatting | default_locale('LC_NUMERIC') or 'en_US' |
+-------------------------+------------------------------------------+-----------------------------------------+
| horizontal_line_char | Character to render for horizontal lines | u'-' |
+-------------------------+------------------------------------------+-----------------------------------------+
| vertical_line_char | Character to render for vertical lines | u'|' |
+-------------------------+------------------------------------------+-----------------------------------------+
| bar_char | Character to render for bar chart units | u'░' |
+-------------------------+------------------------------------------+-----------------------------------------+
| printable_bar_char | Printable character for bar chart units | u':' |
+-------------------------+------------------------------------------+-----------------------------------------+
| zero_line_char | Character to render for zero line units | u'▓' |
+-------------------------+------------------------------------------+-----------------------------------------+
| printable_zero_line_char| Printable character for zero line units | u'|' |
+-------------------------+------------------------------------------+-----------------------------------------+
| tick_char | Character to render for axis ticks | u'+' |
+-------------------------+------------------------------------------+-----------------------------------------+
| ellipsis_chars | Characters to render for ellipsis | u'...' |
+-------------------------+------------------------------------------+-----------------------------------------+
"""
from babel.core import default_locale
_options = {
#: Default locale for number formatting
'default_locale': default_locale('LC_NUMERIC') or 'en_US',
#: Character to render for horizontal lines
'horizontal_line_char': u'-',
#: Character to render for vertical lines
'vertical_line_char': u'|',
#: Character to render for bar chart units
'bar_char': u'░',
#: Printable character to render for bar chart units
'printable_bar_char': u':',
#: Character to render for zero line units
'zero_line_char': u'▓',
#: Printable character to render for zero line units
'printable_zero_line_char': u'|',
#: Character to render for axis ticks
'tick_char': u'+',
#: Characters to render for ellipsis
'ellipsis_chars': u'...',
}
def get_option(key):
"""
Get a global configuration option for agate.
:param key:
The name of the configuration option.
"""
return _options[key]
def set_option(key, value):
"""
Set a global configuration option for agate.
:param key:
The name of the configuration option.
:param value:
The new value to set for the configuration option.
"""
_options[key] = value
def set_options(options):
"""
Set a dictionary of options simultaneously.
:param hash:
A dictionary of option names and values.
"""
_options.update(options)
agate-1.6.3/agate/csv_py2.py 0000664 0000000 0000000 00000017321 14074061410 0015623 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
"""
This module contains the Python 2 replacement for :mod:`csv`.
"""
import codecs
import csv
import warnings
import six
from agate.exceptions import FieldSizeLimitError
EIGHT_BIT_ENCODINGS = [
'utf-8', 'u8', 'utf', 'utf8',
'latin-1', 'iso-8859-1', 'iso8859-1', '8859', 'cp819', 'latin', 'latin1', 'l1'
]
POSSIBLE_DELIMITERS = [',', '\t', ';', ' ', ':', '|']
class UTF8Recoder(six.Iterator):
"""
Iterator that reads an encoded stream and reencodes the input to UTF-8.
"""
def __init__(self, f, encoding):
self.reader = codecs.getreader(encoding)(f)
def __iter__(self):
return self
def __next__(self):
return next(self.reader).encode('utf-8')
class UnicodeReader(object):
"""
A CSV reader which will read rows from a file in a given encoding.
"""
def __init__(self, f, encoding='utf-8', field_size_limit=None, line_numbers=False, header=True, **kwargs):
self.line_numbers = line_numbers
self.header = header
f = UTF8Recoder(f, encoding)
self.reader = csv.reader(f, **kwargs)
if field_size_limit:
csv.field_size_limit(field_size_limit)
def next(self):
try:
row = next(self.reader)
except csv.Error as e:
# Terrible way to test for this exception, but there is no subclass
if 'field larger than field limit' in str(e):
raise FieldSizeLimitError(csv.field_size_limit(), self.line_num)
else:
raise e
if self.line_numbers:
if self.header and self.line_num == 1:
row.insert(0, 'line_numbers')
else:
row.insert(0, str(self.line_num - 1 if self.header else self.line_num))
return [six.text_type(s, 'utf-8') for s in row]
def __iter__(self):
return self
@property
def dialect(self):
return self.reader.dialect
@property
def line_num(self):
return self.reader.line_num
class UnicodeWriter(object):
"""
A CSV writer which will write rows to a file in the specified encoding.
NB: Optimized so that eight-bit encodings skip re-encoding. See:
https://github.com/wireservice/csvkit/issues/175
"""
def __init__(self, f, encoding='utf-8', **kwargs):
self.encoding = encoding
self._eight_bit = (self.encoding.lower().replace('_', '-') in EIGHT_BIT_ENCODINGS)
if self._eight_bit:
self.writer = csv.writer(f, **kwargs)
else:
# Redirect output to a queue for reencoding
self.queue = six.StringIO()
self.writer = csv.writer(self.queue, **kwargs)
self.stream = f
self.encoder = codecs.getincrementalencoder(encoding)()
def writerow(self, row):
if self._eight_bit:
self.writer.writerow([six.text_type(s if s is not None else '').encode(self.encoding) for s in row])
else:
self.writer.writerow([six.text_type(s if s is not None else '').encode('utf-8') for s in row])
# Fetch UTF-8 output from the queue...
data = self.queue.getvalue()
data = data.decode('utf-8')
# ...and reencode it into the target encoding
data = self.encoder.encode(data)
# write to the file
self.stream.write(data)
# empty the queue
self.queue.truncate(0)
def writerows(self, rows):
for row in rows:
self.writerow(row)
class UnicodeDictReader(csv.DictReader):
"""
Defer almost all implementation to :class:`csv.DictReader`, but wraps our
unicode reader instead of :func:`csv.reader`.
"""
def __init__(self, f, fieldnames=None, restkey=None, restval=None, *args, **kwargs):
reader = UnicodeReader(f, *args, **kwargs)
if 'encoding' in kwargs:
kwargs.pop('encoding')
csv.DictReader.__init__(self, f, fieldnames, restkey, restval, *args, **kwargs)
self.reader = reader
class UnicodeDictWriter(csv.DictWriter):
"""
Defer almost all implementation to :class:`csv.DictWriter`, but wraps our
unicode writer instead of :func:`csv.writer`.
"""
def __init__(self, f, fieldnames, restval='', extrasaction='raise', *args, **kwds):
self.fieldnames = fieldnames
self.restval = restval
if extrasaction.lower() not in ('raise', 'ignore'):
raise ValueError('extrasaction (%s) must be "raise" or "ignore"' % extrasaction)
self.extrasaction = extrasaction
self.writer = UnicodeWriter(f, *args, **kwds)
class Reader(UnicodeReader):
"""
A unicode-aware CSV reader.
"""
pass
class Writer(UnicodeWriter):
"""
A unicode-aware CSV writer.
"""
def __init__(self, f, encoding='utf-8', line_numbers=False, **kwargs):
self.row_count = 0
self.line_numbers = line_numbers
if 'lineterminator' not in kwargs:
kwargs['lineterminator'] = '\n'
UnicodeWriter.__init__(self, f, encoding, **kwargs)
def _append_line_number(self, row):
if self.row_count == 0:
row.insert(0, 'line_number')
else:
row.insert(0, self.row_count)
self.row_count += 1
def writerow(self, row):
if self.line_numbers:
row = list(row)
self._append_line_number(row)
# Convert embedded Mac line endings to unix style line endings so they get quoted
row = [i.replace('\r', '\n') if isinstance(i, six.string_types) else i for i in row]
UnicodeWriter.writerow(self, row)
def writerows(self, rows):
for row in rows:
self.writerow(row)
class DictReader(UnicodeDictReader):
"""
A unicode-aware CSV DictReader.
"""
pass
class DictWriter(UnicodeDictWriter):
"""
A unicode-aware CSV DictWriter.
"""
def __init__(self, f, fieldnames, encoding='utf-8', line_numbers=False, **kwargs):
self.row_count = 0
self.line_numbers = line_numbers
if 'lineterminator' not in kwargs:
kwargs['lineterminator'] = '\n'
UnicodeDictWriter.__init__(self, f, fieldnames, encoding=encoding, **kwargs)
def _append_line_number(self, row):
if self.row_count == 0:
row['line_number'] = 0
else:
row['line_number'] = self.row_count
self.row_count += 1
def writerow(self, row):
if self.line_numbers:
row = list(row)
self._append_line_number(row)
# Convert embedded Mac line endings to unix style line endings so they get quoted
row = dict([
(k, v.replace('\r', '\n')) if isinstance(v, basestring) else (k, v) for k, v in row.items() # noqa: F821
])
UnicodeDictWriter.writerow(self, row)
def writerows(self, rows):
for row in rows:
self.writerow(row)
class Sniffer(object):
"""
A functional wrapper of ``csv.Sniffer()``.
"""
def sniff(self, sample):
"""
A functional version of ``csv.Sniffer().sniff``, that extends the
list of possible delimiters to include some seen in the wild.
"""
try:
dialect = csv.Sniffer().sniff(sample, POSSIBLE_DELIMITERS)
except csv.Error as e:
warnings.warn('Error sniffing CSV dialect: %s' % e, RuntimeWarning, stacklevel=2)
dialect = None
return dialect
def reader(*args, **kwargs):
"""
A replacement for Python's :func:`csv.reader` that uses
:class:`.csv_py2.Reader`.
"""
return Reader(*args, **kwargs)
def writer(*args, **kwargs):
"""
A replacement for Python's :func:`csv.writer` that uses
:class:`.csv_py2.Writer`.
"""
return Writer(*args, **kwargs)
agate-1.6.3/agate/csv_py3.py 0000664 0000000 0000000 00000011127 14074061410 0015622 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
"""
This module contains the Python 3 replacement for :mod:`csv`.
"""
import csv
import warnings
import six
from agate.exceptions import FieldSizeLimitError
POSSIBLE_DELIMITERS = [',', '\t', ';', ' ', ':', '|']
class Reader(six.Iterator):
"""
A wrapper around Python 3's builtin :func:`csv.reader`.
"""
def __init__(self, f, field_size_limit=None, line_numbers=False, header=True, **kwargs):
self.line_numbers = line_numbers
self.header = header
if field_size_limit:
csv.field_size_limit(field_size_limit)
self.reader = csv.reader(f, **kwargs)
def __iter__(self):
return self
def __next__(self):
try:
row = next(self.reader)
except csv.Error as e:
# Terrible way to test for this exception, but there is no subclass
if 'field larger than field limit' in str(e):
raise FieldSizeLimitError(csv.field_size_limit(), self.line_num)
else:
raise e
if not self.line_numbers:
return row
else:
if self.line_numbers:
if self.header and self.line_num == 1:
row.insert(0, 'line_numbers')
else:
row.insert(0, str(self.line_num - 1 if self.header else self.line_num))
return row
@property
def dialect(self):
return self.reader.dialect
@property
def line_num(self):
return self.reader.line_num
class Writer(object):
"""
A wrapper around Python 3's builtin :func:`csv.writer`.
"""
def __init__(self, f, line_numbers=False, **kwargs):
self.row_count = 0
self.line_numbers = line_numbers
if 'lineterminator' not in kwargs:
kwargs['lineterminator'] = '\n'
self.writer = csv.writer(f, **kwargs)
def _append_line_number(self, row):
if self.row_count == 0:
row.insert(0, 'line_number')
else:
row.insert(0, self.row_count)
self.row_count += 1
def writerow(self, row):
if self.line_numbers:
row = list(row)
self._append_line_number(row)
# Convert embedded Mac line endings to unix style line endings so they get quoted
row = [i.replace('\r', '\n') if isinstance(i, six.string_types) else i for i in row]
self.writer.writerow(row)
def writerows(self, rows):
for row in rows:
self.writerow(row)
class DictReader(csv.DictReader):
"""
A wrapper around Python 3's builtin :class:`csv.DictReader`.
"""
pass
class DictWriter(csv.DictWriter):
"""
A wrapper around Python 3's builtin :class:`csv.DictWriter`.
"""
def __init__(self, f, fieldnames, line_numbers=False, **kwargs):
self.row_count = 0
self.line_numbers = line_numbers
if 'lineterminator' not in kwargs:
kwargs['lineterminator'] = '\n'
if self.line_numbers:
fieldnames.insert(0, 'line_number')
csv.DictWriter.__init__(self, f, fieldnames, **kwargs)
def _append_line_number(self, row):
if self.row_count == 0:
row['line_number'] = 'line_number'
else:
row['line_number'] = self.row_count
self.row_count += 1
def writerow(self, row):
# Convert embedded Mac line endings to unix style line endings so they get quoted
row = dict([(k, v.replace('\r', '\n')) if isinstance(v, six.string_types) else (k, v) for k, v in row.items()])
if self.line_numbers:
self._append_line_number(row)
csv.DictWriter.writerow(self, row)
def writerows(self, rows):
for row in rows:
self.writerow(row)
class Sniffer(object):
"""
A functional wrapper of ``csv.Sniffer()``.
"""
def sniff(self, sample):
"""
A functional version of ``csv.Sniffer().sniff``, that extends the
list of possible delimiters to include some seen in the wild.
"""
try:
dialect = csv.Sniffer().sniff(sample, POSSIBLE_DELIMITERS)
except csv.Error as e:
warnings.warn('Error sniffing CSV dialect: %s' % e, RuntimeWarning, stacklevel=2)
dialect = None
return dialect
def reader(*args, **kwargs):
"""
A replacement for Python's :func:`csv.reader` that uses
:class:`.csv_py3.Reader`.
"""
return Reader(*args, **kwargs)
def writer(*args, **kwargs):
"""
A replacement for Python's :func:`csv.writer` that uses
:class:`.csv_py3.Writer`.
"""
return Writer(*args, **kwargs)
agate-1.6.3/agate/data_types/ 0000775 0000000 0000000 00000000000 14074061410 0016015 5 ustar 00root root 0000000 0000000 agate-1.6.3/agate/data_types/__init__.py 0000664 0000000 0000000 00000001322 14074061410 0020124 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
"""
Data types define how data should be imported during the creation of a
:class:`.Table`.
If column types are not explicitly specified when a :class:`.Table` is created,
agate will attempt to guess them. The :class:`.TypeTester` class can be used to
control how types are guessed.
"""
from agate.data_types.base import DEFAULT_NULL_VALUES, DataType
from agate.data_types.boolean import DEFAULT_FALSE_VALUES, DEFAULT_TRUE_VALUES, Boolean
from agate.data_types.date import Date
from agate.data_types.date_time import DateTime
from agate.data_types.number import Number
from agate.data_types.text import Text
from agate.data_types.time_delta import TimeDelta
from agate.exceptions import CastError
agate-1.6.3/agate/data_types/base.py 0000664 0000000 0000000 00000002627 14074061410 0017310 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
import six
from agate.exceptions import CastError
#: Default values which will be automatically cast to :code:`None`
DEFAULT_NULL_VALUES = ('', 'na', 'n/a', 'none', 'null', '.')
class DataType(object): # pragma: no cover
"""
Specifies how values should be parsed when creating a :class:`.Table`.
:param null_values: A sequence of values which should be cast to
:code:`None` when encountered by this data type.
"""
def __init__(self, null_values=DEFAULT_NULL_VALUES):
self.null_values = null_values
def test(self, d):
"""
Test, for purposes of type inference, if a value could possibly be
coerced to this data type.
This is really just a thin wrapper around :meth:`DataType.cast`.
"""
try:
self.cast(d)
except CastError:
return False
return True
def cast(self, d):
"""
Coerce a given string value into this column's data type.
"""
raise NotImplementedError
def csvify(self, d):
"""
Format a given native value for CSV serialization.
"""
if d is None:
return None
return six.text_type(d)
def jsonify(self, d):
"""
Format a given native value for JSON serialization.
"""
if d is None:
return None
return six.text_type(d)
agate-1.6.3/agate/data_types/boolean.py 0000664 0000000 0000000 00000004132 14074061410 0020006 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
try:
from cdecimal import Decimal
except ImportError: # pragma: no cover
from decimal import Decimal
import six
from agate.data_types.base import DEFAULT_NULL_VALUES, DataType
from agate.exceptions import CastError
#: Default values which will be automatically cast to :code:`True`.
DEFAULT_TRUE_VALUES = ('yes', 'y', 'true', 't', '1')
#: Default values which will be automatically cast to :code:`False`.
DEFAULT_FALSE_VALUES = ('no', 'n', 'false', 'f', '0')
class Boolean(DataType):
"""
Data representing true and false.
Note that by default numerical `1` and `0` are considered valid boolean
values, but other numbers are not.
:param true_values: A sequence of values which should be cast to
:code:`True` when encountered with this type.
:param false_values: A sequence of values which should be cast to
:code:`False` when encountered with this type.
"""
def __init__(self, true_values=DEFAULT_TRUE_VALUES, false_values=DEFAULT_FALSE_VALUES,
null_values=DEFAULT_NULL_VALUES):
super(Boolean, self).__init__(null_values=null_values)
self.true_values = true_values
self.false_values = false_values
def cast(self, d):
"""
Cast a single value to :class:`bool`.
:param d: A value to cast.
:returns: :class:`bool` or :code:`None`.
"""
if d is None:
return d
elif type(d) is bool and type(d) is not int:
return d
elif type(d) is int or isinstance(d, Decimal):
if d == 1:
return True
elif d == 0:
return False
elif isinstance(d, six.string_types):
d = d.replace(',', '').strip()
d_lower = d.lower()
if d_lower in self.null_values:
return None
elif d_lower in self.true_values:
return True
elif d_lower in self.false_values:
return False
raise CastError('Can not convert value %s to bool.' % d)
def jsonify(self, d):
return d
agate-1.6.3/agate/data_types/date.py 0000664 0000000 0000000 00000006677 14074061410 0017324 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
import locale
from datetime import date, datetime, time
import parsedatetime
import six
from agate.data_types.base import DataType
from agate.exceptions import CastError
ZERO_DT = datetime.combine(date.min, time.min)
class Date(DataType):
"""
Data representing dates alone.
:param date_format:
A formatting string for :meth:`datetime.datetime.strptime` to use
instead of using regex-based parsing.
:param locale:
A locale specification such as :code:`en_US` or :code:`de_DE` to use
for parsing formatted dates.
"""
def __init__(self, date_format=None, locale=None, **kwargs):
super(Date, self).__init__(**kwargs)
self.date_format = date_format
self.locale = locale
self._constants = parsedatetime.Constants(localeID=self.locale)
self._parser = parsedatetime.Calendar(constants=self._constants, version=parsedatetime.VERSION_CONTEXT_STYLE)
def __getstate__(self):
"""
Return state values to be pickled. Exclude _constants and _parser because parsedatetime
cannot be pickled.
"""
odict = self.__dict__.copy()
del odict['_constants']
del odict['_parser']
return odict
def __setstate__(self, ndict):
"""
Restore state from the unpickled state values. Set _constants to an instance
of the parsedatetime Constants class, and _parser to an instance
of the parsedatetime Calendar class.
"""
self.__dict__.update(ndict)
self._constants = parsedatetime.Constants(localeID=self.locale)
self._parser = parsedatetime.Calendar(constants=self._constants, version=parsedatetime.VERSION_CONTEXT_STYLE)
def cast(self, d):
"""
Cast a single value to a :class:`datetime.date`.
If both `date_format` and `locale` have been specified
in the `agate.Date` instance, the `cast()` function
is not thread-safe.
:returns:
:class:`datetime.date` or :code:`None`.
"""
if type(d) is date or d is None:
return d
elif isinstance(d, six.string_types):
d = d.strip()
if d.lower() in self.null_values:
return None
else:
raise CastError('Can not parse value "%s" as date.' % d)
if self.date_format:
orig_locale = None
if self.locale:
orig_locale = locale.getlocale(locale.LC_TIME)
locale.setlocale(locale.LC_TIME, (self.locale, 'UTF-8'))
try:
dt = datetime.strptime(d, self.date_format)
except (ValueError, TypeError):
raise CastError('Value "%s" does not match date format.' % d)
finally:
if orig_locale:
locale.setlocale(locale.LC_TIME, orig_locale)
return dt.date()
try:
(value, ctx, _, _, matched_text), = self._parser.nlp(d, sourceTime=ZERO_DT)
except (TypeError, ValueError, OverflowError):
raise CastError('Value "%s" does not match date format.' % d)
else:
if matched_text == d and ctx.hasDate and not ctx.hasTime:
return value.date()
raise CastError('Can not parse value "%s" as date.' % d)
def csvify(self, d):
if d is None:
return None
return d.isoformat()
def jsonify(self, d):
return self.csvify(d)
agate-1.6.3/agate/data_types/date_time.py 0000664 0000000 0000000 00000010342 14074061410 0020322 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
import datetime
import locale
import isodate
import parsedatetime
import six
from agate.data_types.base import DataType
from agate.exceptions import CastError
class DateTime(DataType):
"""
Data representing dates with times.
:param datetime_format:
A formatting string for :meth:`datetime.datetime.strptime` to use
instead of using regex-based parsing.
:param timezone:
A `pytz `_ timezone to apply to each
parsed date.
:param locale:
A locale specification such as :code:`en_US` or :code:`de_DE` to use
for parsing formatted datetimes.
"""
def __init__(self, datetime_format=None, timezone=None, locale=None, **kwargs):
super(DateTime, self).__init__(**kwargs)
self.datetime_format = datetime_format
self.timezone = timezone
self.locale = locale
now = datetime.datetime.now()
self._source_time = datetime.datetime(
now.year, now.month, now.day, 0, 0, 0, 0, None
)
self._constants = parsedatetime.Constants(localeID=self.locale)
self._parser = parsedatetime.Calendar(constants=self._constants, version=parsedatetime.VERSION_CONTEXT_STYLE)
def __getstate__(self):
"""
Return state values to be pickled. Exclude _parser because parsedatetime
cannot be pickled.
"""
odict = self.__dict__.copy()
del odict['_constants']
del odict['_parser']
return odict
def __setstate__(self, ndict):
"""
Restore state from the unpickled state values. Set _constants to an instance
of the parsedatetime Constants class, and _parser to an instance
of the parsedatetime Calendar class.
"""
self.__dict__.update(ndict)
self._constants = parsedatetime.Constants(localeID=self.locale)
self._parser = parsedatetime.Calendar(constants=self._constants, version=parsedatetime.VERSION_CONTEXT_STYLE)
def cast(self, d):
"""
Cast a single value to a :class:`datetime.datetime`.
If both `date_format` and `locale` have been specified
in the `agate.DateTime` instance, the `cast()` function
is not thread-safe.
:returns:
:class:`datetime.datetime` or :code:`None`.
"""
if isinstance(d, datetime.datetime) or d is None:
return d
elif isinstance(d, datetime.date):
return datetime.datetime.combine(d, datetime.time(0, 0, 0))
elif isinstance(d, six.string_types):
d = d.strip()
if d.lower() in self.null_values:
return None
else:
raise CastError('Can not parse value "%s" as datetime.' % d)
if self.datetime_format:
orig_locale = None
if self.locale:
orig_locale = locale.getlocale(locale.LC_TIME)
locale.setlocale(locale.LC_TIME, (self.locale, 'UTF-8'))
try:
dt = datetime.datetime.strptime(d, self.datetime_format)
except (ValueError, TypeError):
raise CastError('Value "%s" does not match date format.' % d)
finally:
if orig_locale:
locale.setlocale(locale.LC_TIME, orig_locale)
return dt
try:
(_, _, _, _, matched_text), = self._parser.nlp(d, sourceTime=self._source_time)
except Exception:
matched_text = None
else:
value, ctx = self._parser.parseDT(
d,
sourceTime=self._source_time,
tzinfo=self.timezone
)
if matched_text == d and ctx.hasDate and ctx.hasTime:
return value
elif matched_text == d and ctx.hasDate and not ctx.hasTime:
return datetime.datetime.combine(value.date(), datetime.time.min)
try:
dt = isodate.parse_datetime(d)
return dt
except Exception:
pass
raise CastError('Can not parse value "%s" as datetime.' % d)
def csvify(self, d):
if d is None:
return None
return d.isoformat()
def jsonify(self, d):
return self.csvify(d)
agate-1.6.3/agate/data_types/number.py 0000664 0000000 0000000 00000006627 14074061410 0017672 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# -*- coding: utf8 -*-
try:
from cdecimal import Decimal, InvalidOperation
except ImportError: # pragma: no cover
from decimal import Decimal, InvalidOperation
import warnings
import six
from babel.core import Locale
from agate.data_types.base import DataType
from agate.exceptions import CastError
#: A list of currency symbols sourced from `Xe `_.
DEFAULT_CURRENCY_SYMBOLS = [u'؋', u'$', u'ƒ', u'៛', u'¥', u'₡', u'₱', u'£', u'€', u'¢', u'﷼', u'₪', u'₩', u'₭', u'₮',
u'₦', u'฿', u'₤', u'₫']
POSITIVE = Decimal('1')
NEGATIVE = Decimal('-1')
class Number(DataType):
"""
Data representing numbers.
:param locale:
A locale specification such as :code:`en_US` or :code:`de_DE` to use
for parsing formatted numbers.
:param group_symbol:
A grouping symbol used in the numbers. Overrides the value provided by
the specified :code:`locale`.
:param decimal_symbol:
A decimal separate symbol used in the numbers. Overrides the value
provided by the specified :code:`locale`.
:param currency_symbols:
A sequence of currency symbols to strip from numbers.
"""
def __init__(self, locale='en_US', group_symbol=None, decimal_symbol=None,
currency_symbols=DEFAULT_CURRENCY_SYMBOLS, **kwargs):
super(Number, self).__init__(**kwargs)
self.locale = Locale.parse(locale)
self.currency_symbols = currency_symbols
# Suppress Babel warning on Python 3.6
# See #665
with warnings.catch_warnings():
warnings.simplefilter("ignore")
self.group_symbol = group_symbol or self.locale.number_symbols.get('group', ',')
self.decimal_symbol = decimal_symbol or self.locale.number_symbols.get('decimal', '.')
def cast(self, d):
"""
Cast a single value to a :class:`decimal.Decimal`.
:returns:
:class:`decimal.Decimal` or :code:`None`.
"""
if isinstance(d, Decimal) or d is None:
return d
t = type(d)
if t is int:
return Decimal(d)
elif six.PY2 and t is long: # noqa: F821
return Decimal(d)
elif t is float:
return Decimal(repr(d))
elif d is False:
return Decimal(0)
elif d is True:
return Decimal(1)
elif not isinstance(d, six.string_types):
raise CastError('Can not parse value "%s" as Decimal.' % d)
d = d.strip()
if d.lower() in self.null_values:
return None
d = d.strip('%')
if len(d) > 0 and d[0] == '-':
d = d[1:]
sign = NEGATIVE
else:
sign = POSITIVE
for symbol in self.currency_symbols:
d = d.strip(symbol)
d = d.replace(self.group_symbol, '')
d = d.replace(self.decimal_symbol, '.')
try:
return Decimal(d) * sign
# The Decimal class will return an InvalidOperation exception on most Python implementations,
# but PyPy3 may return a ValueError if the string is not translatable to ASCII
except (InvalidOperation, ValueError):
pass
raise CastError('Can not parse value "%s" as Decimal.' % d)
def jsonify(self, d):
if d is None:
return d
return float(d)
agate-1.6.3/agate/data_types/text.py 0000664 0000000 0000000 00000001632 14074061410 0017355 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
import six
from agate.data_types.base import DataType
class Text(DataType):
"""
Data representing text.
:param cast_nulls:
If :code:`True`, values in :data:`.DEFAULT_NULL_VALUES` will be
converted to `None`. Disable to retain them as strings.
"""
def __init__(self, cast_nulls=True, **kwargs):
super(Text, self).__init__(**kwargs)
self.cast_nulls = cast_nulls
def cast(self, d):
"""
Cast a single value to :func:`unicode` (:func:`str` in Python 3).
:param d:
A value to cast.
:returns:
:func:`unicode` (:func:`str` in Python 3) or :code:`None`
"""
if d is None:
return d
elif isinstance(d, six.string_types):
if self.cast_nulls and d.strip().lower() in self.null_values:
return None
return six.text_type(d)
agate-1.6.3/agate/data_types/time_delta.py 0000664 0000000 0000000 00000002072 14074061410 0020477 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
import datetime
import pytimeparse
import six
from agate.data_types.base import DataType
from agate.exceptions import CastError
class TimeDelta(DataType):
"""
Data representing the interval between two dates and/or times.
"""
def cast(self, d):
"""
Cast a single value to :class:`datetime.timedelta`.
:param d:
A value to cast.
:returns:
:class:`datetime.timedelta` or :code:`None`
"""
if isinstance(d, datetime.timedelta) or d is None:
return d
elif isinstance(d, six.string_types):
d = d.strip()
if d.lower() in self.null_values:
return None
else:
raise CastError('Can not parse value "%s" as timedelta.' % d)
try:
seconds = pytimeparse.parse(d)
except AttributeError:
seconds = None
if seconds is None:
raise CastError('Can not parse value "%s" to as timedelta.' % d)
return datetime.timedelta(seconds=seconds)
agate-1.6.3/agate/exceptions.py 0000664 0000000 0000000 00000002157 14074061410 0016420 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
"""
This module contains various exceptions raised by agate.
"""
class DataTypeError(TypeError): # pragma: no cover
"""
A calculation was attempted with an invalid :class:`.DataType`.
"""
pass
class UnsupportedAggregationError(TypeError): # pragma: no cover
"""
An :class:`.Aggregation` was attempted which is not supported.
For example, if a :class:`.Percentiles` is applied to a :class:`.TableSet`.
"""
pass
class CastError(Exception): # pragma: no cover
"""
A column value can not be cast to the correct type.
"""
pass
class FieldSizeLimitError(Exception): # pragma: no cover
"""
A field in a CSV file exceeds the maximum length.
This length may be the default or one set by the user.
"""
def __init__(self, limit, line_number):
super(FieldSizeLimitError, self).__init__(
'CSV contains a field longer than the maximum length of %i characters on line %i. Try raising the maximum '
'with the field_size_limit parameter, or try setting quoting=csv.QUOTE_NONE.' % (limit, line_number)
)
agate-1.6.3/agate/fixed.py 0000664 0000000 0000000 00000004072 14074061410 0015334 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
"""
This module contains a generic parser for fixed-width files. It operates
similar to Python's built-in CSV reader.
"""
from collections import OrderedDict, namedtuple
import six
Field = namedtuple('Field', ['name', 'start', 'length'])
class Reader(six.Iterator):
"""
Reads a fixed-width file using a column schema in CSV format.
This works almost exactly like Python's built-in CSV reader.
Schemas must be in the "ffs" format, with :code:`column`, :code:`start`,
and :code:`length` columns. There is a repository of such schemas
maintained at `wireservice/ffs `_.
"""
def __init__(self, f, schema_f):
from agate import csv
self.file = f
self.fields = []
reader = csv.reader(schema_f)
header = next(reader)
if header != ['column', 'start', 'length']:
raise ValueError('Schema must contain exactly three columns: "column", "start", and "length".')
for row in reader:
self.fields.append(Field(row[0], int(row[1]), int(row[2])))
def __iter__(self):
return self
def __next__(self):
line = next(self.file)
values = []
for field in self.fields:
values.append(line[field.start:field.start + field.length].strip())
return values
@property
def fieldnames(self):
"""
The names of the columns read from the schema.
"""
return [field.name for field in self.fields]
class DictReader(Reader):
"""
A fixed-width reader that returns :class:`collections.OrderedDict` rather
than a list.
"""
def __next__(self):
line = next(self.file)
values = OrderedDict()
for field in self.fields:
values[field.name] = line[field.start:field.start + field.length].strip()
return values
def reader(*args, **kwargs):
"""
A wrapper around :class:`.fixed.Reader`, so that it can be used in the same
way as a normal CSV reader.
"""
return Reader(*args, **kwargs)
agate-1.6.3/agate/mapped_sequence.py 0000664 0000000 0000000 00000010777 14074061410 0017404 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
"""
This module contains the :class:`MappedSequence` class that forms the foundation
for agate's :class:`.Row` and :class:`.Column` as well as for named sequences of
rows and columns.
"""
from collections import OrderedDict
try:
from collections.abc import Sequence
except ImportError:
from collections import Sequence
import six
from six.moves import range # pylint: disable=W0622
from agate.utils import memoize
class MappedSequence(Sequence):
"""
A generic container for immutable data that can be accessed either by
numeric index or by key. This is similar to an
:class:`collections.OrderedDict` except that the keys are optional and
iteration over it returns the values instead of keys.
This is the base class for both :class:`.Column` and :class:`.Row`.
:param values:
A sequence of values.
:param keys:
A sequence of keys.
"""
__slots__ = ['_values', '_keys']
def __init__(self, values, keys=None):
self._values = tuple(values)
if keys is not None:
self._keys = keys
else:
self._keys = None
def __getstate__(self):
"""
Return state values to be pickled.
This is necessary on Python2.7 when using :code:`__slots__`.
"""
return {
'_values': self._values,
'_keys': self._keys
}
def __setstate__(self, data):
"""
Restore pickled state.
This is necessary on Python2.7 when using :code:`__slots__`.
"""
self._values = data['_values']
self._keys = data['_keys']
def __unicode__(self):
"""
Print a unicode sample of the contents of this sequence.
"""
sample = u', '.join(repr(d) for d in self.values()[:5])
if len(self) > 5:
sample = u'%s, ...' % sample
return u'' % (type(self).__name__, sample)
def __str__(self):
"""
Print an ascii sample of the contents of this sequence.
"""
if six.PY2: # pragma: no cover
return str(self.__unicode__().encode('utf8'))
return str(self.__unicode__())
def __repr__(self):
return self.__str__()
def __getitem__(self, key):
"""
Retrieve values from this array by index, slice or key.
"""
if isinstance(key, slice):
indices = range(*key.indices(len(self)))
values = self.values()
return tuple(values[i] for i in indices)
# Note: can't use isinstance because bool is a subclass of int
elif type(key) is int:
return self.values()[key]
else:
return self.dict()[key]
def __setitem__(self, key, value):
"""
Set values by index, which we want to fail loudly.
"""
raise TypeError('Rows and columns can not be modified directly. You probably need to compute a new column.')
def __iter__(self):
"""
Iterate over values.
"""
return iter(self.values())
@memoize
def __len__(self):
return len(self.values())
def __eq__(self, other):
"""
Equality test with other sequences.
"""
if not isinstance(other, Sequence):
return False
return self.values() == tuple(other)
def __ne__(self, other):
"""
Inequality test with other sequences.
"""
return not self.__eq__(other)
def __contains__(self, value):
return self.values().__contains__(value)
def keys(self):
"""
Equivalent to :meth:`collections.OrderedDict.keys`.
"""
return self._keys
def values(self):
"""
Equivalent to :meth:`collections.OrderedDict.values`.
"""
return self._values
@memoize
def items(self):
"""
Equivalent to :meth:`collections.OrderedDict.items`.
"""
return tuple(zip(self.keys(), self.values()))
def get(self, key, default=None):
"""
Equivalent to :meth:`collections.OrderedDict.get`.
"""
try:
return self.dict()[key]
except KeyError:
if default:
return default
else:
return None
@memoize
def dict(self):
"""
Retrieve the contents of this sequence as an
:class:`collections.OrderedDict`.
"""
if self.keys() is None:
raise KeyError
return OrderedDict(self.items())
agate-1.6.3/agate/rows.py 0000664 0000000 0000000 00000001365 14074061410 0015231 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
"""
This module contains agate's :class:`Row` implementation. Rows are independent
of both the :class:`.Table` that contains them as well as the :class:`.Columns`
that access their data. This independence, combined with rows immutability
allows them to be safely shared between table instances.
"""
from agate.mapped_sequence import MappedSequence
class Row(MappedSequence):
"""
A row of data. Values within a row can be accessed by column name or column
index. Row are immutable and may be shared between :class:`.Table`
instances.
Currently row instances are a no-op subclass of :class:`MappedSequence`.
They are being maintained in this fashion in order to support future
features.
"""
pass
agate-1.6.3/agate/table/ 0000775 0000000 0000000 00000000000 14074061410 0014747 5 ustar 00root root 0000000 0000000 agate-1.6.3/agate/table/__init__.py 0000664 0000000 0000000 00000031473 14074061410 0017070 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
"""
The :class:`.Table` object is the most important class in agate. Tables are
created by supplying row data, column names and subclasses of :class:`.DataType`
to the constructor. Once created, the data in a table **can not be changed**.
This concept is central to agate.
Instead of modifying the data, various methods can be used to create new,
derivative tables. For example, the :meth:`.Table.select` method creates a new
table with only the specified columns. The :meth:`.Table.where` method creates
a new table with only those rows that pass a test. And :meth:`.Table.order_by`
creates a sorted table. In all of these cases the output is a new :class:`.Table`
and the existing table remains unmodified.
Tables are not themselves iterable, but the columns of the table can be
accessed via :attr:`.Table.columns` and the rows via :attr:`.Table.rows`. Both
sequences can be accessed either by numeric index or by name. (In the case of
rows, row names are optional.)
"""
import sys
import warnings
from itertools import chain
import six
from six.moves import range # pylint: disable=W0622
from agate import utils
from agate.columns import Column
from agate.data_types import DataType
from agate.exceptions import CastError
from agate.mapped_sequence import MappedSequence
from agate.rows import Row
from agate.type_tester import TypeTester
@six.python_2_unicode_compatible
class Table(object):
"""
A dataset consisting of rows and columns. Columns refer to "vertical" slices
of data that must all be of the same type. Rows refer to "horizontal" slices
of data that may (and usually do) contain mixed types.
The sequence of :class:`.Column` instances are retrieved via the
:attr:`.Table.columns` property. They may be accessed by either numeric
index or by unique column name.
The sequence of :class:`.Row` instances are retrieved via the
:attr:`.Table.rows` property. They may be accessed by either numeric index
or, if specified, unique row names.
:param rows:
The data as a sequence of any sequences: tuples, lists, etc. If
any row has fewer values than the number of columns, it will be filled
out with nulls. No row may have more values than the number of columns.
:param column_names:
A sequence of string names for each column or `None`, in which case
column names will be automatically assigned using :func:`.letter_name`.
:param column_types:
A sequence of instances of :class:`.DataType` or an instance of
:class:`.TypeTester` or `None` in which case a generic TypeTester will
be used. Alternatively, a dictionary with column names as keys and
instances of :class:`.DataType` as values to specify some types.
:param row_names:
Specifies unique names for each row. This parameter is
optional. If specified it may be 1) the name of a single column that
contains a unique identifier for each row, 2) a key function that takes
a :class:`.Row` and returns a unique identifier or 3) a sequence of
unique identifiers of the same length as the sequence of rows. The
uniqueness of resulting identifiers is not validated, so be certain
the values you provide are truly unique.
:param _is_fork:
Used internally to skip certain validation steps when data
is propagated from an existing table. When :code:`True`, rows are
assumed to be :class:`.Row` instances, rather than raw data.
"""
def __init__(self, rows, column_names=None, column_types=None, row_names=None, _is_fork=False):
if isinstance(rows, six.string_types):
raise ValueError('When created directly, the first argument to Table must be a sequence of rows. '
'Did you want agate.Table.from_csv?')
# Validate column names
if column_names:
self._column_names = utils.deduplicate(column_names, column_names=True)
elif rows:
self._column_names = tuple(utils.letter_name(i) for i in range(len(rows[0])))
warnings.warn('Column names not specified. "%s" will be used as names.' % str(self._column_names),
RuntimeWarning, stacklevel=2)
else:
self._column_names = tuple()
len_column_names = len(self._column_names)
# Validate column_types
if column_types is None:
column_types = TypeTester()
elif isinstance(column_types, dict):
for v in column_types.values():
if not isinstance(v, DataType):
raise ValueError('Column types must be instances of DataType.')
column_types = TypeTester(force=column_types)
elif not isinstance(column_types, TypeTester):
for column_type in column_types:
if not isinstance(column_type, DataType):
raise ValueError('Column types must be instances of DataType.')
if isinstance(column_types, TypeTester):
self._column_types = column_types.run(rows, self._column_names)
else:
self._column_types = tuple(column_types)
if len_column_names != len(self._column_types):
raise ValueError('column_names and column_types must be the same length.')
if not _is_fork:
new_rows = []
cast_funcs = [c.cast for c in self._column_types]
for i, row in enumerate(rows):
len_row = len(row)
if len_row > len_column_names:
raise ValueError(
'Row %i has %i values, but Table only has %i columns.' % (i, len_row, len_column_names)
)
elif len(row) < len_column_names:
row = chain(row, [None] * (len_column_names - len_row))
row_values = []
for j, d in enumerate(row):
try:
row_values.append(cast_funcs[j](d))
except CastError as e:
raise CastError(str(e) + ' Error at row %s column %s.' % (i, self._column_names[j]))
new_rows.append(Row(row_values, self._column_names))
else:
new_rows = rows
if row_names:
computed_row_names = []
if isinstance(row_names, six.string_types):
for row in new_rows:
name = row[row_names]
computed_row_names.append(name)
elif hasattr(row_names, '__call__'):
for row in new_rows:
name = row_names(row)
computed_row_names.append(name)
elif utils.issequence(row_names):
computed_row_names = row_names
else:
raise ValueError('row_names must be a column name, function or sequence')
for row_name in computed_row_names:
if type(row_name) is int:
raise ValueError('Row names cannot be of type int. Use Decimal for numbered row names.')
self._row_names = tuple(computed_row_names)
else:
self._row_names = None
self._rows = MappedSequence(new_rows, self._row_names)
# Build columns
new_columns = []
for i in range(len_column_names):
name = self._column_names[i]
data_type = self._column_types[i]
column = Column(i, name, data_type, self._rows, row_names=self._row_names)
new_columns.append(column)
self._columns = MappedSequence(new_columns, self._column_names)
def __str__(self):
"""
Print the table's structure using :meth:`.Table.print_structure`.
"""
structure = six.StringIO()
self.print_structure(output=structure)
return structure.getvalue()
def __len__(self):
"""
Shorthand for :code:`len(table.rows)`.
"""
return self._rows.__len__()
def __iter__(self):
"""
Shorthand for :code:`iter(table.rows)`.
"""
return self._rows.__iter__()
def __getitem__(self, key):
"""
Shorthand for :code:`table.rows[foo]`.
"""
return self._rows.__getitem__(key)
@property
def column_types(self):
"""
An tuple :class:`.DataType` instances.
"""
return self._column_types
@property
def column_names(self):
"""
An tuple of strings.
"""
return self._column_names
@property
def row_names(self):
"""
An tuple of strings, if this table has row names.
If this table does not have row names, then :code:`None`.
"""
return self._row_names
@property
def columns(self):
"""
A :class:`.MappedSequence` with column names for keys and
:class:`.Column` instances for values.
"""
return self._columns
@property
def rows(self):
"""
A :class:`.MappedSeqeuence` with row names for keys (if specified) and
:class:`.Row` instances for values.
"""
return self._rows
def _fork(self, rows, column_names=None, column_types=None, row_names=None):
"""
Create a new table using the metadata from this one.
This method is used internally by functions like
:meth:`.Table.order_by`.
:param rows:
Row data for the forked table.
:param column_names:
Column names for the forked table. If not specified, fork will use
this table's column names.
:param column_types:
Column types for the forked table. If not specified, fork will use
this table's column names.
:param row_names:
Row names for the forked table. If not specified, fork will use
this table's row names.
"""
if column_names is None:
column_names = self._column_names
if column_types is None:
column_types = self._column_types
if row_names is None:
row_names = self._row_names
return Table(rows, column_names, column_types, row_names=row_names, _is_fork=True)
def print_csv(self, **kwargs):
"""
Print this table as a CSV.
This is the same as passing :code:`sys.stdout` to :meth:`.Table.to_csv`.
:code:`kwargs` will be passed on to :meth:`.Table.to_csv`.
"""
self.to_csv(sys.stdout, **kwargs)
def print_json(self, **kwargs):
"""
Print this table as JSON.
This is the same as passing :code:`sys.stdout` to
:meth:`.Table.to_json`.
:code:`kwargs` will be passed on to :meth:`.Table.to_json`.
"""
self.to_json(sys.stdout, **kwargs)
from agate.table.aggregate import aggregate
from agate.table.bar_chart import bar_chart
from agate.table.bins import bins
from agate.table.column_chart import column_chart
from agate.table.compute import compute
from agate.table.denormalize import denormalize
from agate.table.distinct import distinct
from agate.table.exclude import exclude
from agate.table.find import find
from agate.table.from_csv import from_csv
from agate.table.from_fixed import from_fixed
from agate.table.from_json import from_json
from agate.table.from_object import from_object
from agate.table.group_by import group_by
from agate.table.homogenize import homogenize
from agate.table.join import join
from agate.table.limit import limit
from agate.table.line_chart import line_chart
from agate.table.merge import merge
from agate.table.normalize import normalize
from agate.table.order_by import order_by
from agate.table.pivot import pivot
from agate.table.print_bars import print_bars
from agate.table.print_html import print_html
from agate.table.print_structure import print_structure
from agate.table.print_table import print_table
from agate.table.rename import rename
from agate.table.scatterplot import scatterplot
from agate.table.select import select
from agate.table.to_csv import to_csv
from agate.table.to_json import to_json
from agate.table.where import where
Table.aggregate = aggregate
Table.bar_chart = bar_chart
Table.bins = bins
Table.column_chart = column_chart
Table.compute = compute
Table.denormalize = denormalize
Table.distinct = distinct
Table.exclude = exclude
Table.find = find
Table.from_csv = from_csv
Table.from_fixed = from_fixed
Table.from_json = from_json
Table.from_object = from_object
Table.group_by = group_by
Table.homogenize = homogenize
Table.join = join
Table.limit = limit
Table.line_chart = line_chart
Table.merge = merge
Table.normalize = normalize
Table.order_by = order_by
Table.pivot = pivot
Table.print_bars = print_bars
Table.print_html = print_html
Table.print_structure = print_structure
Table.print_table = print_table
Table.rename = rename
Table.scatterplot = scatterplot
Table.select = select
Table.to_csv = to_csv
Table.to_json = to_json
Table.where = where
agate-1.6.3/agate/table/aggregate.py 0000664 0000000 0000000 00000001744 14074061410 0017255 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# pylint: disable=W0212
from collections import OrderedDict
from agate import utils
def aggregate(self, aggregations):
"""
Apply one or more :class:`.Aggregation` instances to this table.
:param aggregations:
A single :class:`.Aggregation` instance or a sequence of tuples in the
format :code:`(name, aggregation)`, where each :code:`aggregation` is
an instance of :class:`.Aggregation`.
:returns:
If the input was a single :class:`Aggregation` then a single result
will be returned. If it was a sequence then an :class:`.OrderedDict` of
results will be returned.
"""
if utils.issequence(aggregations):
results = OrderedDict()
for name, agg in aggregations:
agg.validate(self)
for name, agg in aggregations:
results[name] = agg.run(self)
return results
else:
aggregations.validate(self)
return aggregations.run(self)
agate-1.6.3/agate/table/bar_chart.py 0000664 0000000 0000000 00000002376 14074061410 0017256 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# pylint: disable=W0212
import leather
def bar_chart(self, label=0, value=1, path=None, width=None, height=None):
"""
Render a bar chart using :class:`leather.Chart`.
:param label:
The name or index of a column to plot as the labels of the chart.
Defaults to the first column in the table.
:param value:
The name or index of a column to plot as the values of the chart.
Defaults to the second column in the table.
:param path:
If specified, the resulting SVG will be saved to this location. If
:code:`None` and running in IPython, then the SVG will be rendered
inline. Otherwise, the SVG data will be returned as a string.
:param width:
The width of the output SVG.
:param height:
The height of the output SVG.
"""
if type(label) is int:
label_name = self.column_names[label]
else:
label_name = label
if type(value) is int:
value_name = self.column_names[value]
else:
value_name = value
chart = leather.Chart()
chart.add_x_axis(name=value_name)
chart.add_y_axis(name=label_name)
chart.add_bars(self, x=value, y=label)
return chart.to_svg(path=path, width=width, height=height)
agate-1.6.3/agate/table/bins.py 0000664 0000000 0000000 00000006447 14074061410 0016267 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# pylint: disable=W0212
try:
from cdecimal import Decimal
except ImportError: # pragma: no cover
from decimal import Decimal
from babel.numbers import format_decimal
from agate import utils
from agate.aggregations import Max, Min
def bins(self, column_name, count=10, start=None, end=None):
"""
Generates (approximately) evenly sized bins for the values in a column.
Bins may not be perfectly even if the spread of the data does not divide
evenly, but all values will always be included in some bin.
The resulting table will have two columns. The first will have
the same name as the specified column, but will be type :class:`.Text`.
The second will be named :code:`count` and will be of type
:class:`.Number`.
:param column_name:
The name of the column to bin. Must be of type :class:`.Number`
:param count:
The number of bins to create. If not specified then each value will
be counted as its own bin.
:param start:
The minimum value to start the bins at. If not specified the
minimum value in the column will be used.
:param end:
The maximum value to end the bins at. If not specified the maximum
value in the column will be used.
:returns:
A new :class:`Table`.
"""
minimum, maximum = utils.round_limits(
Min(column_name).run(self),
Max(column_name).run(self)
)
# Infer bin start/end positions
start = minimum if not start else Decimal(start)
end = maximum if not end else Decimal(end)
# Calculate bin size
spread = abs(end - start)
size = spread / count
breaks = [start]
# Calculate breakpoints
for i in range(1, count + 1):
top = start + (size * i)
breaks.append(top)
# Format bin names
decimal_places = utils.max_precision(breaks)
break_formatter = utils.make_number_formatter(decimal_places)
def name_bin(i, j, first_exclusive=True, last_exclusive=False):
inclusive = format_decimal(i, format=break_formatter)
exclusive = format_decimal(j, format=break_formatter)
output = u'[' if first_exclusive else u'('
output += u'%s - %s' % (inclusive, exclusive)
output += u']' if last_exclusive else u')'
return output
# Generate bins
bin_names = []
for i in range(1, len(breaks)):
last_exclusive = (i == len(breaks) - 1)
if i == 1 and minimum < start:
name = name_bin(minimum, breaks[i], last_exclusive=last_exclusive)
elif i == len(breaks) - 1 and maximum > end:
name = name_bin(breaks[i - 1], maximum, last_exclusive=last_exclusive)
else:
name = name_bin(breaks[i - 1], breaks[i], last_exclusive=last_exclusive)
bin_names.append(name)
bin_names.append(None)
# Lambda method for actually assigning values to bins
def binner(row):
value = row[column_name]
if value is None:
return None
i = 1
try:
while value >= breaks[i]:
i += 1
except IndexError:
i -= 1
return bin_names[i - 1]
# Pivot by lambda
table = self.pivot(binner, key_name=column_name)
# Sort by bin order
return table.order_by(lambda r: bin_names.index(r[column_name]))
agate-1.6.3/agate/table/column_chart.py 0000664 0000000 0000000 00000002407 14074061410 0020002 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# pylint: disable=W0212
import leather
def column_chart(self, label=0, value=1, path=None, width=None, height=None):
"""
Render a column chart using :class:`leather.Chart`.
:param label:
The name or index of a column to plot as the labels of the chart.
Defaults to the first column in the table.
:param value:
The name or index of a column to plot as the values of the chart.
Defaults to the second column in the table.
:param path:
If specified, the resulting SVG will be saved to this location. If
:code:`None` and running in IPython, then the SVG will be rendered
inline. Otherwise, the SVG data will be returned as a string.
:param width:
The width of the output SVG.
:param height:
The height of the output SVG.
"""
if type(label) is int:
label_name = self.column_names[label]
else:
label_name = label
if type(value) is int:
value_name = self.column_names[value]
else:
value_name = value
chart = leather.Chart()
chart.add_x_axis(name=label_name)
chart.add_y_axis(name=value_name)
chart.add_columns(self, x=label, y=value)
return chart.to_svg(path=path, width=width, height=height)
agate-1.6.3/agate/table/compute.py 0000664 0000000 0000000 00000004066 14074061410 0017003 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# pylint: disable=W0212
from collections import OrderedDict
from copy import copy
from agate.rows import Row
def compute(self, computations, replace=False):
"""
Create a new table by applying one or more :class:`.Computation` instances
to each row.
:param computations:
A sequence of pairs of new column names and :class:`.Computation`
instances.
:param replace:
If :code:`True` then new column names can match existing names, and
those columns will be replaced with the computed data.
:returns:
A new :class:`.Table`.
"""
column_names = list(copy(self._column_names))
column_types = list(copy(self._column_types))
for new_column_name, computation in computations:
new_column_type = computation.get_computed_data_type(self)
if new_column_name in column_names:
if not replace:
raise ValueError(
'New column name "%s" already exists. Specify replace=True to replace with computed data.'
)
i = column_names.index(new_column_name)
column_types[i] = new_column_type
else:
column_names.append(new_column_name)
column_types.append(new_column_type)
computation.validate(self)
new_columns = OrderedDict()
for new_column_name, computation in computations:
new_columns[new_column_name] = computation.run(self)
new_rows = []
for i, row in enumerate(self._rows):
# Slow version if using replace
if replace:
values = []
for j, column_name in enumerate(column_names):
if column_name in new_columns:
values.append(new_columns[column_name][i])
else:
values.append(row[j])
# Faster version if not using replace
else:
values = row.values() + tuple(c[i] for c in new_columns.values())
new_rows.append(Row(values, column_names))
return self._fork(new_rows, column_names, column_types)
agate-1.6.3/agate/table/denormalize.py 0000664 0000000 0000000 00000010613 14074061410 0017633 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# pylint: disable=W0212
from collections import OrderedDict
try:
from cdecimal import Decimal
except ImportError: # pragma: no cover
from decimal import Decimal
import six
from agate import utils
from agate.data_types import Number
from agate.rows import Row
from agate.type_tester import TypeTester
def denormalize(self, key=None, property_column='property', value_column='value', default_value=utils.default,
column_types=None):
"""
Create a new table with row values converted into columns.
For example:
+---------+-----------+---------+
| name | property | value |
+=========+===========+=========+
| Jane | gender | female |
+---------+-----------+---------+
| Jane | race | black |
+---------+-----------+---------+
| Jane | age | 24 |
+---------+-----------+---------+
| ... | ... | ... |
+---------+-----------+---------+
Can be denormalized so that each unique value in `field` becomes a
column with `value` used for its values.
+---------+----------+--------+-------+
| name | gender | race | age |
+=========+==========+========+=======+
| Jane | female | black | 24 |
+---------+----------+--------+-------+
| Jack | male | white | 35 |
+---------+----------+--------+-------+
| Joe | male | black | 28 |
+---------+----------+--------+-------+
If one or more keys are specified then the resulting table will
automatically have :code:`row_names` set to those keys.
This is the opposite of :meth:`.Table.normalize`.
:param key:
A column name or a sequence of column names that should be
maintained as they are in the normalized table. Typically these
are the tables unique identifiers and any metadata about them. Or,
:code:`None` if there are no key columns.
:param field_column:
The column whose values should become column names in the new table.
:param property_column:
The column whose values should become the values of the property
columns in the new table.
:param default_value:
Value to be used for missing values in the pivot table. If not
specified :code:`Decimal(0)` will be used for aggregations that
return :class:`.Number` data and :code:`None` will be used for
all others.
:param column_types:
A sequence of column types with length equal to number of unique
values in field_column or an instance of :class:`.TypeTester`.
Defaults to a generic :class:`.TypeTester`.
:returns:
A new :class:`.Table`.
"""
from agate.table import Table
if key is None:
key = []
elif not utils.issequence(key):
key = [key]
field_names = []
row_data = OrderedDict()
for row in self.rows:
row_key = tuple(row[k] for k in key)
if row_key not in row_data:
row_data[row_key] = OrderedDict()
f = six.text_type(row[property_column])
v = row[value_column]
if f not in field_names:
field_names.append(f)
row_data[row_key][f] = v
if default_value == utils.default:
if isinstance(self.columns[value_column].data_type, Number):
default_value = Decimal(0)
else:
default_value = None
new_column_names = key + field_names
new_rows = []
row_names = []
for k, v in row_data.items():
row = list(k)
if len(k) == 1:
row_names.append(k[0])
else:
row_names.append(k)
for f in field_names:
if f in v:
row.append(v[f])
else:
row.append(default_value)
new_rows.append(Row(row, new_column_names))
key_column_types = [self.column_types[self.column_names.index(name)] for name in key]
if column_types is None or isinstance(column_types, TypeTester):
tester = TypeTester() if column_types is None else column_types
force_update = dict(zip(key, key_column_types))
force_update.update(tester._force)
tester._force = force_update
new_column_types = tester.run(new_rows, new_column_names)
else:
new_column_types = key_column_types + list(column_types)
return Table(new_rows, new_column_names, new_column_types, row_names=row_names)
agate-1.6.3/agate/table/distinct.py 0000664 0000000 0000000 00000002317 14074061410 0017145 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# pylint: disable=W0212
from agate import utils
def distinct(self, key=None):
"""
Create a new table with only unique rows.
:param key:
Either the name of a single column to use to identify unique rows, a
sequence of such column names, a :class:`function` that takes a
row and returns a value to identify unique rows, or `None`, in
which case the entire row will be checked for uniqueness.
:returns:
A new :class:`.Table`.
"""
key_is_row_function = hasattr(key, '__call__')
key_is_sequence = utils.issequence(key)
uniques = []
rows = []
if self._row_names is not None:
row_names = []
else:
row_names = None
for i, row in enumerate(self._rows):
if key_is_row_function:
k = key(row)
elif key_is_sequence:
k = (row[j] for j in key)
elif key is None:
k = tuple(row)
else:
k = row[key]
if k not in uniques:
uniques.append(k)
rows.append(row)
if self._row_names is not None:
row_names.append(self._row_names[i])
return self._fork(rows, row_names=row_names)
agate-1.6.3/agate/table/exclude.py 0000664 0000000 0000000 00000000767 14074061410 0016764 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# pylint: disable=W0212
from agate import utils
def exclude(self, key):
"""
Create a new table without the specified columns.
:param key:
Either the name of a single column to exclude or a sequence of such
names.
:returns:
A new :class:`.Table`.
"""
if not utils.issequence(key):
key = [key]
selected_column_names = tuple(n for n in self._column_names if n not in key)
return self.select(selected_column_names)
agate-1.6.3/agate/table/find.py 0000664 0000000 0000000 00000000666 14074061410 0016251 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# pylint: disable=W0212
def find(self, test):
"""
Find the first row that passes a test.
:param test:
A function that takes a :class:`.Row` and returns :code:`True` if
it matches.
:type test:
:class:`function`
:returns:
A single :class:`.Row` if found, or `None`.
"""
for row in self._rows:
if test(row):
return row
return None
agate-1.6.3/agate/table/from_csv.py 0000664 0000000 0000000 00000005674 14074061410 0017153 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
import io
import itertools
import six
@classmethod
def from_csv(cls, path, column_names=None, column_types=None, row_names=None, skip_lines=0, header=True, sniff_limit=0,
encoding='utf-8', row_limit=None, **kwargs):
"""
Create a new table from a CSV.
This method uses agate's builtin CSV reader, which supplies encoding
support for both Python 2 and Python 3.
:code:`kwargs` will be passed through to the CSV reader.
:param path:
Filepath or file-like object from which to read CSV data. If a file-like
object is specified, it must be seekable. If using Python 2, the file
should be opened in binary mode (`rb`).
:param column_names:
See :meth:`.Table.__init__`.
:param column_types:
See :meth:`.Table.__init__`.
:param row_names:
See :meth:`.Table.__init__`.
:param skip_lines:
The number of lines to skip from the top of the file.
:param header:
If :code:`True`, the first row of the CSV is assumed to contain column
names. If :code:`header` and :code:`column_names` are both specified
then a row will be skipped, but :code:`column_names` will be used.
:param sniff_limit:
Limit CSV dialect sniffing to the specified number of bytes. Set to
None to sniff the entire file. Defaults to 0 (no sniffing).
:param encoding:
Character encoding of the CSV file. Note: if passing in a file
handle it is assumed you have already opened it with the correct
encoding specified.
:param row_limit:
Limit how many rows of data will be read.
"""
from agate import csv
from agate.table import Table
close = False
try:
if hasattr(path, 'read'):
f = path
else:
if six.PY2:
f = open(path, 'Urb')
else:
f = io.open(path, encoding=encoding)
close = True
if isinstance(skip_lines, int):
while skip_lines > 0:
f.readline()
skip_lines -= 1
else:
raise ValueError('skip_lines argument must be an int')
contents = six.StringIO(f.read())
if sniff_limit is None:
kwargs['dialect'] = csv.Sniffer().sniff(contents.getvalue())
elif sniff_limit > 0:
kwargs['dialect'] = csv.Sniffer().sniff(contents.getvalue()[:sniff_limit])
if six.PY2:
kwargs['encoding'] = encoding
reader = csv.reader(contents, header=header, **kwargs)
if header:
if column_names is None:
column_names = next(reader)
else:
next(reader)
if row_limit is None:
rows = tuple(reader)
else:
rows = tuple(itertools.islice(reader, row_limit))
finally:
if close:
f.close()
return Table(rows, column_names, column_types, row_names=row_names)
agate-1.6.3/agate/table/from_fixed.py 0000664 0000000 0000000 00000004071 14074061410 0017445 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
import io
from agate import fixed, utils
@classmethod
def from_fixed(cls, path, schema_path, column_names=utils.default, column_types=None, row_names=None, encoding='utf-8',
schema_encoding='utf-8'):
"""
Create a new table from a fixed-width file and a CSV schema.
Schemas must be in the "ffs" format. There is a repository of such schemas
maintained at `wireservice/ffs `_.
:param path:
File path or file-like object from which to read fixed-width data.
:param schema_path:
File path or file-like object from which to read schema (CSV) data.
:param column_names:
By default, these will be parsed from the schema. For alternatives, see
:meth:`.Table.__init__`.
:param column_types:
See :meth:`.Table.__init__`.
:param row_names:
See :meth:`.Table.__init__`.
:param encoding:
Character encoding of the fixed-width file. Note: if passing in a file
handle it is assumed you have already opened it with the correct
encoding specified.
:param schema_encoding:
Character encoding of the schema file. Note: if passing in a file
handle it is assumed you have already opened it with the correct
encoding specified.
"""
from agate.table import Table
close_f = False
close_schema_f = False
try:
if not hasattr(path, 'read'):
f = io.open(path, encoding=encoding)
close_f = True
else:
f = path
if not hasattr(schema_path, 'read'):
schema_f = io.open(schema_path, encoding=schema_encoding)
close_schema_f = True
else:
schema_f = path
reader = fixed.reader(f, schema_f)
rows = list(reader)
finally:
if close_f:
f.close()
if close_schema_f:
schema_f.close()
if column_names == utils.default:
column_names = reader.fieldnames
return Table(rows, column_names, column_types, row_names=row_names)
agate-1.6.3/agate/table/from_json.py 0000664 0000000 0000000 00000005317 14074061410 0017323 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
import io
import json
from collections import OrderedDict
from decimal import Decimal
@classmethod
def from_json(cls, path, row_names=None, key=None, newline=False, column_types=None, encoding='utf-8', **kwargs):
"""
Create a new table from a JSON file.
Once the JSON has been deseralized, the resulting Python object is
passed to :meth:`.Table.from_object`.
If the file contains a top-level dictionary you may specify what
property contains the row list using the :code:`key` parameter.
:code:`kwargs` will be passed through to :meth:`json.load`.
:param path:
Filepath or file-like object from which to read JSON data.
:param row_names:
See the :meth:`.Table.__init__`.
:param key:
The key of the top-level dictionary that contains a list of row
arrays.
:param newline:
If `True` then the file will be parsed as "newline-delimited JSON".
:param column_types:
See :meth:`.Table.__init__`.
:param encoding:
According to RFC4627, JSON text shall be encoded in Unicode; the default encoding is
UTF-8. You can override this by using any encoding supported by your Python's open() function
if :code:`path` is a filepath. If passing in a file handle, it is assumed you have already opened it with the
correct encoding specified.
"""
from agate.table import Table
if key is not None and newline:
raise ValueError('key and newline may not be specified together.')
close = False
try:
if newline:
js = []
if hasattr(path, 'read'):
for line in path:
js.append(json.loads(line, object_pairs_hook=OrderedDict, parse_float=Decimal, **kwargs))
else:
f = io.open(path, encoding=encoding)
close = True
for line in f:
js.append(json.loads(line, object_pairs_hook=OrderedDict, parse_float=Decimal, **kwargs))
else:
if hasattr(path, 'read'):
js = json.load(path, object_pairs_hook=OrderedDict, parse_float=Decimal, **kwargs)
else:
f = io.open(path, encoding=encoding)
close = True
js = json.load(f, object_pairs_hook=OrderedDict, parse_float=Decimal, **kwargs)
if isinstance(js, dict):
if not key:
raise TypeError(
'When converting a JSON document with a top-level dictionary element, a key must be specified.'
)
js = js[key]
finally:
if close:
f.close()
return Table.from_object(js, row_names=row_names, column_types=column_types)
agate-1.6.3/agate/table/from_object.py 0000664 0000000 0000000 00000003272 14074061410 0017616 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
from agate import utils
@classmethod
def from_object(cls, obj, row_names=None, column_types=None):
"""
Create a new table from a Python object.
The object should be a list containing a dictionary for each "row".
Nested objects or lists will also be parsed. For example, this object:
.. code-block:: python
{
'one': {
'a': 1,
'b': 2,
'c': 3
},
'two': [4, 5, 6],
'three': 'd'
}
Would generate these columns and values:
.. code-block:: python
{
'one/a': 1,
'one/b': 2,
'one/c': 3,
'two.0': 4,
'two.1': 5,
'two.2': 6,
'three': 'd'
}
Column names and types will be inferred from the data.
Not all rows are required to have the same keys. Missing elements will
be filled in with null values.
:param obj:
Filepath or file-like object from which to read JSON data.
:param row_names:
See :meth:`.Table.__init__`.
:param column_types:
See :meth:`.Table.__init__`.
"""
from agate.table import Table
column_names = []
row_objects = []
for sub in obj:
parsed = utils.parse_object(sub)
for key in parsed.keys():
if key not in column_names:
column_names.append(key)
row_objects.append(parsed)
rows = []
for sub in row_objects:
r = []
for name in column_names:
r.append(sub.get(name, None))
rows.append(r)
return Table(rows, column_names, row_names=row_names, column_types=column_types)
agate-1.6.3/agate/table/group_by.py 0000664 0000000 0000000 00000004023 14074061410 0017146 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# pylint: disable=W0212
from collections import OrderedDict
from agate.data_types import Text
from agate.tableset import TableSet
def group_by(self, key, key_name=None, key_type=None):
"""
Create a :class:`.TableSet` with a table for each unique key.
Note that group names will always be coerced to a string, regardless of the
format of the input column.
:param key:
Either the name of a column from the this table to group by, or a
:class:`function` that takes a row and returns a value to group by.
:param key_name:
A name that describes the grouped properties. Defaults to the
column name that was grouped on or "group" if grouping with a key
function. See :class:`.TableSet` for more.
:param key_type:
An instance of any subclass of :class:`.DataType`. If not provided
it will default to a :class`.Text`.
:returns:
A :class:`.TableSet` mapping where the keys are unique values from
the :code:`key` and the values are new :class:`.Table` instances
containing the grouped rows.
"""
key_is_row_function = hasattr(key, '__call__')
if key_is_row_function:
key_name = key_name or 'group'
key_type = key_type or Text()
else:
column = self._columns[key]
key_name = key_name or column.name
key_type = key_type or column.data_type
groups = OrderedDict()
for row in self._rows:
if key_is_row_function:
group_name = key(row)
else:
group_name = row[column.name]
group_name = key_type.cast(group_name)
if group_name not in groups:
groups[group_name] = []
groups[group_name].append(row)
if not groups:
return TableSet([self._fork([])], [], key_name=key_name, key_type=key_type)
output = OrderedDict()
for group, rows in groups.items():
output[group] = self._fork(rows)
return TableSet(output.values(), output.keys(), key_name=key_name, key_type=key_type)
agate-1.6.3/agate/table/homogenize.py 0000664 0000000 0000000 00000006006 14074061410 0017467 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# pylint: disable=W0212
from agate import utils
from agate.rows import Row
def homogenize(self, key, compare_values, default_row=None):
"""
Fill in missing rows in a series.
This can be used, for instance, to add rows for missing years in a time
series.
Missing rows are found by comparing the values in the :code:`key` columns
with those provided as :code:`compare_values`.
Values not found in the table will be used to generate new rows with
the given :code:`default_row`.
:code:`default_row` should be an array of values or an array-generating
function. If not specified, the new rows will have :code:`None` in columns
all columns not specified in :code:`key`.
If :code:`default_row` is an array of values, its length should be row
length minus the number of column names provided in the :code:`key`.
If it is an array-generating function, the function should take an array
of missing values for each new row and output a full row including those
values.
:param key:
Either a column name or a sequence of such names.
:param compare_values:
Either an array of column values if key is a single column name or a
sequence of arrays of values if key is a sequence of names. It can
also be a generator that yields either of the two. A row is created for
each value or list of values not found in the rows of the table.
:param default_row:
An array of values or a function to generate new rows. The length of
the input array should be equal to row length minus column_names
count. The length of array generated by the function should be the
row length.
:returns:
A new :class:`.Table`.
"""
rows = list(self._rows)
if not utils.issequence(key):
key = [key]
if len(key) == 1:
if any(not utils.issequence(compare_value) for compare_value in compare_values):
compare_values = [[compare_value] for compare_value in compare_values]
column_values = [self._columns.get(name) for name in key]
column_indexes = [self._column_names.index(name) for name in key]
compare_values = [[column_values[i].data_type.cast(v) for i, v in enumerate(values)] for values in compare_values]
column_values = zip(*column_values)
differences = list(set(map(tuple, compare_values)) - set(column_values))
for difference in differences:
if callable(default_row):
new_row = default_row(difference)
else:
if default_row is not None:
new_row = list(default_row)
else:
new_row = [None] * (len(self._column_names) - len(key))
for i, d in zip(column_indexes, difference):
new_row.insert(i, d)
new_row = [self._columns[i].data_type.cast(v) for i, v in enumerate(new_row)]
rows.append(Row(new_row, self._column_names))
# Do not copy the row_names, since this function adds rows.
return self._fork(rows, row_names=[])
agate-1.6.3/agate/table/join.py 0000664 0000000 0000000 00000020035 14074061410 0016260 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# pylint: disable=W0212
from agate import utils
from agate.rows import Row
def join(self, right_table, left_key=None, right_key=None, inner=False, full_outer=False, require_match=False,
columns=None):
"""
Create a new table by joining two table's on common values. This method
implements most varieties of SQL join, in addition to some unique features.
If :code:`left_key` and :code:`right_key` are both :code:`None` then this
method will perform a "sequential join", which is to say it will join on row
number. The :code:`inner` and :code:`full_outer` arguments will determine
whether dangling left-hand and right-hand rows are included, respectively.
If :code:`left_key` is specified, then a "left outer join" will be
performed. This will combine columns from the :code:`right_table` anywhere
that :code:`left_key` and :code:`right_key` are equal. Unmatched rows from
the left table will be included with the right-hand columns set to
:code:`None`.
If :code:`inner` is :code:`True` then an "inner join" will be performed.
Unmatched rows from either table will be left out.
If :code:`full_outer` is :code:`True` then a "full outer join" will be
performed. Unmatched rows from both tables will be included, with the
columns in the other table set to :code:`None`.
In all cases, if :code:`right_key` is :code:`None` then it :code:`left_key`
will be used for both tables.
If :code:`left_key` and :code:`right_key` are column names, the right-hand
identifier column will not be included in the output table.
If :code:`require_match` is :code:`True` unmatched rows will raise an
exception. This is like an "inner join" except any row that doesn't have a
match will raise an exception instead of being dropped. This is useful for
enforcing expectations about datasets that should match.
Column names from the right table which also exist in this table will
be suffixed "2" in the new table.
A subset of columns from the right-hand table can be included in the joined
table using the :code:`columns` argument.
:param right_table:
The "right" table to join to.
:param left_key:
Either the name of a column from the this table to join on, the index
of a column, a sequence of such column identifiers, a
:class:`function` that takes a row and returns a value to join on, or
:code:`None` in which case the tables will be joined on row number.
:param right_key:
Either the name of a column from :code:table` to join on, the index of
a column, a sequence of such column identifiers, or a :class:`function`
that takes a ow and returns a value to join on. If :code:`None` then
:code:`left_key` will be used for both. If :code:`left_key` is
:code:`None` then this value is ignored.
:param inner:
Perform a SQL-style "inner join" instead of a left outer join. Rows
which have no match for :code:`left_key` will not be included in
the output table.
:param full_outer:
Perform a SQL-style "full outer" join rather than a left or a right.
May not be used in combination with :code:`inner`.
:param require_match:
If true, an exception will be raised if there is a left_key with no
matching right_key.
:param columns:
A sequence of column names from :code:`right_table` to include in
the final output table. Defaults to all columns not in
:code:`right_key`. Ignored when :code:`full_outer` is :code:`True`.
:returns:
A new :class:`.Table`.
"""
if inner and full_outer:
raise ValueError('A join can not be both "inner" and "full_outer".')
if right_key is None:
right_key = left_key
# Get join columns
right_key_indices = []
left_key_is_func = hasattr(left_key, '__call__')
left_key_is_sequence = utils.issequence(left_key)
# Left key is None
if left_key is None:
left_data = tuple(range(len(self._rows)))
# Left key is a function
elif left_key_is_func:
left_data = [left_key(row) for row in self._rows]
# Left key is a sequence
elif left_key_is_sequence:
left_columns = [self._columns[key] for key in left_key]
left_data = zip(*[column.values() for column in left_columns])
# Left key is a column name/index
else:
left_data = self._columns[left_key].values()
right_key_is_func = hasattr(right_key, '__call__')
right_key_is_sequence = utils.issequence(right_key)
# Sequential join
if left_key is None:
right_data = tuple(range(len(right_table._rows)))
# Right key is a function
elif right_key_is_func:
right_data = [right_key(row) for row in right_table._rows]
# Right key is a sequence
elif right_key_is_sequence:
right_columns = [right_table._columns[key] for key in right_key]
right_data = zip(*[column.values() for column in right_columns])
right_key_indices = [right_table._columns._keys.index(key) for key in right_key]
# Right key is a column name/index
else:
right_column = right_table._columns[right_key]
right_data = right_column.values()
right_key_indices = [right_table._columns.index(right_column)]
# Build names and type lists
column_names = list(self._column_names)
column_types = list(self._column_types)
for i, column in enumerate(right_table._columns):
name = column.name
if not full_outer:
if columns is None and i in right_key_indices:
continue
if columns is not None and name not in columns:
continue
if name in self.column_names:
column_names.append('%s2' % name)
else:
column_names.append(name)
column_types.append(column.data_type)
if columns is not None and not full_outer:
right_table = right_table.select([n for n in right_table._column_names if n in columns])
right_hash = {}
for i, value in enumerate(right_data):
if value not in right_hash:
right_hash[value] = []
right_hash[value].append(right_table._rows[i])
# Collect new rows
rows = []
if self._row_names is not None and not full_outer:
row_names = []
else:
row_names = None
# Iterate over left column
for left_index, left_value in enumerate(left_data):
matching_rows = right_hash.get(left_value, None)
if require_match and matching_rows is None:
raise ValueError('Left key "%s" does not have a matching right key.' % left_value)
# Rows with matches
if matching_rows:
for right_row in matching_rows:
new_row = list(self._rows[left_index])
for k, v in enumerate(right_row):
if columns is None and k in right_key_indices and not full_outer:
continue
new_row.append(v)
rows.append(Row(new_row, column_names))
if self._row_names is not None and not full_outer:
row_names.append(self._row_names[left_index])
# Rows without matches
elif not inner:
new_row = list(self._rows[left_index])
for k, v in enumerate(right_table._column_names):
if columns is None and k in right_key_indices and not full_outer:
continue
new_row.append(None)
rows.append(Row(new_row, column_names))
if self._row_names is not None and not full_outer:
row_names.append(self._row_names[left_index])
# Full outer join
if full_outer:
left_set = set(left_data)
for right_index, right_value in enumerate(right_data):
if right_value in left_set:
continue
new_row = ([None] * len(self._columns)) + list(right_table.rows[right_index])
rows.append(Row(new_row, column_names))
return self._fork(rows, column_names, column_types, row_names=row_names)
agate-1.6.3/agate/table/limit.py 0000664 0000000 0000000 00000001557 14074061410 0016447 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# pylint: disable=W0212
def limit(self, start_or_stop=None, stop=None, step=None):
"""
Create a new table with fewer rows.
See also: Python's builtin :func:`slice`.
:param start_or_stop:
If the only argument, then how many rows to include, otherwise,
the index of the first row to include.
:param stop:
The index of the last row to include.
:param step:
The size of the jump between rows to include. (`step=2` will return
every other row.)
:returns:
A new :class:`.Table`.
"""
if stop or step:
s = slice(start_or_stop, stop, step)
else:
s = slice(start_or_stop)
rows = self._rows[s]
if self._row_names is not None:
row_names = self._row_names[s]
else:
row_names = None
return self._fork(rows, row_names=row_names)
agate-1.6.3/agate/table/line_chart.py 0000664 0000000 0000000 00000002236 14074061410 0017434 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# pylint: disable=W0212
import leather
def line_chart(self, x=0, y=1, path=None, width=None, height=None):
"""
Render a line chart using :class:`leather.Chart`.
:param x:
The name or index of a column to plot as the x-axis. Defaults to the
first column in the table.
:param y:
The name or index of a column to plot as the y-axis. Defaults to the
second column in the table.
:param path:
If specified, the resulting SVG will be saved to this location. If
:code:`None` and running in IPython, then the SVG will be rendered
inline. Otherwise, the SVG data will be returned as a string.
:param width:
The width of the output SVG.
:param height:
The height of the output SVG.
"""
if type(x) is int:
x_name = self.column_names[x]
else:
x_name = x
if type(y) is int:
y_name = self.column_names[y]
else:
y_name = y
chart = leather.Chart()
chart.add_x_axis(name=x_name)
chart.add_y_axis(name=y_name)
chart.add_line(self, x=x, y=y)
return chart.to_svg(path=path, width=width, height=height)
agate-1.6.3/agate/table/merge.py 0000664 0000000 0000000 00000004472 14074061410 0016427 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# pylint: disable=W0212
from collections import OrderedDict
from agate.exceptions import DataTypeError
from agate.rows import Row
@classmethod
def merge(cls, tables, row_names=None, column_names=None):
"""
Create a new table from a sequence of similar tables.
This method will not carry over row names from the merged tables, but new
row names can be specified with the :code:`row_names` argument.
It is possible to limit the columns included in the new :class:`.Table`
with :code:`column_names` argument. For example, to only include columns
from a specific table, set :code:`column_names` equal to
:code:`table.column_names`.
:param tables:
An sequence of :class:`.Table` instances.
:param row_names:
See :class:`.Table` for the usage of this parameter.
:param column_names:
A sequence of column names to include in the new :class:`.Table`. If
not specified, all distinct column names from `tables` are included.
:returns:
A new :class:`.Table`.
"""
from agate.table import Table
new_columns = OrderedDict()
for table in tables:
for i in range(0, len(table.columns)):
if column_names is None or table.column_names[i] in column_names:
column_name = table.column_names[i]
column_type = table.column_types[i]
if column_name in new_columns:
if not isinstance(column_type, type(new_columns[column_name])):
raise DataTypeError('Tables contain columns with the same names, but different types.')
else:
new_columns[column_name] = column_type
column_keys = tuple(new_columns.keys())
column_types = tuple(new_columns.values())
rows = []
for table in tables:
# Performance optimization for identical table structures
if table.column_names == column_keys and table.column_types == column_types:
rows.extend(table.rows)
else:
for row in table.rows:
data = []
for column_key in column_keys:
data.append(row.get(column_key, None))
rows.append(Row(data, column_keys))
return Table(rows, column_keys, column_types, row_names=row_names, _is_fork=True)
agate-1.6.3/agate/table/normalize.py 0000664 0000000 0000000 00000006257 14074061410 0017333 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# pylint: disable=W0212
from agate import utils
from agate.rows import Row
from agate.type_tester import TypeTester
def normalize(self, key, properties, property_column='property', value_column='value', column_types=None):
"""
Create a new table with columns converted into rows values.
For example:
+---------+----------+--------+-------+
| name | gender | race | age |
+=========+==========+========+=======+
| Jane | female | black | 24 |
+---------+----------+--------+-------+
| Jack | male | white | 35 |
+---------+----------+--------+-------+
| Joe | male | black | 28 |
+---------+----------+--------+-------+
can be normalized on columns 'gender', 'race' and 'age':
+---------+-----------+---------+
| name | property | value |
+=========+===========+=========+
| Jane | gender | female |
+---------+-----------+---------+
| Jane | race | black |
+---------+-----------+---------+
| Jane | age | 24 |
+---------+-----------+---------+
| ... | ... | ... |
+---------+-----------+---------+
This is the opposite of :meth:`.Table.denormalize`.
:param key:
A column name or a sequence of column names that should be
maintained as they are in the normalized self. Typically these
are the tables unique identifiers and any metadata about them.
:param properties:
A column name or a sequence of column names that should be
converted to properties in the new self.
:param property_column:
The name to use for the column containing the property names.
:param value_column:
The name to use for the column containing the property values.
:param column_types:
A sequence of two column types for the property and value column in
that order or an instance of :class:`.TypeTester`. Defaults to a
generic :class:`.TypeTester`.
:returns:
A new :class:`.Table`.
"""
from agate.table import Table
new_rows = []
if not utils.issequence(key):
key = [key]
if not utils.issequence(properties):
properties = [properties]
new_column_names = key + [property_column, value_column]
row_names = []
for row in self._rows:
k = tuple(row[n] for n in key)
left_row = list(k)
if len(k) == 1:
row_names.append(k[0])
else:
row_names.append(k)
for f in properties:
new_rows.append(Row((left_row + [f, row[f]]), new_column_names))
key_column_types = [self._column_types[self._column_names.index(name)] for name in key]
if column_types is None or isinstance(column_types, TypeTester):
tester = TypeTester() if column_types is None else column_types
force_update = dict(zip(key, key_column_types))
force_update.update(tester._force)
tester._force = force_update
new_column_types = tester.run(new_rows, new_column_names)
else:
new_column_types = key_column_types + list(column_types)
return Table(new_rows, new_column_names, new_column_types)
agate-1.6.3/agate/table/order_by.py 0000664 0000000 0000000 00000002530 14074061410 0017126 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# pylint: disable=W0212
from agate import utils
def order_by(self, key, reverse=False):
"""
Create a new table that is sorted.
:param key:
Either the name of a single column to sort by, a sequence of such
names, or a :class:`function` that takes a row and returns a value
to sort by.
:param reverse:
If `True` then sort in reverse (typically, descending) order.
:returns:
A new :class:`.Table`.
"""
if len(self._rows) == 0:
return self._fork(self._rows)
else:
key_is_row_function = hasattr(key, '__call__')
key_is_sequence = utils.issequence(key)
def sort_key(data):
row = data[1]
if key_is_row_function:
k = key(row)
elif key_is_sequence:
k = tuple(utils.NullOrder() if row[n] is None else row[n] for n in key)
else:
k = row[key]
if k is None:
return utils.NullOrder()
return k
results = sorted(enumerate(self._rows), key=sort_key, reverse=reverse)
indices, rows = zip(*results)
if self._row_names is not None:
row_names = [self._row_names[i] for i in indices]
else:
row_names = None
return self._fork(rows, row_names=row_names)
agate-1.6.3/agate/table/pivot.py 0000664 0000000 0000000 00000011175 14074061410 0016467 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# pylint: disable=W0212
import six
from agate import utils
from agate.aggregations import Count
def pivot(self, key=None, pivot=None, aggregation=None, computation=None, default_value=utils.default, key_name=None):
"""
Create a new table by grouping the data, aggregating those groups,
applying a computation, and then organizing the groups into new rows and
columns.
This is sometimes called a "crosstab".
+---------+---------+--------+
| name | race | gender |
+=========+=========+========+
| Joe | white | male |
+---------+---------+--------+
| Jane | black | female |
+---------+---------+--------+
| Josh | black | male |
+---------+---------+--------+
| Jim | asian | female |
+---------+---------+--------+
This table can be pivoted with :code:`key` equal to "race" and
:code:`columns` equal to "gender". The default aggregation is
:class:`.Count`. This would result in the following table.
+---------+---------+--------+
| race | male | female |
+=========+=========+========+
| white | 1 | 0 |
+---------+---------+--------+
| black | 1 | 1 |
+---------+---------+--------+
| asian | 0 | 1 |
+---------+---------+--------+
If one or more keys are specified then the resulting table will
automatically have :code:`row_names` set to those keys.
See also the related method :meth:`.Table.denormalize`.
:param key:
Either the name of a column from the this table to group by, a
sequence of such column names, a :class:`function` that takes a
row and returns a value to group by, or :code:`None`, in which case
there will be only a single row in the output table.
:param pivot:
A column name whose unique values will become columns in the new
table, or :code:`None` in which case there will be a single value
column in the output table.
:param aggregation:
An instance of an :class:`.Aggregation` to perform on each group of
data in the pivot table. (Each cell is the result of an aggregation
of the grouped data.)
If not specified this defaults to :class:`.Count` with no arguments.
:param computation:
An optional :class:`.Computation` instance to be applied to the
aggregated sequence of values before they are transposed into the
pivot table.
Use the class name of the aggregation as your column name argument
when constructing your computation. (This is "Count" if using the
default value for :code:`aggregation`.)
:param default_value:
Value to be used for missing values in the pivot table. Defaults to
:code:`Decimal(0)`. If performing non-mathematical aggregations you
may wish to set this to :code:`None`.
:param key_name:
A name for the key column in the output table. This is most
useful when the provided key is a function. This argument is not
valid when :code:`key` is a sequence.
:returns:
A new :class:`.Table`.
"""
if key is None:
key = []
elif not utils.issequence(key):
key = [key]
elif key_name:
raise ValueError('key_name is not a valid argument when key is a sequence.')
if aggregation is None:
aggregation = Count()
groups = self
for k in key:
groups = groups.group_by(k, key_name=key_name)
aggregation_name = six.text_type(aggregation)
computation_name = six.text_type(computation) if computation else None
def apply_computation(table):
computed = table.compute([
(computation_name, computation)
])
excluded = computed.exclude([aggregation_name])
return excluded
if pivot is not None:
groups = groups.group_by(pivot)
column_type = aggregation.get_aggregate_data_type(self)
table = groups.aggregate([
(aggregation_name, aggregation)
])
pivot_count = len(set(table.columns[pivot].values()))
if computation is not None:
column_types = computation.get_computed_data_type(table)
table = apply_computation(table)
column_types = [column_type] * pivot_count
table = table.denormalize(key, pivot, computation_name or aggregation_name, default_value=default_value,
column_types=column_types)
else:
table = groups.aggregate([
(aggregation_name, aggregation)
])
if computation:
table = apply_computation(table)
return table
agate-1.6.3/agate/table/print_bars.py 0000664 0000000 0000000 00000017173 14074061410 0017475 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# -*- coding: utf8 -*-
# pylint: disable=W0212
from collections import OrderedDict
try:
from cdecimal import Decimal
except ImportError: # pragma: no cover
from decimal import Decimal
import sys
import six
from babel.numbers import format_decimal
from agate import config, utils
from agate.aggregations import Max, Min
from agate.data_types import Number
from agate.exceptions import DataTypeError
def print_bars(self, label_column_name='group', value_column_name='Count', domain=None, width=120, output=sys.stdout,
printable=False):
"""
Print a text-based bar chart based on this table.
:param label_column_name:
The column containing the label values. Defaults to :code:`group`, which
is the default output of :meth:`.Table.pivot` or :meth:`.Table.bins`.
:param value_column_name:
The column containing the bar values. Defaults to :code:`Count`, which
is the default output of :meth:`.Table.pivot` or :meth:`.Table.bins`.
:param domain:
A 2-tuple containing the minimum and maximum values for the chart's
x-axis. The domain must be large enough to contain all values in
the column.
:param width:
The width, in characters, to use for the bar chart. Defaults to
:code:`120`.
:param output:
A file-like object to print to. Defaults to :code:`sys.stdout`.
:param printable:
If true, only printable characters will be outputed.
"""
tick_mark = config.get_option('tick_char')
horizontal_line = config.get_option('horizontal_line_char')
locale = config.get_option('default_locale')
if printable:
bar_mark = config.get_option('printable_bar_char')
zero_mark = config.get_option('printable_zero_line_char')
else:
bar_mark = config.get_option('bar_char')
zero_mark = config.get_option('zero_line_char')
y_label = label_column_name
label_column = self._columns[label_column_name]
# if not isinstance(label_column.data_type, Text):
# raise ValueError('Only Text data is supported for bar chart labels.')
x_label = value_column_name
value_column = self._columns[value_column_name]
if not isinstance(value_column.data_type, Number):
raise DataTypeError('Only Number data is supported for bar chart values.')
output = output
width = width
# Format numbers
decimal_places = utils.max_precision(value_column)
value_formatter = utils.make_number_formatter(decimal_places)
formatted_labels = []
for label in label_column:
formatted_labels.append(six.text_type(label))
formatted_values = []
for value in value_column:
if value is None:
formatted_values.append('-')
else:
formatted_values.append(format_decimal(
value,
format=value_formatter,
locale=locale
))
max_label_width = max(max([len(label) for label in formatted_labels]), len(y_label))
max_value_width = max(max([len(value) for value in formatted_values]), len(x_label))
plot_width = width - (max_label_width + max_value_width + 2)
min_value = Min(value_column_name).run(self)
max_value = Max(value_column_name).run(self)
# Calculate dimensions
if domain:
x_min = Decimal(domain[0])
x_max = Decimal(domain[1])
if min_value < x_min or max_value > x_max:
raise ValueError('Column contains values outside specified domain')
else:
x_min, x_max = utils.round_limits(min_value, max_value)
# All positive
if x_min >= 0:
x_min = Decimal('0')
plot_negative_width = 0
zero_line = 0
plot_positive_width = plot_width - 1
# All negative
elif x_max <= 0:
x_max = Decimal('0')
plot_negative_width = plot_width - 1
zero_line = plot_width - 1
plot_positive_width = 0
# Mixed signs
else:
spread = x_max - x_min
negative_portion = (x_min.copy_abs() / spread)
# Subtract one for zero line
plot_negative_width = int(((plot_width - 1) * negative_portion).to_integral_value())
zero_line = plot_negative_width
plot_positive_width = plot_width - (plot_negative_width + 1)
def project(value):
if value >= 0:
return plot_negative_width + int((plot_positive_width * (value / x_max)).to_integral_value())
else:
return plot_negative_width - int((plot_negative_width * (value / x_min)).to_integral_value())
# Calculate ticks
ticks = OrderedDict()
# First tick
ticks[0] = x_min
ticks[plot_width - 1] = x_max
tick_fractions = [Decimal('0.25'), Decimal('0.5'), Decimal('0.75')]
# All positive
if x_min >= 0:
for fraction in tick_fractions:
value = x_max * fraction
ticks[project(value)] = value
# All negative
elif x_max <= 0:
for fraction in tick_fractions:
value = x_min * fraction
ticks[project(value)] = value
# Mixed signs
else:
# Zero tick
ticks[zero_line] = Decimal('0')
# Halfway between min and 0
value = x_min * Decimal('0.5')
ticks[project(value)] = value
# Halfway between 0 and max
value = x_max * Decimal('0.5')
ticks[project(value)] = value
decimal_places = utils.max_precision(ticks.values())
tick_formatter = utils.make_number_formatter(decimal_places)
ticks_formatted = OrderedDict()
for k, v in ticks.items():
ticks_formatted[k] = format_decimal(
v,
format=tick_formatter,
locale=locale
)
def write(line):
output.write(line + '\n')
# Chart top
top_line = u'%s %s' % (y_label.ljust(max_label_width), x_label.rjust(max_value_width))
write(top_line)
# Bars
for i, label in enumerate(formatted_labels):
value = value_column[i]
if value == 0 or value is None:
bar_width = 0
elif value > 0:
bar_width = project(value) - plot_negative_width
elif value < 0:
bar_width = plot_negative_width - project(value)
label_text = label.ljust(max_label_width)
value_text = formatted_values[i].rjust(max_value_width)
bar = bar_mark * bar_width
if value is not None and value >= 0:
gap = (u' ' * plot_negative_width)
# All positive
if x_min <= 0:
bar = gap + zero_mark + bar
else:
bar = bar + gap + zero_mark
else:
bar = u' ' * (plot_negative_width - bar_width) + bar
# All negative or mixed signs
if value is None or x_max > value:
bar = bar + zero_mark
bar = bar.ljust(plot_width)
write('%s %s %s' % (label_text, value_text, bar))
# Axis & ticks
axis = horizontal_line * plot_width
tick_text = u' ' * width
for i, (tick, label) in enumerate(ticks_formatted.items()):
# First tick
if tick == 0:
offset = 0
# Last tick
elif tick == plot_width - 1:
offset = -(len(label) - 1)
else:
offset = int(-(len(label) / 2))
pos = (width - plot_width) + tick + offset
# Don't print intermediate ticks that would overlap
if tick != 0 and tick != plot_width - 1:
if tick_text[pos - 1:pos + len(label) + 1] != ' ' * (len(label) + 2):
continue
tick_text = tick_text[:pos] + label + tick_text[pos + len(label):]
axis = axis[:tick] + tick_mark + axis[tick + 1:]
write(axis.rjust(width))
write(tick_text)
agate-1.6.3/agate/table/print_html.py 0000664 0000000 0000000 00000010622 14074061410 0017502 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# pylint: disable=W0212
import math
import sys
import six
from babel.numbers import format_decimal
from agate import config, utils
from agate.data_types import Number, Text
def print_html(self, max_rows=20, max_columns=6, output=sys.stdout, max_column_width=20, locale=None, max_precision=3):
"""
Print an HTML version of this table.
:param max_rows:
The maximum number of rows to display before truncating the data. This
defaults to :code:`20` to prevent accidental printing of the entire
table. Pass :code:`None` to disable the limit.
:param max_columns:
The maximum number of columns to display before truncating the data.
This defaults to :code:`6` to prevent wrapping in most cases. Pass
:code:`None` to disable the limit.
:param output:
A file-like object to print to. Defaults to :code:`sys.stdout`, unless
running in Jupyter. (See above.)
:param max_column_width:
Truncate all columns to at most this width. The remainder will be
replaced with ellipsis.
:param locale:
Provide a locale you would like to be used to format the output.
By default it will use the system's setting.
:max_precision:
Puts a limit on the maximum precision displayed for number types.
Numbers with lesser precision won't be affected.
This defaults to :code:`3`. Pass :code:`None` to disable limit.
"""
if max_rows is None:
max_rows = len(self._rows)
if max_columns is None:
max_columns = len(self._columns)
if max_precision is None:
max_precision = float('inf')
ellipsis = config.get_option('ellipsis_chars')
locale = locale or config.get_option('default_locale')
rows_truncated = max_rows < len(self._rows)
columns_truncated = max_columns < len(self._column_names)
column_names = list(self._column_names[:max_columns])
if columns_truncated:
column_names.append(ellipsis)
number_formatters = []
formatted_data = []
# Determine correct number of decimal places for each Number column
for i, c in enumerate(self._columns):
if i >= max_columns:
break
if isinstance(c.data_type, Number):
max_places = utils.max_precision(c[:max_rows])
add_ellipsis = False
if max_places > max_precision:
add_ellipsis = True
max_places = max_precision
number_formatters.append(utils.make_number_formatter(max_places, add_ellipsis))
else:
number_formatters.append(None)
# Format data
for i, row in enumerate(self._rows):
if i >= max_rows:
break
formatted_row = []
for j, v in enumerate(row):
if j >= max_columns:
v = ellipsis
elif v is None:
v = ''
elif number_formatters[j] is not None and not math.isinf(v):
v = format_decimal(
v,
format=number_formatters[j],
locale=locale
)
else:
v = six.text_type(v)
if max_column_width is not None and len(v) > max_column_width:
v = '%s...' % v[:max_column_width - 3]
formatted_row.append(v)
if j >= max_columns:
break
formatted_data.append(formatted_row)
def write(line):
output.write(line + '\n')
def write_row(formatted_row):
"""
Helper function that formats individual rows.
"""
write('
')
for j, d in enumerate(formatted_row):
# Text is left-justified, all other values are right-justified
if isinstance(self._column_types[j], Text):
write('
%s
' % d)
else:
write('
%s
' % d)
write('
')
# Header
write('
')
write('')
write('
')
for i, col in enumerate(column_names):
write('
%s
' % col)
write('
')
write('')
write('')
# Rows
for formatted_row in formatted_data:
write_row(formatted_row)
# Row indicating data was truncated
if rows_truncated:
write_row([ellipsis for n in column_names])
# Footer
write('')
write('
')
agate-1.6.3/agate/table/print_structure.py 0000664 0000000 0000000 00000001321 14074061410 0020572 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# pylint: disable=W0212
import sys
from agate.data_types import Text
def print_structure(self, output=sys.stdout, max_rows=None):
"""
Print this table's column names and types as a plain-text table.
:param output:
The output to print to.
"""
from agate.table import Table
name_column = [n for n in self._column_names]
type_column = [t.__class__.__name__ for t in self._column_types]
rows = zip(name_column, type_column)
column_names = ['column', 'data_type']
text = Text()
column_types = [text, text]
table = Table(rows, column_names, column_types)
return table.print_table(output=output, max_column_width=None, max_rows=max_rows)
agate-1.6.3/agate/table/print_table.py 0000664 0000000 0000000 00000011637 14074061410 0017634 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# pylint: disable=W0212
import math
import sys
import six
from babel.numbers import format_decimal
from agate import config, utils
from agate.data_types import Number, Text
def print_table(self, max_rows=20, max_columns=6, output=sys.stdout, max_column_width=20, locale=None,
max_precision=3):
"""
Print a text-based view of the data in this table.
The output of this method is GitHub Flavored Markdown (GFM) compatible.
:param max_rows:
The maximum number of rows to display before truncating the data. This
defaults to :code:`20` to prevent accidental printing of the entire
table. Pass :code:`None` to disable the limit.
:param max_columns:
The maximum number of columns to display before truncating the data.
This defaults to :code:`6` to prevent wrapping in most cases. Pass
:code:`None` to disable the limit.
:param output:
A file-like object to print to.
:param max_column_width:
Truncate all columns to at most this width. The remainder will be
replaced with ellipsis.
:param locale:
Provide a locale you would like to be used to format the output.
By default it will use the system's setting.
:max_precision:
Puts a limit on the maximum precision displayed for number types.
Numbers with lesser precision won't be affected.
This defaults to :code:`3`. Pass :code:`None` to disable limit.
"""
if max_rows is None:
max_rows = len(self._rows)
if max_columns is None:
max_columns = len(self._columns)
if max_precision is None:
max_precision = float('inf')
ellipsis = config.get_option('ellipsis_chars')
h_line = config.get_option('horizontal_line_char')
v_line = config.get_option('vertical_line_char')
locale = locale or config.get_option('default_locale')
rows_truncated = max_rows < len(self._rows)
columns_truncated = max_columns < len(self._column_names)
column_names = []
for column_name in self.column_names[:max_columns]:
if max_column_width is not None and len(column_name) > max_column_width:
column_names.append('%s...' % column_name[:max_column_width - 3])
else:
column_names.append(column_name)
if columns_truncated:
column_names.append(ellipsis)
widths = [len(n) for n in column_names]
number_formatters = []
formatted_data = []
# Determine correct number of decimal places for each Number column
for i, c in enumerate(self._columns):
if i >= max_columns:
break
if isinstance(c.data_type, Number):
max_places = utils.max_precision(c[:max_rows])
add_ellipsis = False
if max_places > max_precision:
add_ellipsis = True
max_places = max_precision
number_formatters.append(utils.make_number_formatter(max_places, add_ellipsis))
else:
number_formatters.append(None)
# Format data and display column widths
for i, row in enumerate(self._rows):
if i >= max_rows:
break
formatted_row = []
for j, v in enumerate(row):
if j >= max_columns:
v = ellipsis
elif v is None:
v = ''
elif number_formatters[j] is not None and not math.isinf(v):
v = format_decimal(
v,
format=number_formatters[j],
locale=locale
)
else:
v = six.text_type(v)
if max_column_width is not None and len(v) > max_column_width:
v = '%s...' % v[:max_column_width - 3]
if len(v) > widths[j]:
widths[j] = len(v)
formatted_row.append(v)
if j >= max_columns:
break
formatted_data.append(formatted_row)
def write(line):
output.write(line + '\n')
def write_row(formatted_row):
"""
Helper function that formats individual rows.
"""
row_output = []
for j, d in enumerate(formatted_row):
# Text is left-justified, all other values are right-justified
if isinstance(self._column_types[j], Text):
output = ' %s ' % d.ljust(widths[j])
else:
output = ' %s ' % d.rjust(widths[j])
row_output.append(output)
text = v_line.join(row_output)
write('%s%s%s' % (v_line, text, v_line))
divider = '%(v_line)s %(columns)s %(v_line)s' % {
'v_line': v_line,
'columns': ' | '.join(h_line * w for w in widths)
}
# Headers
write_row(column_names)
write(divider)
# Rows
for formatted_row in formatted_data:
write_row(formatted_row)
# Row indicating data was truncated
if rows_truncated:
write_row([ellipsis for n in column_names])
agate-1.6.3/agate/table/rename.py 0000664 0000000 0000000 00000004671 14074061410 0016600 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# pylint: disable=W0212
from agate import utils
def rename(self, column_names=None, row_names=None, slug_columns=False, slug_rows=False, **kwargs):
"""
Create a copy of this table with different column names or row names.
By enabling :code:`slug_columns` or :code:`slug_rows` and not specifying
new names you may slugify the table's existing names.
:code:`kwargs` will be passed to the slugify method in python-slugify. See:
https://github.com/un33k/python-slugify
:param column_names:
New column names for the renamed table. May be either an array or
a dictionary mapping existing column names to new names. If not
specified, will use this table's existing column names.
:param row_names:
New row names for the renamed table. May be either an array or
a dictionary mapping existing row names to new names. If not
specified, will use this table's existing row names.
:param slug_columns:
If True, column names will be converted to slugs and duplicate names
will have unique identifiers appended.
:param slug_rows:
If True, row names will be converted to slugs and dupicate names will
have unique identifiers appended.
"""
from agate.table import Table
if isinstance(column_names, dict):
column_names = [column_names[name] if name in column_names else name for name in self._column_names]
if isinstance(row_names, dict):
row_names = [row_names[name] if name in row_names else name for name in self._row_names]
if slug_columns:
column_names = column_names or self._column_names
if column_names is not None:
if column_names == self._column_names:
column_names = utils.slugify(column_names, ensure_unique=False, **kwargs)
else:
column_names = utils.slugify(column_names, ensure_unique=True, **kwargs)
if slug_rows:
row_names = row_names or self.row_names
if row_names is not None:
row_names = utils.slugify(row_names, ensure_unique=True, **kwargs)
if column_names is not None and column_names != self._column_names:
if row_names is None:
row_names = self._row_names
return Table(self._rows, column_names, self._column_types, row_names=row_names, _is_fork=False)
else:
return self._fork(self._rows, column_names, self._column_types, row_names=row_names)
agate-1.6.3/agate/table/scatterplot.py 0000664 0000000 0000000 00000002240 14074061410 0017663 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# pylint: disable=W0212
import leather
def scatterplot(self, x=0, y=1, path=None, width=None, height=None):
"""
Render a scatterplot using :class:`leather.Chart`.
:param x:
The name or index of a column to plot as the x-axis. Defaults to the
first column in the table.
:param y:
The name or index of a column to plot as the y-axis. Defaults to the
second column in the table.
:param path:
If specified, the resulting SVG will be saved to this location. If
:code:`None` and running in IPython, then the SVG will be rendered
inline. Otherwise, the SVG data will be returned as a string.
:param width:
The width of the output SVG.
:param height:
The height of the output SVG.
"""
if type(x) is int:
x_name = self.column_names[x]
else:
x_name = x
if type(y) is int:
y_name = self.column_names[y]
else:
y_name = y
chart = leather.Chart()
chart.add_x_axis(name=x_name)
chart.add_y_axis(name=y_name)
chart.add_dots(self, x=x, y=y)
return chart.to_svg(path=path, width=width, height=height)
agate-1.6.3/agate/table/select.py 0000664 0000000 0000000 00000001261 14074061410 0016600 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# pylint: disable=W0212
from agate import utils
from agate.rows import Row
def select(self, key):
"""
Create a new table with only the specified columns.
:param key:
Either the name of a single column to include or a sequence of such
names.
:returns:
A new :class:`.Table`.
"""
if not utils.issequence(key):
key = [key]
indexes = tuple(self._column_names.index(k) for k in key)
column_types = tuple(self._column_types[i] for i in indexes)
new_rows = []
for row in self._rows:
new_rows.append(Row((row[i] for i in indexes), key))
return self._fork(new_rows, key, column_types)
agate-1.6.3/agate/table/to_csv.py 0000664 0000000 0000000 00000002125 14074061410 0016616 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# pylint: disable=W0212
import os
def to_csv(self, path, **kwargs):
"""
Write this table to a CSV. This method uses agate's builtin CSV writer,
which supports unicode on both Python 2 and Python 3.
`kwargs` will be passed through to the CSV writer.
:param path:
Filepath or file-like object to write to.
"""
from agate import csv
if 'lineterminator' not in kwargs:
kwargs['lineterminator'] = '\n'
close = True
f = None
try:
if hasattr(path, 'write'):
f = path
close = False
else:
dirpath = os.path.dirname(path)
if dirpath and not os.path.exists(dirpath):
os.makedirs(dirpath)
f = open(path, 'w')
writer = csv.writer(f, **kwargs)
writer.writerow(self._column_names)
csv_funcs = [c.csvify for c in self._column_types]
for row in self._rows:
writer.writerow(tuple(csv_funcs[i](d) for i, d in enumerate(row)))
finally:
if close and f is not None:
f.close()
agate-1.6.3/agate/table/to_json.py 0000664 0000000 0000000 00000006037 14074061410 0017002 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# pylint: disable=W0212
import codecs
import json
import os
from collections import OrderedDict
import six
def to_json(self, path, key=None, newline=False, indent=None, **kwargs):
"""
Write this table to a JSON file or file-like object.
:code:`kwargs` will be passed through to the JSON encoder.
:param path:
File path or file-like object to write to.
:param key:
If specified, JSON will be output as an hash instead of a list. May
be either the name of a column from the this table containing
unique values or a :class:`function` that takes a row and returns
a unique value.
:param newline:
If `True`, output will be in the form of "newline-delimited JSON".
:param indent:
If specified, the number of spaces to indent the JSON for
formatting.
"""
if key is not None and newline:
raise ValueError('key and newline may not be specified together.')
if newline and indent is not None:
raise ValueError('newline and indent may not be specified together.')
key_is_row_function = hasattr(key, '__call__')
json_kwargs = {
'ensure_ascii': False,
'indent': indent
}
if six.PY2:
json_kwargs['encoding'] = 'utf-8'
# Pass remaining kwargs through to JSON encoder
json_kwargs.update(kwargs)
json_funcs = [c.jsonify for c in self._column_types]
close = True
f = None
try:
if hasattr(path, 'write'):
f = path
close = False
else:
if os.path.dirname(path) and not os.path.exists(os.path.dirname(path)):
os.makedirs(os.path.dirname(path))
f = open(path, 'w')
if six.PY2:
f = codecs.getwriter('utf-8')(f)
def dump_json(data):
json.dump(data, f, **json_kwargs)
if newline:
f.write('\n')
# Keyed
if key is not None:
output = OrderedDict()
for row in self._rows:
if key_is_row_function:
k = key(row)
else:
k = str(row[key]) if six.PY3 else unicode(row[key]) # noqa: F821
if k in output:
raise ValueError('Value %s is not unique in the key column.' % six.text_type(k))
values = tuple(json_funcs[i](d) for i, d in enumerate(row))
output[k] = OrderedDict(zip(row.keys(), values))
dump_json(output)
# Newline-delimited
elif newline:
for row in self._rows:
values = tuple(json_funcs[i](d) for i, d in enumerate(row))
dump_json(OrderedDict(zip(row.keys(), values)))
# Normal
else:
output = []
for row in self._rows:
values = tuple(json_funcs[i](d) for i, d in enumerate(row))
output.append(OrderedDict(zip(row.keys(), values)))
dump_json(output)
finally:
if close and f is not None:
f.close()
agate-1.6.3/agate/table/where.py 0000664 0000000 0000000 00000001344 14074061410 0016435 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# pylint: disable=W0212
def where(self, test):
"""
Create a new :class:`.Table` with only those rows that pass a test.
:param test:
A function that takes a :class:`.Row` and returns :code:`True` if
it should be included in the new :class:`.Table`.
:type test:
:class:`function`
:returns:
A new :class:`.Table`.
"""
rows = []
if self._row_names is not None:
row_names = []
else:
row_names = None
for i, row in enumerate(self._rows):
if test(row):
rows.append(row)
if row_names is not None:
row_names.append(self._row_names[i])
return self._fork(rows, row_names=row_names)
agate-1.6.3/agate/tableset/ 0000775 0000000 0000000 00000000000 14074061410 0015463 5 ustar 00root root 0000000 0000000 agate-1.6.3/agate/tableset/__init__.py 0000664 0000000 0000000 00000015661 14074061410 0017605 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
"""
The :class:`.TableSet` class collects a set of related tables in a single data
structure. The most common way of creating a :class:`.TableSet` is using the
:meth:`.Table.group_by` method, which is similar to SQL's ``GROUP BY`` keyword.
The resulting set of tables will all have identical columns structure.
:class:`.TableSet` functions as a dictionary. Individual tables in the set can
be accessed by using their name as a key. If the table set was created using
:meth:`.Table.group_by` then the names of the tables will be the grouping
factors found in the original data.
:class:`.TableSet` replicates the majority of the features of :class:`.Table`.
When methods such as :meth:`.TableSet.select`, :meth:`.TableSet.where` or
:meth:`.TableSet.order_by` are used, the operation is applied to *each* table
in the set and the result is a new :class:`TableSet` instance made up of
entirely new :class:`.Table` instances.
:class:`.TableSet` instances can also contain other TableSet's. This means you
can chain calls to :meth:`.Table.group_by` and :meth:`.TableSet.group_by`
and end up with data grouped across multiple dimensions.
:meth:`.TableSet.aggregate` on nested TableSets will then group across multiple
dimensions.
"""
import six
from six.moves import zip_longest
from agate.data_types import Text
from agate.mapped_sequence import MappedSequence
class TableSet(MappedSequence):
"""
An group of named tables with identical column definitions. Supports
(almost) all the same operations as :class:`.Table`. When executed on a
:class:`TableSet`, any operation that would have returned a new
:class:`.Table` instead returns a new :class:`TableSet`. Any operation
that would have returned a single value instead returns a dictionary of
values.
TableSet is implemented as a subclass of :class:`.MappedSequence`
:param tables:
A sequence :class:`Table` instances.
:param keys:
A sequence of keys corresponding to the tables. These may be any type
except :class:`int`.
:param key_name:
A name that describes the grouping properties. Used as the column
header when the groups are aggregated. Defaults to the column name that
was grouped on.
:param key_type:
An instance some subclass of :class:`.DataType`. If not provided it
will default to a :class`.Text`.
:param _is_fork:
Used internally to skip certain validation steps when data
is propagated from an existing tablset.
"""
def __init__(self, tables, keys, key_name='group', key_type=None, _is_fork=False):
tables = tuple(tables)
keys = tuple(keys)
self._key_name = key_name
self._key_type = key_type or Text()
self._sample_table = tables[0]
while isinstance(self._sample_table, TableSet):
self._sample_table = self._sample_table[0]
self._column_types = self._sample_table.column_types
self._column_names = self._sample_table.column_names
if not _is_fork:
for table in tables:
if any(not isinstance(a, type(b)) for a, b in zip_longest(table.column_types, self._column_types)):
raise ValueError('Not all tables have the same column types!')
if table.column_names != self._column_names:
raise ValueError('Not all tables have the same column names!')
MappedSequence.__init__(self, tables, keys)
def __str__(self):
"""
Print the tableset's structure via :meth:`TableSet.print_structure`.
"""
structure = six.StringIO()
self.print_structure(output=structure)
return structure.getvalue()
@property
def key_name(self):
"""
Get the name of the key this TableSet is grouped by. (If created using
:meth:`.Table.group_by` then this is the original column name.)
"""
return self._key_name
@property
def key_type(self):
"""
Get the :class:`.DataType` this TableSet is grouped by. (If created
using :meth:`.Table.group_by` then this is the original column type.)
"""
return self._key_type
@property
def column_types(self):
"""
Get an ordered list of this :class:`.TableSet`'s column types.
:returns:
A :class:`tuple` of :class:`.DataType` instances.
"""
return self._column_types
@property
def column_names(self):
"""
Get an ordered list of this :class:`TableSet`'s column names.
:returns:
A :class:`tuple` of strings.
"""
return self._column_names
def _fork(self, tables, keys, key_name=None, key_type=None):
"""
Create a new :class:`.TableSet` using the metadata from this one.
This method is used internally by functions like
:meth:`.TableSet.having`.
"""
if key_name is None:
key_name = self._key_name
if key_type is None:
key_type = self._key_type
return TableSet(tables, keys, key_name, key_type, _is_fork=True)
def _proxy(self, method_name, *args, **kwargs):
"""
Calls a method on each table in this :class:`.TableSet`.
"""
tables = []
for key, table in self.items():
tables.append(getattr(table, method_name)(*args, **kwargs))
return self._fork(
tables,
self.keys()
)
from agate.tableset.aggregate import aggregate
from agate.tableset.bar_chart import bar_chart
from agate.tableset.column_chart import column_chart
from agate.tableset.from_csv import from_csv
from agate.tableset.from_json import from_json
from agate.tableset.having import having
from agate.tableset.line_chart import line_chart
from agate.tableset.merge import merge
from agate.tableset.print_structure import print_structure
from agate.tableset.proxy_methods import (bins, compute, denormalize, distinct, exclude, find, group_by, homogenize,
join, limit, normalize, order_by, pivot, select, where)
from agate.tableset.scatterplot import scatterplot
from agate.tableset.to_csv import to_csv
from agate.tableset.to_json import to_json
TableSet.aggregate = aggregate
TableSet.bar_chart = bar_chart
TableSet.bins = bins
TableSet.column_chart = column_chart
TableSet.compute = compute
TableSet.denormalize = denormalize
TableSet.distinct = distinct
TableSet.exclude = exclude
TableSet.find = find
TableSet.from_csv = from_csv
TableSet.from_json = from_json
TableSet.group_by = group_by
TableSet.having = having
TableSet.homogenize = homogenize
TableSet.join = join
TableSet.limit = limit
TableSet.line_chart = line_chart
TableSet.merge = merge
TableSet.normalize = normalize
TableSet.order_by = order_by
TableSet.pivot = pivot
TableSet.print_structure = print_structure
TableSet.scatterplot = scatterplot
TableSet.select = select
TableSet.to_csv = to_csv
TableSet.to_json = to_json
TableSet.where = where
agate-1.6.3/agate/tableset/aggregate.py 0000664 0000000 0000000 00000005177 14074061410 0017775 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# pylint: disable=W0212
from agate.table import Table
def _aggregate(self, aggregations=[]):
"""
Recursive aggregation allowing for TableSet's to be nested inside
one another.
"""
from agate.tableset import TableSet
output = []
# Process nested TableSet's
if isinstance(self._values[0], TableSet):
for key, nested_tableset in self.items():
column_names, column_types, nested_output, row_name_columns = _aggregate(nested_tableset, aggregations)
for row in nested_output:
row.insert(0, key)
output.append(row)
column_names.insert(0, self._key_name)
column_types.insert(0, self._key_type)
row_name_columns.insert(0, self._key_name)
# Regular Tables
else:
column_names = [self._key_name]
column_types = [self._key_type]
row_name_columns = [self._key_name]
for new_column_name, aggregation in aggregations:
column_names.append(new_column_name)
column_types.append(aggregation.get_aggregate_data_type(self._sample_table))
for name, table in self.items():
for new_column_name, aggregation in aggregations:
aggregation.validate(table)
for name, table in self.items():
new_row = [name]
for new_column_name, aggregation in aggregations:
new_row.append(aggregation.run(table))
output.append(new_row)
return column_names, column_types, output, row_name_columns
def aggregate(self, aggregations):
"""
Aggregate data from the tables in this set by performing some
set of column operations on the groups and coalescing the results into
a new :class:`.Table`.
:code:`aggregations` must be a sequence of tuples, where each has two
parts: a :code:`new_column_name` and a :class:`.Aggregation` instance.
The resulting table will have the keys from this :class:`TableSet` (and
any nested TableSets) set as its :code:`row_names`. See
:meth:`.Table.__init__` for more details.
:param aggregations:
A list of tuples in the format :code:`(new_column_name, aggregation)`,
where each :code:`aggregation` is an instance of :class:`.Aggregation`.
:returns:
A new :class:`.Table`.
"""
column_names, column_types, output, row_name_columns = _aggregate(self, aggregations)
if len(row_name_columns) == 1:
row_names = row_name_columns[0]
else:
def row_names(r):
return tuple(r[n] for n in row_name_columns)
return Table(output, column_names, column_types, row_names=row_names)
agate-1.6.3/agate/tableset/bar_chart.py 0000664 0000000 0000000 00000002504 14074061410 0017763 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# pylint: disable=W0212
import leather
def bar_chart(self, label=0, value=1, path=None, width=None, height=None):
"""
Render a lattice/grid of bar charts using :class:`leather.Lattice`.
:param label:
The name or index of a column to plot as the labels of the chart.
Defaults to the first column in the table.
:param value:
The name or index of a column to plot as the values of the chart.
Defaults to the second column in the table.
:param path:
If specified, the resulting SVG will be saved to this location. If
:code:`None` and running in IPython, then the SVG will be rendered
inline. Otherwise, the SVG data will be returned as a string.
:param width:
The width of the output SVG.
:param height:
The height of the output SVG.
"""
if type(label) is int:
label_name = self.column_names[label]
else:
label_name = label
if type(value) is int:
value_name = self.column_names[value]
else:
value_name = value
chart = leather.Lattice(shape=leather.Bars())
chart.add_x_axis(name=value_name)
chart.add_y_axis(name=label_name)
chart.add_many(self.values(), x=value, y=label, titles=self.keys())
return chart.to_svg(path=path, width=width, height=height)
agate-1.6.3/agate/tableset/column_chart.py 0000664 0000000 0000000 00000002515 14074061410 0020516 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# pylint: disable=W0212
import leather
def column_chart(self, label=0, value=1, path=None, width=None, height=None):
"""
Render a lattice/grid of column charts using :class:`leather.Lattice`.
:param label:
The name or index of a column to plot as the labels of the chart.
Defaults to the first column in the table.
:param value:
The name or index of a column to plot as the values of the chart.
Defaults to the second column in the table.
:param path:
If specified, the resulting SVG will be saved to this location. If
:code:`None` and running in IPython, then the SVG will be rendered
inline. Otherwise, the SVG data will be returned as a string.
:param width:
The width of the output SVG.
:param height:
The height of the output SVG.
"""
if type(label) is int:
label_name = self.column_names[label]
else:
label_name = label
if type(value) is int:
value_name = self.column_names[value]
else:
value_name = value
chart = leather.Lattice(shape=leather.Columns())
chart.add_x_axis(name=label_name)
chart.add_y_axis(name=value_name)
chart.add_many(self.values(), x=label, y=value, titles=self.keys())
return chart.to_svg(path=path, width=width, height=height)
agate-1.6.3/agate/tableset/from_csv.py 0000664 0000000 0000000 00000002260 14074061410 0017653 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
import os
from collections import OrderedDict
from glob import glob
from agate.table import Table
@classmethod
def from_csv(cls, dir_path, column_names=None, column_types=None, row_names=None, header=True, **kwargs):
"""
Create a new :class:`TableSet` from a directory of CSVs.
See :meth:`.Table.from_csv` for additional details.
:param dir_path:
Path to a directory full of CSV files. All CSV files in this
directory will be loaded.
:param column_names:
See :meth:`Table.__init__`.
:param column_types:
See :meth:`Table.__init__`.
:param row_names:
See :meth:`Table.__init__`.
:param header:
See :meth:`Table.from_csv`.
"""
from agate.tableset import TableSet
if not os.path.isdir(dir_path):
raise IOError('Specified path doesn\'t exist or isn\'t a directory.')
tables = OrderedDict()
for path in glob(os.path.join(dir_path, '*.csv')):
name = os.path.split(path)[1].strip('.csv')
tables[name] = Table.from_csv(path, column_names, column_types, row_names=row_names, header=header, **kwargs)
return TableSet(tables.values(), tables.keys())
agate-1.6.3/agate/tableset/from_json.py 0000664 0000000 0000000 00000004245 14074061410 0020036 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
import json
import os
from collections import OrderedDict
from decimal import Decimal
from glob import glob
import six
from agate.table import Table
@classmethod
def from_json(cls, path, column_names=None, column_types=None, keys=None, **kwargs):
"""
Create a new :class:`TableSet` from a directory of JSON files or a
single JSON object with key value (Table key and list of row objects)
pairs for each :class:`Table`.
See :meth:`.Table.from_json` for additional details.
:param path:
Path to a directory containing JSON files or filepath/file-like
object of nested JSON file.
:param keys:
A list of keys of the top-level dictionaries for each file. If
specified, length must be equal to number of JSON files in path.
:param column_types:
See :meth:`Table.__init__`.
"""
from agate.tableset import TableSet
if isinstance(path, six.string_types) and not os.path.isdir(path) and not os.path.isfile(path):
raise IOError('Specified path doesn\'t exist.')
tables = OrderedDict()
if isinstance(path, six.string_types) and os.path.isdir(path):
filepaths = glob(os.path.join(path, '*.json'))
if keys is not None and len(keys) != len(filepaths):
raise ValueError('If specified, keys must have length equal to number of JSON files')
for i, filepath in enumerate(filepaths):
name = os.path.split(filepath)[1].strip('.json')
if keys is not None:
tables[name] = Table.from_json(filepath, keys[i], column_types=column_types, **kwargs)
else:
tables[name] = Table.from_json(filepath, column_types=column_types, **kwargs)
else:
if hasattr(path, 'read'):
js = json.load(path, object_pairs_hook=OrderedDict, parse_float=Decimal, **kwargs)
else:
with open(path, 'r') as f:
js = json.load(f, object_pairs_hook=OrderedDict, parse_float=Decimal, **kwargs)
for key, value in js.items():
tables[key] = Table.from_object(value, column_types=column_types, **kwargs)
return TableSet(tables.values(), tables.keys())
agate-1.6.3/agate/tableset/having.py 0000664 0000000 0000000 00000002145 14074061410 0017313 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# pylint: disable=W0212
def having(self, aggregations, test):
"""
Create a new :class:`.TableSet` with only those tables that pass a test.
This works by applying a sequence of :class:`Aggregation` instances to
each table. The resulting dictionary of properties is then passed to
the :code:`test` function.
This method does not modify the underlying tables in any way.
:param aggregations:
A list of tuples in the format :code:`(name, aggregation)`, where
each :code:`aggregation` is an instance of :class:`.Aggregation`.
:param test:
A function that takes a dictionary of aggregated properties and returns
:code:`True` if it should be included in the new :class:`.TableSet`.
:type test:
:class:`function`
:returns:
A new :class:`.TableSet`.
"""
new_tables = []
new_keys = []
for key, table in self.items():
props = table.aggregate(aggregations)
if test(props):
new_tables.append(table)
new_keys.append(key)
return self._fork(new_tables, new_keys)
agate-1.6.3/agate/tableset/line_chart.py 0000664 0000000 0000000 00000002376 14074061410 0020155 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# pylint: disable=W0212
import leather
def line_chart(self, x=0, y=1, path=None, width=None, height=None):
"""
Render a lattice/grid of line charts using :class:`leather.Lattice`.
:param x:
The name or index of a column to plot as the x axis of the chart.
Defaults to the first column in the table.
:param y:
The name or index of a column to plot as the y axis of the chart.
Defaults to the second column in the table.
:param path:
If specified, the resulting SVG will be saved to this location. If
:code:`None` and running in IPython, then the SVG will be rendered
inline. Otherwise, the SVG data will be returned as a string.
:param width:
The width of the output SVG.
:param height:
The height of the output SVG.
"""
if type(x) is int:
x_name = self.column_names[x]
else:
x_name = x
if type(y) is int:
y_name = self.column_names[y]
else:
y_name = y
chart = leather.Lattice(shape=leather.Line())
chart.add_x_axis(name=x_name)
chart.add_y_axis(name=y_name)
chart.add_many(self.values(), x=x, y=y, titles=self.keys())
return chart.to_svg(path=path, width=width, height=height)
agate-1.6.3/agate/tableset/merge.py 0000664 0000000 0000000 00000003471 14074061410 0017141 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# pylint: disable=W0212
from agate.rows import Row
from agate.table import Table
def merge(self, groups=None, group_name=None, group_type=None):
"""
Convert this TableSet into a single table. This is the inverse of
:meth:`.Table.group_by`.
Any `row_names` set on the merged tables will be lost in this
process.
:param groups:
A list of grouping factors to add to merged rows in a new column.
If specified, it should have exactly one element per :class:`Table`
in the :class:`TableSet`. If not specified or None, the grouping
factor will be the name of the :class:`Row`'s original Table.
:param group_name:
This will be the column name of the grouping factors. If None,
defaults to the :attr:`TableSet.key_name`.
:param group_type:
This will be the column type of the grouping factors. If None,
defaults to the :attr:`TableSet.key_type`.
:returns:
A new :class:`Table`.
"""
if type(groups) is not list and groups is not None:
raise ValueError('Groups must be None or a list.')
if type(groups) is list and len(groups) != len(self):
raise ValueError('Groups length must be equal to TableSet length.')
column_names = list(self._column_names)
column_types = list(self._column_types)
column_names.insert(0, group_name if group_name else self._key_name)
column_types.insert(0, group_type if group_type else self._key_type)
rows = []
for index, (key, table) in enumerate(self.items()):
for row in table._rows:
if groups is None:
rows.append(Row((key,) + tuple(row), column_names))
else:
rows.append(Row((groups[index],) + tuple(row), column_names))
return Table(rows, column_names, column_types)
agate-1.6.3/agate/tableset/print_structure.py 0000664 0000000 0000000 00000001613 14074061410 0021312 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
import sys
from agate.data_types import Text
from agate.table import Table
def print_structure(self, max_rows=20, output=sys.stdout):
"""
Print the keys and row counts of each table in the tableset.
:param max_rows:
The maximum number of rows to display before truncating the data.
Defaults to 20.
:param output:
The output used to print the structure of the :class:`Table`.
:returns:
None
"""
max_length = min(len(self.items()), max_rows)
name_column = self.keys()[0:max_length]
type_column = [str(len(table.rows)) for key, table in self.items()[0:max_length]]
rows = zip(name_column, type_column)
column_names = ['table', 'rows']
text = Text()
column_types = [text, text]
table = Table(rows, column_names, column_types)
return table.print_table(output=output, max_column_width=None)
agate-1.6.3/agate/tableset/proxy_methods.py 0000664 0000000 0000000 00000004761 14074061410 0020751 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
def bins(self, *args, **kwargs):
"""
Calls :meth:`.Table.bins` on each table in the TableSet.
"""
return self._proxy('bins', *args, **kwargs)
def compute(self, *args, **kwargs):
"""
Calls :meth:`.Table.compute` on each table in the TableSet.
"""
return self._proxy('compute', *args, **kwargs)
def denormalize(self, *args, **kwargs):
"""
Calls :meth:`.Table.denormalize` on each table in the TableSet.
"""
return self._proxy('denormalize', *args, **kwargs)
def distinct(self, *args, **kwargs):
"""
Calls :meth:`.Table.distinct` on each table in the TableSet.
"""
return self._proxy('distinct', *args, **kwargs)
def exclude(self, *args, **kwargs):
"""
Calls :meth:`.Table.exclude` on each table in the TableSet.
"""
return self._proxy('exclude', *args, **kwargs)
def find(self, *args, **kwargs):
"""
Calls :meth:`.Table.find` on each table in the TableSet.
"""
return self._proxy('find', *args, **kwargs)
def group_by(self, *args, **kwargs):
"""
Calls :meth:`.Table.group_by` on each table in the TableSet.
"""
return self._proxy('group_by', *args, **kwargs)
def homogenize(self, *args, **kwargs):
"""
Calls :meth:`.Table.homogenize` on each table in the TableSet.
"""
return self._proxy('homogenize', *args, **kwargs)
def join(self, *args, **kwargs):
"""
Calls :meth:`.Table.join` on each table in the TableSet.
"""
return self._proxy('join', *args, **kwargs)
def limit(self, *args, **kwargs):
"""
Calls :meth:`.Table.limit` on each table in the TableSet.
"""
return self._proxy('limit', *args, **kwargs)
def normalize(self, *args, **kwargs):
"""
Calls :meth:`.Table.normalize` on each table in the TableSet.
"""
return self._proxy('normalize', *args, **kwargs)
def order_by(self, *args, **kwargs):
"""
Calls :meth:`.Table.order_by` on each table in the TableSet.
"""
return self._proxy('order_by', *args, **kwargs)
def pivot(self, *args, **kwargs):
"""
Calls :meth:`.Table.pivot` on each table in the TableSet.
"""
return self._proxy('pivot', *args, **kwargs)
def select(self, *args, **kwargs):
"""
Calls :meth:`.Table.select` on each table in the TableSet.
"""
return self._proxy('select', *args, **kwargs)
def where(self, *args, **kwargs):
"""
Calls :meth:`.Table.where` on each table in the TableSet.
"""
return self._proxy('where', *args, **kwargs)
agate-1.6.3/agate/tableset/scatterplot.py 0000664 0000000 0000000 00000002400 14074061410 0020375 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# pylint: disable=W0212
import leather
def scatterplot(self, x=0, y=1, path=None, width=None, height=None):
"""
Render a lattice/grid of scatterplots using :class:`leather.Lattice`.
:param x:
The name or index of a column to plot as the x axis of the chart.
Defaults to the first column in the table.
:param y:
The name or index of a column to plot as the y axis of the chart.
Defaults to the second column in the table.
:param path:
If specified, the resulting SVG will be saved to this location. If
:code:`None` and running in IPython, then the SVG will be rendered
inline. Otherwise, the SVG data will be returned as a string.
:param width:
The width of the output SVG.
:param height:
The height of the output SVG.
"""
if type(x) is int:
x_name = self.column_names[x]
else:
x_name = x
if type(y) is int:
y_name = self.column_names[y]
else:
y_name = y
chart = leather.Lattice(shape=leather.Dots())
chart.add_x_axis(name=x_name)
chart.add_y_axis(name=y_name)
chart.add_many(self.values(), x=x, y=y, titles=self.keys())
return chart.to_svg(path=path, width=width, height=height)
agate-1.6.3/agate/tableset/to_csv.py 0000664 0000000 0000000 00000000762 14074061410 0017337 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
import os
def to_csv(self, dir_path, **kwargs):
"""
Write each table in this set to a separate CSV in a given
directory.
See :meth:`.Table.to_csv` for additional details.
:param dir_path:
Path to the directory to write the CSV files to.
"""
if not os.path.exists(dir_path):
os.makedirs(dir_path)
for name, table in self.items():
path = os.path.join(dir_path, '%s.csv' % name)
table.to_csv(path, **kwargs)
agate-1.6.3/agate/tableset/to_json.py 0000664 0000000 0000000 00000003525 14074061410 0017515 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
import json
import os
from collections import OrderedDict
import six
def to_json(self, path, nested=False, indent=None, **kwargs):
"""
Write :class:`TableSet` to either a set of JSON files for each table or
a single nested JSON file.
See :meth:`.Table.to_json` for additional details.
:param path:
Path to the directory to write the JSON file(s) to. If nested is
`True`, this should be a file path or file-like object to write to.
:param nested:
If `True`, the output will be a single nested JSON file with each
Table's key paired with a list of row objects. Otherwise, the output
will be a set of files for each table. Defaults to `False`.
:param indent:
See :meth:`Table.to_json`.
"""
if not nested:
if not os.path.exists(path):
os.makedirs(path)
for name, table in self.items():
filepath = os.path.join(path, '%s.json' % name)
table.to_json(filepath, indent=indent, **kwargs)
else:
close = True
tableset_dict = OrderedDict()
for name, table in self.items():
output = six.StringIO()
table.to_json(output, **kwargs)
tableset_dict[name] = json.loads(output.getvalue(), object_pairs_hook=OrderedDict)
if hasattr(path, 'write'):
f = path
close = False
else:
dirpath = os.path.dirname(path)
if dirpath and not os.path.exists(dirpath):
os.makedirs(dirpath)
f = open(path, 'w')
json_kwargs = {'ensure_ascii': False, 'indent': indent}
if six.PY2:
json_kwargs['encoding'] = 'utf-8'
json_kwargs.update(kwargs)
json.dump(tableset_dict, f, **json_kwargs)
if close and f is not None:
f.close()
agate-1.6.3/agate/testcase.py 0000664 0000000 0000000 00000003662 14074061410 0016054 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
try:
import unittest2 as unittest
except ImportError:
import unittest
import agate
class AgateTestCase(unittest.TestCase):
"""
Unittest case for quickly asserting logic about tables.
"""
def assertColumnNames(self, table, names):
"""
Verify the column names in the given table match what is expected.
"""
self.assertIsInstance(table, agate.Table)
self.assertSequenceEqual(table.column_names, names)
self.assertSequenceEqual(
[c.name for c in table.columns],
names
)
for row in table.rows:
self.assertSequenceEqual(
row.keys(),
names
)
def assertColumnTypes(self, table, types):
"""
Verify the column types in the given table are of the expected types.
"""
self.assertIsInstance(table, agate.Table)
table_types = table.column_types
column_types = [c.data_type for c in table.columns]
for i, test_type in enumerate(types):
self.assertIsInstance(table_types[i], test_type)
self.assertIsInstance(column_types[i], test_type)
def assertRows(self, table, rows):
"""
Verify the row data in the given table match what is expected.
"""
self.assertIsInstance(table, agate.Table)
for i, row in enumerate(rows):
self.assertSequenceEqual(table.rows[i], row)
def assertRowNames(self, table, names):
"""
Verify the row names in the given table match what is expected.
"""
self.assertIsInstance(table, agate.Table)
self.assertSequenceEqual(table.row_names, names)
self.assertSequenceEqual(
table.rows.keys(),
names
)
for column in table.columns:
self.assertSequenceEqual(
column.keys(),
names
)
agate-1.6.3/agate/type_tester.py 0000664 0000000 0000000 00000011324 14074061410 0016602 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
import warnings
from copy import copy
from agate.data_types.base import DEFAULT_NULL_VALUES
from agate.data_types.boolean import Boolean
from agate.data_types.date import Date
from agate.data_types.date_time import DateTime
from agate.data_types.number import Number
from agate.data_types.text import Text
from agate.data_types.time_delta import TimeDelta
class TypeTester(object):
"""
Control how data types are inferred for columns in a given set of data.
This class is used by passing it to the :code:`column_types` argument of
the :class:`.Table` constructor, or the same argument for any other method
that create a :class:`.Table`
Type inference can be a slow process. To limit the number of rows of data to
be tested, pass the :code:`limit` argument. Note that may cause errors if
your data contains different types of values after the specified number of
rows.
By default, data types will be tested against each column in this order:
1. :class:`.Boolean`
2. :class:`.Number`
3. :class:`.TimeDelta`
#. :class:`.Date`
#. :class:`.DateTime`
#. :class:`.Text`
Individual types may be specified using the :code:`force` argument. The type
order by be changed, or entire types disabled, by using the :code:`types`
argument. Beware that changing the order of the types may cause unexpected
behavior.
:param force:
A dictionary where each key is a column name and each value is a
:class:`.DataType` instance that overrides inference.
:param limit:
An optional limit on how many rows to evaluate before selecting the
most likely type. Note that applying a limit may mean errors arise when
the data is cast--if the guess is proved incorrect in further rows of
data.
:param types:
A sequence of possible types to test against. This be used to specify
what data formats you want to test against. For instance, you may want
to exclude :class:`TimeDelta` from testing. It can also be used to pass
options such as ``locale`` to :class:`.Number` or ``cast_nulls`` to
:class:`.Text`. Take care in specifying the order of the list. It is
the order they are tested in. :class:`.Text` should always be last.
:param null_values:
If :code:`types` is :code:`None`, a sequence of values which should be
cast to :code:`None` when encountered by the default data types.
"""
def __init__(self, force={}, limit=None, types=None, null_values=DEFAULT_NULL_VALUES):
self._force = force
self._limit = limit
if types:
self._possible_types = types
else:
# In order of preference
self._possible_types = [
Boolean(null_values=null_values),
Number(null_values=null_values),
TimeDelta(null_values=null_values),
Date(null_values=null_values),
DateTime(null_values=null_values),
Text(null_values=null_values)
]
def run(self, rows, column_names):
"""
Apply type inference to the provided data and return an array of
column types.
:param rows:
The data as a sequence of any sequences: tuples, lists, etc.
"""
num_columns = len(column_names)
hypotheses = [set(self._possible_types) for i in range(num_columns)]
force_indices = []
for name in self._force.keys():
try:
force_indices.append(column_names.index(name))
except ValueError:
warnings.warn('"%s" does not match the name of any column in this table.' % name, RuntimeWarning)
if self._limit:
sample_rows = rows[:self._limit]
elif self._limit == 0:
text = Text()
return tuple([text] * num_columns)
else:
sample_rows = rows
for row in sample_rows:
for i in range(num_columns):
if i in force_indices:
continue
h = hypotheses[i]
if len(h) == 1:
continue
for column_type in copy(h):
if len(row) > i and not column_type.test(row[i]):
h.remove(column_type)
column_types = []
for i in range(num_columns):
if i in force_indices:
column_types.append(self._force[column_names[i]])
continue
h = hypotheses[i]
# Select in prefer order
for t in self._possible_types:
if t in h:
column_types.append(t)
break
return tuple(column_types)
agate-1.6.3/agate/utils.py 0000664 0000000 0000000 00000021440 14074061410 0015373 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# -*- coding: utf8 -*-
"""
This module contains a collection of utility classes and functions used in
agate.
"""
from collections import OrderedDict
try:
from collections.abc import Sequence
except ImportError:
from collections import Sequence
import math
import string
from functools import wraps
from slugify import slugify as pslugify
from agate.warns import warn_duplicate_column, warn_unnamed_column
try:
from cdecimal import ROUND_CEILING, ROUND_FLOOR, Decimal, getcontext
except ImportError: # pragma: no cover
from decimal import Decimal, ROUND_FLOOR, ROUND_CEILING, getcontext
import six
#: Sentinal for use when `None` is an valid argument value
default = object()
def memoize(func):
"""
Dead-simple memoize decorator for instance methods that take no arguments.
This is especially useful since so many of our classes are immutable.
"""
memo = None
@wraps(func)
def wrapper(self):
if memo is not None:
return memo
return func(self)
return wrapper
class NullOrder(object):
"""
Dummy object used for sorting in place of None.
Sorts as "greater than everything but other nulls."
"""
def __lt__(self, other):
return False
def __gt__(self, other):
if other is None:
return False
return True
class Quantiles(Sequence):
"""
A class representing quantiles (percentiles, quartiles, etc.) for a given
column of Number data.
"""
def __init__(self, quantiles):
self._quantiles = quantiles
def __getitem__(self, i):
return self._quantiles.__getitem__(i)
def __iter__(self):
return self._quantiles.__iter__()
def __len__(self):
return self._quantiles.__len__()
def __repr__(self):
return repr(self._quantiles)
def __eq__(self, other):
return self._quantiles == other._quantiles
def locate(self, value):
"""
Identify which quantile a given value is part of.
"""
i = 0
if value < self._quantiles[0]:
raise ValueError('Value is less than minimum quantile value.')
if value > self._quantiles[-1]:
raise ValueError('Value is greater than maximum quantile value.')
if value == self._quantiles[-1]:
return Decimal(len(self._quantiles) - 1)
while value >= self._quantiles[i + 1]:
i += 1
return Decimal(i)
def median(data_sorted):
"""
Finds the median value of a given series of values.
:param data_sorted:
The values to find the median of. Must be sorted.
"""
length = len(data_sorted)
if length % 2 == 1:
return data_sorted[((length + 1) // 2) - 1]
half = length // 2
a = data_sorted[half - 1]
b = data_sorted[half]
return (a + b) / 2
def max_precision(values):
"""
Given a series of values (such as a :class:`.Column`) returns the most
significant decimal places present in any value.
:param values:
The values to analyze.
"""
max_whole_places = 1
max_decimal_places = 0
precision = getcontext().prec
for value in values:
if value is None or math.isnan(value) or math.isinf(value):
continue
sign, digits, exponent = value.normalize().as_tuple()
exponent_places = exponent * -1
whole_places = len(digits) - exponent_places
if whole_places > max_whole_places:
max_whole_places = whole_places
if exponent_places > max_decimal_places:
max_decimal_places = exponent_places
# In Python 2 it was possible for the total digits to exceed the
# available context precision. This ensures that can't happen. See #412
if max_whole_places + max_decimal_places > precision: # pragma: no cover
max_decimal_places = precision - max_whole_places
return max_decimal_places
def make_number_formatter(decimal_places, add_ellipsis=False):
"""
Given a number of decimal places creates a formatting string that will
display numbers with that precision.
:param decimal_places:
The number of decimal places
:param add_ellipsis:
Optionally add an ellipsis symbol at the end of a number
"""
fraction = u'0' * decimal_places
ellipsis = u'…' if add_ellipsis else u''
return u''.join([u'#,##0.', fraction, ellipsis, u';-#,##0.', fraction, ellipsis])
def round_limits(minimum, maximum):
"""
Rounds a pair of minimum and maximum values to form reasonable "round"
values suitable for use as axis minimum and maximum values.
Values are rounded "out": up for maximum and down for minimum, and "off":
to one higher than the first significant digit shared by both.
See unit tests for examples.
"""
min_bits = minimum.normalize().as_tuple()
max_bits = maximum.normalize().as_tuple()
max_digits = max(
len(min_bits.digits) + min_bits.exponent,
len(max_bits.digits) + max_bits.exponent
)
# Whole number rounding
if max_digits > 0:
multiplier = Decimal('10') ** (max_digits - 1)
min_fraction = (minimum / multiplier).to_integral_value(rounding=ROUND_FLOOR)
max_fraction = (maximum / multiplier).to_integral_value(rounding=ROUND_CEILING)
return (
min_fraction * multiplier,
max_fraction * multiplier
)
max_exponent = max(min_bits.exponent, max_bits.exponent)
# Fractional rounding
q = Decimal('10') ** (max_exponent + 1)
return (
minimum.quantize(q, rounding=ROUND_FLOOR).normalize(),
maximum.quantize(q, rounding=ROUND_CEILING).normalize()
)
def letter_name(index):
"""
Given a column index, assign a "letter" column name equivalent to
Excel. For example, index ``4`` would return ``E``.
Index ``30`` would return ``EE``.
"""
letters = string.ascii_lowercase
count = len(letters)
return letters[index % count] * ((index // count) + 1)
def parse_object(obj, path=''):
"""
Recursively parse JSON-like Python objects as a dictionary of paths/keys
and values.
Inspired by JSONPipe (https://github.com/dvxhouse/jsonpipe).
"""
if isinstance(obj, dict):
iterator = obj.items()
elif isinstance(obj, (list, tuple)):
iterator = enumerate(obj)
else:
return {path.strip('/'): obj}
d = OrderedDict()
for key, value in iterator:
key = six.text_type(key)
d.update(parse_object(value, path + key + '/'))
return d
def issequence(obj):
"""
Returns :code:`True` if the given object is an instance of
:class:`.Sequence` that is not also a string.
"""
return isinstance(obj, Sequence) and not isinstance(obj, six.string_types)
def deduplicate(values, column_names=False, separator='_'):
"""
Append a unique identifer to duplicate strings in a given sequence of
strings. Identifers are an underscore followed by the occurance number of
the specific string.
['abc', 'abc', 'cde', 'abc'] -> ['abc', 'abc_2', 'cde', 'abc_3']
:param column_names:
If True, values are treated as column names. Warnings will be thrown
if column names are None or duplicates. None values will be replaced with
letter indices.
"""
final_values = []
for i, value in enumerate(values):
if column_names:
if not value:
new_value = letter_name(i)
warn_unnamed_column(i, new_value)
elif isinstance(value, six.string_types):
new_value = value
else:
raise ValueError('Column names must be strings or None.')
else:
new_value = value
final_value = new_value
duplicates = 0
while final_value in final_values:
final_value = new_value + separator + str(duplicates + 2)
duplicates += 1
if column_names and duplicates > 0:
warn_duplicate_column(new_value, final_value)
final_values.append(final_value)
return tuple(final_values)
def slugify(values, ensure_unique=False, **kwargs):
"""
Given a sequence of strings, returns a standardized version of the sequence.
If ``ensure_unique`` is True, any duplicate strings will be appended with
a unique identifier.
agate uses an underscore as a default separator but this can be changed with
kwargs.
Any kwargs will be passed to the slugify method in python-slugify. See:
https://github.com/un33k/python-slugify
"""
slug_args = {'separator': '_'}
slug_args.update(kwargs)
if ensure_unique:
new_values = tuple(pslugify(value, **slug_args) for value in values)
return deduplicate(new_values, separator=slug_args['separator'])
else:
return tuple(pslugify(value, **slug_args) for value in values)
agate-1.6.3/agate/warns.py 0000664 0000000 0000000 00000002504 14074061410 0015365 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
import warnings
class NullCalculationWarning(RuntimeWarning): # pragma: no cover
"""
Warning raised if a calculation which can not logically
account for null values is performed on a :class:`.Column` containing
nulls.
"""
pass
def warn_null_calculation(operation, column):
warnings.warn('Column "%s" contains nulls. These will be excluded from %s calculation.' % (
column.name,
operation.__class__.__name__
), NullCalculationWarning, stacklevel=2)
class DuplicateColumnWarning(RuntimeWarning): # pragma: no cover
"""
Warning raised if multiple columns with the same name are added to a new
:class:`.Table`.
"""
pass
def warn_duplicate_column(column_name, column_rename):
warnings.warn('Column name "%s" already exists in Table. Column will be renamed to "%s".' % (
column_name,
column_rename
), DuplicateColumnWarning, stacklevel=2)
class UnnamedColumnWarning(RuntimeWarning): # pragma: no cover
"""
Warning raised when a column has no name and an a programmatically generated
name is used.
"""
pass
def warn_unnamed_column(column_id, new_column_name):
warnings.warn('Column %i has no name. Using "%s".' % (
column_id,
new_column_name
), UnnamedColumnWarning, stacklevel=2)
agate-1.6.3/benchmarks/ 0000775 0000000 0000000 00000000000 14074061410 0014714 5 ustar 00root root 0000000 0000000 agate-1.6.3/benchmarks/__init__.py 0000664 0000000 0000000 00000000000 14074061410 0017013 0 ustar 00root root 0000000 0000000 agate-1.6.3/benchmarks/test_joins.py 0000664 0000000 0000000 00000001637 14074061410 0017456 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# -*- coding: utf8 -*-
from random import shuffle
from timeit import Timer
try:
import unittest2 as unittest
except ImportError:
import unittest
import six
from six.moves import range
import agate
class TestTableJoin(unittest.TestCase):
def test_join(self):
left_rows = [(six.text_type(i), i) for i in range(100000)]
right_rows = [(six.text_type(i), i) for i in range(100000)]
shuffle(left_rows)
shuffle(right_rows)
column_names = ['text', 'number']
column_types = [agate.Text(), agate.Number()]
left = agate.Table(left_rows, column_names, column_types)
right = agate.Table(right_rows, column_names, column_types)
def test():
left.join(right, 'text')
results = Timer(test).repeat(10, 1)
min_time = min(results)
self.assertLess(min_time, 10) # CI unreliable, 5s witnessed
agate-1.6.3/charts.py 0000775 0000000 0000000 00000001742 14074061410 0014444 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
import agate
table = agate.Table.from_csv('examples/realdata/Datagov_FY10_EDU_recp_by_State.csv')
table.limit(10).bar_chart('State Name', 'TOTAL', 'docs/images/bar_chart.svg')
table.limit(10).column_chart('State Name', 'TOTAL', 'docs/images/column_chart.svg')
table = agate.Table.from_csv('examples/realdata/exonerations-20150828.csv')
by_year_exonerated = table.group_by('exonerated')
counts = by_year_exonerated.aggregate([
('count', agate.Count())
])
counts.order_by('exonerated').line_chart('exonerated', 'count', 'docs/images/line_chart.svg')
table.scatterplot('exonerated', 'age', 'docs/images/dots_chart.svg')
top_crimes = table.group_by('crime').having([
('count', agate.Count())
], lambda t: t['count'] > 100)
by_year = top_crimes.group_by('exonerated')
counts = by_year.aggregate([
('count', agate.Count())
])
by_crime = counts.group_by('crime')
by_crime.order_by('exonerated').line_chart('exonerated', 'count', 'docs/images/lattice.svg')
agate-1.6.3/docs/ 0000775 0000000 0000000 00000000000 14074061410 0013527 5 ustar 00root root 0000000 0000000 agate-1.6.3/docs/Makefile 0000664 0000000 0000000 00000010752 14074061410 0015174 0 ustar 00root root 0000000 0000000 # Makefile for Sphinx documentation
#
# You can set these variables from the command line.
SPHINXOPTS =
SPHINXBUILD = sphinx-build
PAPER =
BUILDDIR = _build
# Internal variables.
PAPEROPT_a4 = -D latex_paper_size=a4
PAPEROPT_letter = -D latex_paper_size=letter
ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
.PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest
help:
@echo "Please use \`make ' where is one of"
@echo " html to make standalone HTML files"
@echo " dirhtml to make HTML files named index.html in directories"
@echo " singlehtml to make a single large HTML file"
@echo " pickle to make pickle files"
@echo " json to make JSON files"
@echo " htmlhelp to make HTML files and a HTML help project"
@echo " qthelp to make HTML files and a qthelp project"
@echo " devhelp to make HTML files and a Devhelp project"
@echo " epub to make an epub"
@echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
@echo " latexpdf to make LaTeX files and run them through pdflatex"
@echo " text to make text files"
@echo " man to make manual pages"
@echo " changes to make an overview of all changed/added/deprecated items"
@echo " linkcheck to check all external links for integrity"
@echo " doctest to run all doctests embedded in the documentation (if enabled)"
clean:
-rm -rf $(BUILDDIR)/*
html:
$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
dirhtml:
$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."
singlehtml:
$(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
@echo
@echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."
pickle:
$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
@echo
@echo "Build finished; now you can process the pickle files."
json:
$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
@echo
@echo "Build finished; now you can process the JSON files."
htmlhelp:
$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
@echo
@echo "Build finished; now you can run HTML Help Workshop with the" \
".hhp project file in $(BUILDDIR)/htmlhelp."
qthelp:
$(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
@echo
@echo "Build finished; now you can run "qcollectiongenerator" with the" \
".qhcp project file in $(BUILDDIR)/qthelp, like this:"
@echo "# qcollectiongenerator $(BUILDDIR)/qthelp/agate.qhcp"
@echo "To view the help file:"
@echo "# assistant -collectionFile $(BUILDDIR)/qthelp/agate.qhc"
devhelp:
$(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
@echo
@echo "Build finished."
@echo "To view the help file:"
@echo "# mkdir -p $$HOME/.local/share/devhelp/agate"
@echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/agate"
@echo "# devhelp"
epub:
$(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
@echo
@echo "Build finished. The epub file is in $(BUILDDIR)/epub."
latex:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo
@echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
@echo "Run \`make' in that directory to run these through (pdf)latex" \
"(use \`make latexpdf' here to do that automatically)."
latexpdf:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo "Running LaTeX files through pdflatex..."
make -C $(BUILDDIR)/latex all-pdf
@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
text:
$(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
@echo
@echo "Build finished. The text files are in $(BUILDDIR)/text."
man:
$(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
@echo
@echo "Build finished. The manual pages are in $(BUILDDIR)/man."
changes:
$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
@echo
@echo "The overview file is in $(BUILDDIR)/changes."
linkcheck:
$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
@echo
@echo "Link check complete; look for any errors in the above output " \
"or in $(BUILDDIR)/linkcheck/output.txt."
doctest:
$(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
@echo "Testing of doctests in the sources finished, look at the " \
"results in $(BUILDDIR)/doctest/output.txt."
agate-1.6.3/docs/about.rst 0000664 0000000 0000000 00000002577 14074061410 0015406 0 ustar 00root root 0000000 0000000 ===========
About agate
===========
Why agate?
==========
* A readable and user-friendly API.
* A complete set of SQL-like operations.
* Unicode support everywhere.
* Decimal precision everywhere.
* Exhaustive user documentation.
* Pluggable `extensions `_ that add SQL integration, Excel support, and more.
* Designed with `iPython `_, `Jupyter `_ and `atom/hydrogen `_ in mind.
* Pure Python. No C dependencies to compile.
* Exhaustive test coverage.
* MIT licensed and free for all purposes.
* Zealously `zen `_.
* Made with love.
Principles
==========
agate is a intended to fill a very particular programming niche. It should not be allowed to become as complex as `numpy `_ or `pandas `_. Please bear in mind the following principles when considering a new feature:
* Humans have less time than computers. Optimize for humans.
* Most datasets are small. Don't optimize for "big data".
* Text is data. It must always be a first-class citizen.
* Python gets it right. Make it work like Python does.
* Humans lives are nasty, brutish and short. Make it easy.
* Mutability leads to confusion. Processes that alter data must create new copies.
* Extensions are the way. Don't add it to core unless everybody needs it.
agate-1.6.3/docs/api.rst 0000664 0000000 0000000 00000000435 14074061410 0015034 0 ustar 00root root 0000000 0000000 ===
API
===
.. toctree::
:maxdepth: 1
api/table
api/tableset
api/columns_and_rows
api/data_types
api/type_tester
api/aggregations
api/computations
api/csv
api/fixed
api/misc
api/exceptions
api/warns
api/testcase
api/config
agate-1.6.3/docs/api/ 0000775 0000000 0000000 00000000000 14074061410 0014300 5 ustar 00root root 0000000 0000000 agate-1.6.3/docs/api/aggregations.rst 0000664 0000000 0000000 00000002773 14074061410 0017515 0 ustar 00root root 0000000 0000000 ============
Aggregations
============
.. automodule:: agate.aggregations
:no-members:
.. autosummary::
:nosignatures:
agate.Aggregation
agate.Summary
Basic aggregations
------------------
.. autosummary::
:nosignatures:
agate.All
agate.Any
agate.Count
agate.HasNulls
agate.Min
agate.Max
agate.MaxPrecision
Statistical aggregations
------------------------
.. autosummary::
:nosignatures:
agate.Deciles
agate.IQR
agate.MAD
agate.Mean
agate.Median
agate.Mode
agate.Percentiles
agate.PopulationStDev
agate.PopulationVariance
agate.Quartiles
agate.Quintiles
agate.StDev
agate.Sum
agate.Variance
Text aggregations
-----------------
.. autosummary::
:nosignatures:
agate.MaxLength
Detailed list
-------------
.. autoclass:: agate.Aggregation
.. autoclass:: agate.All
.. autoclass:: agate.Any
.. autoclass:: agate.Count
.. autoclass:: agate.Deciles
.. autoclass:: agate.HasNulls
.. autoclass:: agate.IQR
.. autoclass:: agate.MAD
.. autoclass:: agate.Min
.. autoclass:: agate.Max
.. autoclass:: agate.MaxLength
.. autoclass:: agate.MaxPrecision
.. autoclass:: agate.Mean
.. autoclass:: agate.Median
.. autoclass:: agate.Mode
.. autoclass:: agate.Percentiles
.. autoclass:: agate.PopulationStDev
.. autoclass:: agate.PopulationVariance
.. autoclass:: agate.Quartiles
.. autoclass:: agate.Quintiles
.. autoclass:: agate.StDev
.. autoclass:: agate.Sum
.. autoclass:: agate.Summary
.. autoclass:: agate.Variance
agate-1.6.3/docs/api/columns_and_rows.rst 0000664 0000000 0000000 00000000355 14074061410 0020411 0 ustar 00root root 0000000 0000000 ================
Columns and rows
================
.. autosummary::
:nosignatures:
agate.MappedSequence
agate.Column
agate.Row
.. autoclass:: agate.MappedSequence
.. autoclass:: agate.Column
.. autoclass:: agate.Row
agate-1.6.3/docs/api/computations.rst 0000664 0000000 0000000 00000001170 14074061410 0017556 0 ustar 00root root 0000000 0000000 ============
Computations
============
.. automodule:: agate.computations
:no-members:
.. autosummary::
:nosignatures:
agate.Computation
agate.Formula
Mathematical computations
-------------------------
.. autosummary::
:nosignatures:
agate.Change
agate.Percent
agate.PercentChange
agate.PercentileRank
agate.Rank
Detailed list
-------------
.. autoclass:: agate.Change
.. autoclass:: agate.Computation
.. autoclass:: agate.Formula
.. autoclass:: agate.Percent
.. autoclass:: agate.PercentChange
.. autoclass:: agate.PercentileRank
.. autoclass:: agate.Rank
.. autoclass:: agate.Slug
agate-1.6.3/docs/api/config.rst 0000664 0000000 0000000 00000000131 14074061410 0016272 0 ustar 00root root 0000000 0000000 ======
Config
======
.. automodule:: agate.config
:members:
:inherited-members:
agate-1.6.3/docs/api/csv.rst 0000664 0000000 0000000 00000003166 14074061410 0015633 0 ustar 00root root 0000000 0000000 =====================
CSV reader and writer
=====================
Agate contains CSV readers and writers that are intended to be used as a drop-in replacement for :mod:`csv`. These versions add unicode support for Python 2 and several other minor features.
Agate methods will use these version automatically. If you would like to use them in your own code, you can import them, like this:
.. code-block:: python
from agate import csv
Due to nuanced differences between the versions, these classes are implemented seperately for Python 2 and Python 3. The documentation for both versions is provided below, but only the one for your version of Python is imported with the above code.
Python 3
--------
.. autosummary::
:nosignatures:
agate.csv_py3.reader
agate.csv_py3.writer
agate.csv_py3.Reader
agate.csv_py3.Writer
agate.csv_py3.DictReader
agate.csv_py3.DictWriter
Python 2
--------
.. autosummary::
:nosignatures:
agate.csv_py2.reader
agate.csv_py2.writer
agate.csv_py2.Reader
agate.csv_py2.Writer
agate.csv_py2.DictReader
agate.csv_py2.DictWriter
Python 3 details
----------------
.. autofunction:: agate.csv_py3.reader
.. autofunction:: agate.csv_py3.writer
.. autoclass:: agate.csv_py3.Reader
.. autoclass:: agate.csv_py3.Writer
.. autoclass:: agate.csv_py3.DictReader
.. autoclass:: agate.csv_py3.DictWriter
Python 2 details
----------------
.. autofunction:: agate.csv_py2.reader
.. autofunction:: agate.csv_py2.writer
.. autoclass:: agate.csv_py2.Reader
.. autoclass:: agate.csv_py2.Writer
.. autoclass:: agate.csv_py2.DictReader
.. autoclass:: agate.csv_py2.DictWriter
agate-1.6.3/docs/api/data_types.rst 0000664 0000000 0000000 00000001042 14074061410 0017164 0 ustar 00root root 0000000 0000000 ==========
Data types
==========
.. automodule:: agate.data_types
:no-members:
.. autosummary::
:nosignatures:
agate.DataType
Supported types
---------------
.. autosummary::
:nosignatures:
agate.Text
agate.Number
agate.Boolean
agate.Date
agate.DateTime
agate.TimeDelta
Detailed list
-------------
.. autoclass:: agate.DataType
.. autoclass:: agate.Text
.. autoclass:: agate.Number
.. autoclass:: agate.Boolean
.. autoclass:: agate.Date
.. autoclass:: agate.DateTime
.. autoclass:: agate.TimeDelta
agate-1.6.3/docs/api/exceptions.rst 0000664 0000000 0000000 00000000550 14074061410 0017213 0 ustar 00root root 0000000 0000000 ============
Exceptions
============
.. autosummary::
:nosignatures:
agate.DataTypeError
agate.UnsupportedAggregationError
agate.CastError
agate.FieldSizeLimitError
.. autoexception:: agate.DataTypeError
.. autoexception:: agate.UnsupportedAggregationError
.. autoexception:: agate.CastError
.. autoexception:: agate.FieldSizeLimitError
agate-1.6.3/docs/api/fixed.rst 0000664 0000000 0000000 00000001025 14074061410 0016127 0 ustar 00root root 0000000 0000000 ==================
Fixed-width reader
==================
Agate contains a fixed-width file reader that is designed to work like Python's :mod:`csv`.
These readers work with CSV-formatted schemas, such as those maintained at `wireservice/ffs `_.
.. autosummary::
:nosignatures:
agate.fixed.reader
agate.fixed.Reader
agate.fixed.DictReader
Detailed list
-------------
.. autofunction:: agate.fixed.reader
.. autoclass:: agate.fixed.Reader
.. autoclass:: agate.fixed.DictReader
agate-1.6.3/docs/api/misc.rst 0000664 0000000 0000000 00000000267 14074061410 0015772 0 ustar 00root root 0000000 0000000 =============
Miscellaneous
=============
.. autosummary::
:nosignatures:
agate.NullOrder
agate.Quantiles
.. autoclass:: agate.NullOrder
.. autoclass:: agate.Quantiles
agate-1.6.3/docs/api/table.rst 0000664 0000000 0000000 00000003271 14074061410 0016124 0 ustar 00root root 0000000 0000000 =====
Table
=====
.. automodule:: agate.table
:no-members:
.. autosummary::
:nosignatures:
agate.Table
Properties
----------
.. autosummary::
:nosignatures:
agate.Table.columns
agate.Table.column_names
agate.Table.column_types
agate.Table.rows
agate.Table.row_names
Creating
--------
.. autosummary::
:nosignatures:
agate.Table.from_csv
agate.Table.from_json
agate.Table.from_fixed
agate.Table.from_object
Saving
------
.. autosummary::
:nosignatures:
agate.Table.to_csv
agate.Table.to_json
Basic processing
----------------
.. autosummary::
:nosignatures:
agate.Table.distinct
agate.Table.exclude
agate.Table.find
agate.Table.limit
agate.Table.order_by
agate.Table.select
agate.Table.where
Calculating new data
--------------------
.. autosummary::
:nosignatures:
agate.Table.aggregate
agate.Table.compute
Advanced processing
-------------------
.. autosummary::
:nosignatures:
agate.Table.bins
agate.Table.denormalize
agate.Table.group_by
agate.Table.homogenize
agate.Table.join
agate.Table.merge
agate.Table.normalize
agate.Table.pivot
agate.Table.rename
Previewing
----------
.. autosummary::
:nosignatures:
agate.Table.print_bars
agate.Table.print_csv
agate.Table.print_html
agate.Table.print_json
agate.Table.print_structure
agate.Table.print_table
Charting
--------
.. autosummary::
:nosignatures:
agate.Table.bar_chart
agate.Table.column_chart
agate.Table.line_chart
agate.Table.scatterplot
Detailed list
-------------
.. autoclass:: agate.Table
:members:
:inherited-members:
agate-1.6.3/docs/api/tableset.rst 0000664 0000000 0000000 00000002752 14074061410 0016643 0 ustar 00root root 0000000 0000000 ========
TableSet
========
.. automodule:: agate.tableset
:no-members:
.. autosummary::
:nosignatures:
agate.TableSet
Properties
----------
.. autosummary::
:nosignatures:
agate.TableSet.key_name
agate.TableSet.key_type
agate.TableSet.column_types
agate.TableSet.column_names
Creating
--------
.. autosummary::
:nosignatures:
agate.TableSet.from_csv
agate.TableSet.from_json
Saving
------
.. autosummary::
:nosignatures:
agate.TableSet.to_csv
agate.TableSet.to_json
Processing
----------
.. autosummary::
:nosignatures:
agate.TableSet.aggregate
agate.TableSet.having
agate.TableSet.merge
Previewing
----------
.. autosummary::
:nosignatures:
agate.TableSet.print_structure
Charting
--------
.. autosummary::
:nosignatures:
agate.TableSet.bar_chart
agate.TableSet.column_chart
agate.TableSet.line_chart
agate.TableSet.scatterplot
Table Proxy Methods
-------------------
.. autosummary::
:nosignatures:
agate.TableSet.bins
agate.TableSet.compute
agate.TableSet.denormalize
agate.TableSet.distinct
agate.TableSet.exclude
agate.TableSet.find
agate.TableSet.group_by
agate.TableSet.homogenize
agate.TableSet.join
agate.TableSet.limit
agate.TableSet.normalize
agate.TableSet.order_by
agate.TableSet.pivot
agate.TableSet.select
agate.TableSet.where
Detailed list
-------------
.. autoclass:: agate.TableSet
:inherited-members:
agate-1.6.3/docs/api/testcase.rst 0000664 0000000 0000000 00000000262 14074061410 0016645 0 ustar 00root root 0000000 0000000 ====================
Unit testing helpers
====================
.. autoclass:: agate.AgateTestCase
:members: assertColumnNames, assertColumnTypes, assertRows, assertRowNames
agate-1.6.3/docs/api/type_tester.rst 0000664 0000000 0000000 00000000202 14074061410 0017373 0 ustar 00root root 0000000 0000000 ==============
Type inference
==============
.. automodule:: agate.type_tester
:no-members:
.. autoclass:: agate.TypeTester
agate-1.6.3/docs/api/warns.rst 0000664 0000000 0000000 00000000321 14074061410 0016160 0 ustar 00root root 0000000 0000000 ========
Warnings
========
.. autoclass:: agate.NullCalculationWarning
.. autoclass:: agate.DuplicateColumnWarning
.. autofunction:: agate.warn_null_calculation
.. autofunction:: agate.warn_duplicate_column
agate-1.6.3/docs/changelog.rst 0000664 0000000 0000000 00000000075 14074061410 0016212 0 ustar 00root root 0000000 0000000 =========
Changelog
=========
.. include:: ../CHANGELOG.rst
agate-1.6.3/docs/conf.py 0000664 0000000 0000000 00000002064 14074061410 0015030 0 ustar 00root root 0000000 0000000 #!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
import sys
# Path munging
sys.path.insert(0, os.path.abspath('..'))
# Extensions
extensions = [
'sphinx.ext.autosummary',
'sphinx.ext.autodoc',
'sphinx.ext.intersphinx'
]
# autodoc_member_order = 'bysource'
autodoc_default_flags = ['members', 'show-inheritance']
intersphinx_mapping = {
'python': ('http://docs.python.org/3.5', None),
'leather': ('http://leather.readthedocs.io/en/latest/', None)
}
# Templates
templates_path = ['_templates']
master_doc = 'index'
# Metadata
project = u'agate'
copyright = u'2017, Christopher Groskopf'
version = '1.6.3'
release = version
exclude_patterns = ['_build']
pygments_style = 'sphinx'
# HTMl theming
html_theme = 'default'
on_rtd = os.environ.get('READTHEDOCS', None) == 'True'
if not on_rtd: # only import and set the theme if we're building docs locally
import sphinx_rtd_theme
html_theme = 'sphinx_rtd_theme'
html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
html_static_path = ['_static']
htmlhelp_basename = 'agatedoc'
agate-1.6.3/docs/contributing.rst 0000664 0000000 0000000 00000007777 14074061410 0017012 0 ustar 00root root 0000000 0000000 ============
Contributing
============
agate actively encourages contributions from people of all genders, races, ethnicities, ages, creeds, nationalities, persuasions, alignments, sizes, shapes, and journalistic affiliations. You are welcome here.
We seek contributions from developers and non-developers of all skill levels. We will typically accept bug fixes, documentation updates, and new cookbook recipes with minimal fuss. If you want to work on a larger feature—great! The maintainers will be happy to provide feedback and code review on your implementation.
Before making any changes or additions to agate, please be sure to read about the principles of agate in the `About `_ section of the documentation.
Process for documentation
=========================
Not a developer? That's fine! As long as you can use `git` (there are many tutorials) then you can contribute to agate. Please follow this process:
#. Fork the project on `GitHub `_.
#. If you don't have a specific task in mind, check out the `issue tracker `_ and find a documentation ticket that needs to be done.
#. Comment on the ticket letting everyone know you're going to be working on it so that nobody duplicates your effort.
#. Write the documentation. Documentation files live in the `docs` directory and are in Restructured Text Format.
#. Add yourself to the AUTHORS file if you aren't already there.
#. Once your contribution is complete, submit a pull request on GitHub.
#. Wait for it to either be merged by a maintainer or to receive feedback about what needs to be revised.
#. Rejoice!
Process for code
================
Hacker? We'd love to have you hack with us. Please follow this process to make your contribution:
#. Fork the project on `GitHub `_.
#. If you don't have a specific task in mind, check out the `issue tracker `_ and find a task that needs to be done and is of a scope you can realistically expect to complete in a few days. Don't worry about the priority of the issues at first, but try to choose something you'll enjoy. You're much more likely to finish something to the point it can be merged if it's something you really enjoy hacking on.
#. If you already have a task you know you want to work on, open a ticket or comment on the existing ticket letting everyone know you're going to be working on it. It's also good practice to provide some general idea of how you plan on resolving the issue so that other developers can make suggestions.
#. Write tests for the feature you're building. Follow the format of the existing tests in the test directory to see how this works. You can run all the tests with the command ``nosetests tests``.
#. Write the code. Try to stay consistent with the style and organization of the existing codebase. A good patch won't be refused for stylistic reasons, but large parts of it may be rewritten and nobody wants that.
#. As you are coding, periodically merge in work from the master branch and verify you haven't broken anything by running the test suite.
#. Write documentation. This means docstrings on all classes and methods, including parameter explanations. It also means, when relevant, cookbook recipes and updates to the agate user tutorial.
#. Add yourself to the AUTHORS file if you aren't already there.
#. Once your contribution is complete, tested, and has documentation, submit a pull request on GitHub.
#. Wait for it to either be merged by a maintainer or to receive feedback about what needs to be revisited.
#. Rejoice!
Licensing
=========
To the extent that they care, contributors should keep in mind that the source of agate and therefore of any contributions are licensed under the permissive `MIT license `_. By submitting a patch or pull request you are agreeing to release your code under this license. You will be acknowledged in the AUTHORS list, the commit history and the hearts and minds of journalists everywhere.
agate-1.6.3/docs/cookbook.rst 0000664 0000000 0000000 00000003721 14074061410 0016072 0 ustar 00root root 0000000 0000000 ========
Cookbook
========
Welcome to the agate cookbook, a source of how-to's and use cases.
.. toctree::
:hidden:
:maxdepth: 2
cookbook/create
cookbook/save
cookbook/remove
cookbook/filter
cookbook/sort
cookbook/search
cookbook/standardize
cookbook/statistics
cookbook/compute
cookbook/datetime
cookbook/sql
cookbook/excel
cookbook/r
cookbook/underscore
cookbook/homogenize
cookbook/columns
cookbook/transform
cookbook/locale
cookbook/rank
cookbook/charting
cookbook/lookup
Basics
======
* `Creating tables from various data types `_
* `Saving data to various data types `_
* `Removing columns from a table `_
* `Filtering rows of data `_
* `Sorting rows of data `_
* `Searching through a table `_
* `Standardize names and values `_
* `Calculating statistics `_
* `Computing new columns `_
* `Handling dates and times `_
Coming from other tools
=======================
* `SQL `_
* `Excel `_
* `R `_
* `Underscore.js `_
* Pandas (coming soon!)
Advanced techniques
===================
* `Filling missing rows in a dataset `_
* `Renaming and reordering columns `_
* `Transforming data (pivot/normalize/denormalize) `_
* `Setting your locale and working with foreign data `_
* `Ranking a sequence of data `_
* `Creating simple charts `_
* `Mapping columns to common lookup tables `_
Have a common use case that isn't covered? Please `submit an issue `_ on the GitHub repository.
agate-1.6.3/docs/cookbook/ 0000775 0000000 0000000 00000000000 14074061410 0015335 5 ustar 00root root 0000000 0000000 agate-1.6.3/docs/cookbook/charting.rst 0000664 0000000 0000000 00000010431 14074061410 0017665 0 ustar 00root root 0000000 0000000 ======
Charts
======
Agate offers two kinds of built in charting: very simple text bar charts and SVG charting via `leather `_. Both are intended for efficiently exploring data, rather than producing publication-ready charts.
Text-based bar chart
====================
agate has a builtin text-based bar-chart generator:
.. code-block:: python
table.limit(10).print_bars('State Name', 'TOTAL', width=80)
.. code-block:: bash
State Name TOTAL
ALABAMA 19,582 ▓░░░░░░░░░░░░░
ALASKA 2,705 ▓░░
ARIZONA 46,743 ▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
ARKANSAS 7,932 ▓░░░░░
CALIFORNIA 76,639 ▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
COLORADO 21,485 ▓░░░░░░░░░░░░░░░
CONNECTICUT 4,350 ▓░░░
DELAWARE 1,904 ▓░
DIST. OF COLUMBIA 2,185 ▓░
FLORIDA 59,519 ▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
+-------------+------------+------------+-------------+
0 20,000 40,000 60,000 80,000
Text-based histogram
====================
:meth:`.Table.print_bars` can be combined with :meth:`.Table.pivot` or :meth:`.Table.bins` to produce fast histograms:
.. code-block:: Python
table.bins('TOTAL', start=0, end=100000).print_bars('TOTAL', width=80)
.. code-block:: bash
TOTAL Count
[0 - 10,000) 30 ▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
[10,000 - 20,000) 12 ▓░░░░░░░░░░░░░░░░░░░░░░
[20,000 - 30,000) 7 ▓░░░░░░░░░░░░░
[30,000 - 40,000) 1 ▓░░
[40,000 - 50,000) 2 ▓░░░░
[50,000 - 60,000) 1 ▓░░
[60,000 - 70,000) 1 ▓░░
[70,000 - 80,000) 1 ▓░░
[80,000 - 90,000) 0 ▓
[90,000 - 100,000] 0 ▓
+-------------+------------+------------+-------------+
0.0 7.5 15.0 22.5 30.0
SVG bar chart
=============
.. code-block:: Python
table.limit(10).bar_chart('State Name', 'TOTAL', 'docs/images/bar_chart.svg')
.. figure:: ../images/bar_chart.svg
SVG column chart
================
.. code-block:: Python
table.limit(10).column_chart('State Name', 'TOTAL', 'docs/images/column_chart.svg')
.. figure:: ../images/column_chart.svg
SVG line chart
==============
.. code-block:: Python
by_year_exonerated = table.group_by('exonerated')
counts = by_year_exonerated.aggregate([
('count', agate.Count())
])
counts.order_by('exonerated').line_chart('exonerated', 'count', 'docs/images/line_chart.svg')
.. figure:: ../images/line_chart.svg
SVG dots chart
==============
.. code-block:: Python
table.scatterplot('exonerated', 'age', 'docs/images/dots_chart.svg')
.. figure:: ../images/dots_chart.svg
SVG lattice chart
==================
.. code-block:: Python
top_crimes = table.group_by('crime').having([
('count', agate.Count())
], lambda t: t['count'] > 100)
by_year = top_crimes.group_by('exonerated')
counts = by_year.aggregate([
('count', agate.Count())
])
by_crime = counts.group_by('crime')
by_crime.order_by('exonerated').line_chart('exonerated', 'count', 'docs/images/lattice.svg')
.. figure:: ../images/lattice.svg
Using matplotlib
================
If you need to make more complex charts, you can always use agate with `matplotlib `_.
Here is an example of how you might generate a line chart:
.. code-block:: python
import pylab
pylab.plot(table.columns['homeruns'], table.columns['wins'])
pylab.xlabel('Homeruns')
pylab.ylabel('Wins')
pylab.title('How homeruns correlate to wins')
pylab.show()
agate-1.6.3/docs/cookbook/columns.rst 0000664 0000000 0000000 00000001503 14074061410 0017546 0 ustar 00root root 0000000 0000000 ===============================
Renaming and reordering columns
===============================
Rename columns
===============
You can rename the columns in a table by using the :meth:`.Table.rename` method and specifying the new column names as an array or dictionary mapping old column names to new ones.
.. code-block:: python
table = Table(rows, column_names = ['a', 'b', 'c'])
new_table = table.rename(column_names = ['one', 'two', 'three'])
# or
new_table = table.rename(column_names = {'a': 'one', 'b': 'two', 'c': 'three'})
Reorder columns
===============
You can reorder the columns in a table by using the :meth:`.Table.select` method and specifying the column names in the order you want:
.. code-block:: python
new_table = table.select(['3rd_column_name', '1st_column_name', '2nd_column_name'])
agate-1.6.3/docs/cookbook/compute.rst 0000664 0000000 0000000 00000021317 14074061410 0017547 0 ustar 00root root 0000000 0000000 ==================
Compute new values
==================
Change
======
.. code-block:: python
new_table = table.compute([
('2000_change', agate.Change('2000', '2001')),
('2001_change', agate.Change('2001', '2002')),
('2002_change', agate.Change('2002', '2003'))
])
Or, better yet, compute the whole decade using a loop:
.. code-block:: Python
computations = []
for year in range(2000, 2010):
change = agate.Change(year, year + 1)
computations.append(('%i_change' % year, change))
new_table = table.compute(computations)
Percent
=======
Calculate the percentage for each value in a column with :class:`.Percent`.
Values are divided into the sum of the column by default.
.. code-block:: python
columns = ('value',)
rows = ([1],[2],[2],[5])
new_table = agate.Table(rows, columns)
new_table = new_table.compute([
('percent', agate.Percent('value'))
])
new_table.print_table()
| value | percent |
| ----- | ------- |
| 1 | 10 |
| 2 | 20 |
| 2 | 20 |
| 5 | 50 |
Override the denominator with a keyword argument.
.. code-block:: python
new_table = new_table.compute([
('percent', agate.Percent('value', 5))
])
new_table.print_table()
| value | percent |
| ----- | ------- |
| 1 | 20 |
| 2 | 40 |
| 2 | 40 |
| 5 | 100 |
Percent change
==============
Want percent change instead of value change? Just swap out the :class:`.Computation`:
.. code-block:: Python
computations = []
for year in range(2000, 2010):
change = agate.PercentChange(year, year + 1)
computations.append(('%i_change' % year, change))
new_table = table.compute(computations)
Indexed/cumulative change
=========================
Need your change indexed to a starting year? Just fix the first argument:
.. code-block:: Python
computations = []
for year in range(2000, 2010):
change = agate.Change(2000, year + 1)
computations.append(('%i_change' % year, change))
new_table = table.compute(computations)
Of course you can also use :class:`.PercentChange` if you need percents rather than values.
Round to two decimal places
===========================
agate stores numerical values using Python's :class:`decimal.Decimal` type. This data type ensures numerical precision beyond what is supported by the native :func:`float` type, however, because of this we can not use Python's builtin :func:`round` function. Instead we must use :meth:`decimal.Decimal.quantize`.
We can use :meth:`.Table.compute` to apply the quantize to generate a rounded column from an existing one:
.. code-block:: python
from decimal import Decimal
number_type = agate.Number()
def round_price(row):
return row['price'].quantize(Decimal('0.01'))
new_table = table.compute([
('price_rounded', agate.Formula(number_type, round_price))
])
To round to one decimal place you would simply change :code:`0.01` to :code:`0.1`.
.. _difference_between_dates:
Difference between dates
========================
Calculating the difference between dates (or dates and times) works exactly the same as it does for numbers:
.. code-block:: python
new_table = table.compute([
('age_at_death', agate.Change('born', 'died'))
])
Levenshtein edit distance
=========================
The Levenshtein edit distance is a common measure of string similarity. It can be used, for instance, to check for typos between manually-entered names and a version that is known to be spelled correctly.
Implementing Levenshtein requires writing a custom :class:`.Computation`. To save ourselves building the whole thing from scratch, we will lean on the `python-Levenshtein `_ library for the actual algorithm.
.. code-block:: python
import agate
from Levenshtein import distance
import six
class LevenshteinDistance(agate.Computation):
"""
Computes Levenshtein edit distance between the column and a given string.
"""
def __init__(self, column_name, compare_string):
self._column_name = column_name
self._compare_string = compare_string
def get_computed_data_type(self, table):
"""
The return value is a numerical distance.
"""
return agate.Number()
def validate(self, table):
"""
Verify the column is text.
"""
column = table.columns[self._column_name]
if not isinstance(column.data_type, agate.Text):
raise agate.DataTypeError('Can only be applied to Text data.')
def run(self, table):
"""
Find the distance, returning null when the input column was null.
"""
new_column = []
for row in table.rows:
val = row[self._column_name]
if val is None:
new_column.append(None)
else:
new_column.append(distance(val, self._compare_string))
return new_column
This code can now be applied to any :class:`.Table` just as any other :class:`.Computation` would be:
.. code-block:: python
new_table = table.compute([
('distance', LevenshteinDistance('column_name', 'string to compare'))
])
The resulting column will contain an integer measuring the edit distance between the value in the column and the comparison string.
USA Today Diversity Index
=========================
The `USA Today Diversity Index `_ is a widely cited method for evaluating the racial diversity of a given area. Using a custom :class:`.Computation` makes it simple to calculate.
Assuming that your data has a column for the total population, another for the population of each race and a final column for the hispanic population, you can implement the diversity index like this:
.. code-block:: python
class USATodayDiversityIndex(agate.Computation):
def get_computed_data_type(self, table):
return agate.Number()
def run(self, table):
new_column = []
for row in table.rows:
race_squares = 0
for race in ['white', 'black', 'asian', 'american_indian', 'pacific_islander']:
race_squares += (row[race] / row['population']) ** 2
hispanic_squares = (row['hispanic'] / row['population']) ** 2
hispanic_squares += (1 - (row['hispanic'] / row['population'])) ** 2
new_column.append((1 - (race_squares * hispanic_squares)) * 100)
return new_column
We apply the diversity index like any other computation:
.. code-block:: Python
with_index = table.compute([
('diversity_index', USATodayDiversityIndex())
])
Simple Moving Average
=====================
A simple moving average is the average of some number of prior values in a series. It is typically used to smooth out variation in time series data.
The following custom :class:`.Computation` will compute a simple moving average. This example assumes your data is already sorted.
.. code-block:: python
class SimpleMovingAverage(agate.Computation):
"""
Computes the simple moving average of a column over some interval.
"""
def __init__(self, column_name, interval):
self._column_name = column_name
self._interval = interval
def get_computed_data_type(self, table):
"""
The return value is a numerical average.
"""
return agate.Number()
def validate(self, table):
"""
Verify the column is numerical.
"""
column = table.columns[self._column_name]
if not isinstance(column.data_type, agate.Number):
raise agate.DataTypeError('Can only be applied to Number data.')
def run(self, table):
new_column = []
for i, row in enumerate(table.rows):
if i < self._interval:
new_column.append(None)
else:
values = tuple(r[self._column_name] for r in table.rows[i - self._interval:i])
if None in values:
new_column.append(None)
else:
new_column.append(sum(values) / self._interval)
return new_column
You would use the simple moving average like so:
.. code-block:: Python
with_average = table.compute([
('six_month_moving_average', SimpleMovingAverage('price', 6))
])
agate-1.6.3/docs/cookbook/create.rst 0000664 0000000 0000000 00000011707 14074061410 0017340 0 ustar 00root root 0000000 0000000 ===============
Creating tables
===============
From data in memory
===================
From a list of lists.
.. code-block:: python
column_names = ['letter', 'number']
column_types = [agate.Text(), agate.Number()]
rows = [
('a', 1),
('b', 2),
('c', None)
]
table = agate.Table(rows, column_names, column_types)
From a list of dictionaries.
.. code-block:: python
rows = [
dict(letter='a', number=1),
dict(letter='b', number=2),
dict(letter='c', number=None)
]
table = agate.Table.from_object(rows)
From a CSV
==========
By default, loading a table from a CSV will use agate's builtin :class:`.TypeTester` to infer column types:
.. code-block:: python
table = agate.Table.from_csv('filename.csv')
Override type inference
=======================
In some cases agate's :class:`.TypeTester` may guess incorrectly. To override the type for some columns and use TypeTester for the rest, pass a dictionary to the ``column_types`` argument.
.. code-block:: python
specified_types = {
'column_name_one': agate.Text(),
'column_name_two': agate.Number()
}
table = agate.Table.from_csv('filename.csv', column_types=specified_types)
This will use a generic TypeTester and override your specified columns with ``TypeTester.force``.
Limit type inference
====================
For large datasets :class:`.TypeTester` may be unreasonably slow. In order to limit the amount of data it uses you can specify the ``limit`` argument. Note that if data after the limit invalidates the TypeTester's inference you may get errors when the data is loaded.
.. code-block:: python
tester = agate.TypeTester(limit=100)
table = agate.Table.from_csv('filename.csv', column_types=tester)
Manually specify columns
========================
If you know the types of your data you may find it more efficient to manually specify the names and types of your columns. This also gives you an opportunity to rename columns when you load them.
.. code-block:: python
text_type = agate.Text()
number_type = agate.Number()
column_names = ['city', 'area', 'population']
column_types = [text_type, number_type, number_type]
table = agate.Table.from_csv('population.csv', column_names, column_types)
Or, you can use this method to load data from a file that does not have a header row:
.. code-block:: python
table = agate.Table.from_csv('population.csv', column_names, column_types, header=False)
From a unicode CSV
==================
You don't have to do anything special. It just works!
From a latin1 CSV
=================
.. code-block:: python
table = agate.Table.from_csv('census.csv', encoding='latin1')
From a semicolon delimited CSV
==============================
Normally, agate will automatically guess the delimiter of your CSV, but if that guess fails you can specify it manually:
.. code-block:: python
table = agate.Table.from_csv('filename.csv', delimiter=';')
From a TSV (tab-delimited CSV)
==============================
This is the same as the previous example, but in this case we specify that the delimiter is a tab:
.. code-block:: python
table = agate.Table.from_csv('filename.csv', delimiter='\t')
From JSON
=========
.. code-block:: python
table = agate.Table.from_json('filename.json')
From newline-delimited JSON
===========================
.. code-block:: python
table = agate.Table.from_json('filename.json', newline=True)
.. _load_a_table_from_a_sql_database:
From a SQL database
===================
Use the `agate-sql `_ extension.
.. code-block:: python
import agatesql
agatesql.patch()
table = agate.Table.from_sql('postgresql:///database', 'input_table')
From an Excel spreadsheet
=========================
Use the `agate-excel `_ extension. It supports both .xls and .xlsx files.
.. code-block:: python
import agateexcel
agateexcel.patch()
table = agate.Table.from_xls('test.xls', sheet='data')
table2 = agate.Table.from_xlsx('test.xlsx', sheet='data')
From a DBF table
================
DBF is the file format used to hold tabular data for ArcGIS shapefiles. `agate-dbf `_ extension.
.. code-block:: python
import agatedbf
agatedbf.patch()
table = agate.Table.from_dbf('test.dbf')
From a remote file
==================
Use the `agate-remote `_ extension.
.. code-block:: python
import agateremote
agateremote.patch()
table = agate.Table.from_url('https://raw.githubusercontent.com/wireservice/agate/master/examples/test.csv')
agate-remote also let’s you create an Archive, which is a reference to a group of tables with a known path structure.
.. code-block:: python
archive = agateremote.Archive('https://github.com/vincentarelbundock/Rdatasets/raw/master/csv/')
table = archive.get_table('sandwich/PublicSchools.csv')
agate-1.6.3/docs/cookbook/datetime.rst 0000664 0000000 0000000 00000005542 14074061410 0017671 0 ustar 00root root 0000000 0000000 ===============
Dates and times
===============
Specify a date format
=====================
By default agate will attempt to guess the format of a :class:`.Date` or :class:`.DateTime` column. In some cases, it may not be possible to automatically figure out the format of a date. In this case you can specify a :meth:`datetime.datetime.strptime` formatting string to specify how the dates should be parsed. For example, if your dates were formatted as "15-03-15" (March 15th, 2015) then you could specify:
.. code-block:: python
date_type = agate.Date('%d-%m-%y')
Another use for this feature is if you have a column that contains extraneous data. For instance, imagine that your column contains hours and minutes, but they are always zero. It would make more sense to load that data as type :class:`.Date` and ignore the extra time information:
.. code-block:: python
date_type = agate.Date('%m/%d/%Y 00:00')
.. _specify_a_timezone:
Specify a timezone
==================
Timezones are hard. Under normal circumstances (no arguments specified), agate will not try to parse timezone information, nor will it apply a timezone to the :class:`datetime.datetime` instances it creates. (They will be *naive* in Python parlance.) There are two ways to force timezone data into your agate columns.
The first is to use a format string, as shown above, and specify a pattern for timezone information:
.. code-block:: python
datetime_type = agate.DateTime('%Y-%m-%d %H:%M:%S%z')
The second way is to specify a timezone as an argument to the type constructor:
.. code-block:: python
import pytz
eastern = pytz.timezone('US/Eastern')
datetime_type = agate.DateTime(timezone=eastern)
In this case all timezones that are processed will be set to have the Eastern timezone. Note, the timezone will be **set**, not converted. You cannot use this method to convert your timezones from UTC to another timezone. To do that see :ref:`convert_timezones`.
Calculate a time difference
=============================
See :ref:`difference_between_dates`.
Sort by date
============
See :ref:`sort_by_date`.
.. _convert_timezones:
Convert timezones
====================
If you load data from a spreadsheet in one timezone and you need to convert it to another, you can do this using a :class:`.Formula`. Your datetime column must have timezone data for the following example to work. See :ref:`specify_a_timezone`.
.. code-block:: python
import pytz
us_eastern = pytz.timezone('US/Eastern')
datetime_type = agate.DateTime(timezone=us_eastern)
column_names = ['what', 'when']
column_types = [text_type, datetime_type]
table = agate.Table.from_csv('events.csv', columns)
rome = timezone('Europe/Rome')
timezone_shifter = agate.Formula(lambda r: r['when'].astimezone(rome))
table = agate.Table.compute([
('when_in_rome', timezone_shifter)
])
agate-1.6.3/docs/cookbook/excel.rst 0000664 0000000 0000000 00000007516 14074061410 0017200 0 ustar 00root root 0000000 0000000 =============
Emulate Excel
=============
One of agate's most powerful assets is that instead of a wimpy "formula" language, you have the entire Python language at your disposal. Here are examples of how to translate a few common Excel operations.
Simple formulas
===============
If you need to simulate a simple Excel formula you can use the :class:`.Formula` class to apply an arbitrary function.
Excel:
.. code::
=($A1 + $B1) / $C1
agate:
.. code-block:: python
def f(row):
return (row['a'] + row['b']) / row['c']
new_table = table.compute([
('new_column', agate.Formula(agate.Number(), f))
])
If this still isn't enough flexibility, you can also create your own subclass of :class:`.Computation`.
SUM
===
.. code-block:: python
number_type = agate.Number()
def five_year_total(row):
columns = ('2009', '2010', '2011', '2012', '2013')
return sum(tuple(row[c] for c in columns)]
formula = agate.Formula(number_type, five_year_total)
new_table = table.compute([
('five_year_total', formula)
])
TRIM
====
.. code-block:: python
new_table = table.compute([
('name_stripped', agate.Formula(text_type, lambda r: r['name'].strip()))
])
CONCATENATE
===========
.. code-block:: python
new_table = table.compute([
('full_name', agate.Formula(text_type, lambda r: '%(first_name)s %(middle_name)s %(last_name)s' % r))
])
IF
==
.. code-block:: python
new_table = table.compute([
('mvp_candidate', agate.Formula(boolean_type, lambda r: row['batting_average'] > 0.3))
])
VLOOKUP
=======
There are two ways to get the equivalent of Excel's VLOOKUP with agate. If your lookup source is another agate :class:`.Table`, then you'll want to use the :meth:`.Table.join` method:
.. code-block:: python
new_table = mvp_table.join(states, 'state_abbr')
This will add all the columns from the `states` table to the `mvp_table`, where their `state_abbr` columns match.
If your lookup source is a Python dictionary or some other object you can implement the lookup using a :class:`.Formula` computation:
.. code-block:: python
states = {
'AL': 'Alabama',
'AK': 'Alaska',
'AZ': 'Arizona',
...
}
new_table = table.compute([
('mvp_candidate', agate.Formula(text_type, lambda r: states[row['state_abbr']]))
])
Pivot tables as cross-tabulations
=================================
Pivot tables in Excel implement a tremendous range of functionality. Agate divides this functionality into a few different methods.
If what you want is to convert rows to columns to create a "crosstab", then you'll want to use the :meth:`.Table.pivot` method:
.. code-block:: python
jobs_by_state_and_year = employees.pivot('state', 'year')
This will generate a table with a row for each value in the `state` column and a column for each value in the `year` column. The intersecting cells will contains the counts grouped by state and year. You can pass the `aggregation` keyword to aggregate some other value, such as :class:`.Mean` or :class:`.Median`.
Pivot tables as summaries
=========================
On the other hand, if what you want is to summarize your table with descriptive statistics, then you'll want to use :meth:`.Table.group_by` and :meth:`.TableSet.aggregate`:
.. code-block:: python
jobs = employees.group_by('job_title')
summary = jobs.aggregate([
('employee_count', agate.Count()),
('salary_mean', agate.Mean('salary')),
('salary_median', agate.Median('salary'))
])
The resulting ``summary`` table will have four columns: ``job_title``, ``employee_count``, ``salary_mean`` and ``salary_median``.
You may also want to look at the :meth:`.Table.normalize` and :meth:`.Table.denormalize` methods for examples of functionality frequently accomplished with Excel's pivot tables.
agate-1.6.3/docs/cookbook/filter.rst 0000664 0000000 0000000 00000005525 14074061410 0017363 0 ustar 00root root 0000000 0000000 ===========
Filter rows
===========
By regex
========
You can use Python's builtin :mod:`re` module to introduce a regular expression into a :meth:`.Table.where` query.
For example, here we find all states that start with "C".
.. code-block:: python
import re
new_table = table.where(lambda row: re.match('^C', str(row['state'])))
This can also be useful for finding values that **don't** match your expectations. For example, finding all values in the "phone number" column that don't look like phone numbers:
.. code-block:: python
new_table = table.where(lambda row: not re.match('\d{3}-\d{3}-\d{4}', str(row['phone'])))
By glob
=======
Hate regexes? You can use glob (:mod:`fnmatch`) syntax too!
.. code-block:: python
from fnmatch import fnmatch
new_table = table.where(lambda row: fnmatch('C*', row['state']))
Values within a range
=====================
This snippet filters the dataset to incomes between 100,000 and 200,000.
.. code-block:: python
new_table = table.where(lambda row: 100000 < row['income'] < 200000)
Dates within a range
====================
This snippet filters the dataset to events during the summer of 2015:
.. code-block:: python
import datetime
new_table = table.where(lambda row: datetime.datetime(2015, 6, 1) <= row['date'] <= datetime.datetime(2015, 8, 31))
If you want to filter to events during the summer of any year:
.. code-block:: python
new_table = table.where(lambda row: 6 <= row['date'].month <= 8)
Top N percent
=============
To filter a dataset to the top 10% percent of values we first compute the percentiles for the column and then use the result in the :meth:`.Table.where` truth test:
.. code-block:: python
percentiles = table.aggregate(agate.Percentiles('salary'))
top_ten_percent = table.where(lambda r: r['salary'] >= percentiles[90])
Random sample
=============
By combining a random sort with limiting, we can effectively get a random sample from a table.
.. code-block:: python
import random
randomized = table.order_by(lambda row: random.random())
sampled = table.limit(10)
Ordered sample
==============
With can also get an ordered sample by simply using the :code:`step` parameter of the :meth:`.Table.limit` method to get every Nth row.
.. code-block:: python
sampled = table.limit(step=10)
Distinct values
===============
You can retrieve a distinct list of values in a column using :meth:`.Column.values_distinct` or :meth:`.Table.distinct`.
:meth:`.Table.distinct` returns the entire row so it's necessary to chain a select on the specific column.
.. code-block:: python
columns = ('value',)
rows = ([1],[2],[2],[5])
new_table = agate.Table(rows, columns)
new_table.columns['value'].values_distinct()
# or
new_table.distinct('value').columns['value'].values()
(Decimal('1'), Decimal('2'), Decimal('5'))
agate-1.6.3/docs/cookbook/homogenize.rst 0000664 0000000 0000000 00000005572 14074061410 0020244 0 ustar 00root root 0000000 0000000 ===============
Homogenize rows
===============
Fill in missing rows in a series. This can be used, for instance, to add rows for missing years in a time series.
Create rows for missing values
==============================
We can insert a default row for each value that is missing in a table from a given sequence of values.
Starting with a table like this, we can fill in rows for all missing years:
+-------+--------------+------------+
| year | female_count | male_count |
+=======+==============+============+
| 1997 | 2 | 1 |
+-------+--------------+------------+
| 2000 | 4 | 3 |
+-------+--------------+------------+
| 2002 | 4 | 5 |
+-------+--------------+------------+
| 2003 | 1 | 2 |
+-------+--------------+------------+
.. code-block:: python
key = 'year'
expected_values = (1997, 1998, 1999, 2000, 2001, 2002, 2003)
# Your default row should specify column values not in `key`
default_row = (0, 0)
new_table = table.homogenize(key, expected_values, default_row)
The result will be:
+-------+--------------+------------+
| year | female_count | male_count |
+=======+==============+============+
| 1997 | 2 | 1 |
+-------+--------------+------------+
| 1998 | 0 | 0 |
+-------+--------------+------------+
| 1999 | 0 | 0 |
+-------+--------------+------------+
| 2000 | 4 | 3 |
+-------+--------------+------------+
| 2001 | 0 | 0 |
+-------+--------------+------------+
| 2002 | 4 | 5 |
+-------+--------------+------------+
| 2003 | 1 | 2 |
+-------+--------------+------------+
Create dynamic rows based on missing values
===========================================
We can also specify new row values with a value-generating function:
.. code-block:: python
key = 'year'
expected_values = (1997, 1998, 1999, 2000, 2001, 2002, 2003)
# If default row is a function, it should return a full row
def default_row(missing_value):
return (missing_value, missing_value-1997, missing_value-1997)
new_table = table.homogenize(key, expected_values, default_row)
The new table will be:
+-------+--------------+------------+
| year | female_count | male_count |
+=======+==============+============+
| 1997 | 2 | 1 |
+-------+--------------+------------+
| 1998 | 1 | 1 |
+-------+--------------+------------+
| 1999 | 2 | 2 |
+-------+--------------+------------+
| 2000 | 4 | 3 |
+-------+--------------+------------+
| 2001 | 4 | 4 |
+-------+--------------+------------+
| 2002 | 4 | 5 |
+-------+--------------+------------+
| 2003 | 1 | 2 |
+-------+--------------+------------+
agate-1.6.3/docs/cookbook/locale.rst 0000664 0000000 0000000 00000002506 14074061410 0017331 0 ustar 00root root 0000000 0000000 =======
Locales
=======
agate strives to work equally well for users from all parts of the world. This means properly handling foreign currencies, date formats, etc. To facilitate this, agate makes a hard distinction between *your* locale and the locale of *the data* you are working with. This allows you to work seamlessly with data from other countries.
Set your locale
===============
Setting your locale will change how numbers are displayed when you print an agate :class:`.Table` or serialize it to, for example, a CSV file. This works the same as it does for any other Python module. See the :mod:`locale` documentation for details. Changing your locale will not affect how they are parsed from the files you are using. To change how data is parsed see :ref:`specify_locale_of_numbers`.
.. _specify_locale_of_numbers:
Specify locale of numbers
=========================
To correctly parse numbers from non-US locales, you must pass a :code:`locale` parameter to the :class:`.Number` constructor. For example, to parse Dutch numbers (which use a period to separate thousands and a comma to separate fractions):
.. code-block:: python
dutch_numbers = agate.Number(locale='nl_NL')
column_names = ['city', 'population']
column_types = [text_type, dutch_numbers]
table = agate.Table.from_csv('dutch_cities.csv', columns)
agate-1.6.3/docs/cookbook/lookup.rst 0000664 0000000 0000000 00000004414 14074061410 0017403 0 ustar 00root root 0000000 0000000 ======
Lookup
======
Generate new columns by mapping existing data to common `lookup `_ tables.
CPI deflation
=============
The `agate-lookup `_ extension adds a ``lookup`` method to agate's Table class.
Starting with a table that looks like this:
+-------+-------+
| year | cost |
+=======+=======+
| 1995 | 2.0 |
+-------+-------+
| 1997 | 2.2 |
+-------+-------+
| 1996 | 2.3 |
+-------+-------+
| 2003 | 4.0 |
+-------+-------+
| 2007 | 5.0 |
+-------+-------+
| 2005 | 6.0 |
+-------+-------+
We can map the ``year`` column to its annual CPI index in one lookup call.
.. code-block:: python
import agatelookup
agatelookup.patch()
join_year_cpi = table.lookup('year', 'cpi')
The return table will have now have a new column:
+-------+------+----------+
| year | cost | cpi |
+=======+======+==========+
| 1995 | 2.0 | 152.383 |
+-------+------+----------+
| 1997 | 2.2 | 160.525 |
+-------+------+----------+
| 1996 | 2.3 | 156.858 |
+-------+------+----------+
| 2003 | 4.0 | 184.000 |
+-------+------+----------+
| 2007 | 5.0 | 207.344 |
+-------+------+----------+
| 2005 | 6.0 | 195.267 |
+-------+------+----------+
A simple computation tacked on to this lookup can then get the 2015 equivalent values of each cost:
.. code-block:: python
cpi_2015 = Decimal(216.909)
def cpi_adjust_2015(row):
return (row['cost'] * (cpi_2015 / row['cpi'])).quantize(Decimal('0.01'))
cost_2015 = join_year_cpi.compute([
('cost_2015', agate.Formula(agate.Number(), cpi_adjust_2015))
])
And the final table will look like this:
+-------+------+---------+------------+
| year | cost | cpi | cost_2015 |
+=======+======+=========+============+
| 1995 | 2.0 | 152.383 | 2.85 |
+-------+------+---------+------------+
| 1997 | 2.2 | 160.525 | 2.97 |
+-------+------+---------+------------+
| 1996 | 2.3 | 156.858 | 3.18 |
+-------+------+---------+------------+
| 2003 | 4.0 | 184.000 | 4.72 |
+-------+------+---------+------------+
| 2007 | 5.0 | 207.344 | 5.23 |
+-------+------+---------+------------+
| 2005 | 6.0 | 195.267 | 6.66 |
+-------+------+---------+------------+
agate-1.6.3/docs/cookbook/r.rst 0000664 0000000 0000000 00000004565 14074061410 0016342 0 ustar 00root root 0000000 0000000 =========
Emulate R
=========
c()
===
Agate's :meth:`.Table.select` and :meth:`.Table.exclude` are the equivalent of R's :code:`c` for selecting columns.
R:
.. code-block:: r
selected <- data[c("last_name", "first_name", "age")]
excluded <- data[!c("last_name", "first_name", "age")]
agate:
.. code-block:: python
selected = table.select(['last_name', 'first_name', 'age'])
excluded = table.exclude(['last_name', 'first_name', 'age'])
subset
======
Agate's :meth:`.Table.where` is the equivalent of R's :code:`subset`.
R:
.. code-block:: r
newdata <- subset(data, age >= 20 | age < 10)
agate:
.. code-block:: python
new_table = table.where(lambda row: row['age'] >= 20 or row['age'] < 10)
order
=====
Agate's :meth:`.Table.order_by` is the equivalent of R's :code:`order`.
R:
.. code-block:: r
newdata <- employees[order(last_name),]
agate:
.. code-block:: python
new_table = employees.order_by('last_name')
merge
=====
Agate's :meth:`.Table.join` is the equivalent of R's :code:`merge`.
R:
.. code-block:: r
joined <- merge(employees, states, by="usps")
agate:
.. code-block:: python
joined = employees.join(states, 'usps')
rbind
=====
Agate's :meth:`.Table.merge` is the equivalent of R's :code:`rbind`.
R:
.. code-block:: r
merged <- rbind(first_year, second_year)
agate:
.. code-block:: python
merged = agate.Table.merge(first_year, second_year)
aggregate
=========
Agate's :meth:`.Table.group_by` and :meth:`.TableSet.aggregate` can be used to recreate the functionality of R's :code:`aggregate`.
R:
.. code-block:: r
aggregates = aggregate(employees$salary, list(job = employees$job), mean)
agate:
.. code-block:: python
jobs = employees.group_by('job')
aggregates = jobs.aggregate([
('mean', agate.Mean('salary'))
])
melt
====
Agate's :meth:`.Table.normalize` is the equivalent of R's :code:`melt`.
R:
.. code-block:: r
melt(employees, id=c("last_name", "first_name"))
agate:
.. code-block:: python
employees.normalize(['last_name', 'first_name'])
cast
====
Agate's :meth:`.Table.denormalize` is the equivalent of R's :code:`cast`.
R:
.. code-block:: r
melted = melt(employees, id=c("name"))
casted = cast(melted, name~variable, mean)
agate:
.. code-block:: python
normalized = employees.normalize(['name'])
denormalized = normalized.denormalize('name')
agate-1.6.3/docs/cookbook/rank.rst 0000664 0000000 0000000 00000003375 14074061410 0017032 0 ustar 00root root 0000000 0000000 ====
Rank
====
There are many ways to rank a sequence of values. agate strives to find a balance between simple, intuitive ranking and flexibility when you need it.
Competition rank
================
The basic rank supported by agate is standard "competition ranking". In this model the values :code:`[3, 4, 4, 5]` would be ranked :code:`[1, 2, 2, 4]`. You can apply competition ranking using the :class:`.Rank` computation:
.. code-block:: python
new_table = table.compute([
('rank', agate.Rank('value'))
])
Rank descending
===============
Descending competition ranking is specified using the :code:`reverse` argument.
.. code-block:: python
new_table = table.compute([
('rank', agate.Rank('value', reverse=True))
])
Rank change
===========
You can compute the change from one rank to another by combining the :class:`.Rank` and :class:`.Change` computations:
.. code-block:: python
new_table = table.compute([
('rank2014', agate.Rank('value2014')),
('rank2015', agate.Rank('value2015'))
])
new_table2 = new_table.compute([
('rank_change', agate.Change('rank2014', 'rank2015'))
])
Percentile rank
===============
"Percentile rank" is a bit of a misnomer. Really, this is the percentile in which each value in a column is located. This column can be computed for your data using the :class:`.PercentileRank` computation:
.. code-block:: Python
new_table = table.compute([
('percentile_rank', agate.PercentileRank('value'))
])
Note that there is no entirely standard method for computing percentiles. The percentiles computed in this manner may not agree precisely with those generated by other software. See the :class:`.Percentiles` class documentation for implementation details.
agate-1.6.3/docs/cookbook/remove.rst 0000664 0000000 0000000 00000001023 14074061410 0017360 0 ustar 00root root 0000000 0000000 ==============
Remove columns
==============
Include specific columns
=========================
Create a new table with only a specific set of columns:
.. code-block:: python
include_columns = ['column_name_one', 'column_name_two']
new_table = table.select(include_columns)
Exclude specific columns
========================
Create a new table without a specific set of columns:
.. code-block:: python
exclude_columns = ['column_name_one', 'column_name_two']
new_table = table.exclude(exclude_columns)
agate-1.6.3/docs/cookbook/save.rst 0000664 0000000 0000000 00000001023 14074061410 0017021 0 ustar 00root root 0000000 0000000 ============
Save a table
============
To a CSV
========
.. code-block:: python
table.to_csv('filename.csv')
To JSON
=======
.. code-block:: python
table.to_json('filename.json')
To newline-delimited JSON
=========================
.. code-block:: python
table.to_json('filename.json', newline=True)
To a SQL database
=================
Use the `agate-sql `_ extension.
.. code-block:: python
import agatesql
table.to_sql('postgresql:///database', 'output_table')
agate-1.6.3/docs/cookbook/search.rst 0000664 0000000 0000000 00000002606 14074061410 0017340 0 ustar 00root root 0000000 0000000 ======
Search
======
Exact search
============
Find all individuals with the last_name "Groskopf":
.. code-block:: python
family = table.where(lambda r: r['last_name'] == 'Groskopf')
Fuzzy search by edit distance
=============================
By leveraging an `existing Python library `_ for computing the `Levenshtein edit distance `_ it is trivially easy to implement a fuzzy string search.
For example, to find all names within 2 edits of "Groskopf":
.. code-block:: python
from Levenshtein import distance
fuzzy_family = table.where(lambda r: distance(r['last_name'], 'Groskopf') <= 2)
These results will now include all those "Grosskopfs" and "Groskoffs" whose mail I am always getting.
Fuzzy search by phonetic similarity
===================================
By using `Fuzzy `_ to calculate phonetic similarity, it is possible to implement a fuzzy phonetic search.
For example to find all rows with `first_name` phonetically similar to "Catherine":
.. code-block:: python
import fuzzy
dmetaphone = fuzzy.DMetaphone(4)
phonetic_search = dmetaphone('Catherine')
def phonetic_match(r):
return any(x in dmetaphone(r['first_name']) for x in phonetic_search)
phonetic_family = table.where(lambda r: phonetic_match(r))
agate-1.6.3/docs/cookbook/sort.rst 0000664 0000000 0000000 00000002151 14074061410 0017055 0 ustar 00root root 0000000 0000000 ====
Sort
====
Alphabetical
============
Order a table by the :code:`last_name` column:
.. code-block:: python
new_table = table.order_by('last_name')
Numerical
=========
Order a table by the :code:`cost` column:
.. code-block:: python
new_table = table.order_by('cost')
.. _sort_by_date:
By date
=======
Order a table by the :code:`birth_date` column:
.. code-block:: python
new_table = table.order_by('birth_date')
Reverse order
=============
The order of any sort can be reversed by using the :code:`reverse` keyword:
.. code-block:: python
new_table = table.order_by('birth_date', reverse=True)
Multiple columns
================
Because Python's internal sorting works natively with sequences, we can implement multi-column sort by returning a tuple from the key function.
.. code-block:: python
new_table = table.order_by(lambda row: (row['last_name'], row['first_name']))
This table will now be ordered by :code:`last_name`, then :code:`first_name`.
Random order
============
.. code-block:: python
import random
new_table = table.order_by(lambda row: random.random())
agate-1.6.3/docs/cookbook/sql.rst 0000664 0000000 0000000 00000010705 14074061410 0016671 0 ustar 00root root 0000000 0000000 ===========
Emulate SQL
===========
agate's command structure is very similar to SQL. The primary difference between agate and SQL is that commands like :code:`SELECT` and :code:`WHERE` explicitly create new tables. You can chain them together as you would with SQL, but be aware each command is actually creating a new table.
.. note::
All examples in this section use the `PostgreSQL `_ dialect for comparison.
If you want to read and write data from SQL, see :ref:`load_a_table_from_a_sql_database`.
SELECT
======
SQL:
.. code-block:: postgres
SELECT state, total FROM table;
agate:
.. code-block:: python
new_table = table.select(['state', 'total'])
WHERE
=====
SQL:
.. code-block:: postgres
SELECT * FROM table WHERE LOWER(state) = 'california';
agate:
.. code-block:: python
new_table = table.where(lambda row: row['state'].lower() == 'california')
ORDER BY
========
SQL:
.. code-block:: postgres
SELECT * FROM table ORDER BY total DESC;
agate:
.. code-block:: python
new_table = table.order_by(lambda row: row['total'], reverse=True)
DISTINCT
========
SQL:
.. code-block:: postgres
SELECT DISTINCT ON (state) * FROM table;
agate:
.. code-block:: python
new_table = table.distinct('state')
.. note::
Unlike most SQL implementations, agate always returns the full row. Use :meth:`.Table.select` if you want to filter the columns first.
INNER JOIN
==========
SQL (two ways):
.. code-block:: postgres
SELECT * FROM patient, doctor WHERE patient.doctor = doctor.id;
SELECT * FROM patient INNER JOIN doctor ON (patient.doctor = doctor.id);
agate:
.. code-block:: python
joined = patients.join(doctors, 'doctor', 'id', inner=True)
LEFT OUTER JOIN
===============
SQL:
.. code-block:: postgres
SELECT * FROM patient LEFT OUTER JOIN doctor ON (patient.doctor = doctor.id);
agate:
.. code-block:: python
joined = patients.join(doctors, 'doctor', 'id')
FULL OUTER JOIN
===============
SQL:
.. code-block:: postgres
SELECT * FROM patient FULL OUTER JOIN doctor ON (patient.doctor = doctor.id);
agate:
.. code-block:: python
joined = patients.join(doctors, 'doctor', 'id', full_outer=True)
GROUP BY
========
agate's :meth:`.Table.group_by` works slightly different than SQLs. It does not require an aggregate function. Instead it returns :py:class:`.TableSet`. To see how to perform the equivalent of a SQL aggregate, see below.
.. code-block:: python
doctors = patients.group_by('doctor')
You can group by two or more columns by chaining the command.
.. code-block:: python
doctors_by_state = patients.group_by('state').group_by('doctor')
HAVING
======
agate's :meth:`.TableSet.having` works very similar to SQL's keyword of the same name.
.. code-block:: python
doctors = patients.group_by('doctor')
popular_doctors = doctors.having([
('patient_count', Count())
], lambda t: t['patient_count'] > 100)
This filters to only those doctors whose table includes at least 100 results. Can add as many aggregations as you want to the list and each will be available, by name in the test function you pass.
For example, here we filter to popular doctors with more an average review of at least three stars:
.. code-block:: python
doctors = patients.group_by('doctor')
popular_doctors = doctors.having([
('patient_count', Count()),
('average_stars', Average('stars'))
], lambda t: t['patient_count'] > 100 and t['average_stars'] >= 3)
Chain commands together
=======================
SQL:
.. code-block:: postgres
SELECT state, total FROM table WHERE LOWER(state) = 'california' ORDER BY total DESC;
agate:
.. code-block:: python
new_table = table \
.select(['state', 'total']) \
.where(lambda row: row['state'].lower() == 'california') \
.order_by('total', reverse=True)
.. note::
Chaining commands in this way is sometimes not a good idea. Being explicit about each step can lead to clearer code.
Aggregate functions
===================
SQL:
.. code-block:: postgres
SELECT mean(age), median(age) FROM patients GROUP BY doctor;
agate:
.. code-block:: python
doctors = patients.group_by('doctor')
patient_ages = doctors.aggregate([
('patient_count', agate.Count()),
('age_mean', agate.Mean('age')),
('age_median', agate.Median('age'))
])
The resulting table will have four columns: ``doctor``, ``patient_count``, ``age_mean`` and ``age_median``.
agate-1.6.3/docs/cookbook/standardize.rst 0000664 0000000 0000000 00000002270 14074061410 0020400 0 ustar 00root root 0000000 0000000 ============================
Standardize names and values
============================
Standardize row and columns names
=================================
The :meth:`Table.rename` method has arguments to convert row or column names to slugs and append unique identifiers to duplicate values.
Using an existing table object:
.. code-block:: python
# Convert column names to unique slugs
table.rename(slug_columns=True)
# Convert row names to unique slugs
table.rename(slug_rows=True)
# Convert both column and row names to unique slugs
table.rename(slug_columns=True, slug_rows=True)
Standardize column values
=========================
agate has a :class:`Slug` computation that can be used to also standardize text column values. The computation has an option to also append unique identifiers to duplicate values.
Using an existing table object:
.. code-block:: python
# Convert the values in column 'title' to slugs
new_table = table.compute([
('title-slug', agate.Slug('title'))
])
# Convert the values in column 'title' to unique slugs
new_table = table.compute([
('title-slug', agate.Slug('title', ensure_unique=True))
])
agate-1.6.3/docs/cookbook/statistics.rst 0000664 0000000 0000000 00000010507 14074061410 0020264 0 ustar 00root root 0000000 0000000 ==========
Statistics
==========
Common descriptive and aggregate statistics are included with the core agate library. For additional statistical methods beyond the scope of agate consider using the `agate-stats `_ extension or integrating with `scipy `_.
Descriptive statistics
======================
agate includes a full set of standard descriptive statistics that can be applied to any column containing :class:`.Number` data.
.. code-block:: python
table.aggregate(agate.Sum('salary'))
table.aggregate(agate.Min('salary'))
table.aggregate(agate.Max('salary'))
table.aggregate(agate.Mean('salary'))
table.aggregate(agate.Median('salary'))
table.aggregate(agate.Mode('salary'))
table.aggregate(agate.Variance('salary'))
table.aggregate(agate.StDev('salary'))
table.aggregate(agate.MAD('salary'))
Or, get several at once:
.. code-block:: python
table.aggregate([
agate.Min('salary'),
agate.Mean('salary'),
agate.Max('salary')
])
Aggregate statistics
====================
You can also generate aggregate statistics for subsets of data (sometimes referred to as "rolling up"):
.. code-block:: python
doctors = patients.group_by('doctor')
patient_ages = doctors.aggregate([
('patient_count', agate.Count()),
('age_mean', agate.Mean('age')),
('age_median', agate.Median('age'))
])
The resulting table will have four columns: ``doctor``, ``patient_count``, ``age_mean`` and ``age_median``.
You can roll up by multiple columns by chaining agate's :meth:`.Table.group_by` method.
.. code-block:: python
doctors_by_state = patients.group_by("state").group_by('doctor')
Distribution by count (frequency)
=================================
Counting the number of each unique value in a column can be accomplished with the :meth:`.Table.pivot` method:
.. code-block:: python
# Counts of a single column's values
table.pivot('doctor')
# Counts of all combinations of more than one column's values
table.pivot(['doctor', 'hospital'])
The resulting tables will have a column for each key column and another :code:`Count` column counting the number of instances of each value.
Distribution by percent
=======================
:meth:`.Table.pivot` can also be used to calculate the distribution of values as a percentage of the total number:
.. code-block:: python
# Percents of a single column's values
table.pivot('doctor', computation=agate.Percent('Count'))
# Percents of all combinations of more than one column's values
table.pivot(['doctor', 'hospital'], computation=agate.Percent('Count'))
The output table will be the same format as the previous example, except the value column will be named :code:`Percent`.
Identify outliers
=================
The `agate-stats `_ extension adds methods for finding outliers.
.. code-block:: python
import agatestats
agatestats.patch()
outliers = table.stdev_outliers('salary', deviations=3, reject=False)
By specifying :code:`reject=True` you can instead return a table including only those values **not** identified as outliers.
.. code-block:: python
not_outliers = table.stdev_outliers('salary', deviations=3, reject=True)
The second, more robust, method for identifying outliers is by identifying values which are more than some number of "median absolute deviations" from the median (typically 3).
.. code-block:: python
outliers = table.mad_outliers('salary', deviations=3, reject=False)
As with the first example, you can specify :code:`reject=True` to exclude outliers in the resulting table.
Custom statistics
==================
You can also generate custom aggregated statistics for your data by defining your own 'summary' aggregation. This might be especially useful for performing calculations unique to your data. Here's a simple example:
.. code-block:: python
# Create a custom summary aggregation with agate.Summary
# Input a column name, a return data type and a function to apply on the column
count_millionaires = agate.Summary('salary', agate.Number(), lambda r: sum(salary > 1000000 for salary in r.values()))
table.aggregate([
count_millionaires
])
Your custom aggregation can be used to determine both descriptive and aggregate statistics shown above.
agate-1.6.3/docs/cookbook/transform.rst 0000664 0000000 0000000 00000014235 14074061410 0020107 0 ustar 00root root 0000000 0000000 =========
Transform
=========
Pivot by a single column
========================
The :meth:`.Table.pivot` method is a general process for grouping data by row and, optionally, by column, and then calculating some aggregation for each group. Consider the following table:
+---------+---------+--------+-------+
| name | race | gender | age |
+=========+=========+========+=======+
| Joe | white | female | 20 |
+---------+---------+--------+-------+
| Jane | asian | male | 20 |
+---------+---------+--------+-------+
| Jill | black | female | 20 |
+---------+---------+--------+-------+
| Jim | latino | male | 25 |
+---------+---------+--------+-------+
| Julia | black | female | 25 |
+---------+---------+--------+-------+
| Joan | asian | female | 25 |
+---------+---------+--------+-------+
In the very simplest case, this table can be pivoted to count the number occurences of values in a column:
.. code-block:: python
transformed = table.pivot('race')
Result:
+---------+--------+
| race | pivot |
+=========+========+
| white | 1 |
+---------+--------+
| asian | 2 |
+---------+--------+
| black | 2 |
+---------+--------+
| latino | 1 |
+---------+--------+
Pivot by multiple columns
=========================
You can pivot by multiple columns either as additional row-groups, or as intersecting columns. For example, given the table in the previous example:
.. code-block:: python
transformed = table.pivot(['race', 'gender'])
Result:
+---------+--------+-------+
| race | gender | pivot |
+=========+========+=======+
| white | female | 1 |
+---------+--------+-------+
| asian | male | 1 |
+---------+--------+-------+
| black | female | 2 |
+---------+--------+-------+
| latino | male | 1 |
+---------+--------+-------+
| asian | female | 1 |
+---------+--------+-------+
For the column, version you would do:
.. code-block:: python
transformed = table.pivot('race', 'gender')
Result:
+---------+--------+--------+
| race | male | female |
+=========+========+========+
| white | 0 | 1 |
+---------+--------+--------+
| asian | 1 | 1 |
+---------+--------+--------+
| black | 0 | 2 |
+---------+--------+--------+
| latino | 1 | 0 |
+---------+--------+--------+
Pivot to sum
============
The default pivot aggregation is :class:`.Count` but you can also supply other operations. For example, to aggregate each group by :class:`.Sum` of their ages:
.. code-block:: python
transformed = table.pivot('race', 'gender', aggregation=agate.Sum('age'))
+---------+--------+--------+
| race | male | female |
+=========+========+========+
| white | 0 | 20 |
+---------+--------+--------+
| asian | 20 | 25 |
+---------+--------+--------+
| black | 0 | 45 |
+---------+--------+--------+
| latino | 25 | 0 |
+---------+--------+--------+
Pivot to percent of total
=========================
Pivot allows you to apply a :class:`.Computation` to each row of aggregated results prior to returning the table. Use the stringified name of the aggregation as the column argument to your computation:
.. code-block:: python
transformed = table.pivot('race', 'gender', aggregation=agate.Sum('age'), computation=agate.Percent('sum'))
+---------+--------+--------+
| race | male | female |
+=========+========+========+
| white | 0 | 14.8 |
+---------+--------+--------+
| asian | 14.8 | 18.4 |
+---------+--------+--------+
| black | 0 | 33.3 |
+---------+--------+--------+
| latino | 18.4 | 0 |
+---------+--------+--------+
*Note: actual computed percentages will be much more precise.*
It's helpful when constructing these cases to think of all the cells in the pivot table as a single sequence.
Denormalize key/value columns into separate columns
===================================================
It's common for very large datasets to be distributed in a "normalized" format, such as:
+---------+-----------+---------+
| name | property | value |
+=========+===========+=========+
| Jane | gender | female |
+---------+-----------+---------+
| Jane | race | black |
+---------+-----------+---------+
| Jane | age | 24 |
+---------+-----------+---------+
| ... | ... | ... |
+---------+-----------+---------+
The :meth:`.Table.denormalize` method can be used to transform the table so that each unique property has its own column.
.. code-block:: python
transformed = table.denormalize('name', 'property', 'value')
Result:
+---------+----------+--------+-------+
| name | gender | race | age |
+=========+==========+========+=======+
| Jane | female | black | 24 |
+---------+----------+--------+-------+
| Jack | male | white | 35 |
+---------+----------+--------+-------+
| Joe | male | black | 28 |
+---------+----------+--------+-------+
Normalize separate columns into key/value columns
=================================================
Sometimes you have a dataset where each property has its own column, but your analysis would be easier if all properties were stored together. Consider this table:
+---------+----------+--------+-------+
| name | gender | race | age |
+=========+==========+========+=======+
| Jane | female | black | 24 |
+---------+----------+--------+-------+
| Jack | male | white | 35 |
+---------+----------+--------+-------+
| Joe | male | black | 28 |
+---------+----------+--------+-------+
The :meth:`.Table.normalize` method can be used to transform the table so that all the properties and their values share two columns.
.. code-block:: python
transformed = table.normalize('name', ['gender', 'race', 'age'])
Result:
+---------+-----------+---------+
| name | property | value |
+=========+===========+=========+
| Jane | gender | female |
+---------+-----------+---------+
| Jane | race | black |
+---------+-----------+---------+
| Jane | age | 24 |
+---------+-----------+---------+
| ... | ... | ... |
+---------+-----------+---------+
agate-1.6.3/docs/cookbook/underscore.rst 0000664 0000000 0000000 00000002127 14074061410 0020242 0 ustar 00root root 0000000 0000000 =====================
Emulate underscore.js
=====================
filter
======
agate's :meth:`.Table.where` functions exactly like Underscore's :code:`filter`.
.. code-block:: python
new_table = table.where(lambda row: row['state'] == 'Texas')
reject
======
To simulate Underscore's :code:`reject`, simply negate the return value of the function you pass into agate's :meth:`.Table.where`.
.. code-block:: python
new_table = table.where(lambda row: not (row['state'] == 'Texas'))
find
====
agate's :meth:`.Table.find` works exactly like Underscore's :code:`find`.
.. code-block:: python
row = table.find(lambda row: row['state'].startswith('T'))
any
===
The :class:`.Any` aggregation works like Underscore's :code:`any`.
.. code-block:: python
true_or_false = table.aggregate(Any('salaries', lambda d: d > 100000))
You can also use :meth:`.Table.where` to filter to columns that pass the truth test.
all
===
The :class:`.All` aggregation works like Underscore's :code:`all`.
.. code-block:: python
true_or_false = table.aggregate(All('salaries', lambda d: d > 100000))
agate-1.6.3/docs/extensions.rst 0000664 0000000 0000000 00000004737 14074061410 0016473 0 ustar 00root root 0000000 0000000 ==========
Extensions
==========
The core agate library is designed rely on as few dependencies as possible. However, in the real world you're often going to want to interface with more specialized tools, or with other formats, such as SQL or Excel.
Using extensions
================
agate support's plugin-style extensions using a monkey-patching pattern. Libraries can be created that add new methods onto :class:`.Table` and :class:`.TableSet`. For example, `agate-sql `_ adds the ability to read and write tables from a SQL database:
.. code-block:: python
import agate
import agatesql
# After calling patch the from_sql and to_sql methods are now part of the Table class
table = agate.Table.from_sql('postgresql:///database', 'input_table')
table.to_sql('postgresql:///database', 'output_table')
List of extensions
==================
Here is a list of agate extensions that are known to be actively maintained:
* `agate-sql `_: Read and write tables in SQL databases
* `agate-stats `_: Additional statistical methods
* `agate-excel `_: Read excel tables (xls and xlsx)
* `agate-dbf `_: Read dbf tables (from shapefiles)
* `agate-remote `_: Read from remote files
* `agate-lookup `_: Instantly join to hosted `lookup `_ tables.
Writing your own extensions
===========================
Writing your own extensions is straightforward. Create a function that acts as your "patch" and then dynamically add it to :class:`.Table` or :class:`.TableSet`.
.. code-block:: python
import agate
def new_method(self):
print('I do something to a Table when you call me.')
agate.Table.new_method = new_method
You can also create new classmethods:
.. code-block:: python
def new_class_method(cls):
print('I make Tables when you call me.')
agate.Table.new_method = classmethod(new_method)
These methods can now be called on :class:`.Table` class in your code:
.. code-block:: python
>>> import agate
>>> import myextension
>>> table = agate.Table(rows, column_names, column_types)
>>> table.new_method()
'I do something to a Table when you call me.'
>>> agate.Table.new_class_method()
'I make Tables when you call me.'
The same pattern also works for adding methods to :class:`.TableSet`.
agate-1.6.3/docs/images/ 0000775 0000000 0000000 00000000000 14074061410 0014774 5 ustar 00root root 0000000 0000000 agate-1.6.3/docs/images/bar_chart.svg 0000664 0000000 0000000 00000010677 14074061410 0017455 0 ustar 00root root 0000000 0000000