pax_global_header00006660000000000000000000000064130417042710014511gustar00rootroot0000000000000052 comment=b032da3be4be09183d75a02431f5ee6bae16d494 py-statistics-3.4.0b3/000077500000000000000000000000001304170427100145625ustar00rootroot00000000000000py-statistics-3.4.0b3/.gitignore000066400000000000000000000000221304170427100165440ustar00rootroot00000000000000.idea *.sublime-* py-statistics-3.4.0b3/LICENSE000066400000000000000000000307371304170427100156010ustar00rootroot00000000000000A. HISTORY OF THE SOFTWARE ========================== Python was created in the early 1990s by Guido van Rossum at Stichting Mathematisch Centrum (CWI, see http://www.cwi.nl) in the Netherlands as a successor of a language called ABC. Guido remains Python's principal author, although it includes many contributions from others. In 1995, Guido continued his work on Python at the Corporation for National Research Initiatives (CNRI, see http://www.cnri.reston.va.us) in Reston, Virginia where he released several versions of the software. In May 2000, Guido and the Python core development team moved to BeOpen.com to form the BeOpen PythonLabs team. In October of the same year, the PythonLabs team moved to Digital Creations (now Zope Corporation, see http://www.zope.com). In 2001, the Python Software Foundation (PSF, see http://www.python.org/psf/) was formed, a non-profit organization created specifically to own Python-related Intellectual Property. Zope Corporation is a sponsoring member of the PSF. All Python releases are Open Source (see http://www.opensource.org for the Open Source Definition). Historically, most, but not all, Python releases have also been GPL-compatible; the table below summarizes the various releases. Release Derived Year Owner GPL- from compatible? (1) 0.9.0 thru 1.2 1991-1995 CWI yes 1.3 thru 1.5.2 1.2 1995-1999 CNRI yes 1.6 1.5.2 2000 CNRI no 2.0 1.6 2000 BeOpen.com no 1.6.1 1.6 2001 CNRI yes (2) 2.1 2.0+1.6.1 2001 PSF no 2.0.1 2.0+1.6.1 2001 PSF yes 2.1.1 2.1+2.0.1 2001 PSF yes 2.1.2 2.1.1 2002 PSF yes 2.1.3 2.1.2 2002 PSF yes 2.2 and above 2.1.1 2001-now PSF yes Footnotes: (1) GPL-compatible doesn't mean that we're distributing Python under the GPL. All Python licenses, unlike the GPL, let you distribute a modified version without making your changes open source. The GPL-compatible licenses make it possible to combine Python with other software that is released under the GPL; the others don't. (2) According to Richard Stallman, 1.6.1 is not GPL-compatible, because its license has a choice of law clause. According to CNRI, however, Stallman's lawyer has told CNRI's lawyer that 1.6.1 is "not incompatible" with the GPL. Thanks to the many outside volunteers who have worked under Guido's direction to make these releases possible. B. TERMS AND CONDITIONS FOR ACCESSING OR OTHERWISE USING PYTHON =============================================================== PYTHON SOFTWARE FOUNDATION LICENSE VERSION 2 -------------------------------------------- 1. This LICENSE AGREEMENT is between the Python Software Foundation ("PSF"), and the Individual or Organization ("Licensee") accessing and otherwise using this software ("Python") in source or binary form and its associated documentation. 2. Subject to the terms and conditions of this License Agreement, PSF hereby grants Licensee a nonexclusive, royalty-free, world-wide license to reproduce, analyze, test, perform and/or display publicly, prepare derivative works, distribute, and otherwise use Python alone or in any derivative version, provided, however, that PSF's License Agreement and PSF's notice of copyright, i.e., "Copyright (c) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016 Python Software Foundation; All Rights Reserved" are retained in Python alone or in any derivative version prepared by Licensee. 3. In the event Licensee prepares a derivative work that is based on or incorporates Python or any part thereof, and wants to make the derivative work available to others as provided herein, then Licensee hereby agrees to include in any such work a brief summary of the changes made to Python. 4. PSF is making Python available to Licensee on an "AS IS" basis. PSF MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, PSF MAKES NO AND DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF PYTHON WILL NOT INFRINGE ANY THIRD PARTY RIGHTS. 5. PSF SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF PYTHON FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS AS A RESULT OF MODIFYING, DISTRIBUTING, OR OTHERWISE USING PYTHON, OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF. 6. This License Agreement will automatically terminate upon a material breach of its terms and conditions. 7. Nothing in this License Agreement shall be deemed to create any relationship of agency, partnership, or joint venture between PSF and Licensee. This License Agreement does not grant permission to use PSF trademarks or trade name in a trademark sense to endorse or promote products or services of Licensee, or any third party. 8. By copying, installing or otherwise using Python, Licensee agrees to be bound by the terms and conditions of this License Agreement. BEOPEN.COM LICENSE AGREEMENT FOR PYTHON 2.0 ------------------------------------------- BEOPEN PYTHON OPEN SOURCE LICENSE AGREEMENT VERSION 1 1. This LICENSE AGREEMENT is between BeOpen.com ("BeOpen"), having an office at 160 Saratoga Avenue, Santa Clara, CA 95051, and the Individual or Organization ("Licensee") accessing and otherwise using this software in source or binary form and its associated documentation ("the Software"). 2. Subject to the terms and conditions of this BeOpen Python License Agreement, BeOpen hereby grants Licensee a non-exclusive, royalty-free, world-wide license to reproduce, analyze, test, perform and/or display publicly, prepare derivative works, distribute, and otherwise use the Software alone or in any derivative version, provided, however, that the BeOpen Python License is retained in the Software, alone or in any derivative version prepared by Licensee. 3. BeOpen is making the Software available to Licensee on an "AS IS" basis. BEOPEN MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, BEOPEN MAKES NO AND DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF THE SOFTWARE WILL NOT INFRINGE ANY THIRD PARTY RIGHTS. 4. BEOPEN SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF THE SOFTWARE FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS AS A RESULT OF USING, MODIFYING OR DISTRIBUTING THE SOFTWARE, OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF. 5. This License Agreement will automatically terminate upon a material breach of its terms and conditions. 6. This License Agreement shall be governed by and interpreted in all respects by the law of the State of California, excluding conflict of law provisions. Nothing in this License Agreement shall be deemed to create any relationship of agency, partnership, or joint venture between BeOpen and Licensee. This License Agreement does not grant permission to use BeOpen trademarks or trade names in a trademark sense to endorse or promote products or services of Licensee, or any third party. As an exception, the "BeOpen Python" logos available at http://www.pythonlabs.com/logos.html may be used according to the permissions granted on that web page. 7. By copying, installing or otherwise using the software, Licensee agrees to be bound by the terms and conditions of this License Agreement. CNRI LICENSE AGREEMENT FOR PYTHON 1.6.1 --------------------------------------- 1. This LICENSE AGREEMENT is between the Corporation for National Research Initiatives, having an office at 1895 Preston White Drive, Reston, VA 20191 ("CNRI"), and the Individual or Organization ("Licensee") accessing and otherwise using Python 1.6.1 software in source or binary form and its associated documentation. 2. Subject to the terms and conditions of this License Agreement, CNRI hereby grants Licensee a nonexclusive, royalty-free, world-wide license to reproduce, analyze, test, perform and/or display publicly, prepare derivative works, distribute, and otherwise use Python 1.6.1 alone or in any derivative version, provided, however, that CNRI's License Agreement and CNRI's notice of copyright, i.e., "Copyright (c) 1995-2001 Corporation for National Research Initiatives; All Rights Reserved" are retained in Python 1.6.1 alone or in any derivative version prepared by Licensee. Alternately, in lieu of CNRI's License Agreement, Licensee may substitute the following text (omitting the quotes): "Python 1.6.1 is made available subject to the terms and conditions in CNRI's License Agreement. This Agreement together with Python 1.6.1 may be located on the Internet using the following unique, persistent identifier (known as a handle): 1895.22/1013. This Agreement may also be obtained from a proxy server on the Internet using the following URL: http://hdl.handle.net/1895.22/1013". 3. In the event Licensee prepares a derivative work that is based on or incorporates Python 1.6.1 or any part thereof, and wants to make the derivative work available to others as provided herein, then Licensee hereby agrees to include in any such work a brief summary of the changes made to Python 1.6.1. 4. CNRI is making Python 1.6.1 available to Licensee on an "AS IS" basis. CNRI MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, CNRI MAKES NO AND DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF PYTHON 1.6.1 WILL NOT INFRINGE ANY THIRD PARTY RIGHTS. 5. CNRI SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF PYTHON 1.6.1 FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS AS A RESULT OF MODIFYING, DISTRIBUTING, OR OTHERWISE USING PYTHON 1.6.1, OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF. 6. This License Agreement will automatically terminate upon a material breach of its terms and conditions. 7. This License Agreement shall be governed by the federal intellectual property law of the United States, including without limitation the federal copyright law, and, to the extent such U.S. federal law does not apply, by the law of the Commonwealth of Virginia, excluding Virginia's conflict of law provisions. Notwithstanding the foregoing, with regard to derivative works based on Python 1.6.1 that incorporate non-separable material that was previously distributed under the GNU General Public License (GPL), the law of the Commonwealth of Virginia shall govern this License Agreement only as to issues arising under or with respect to Paragraphs 4, 5, and 7 of this License Agreement. Nothing in this License Agreement shall be deemed to create any relationship of agency, partnership, or joint venture between CNRI and Licensee. This License Agreement does not grant permission to use CNRI trademarks or trade name in a trademark sense to endorse or promote products or services of Licensee, or any third party. 8. By clicking on the "ACCEPT" button where indicated, or by copying, installing or otherwise using Python 1.6.1, Licensee agrees to be bound by the terms and conditions of this License Agreement. ACCEPT CWI LICENSE AGREEMENT FOR PYTHON 0.9.0 THROUGH 1.2 -------------------------------------------------- Copyright (c) 1991 - 1995, Stichting Mathematisch Centrum Amsterdam, The Netherlands. All rights reserved. Permission to use, copy, modify, and distribute this software and its documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appear in all copies and that both that copyright notice and this permission notice appear in supporting documentation, and that the name of Stichting Mathematisch Centrum or CWI not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission. STICHTING MATHEMATISCH CENTRUM DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL STICHTING MATHEMATISCH CENTRUM BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. py-statistics-3.4.0b3/README.rst000066400000000000000000000024751304170427100162610ustar00rootroot00000000000000============================================== statistics - Mathematical statistics functions ============================================== A port of Python 3.4 statistics module to Python 2.*, initially done through the `3to2 tool `_. This module provides functions for calculating mathematical statistics of numeric (Real-valued) data. Bugs ==== Please use the github tracker (see github link later) Version ======= Currently corresponds to Python 3.4.0 beta 3 - commit `720bb69ea9b39adc6c39d2fd9da5072a5c1e8d03 `_ Sources ======= Relevant links: * This package: https://github.com/digitalemagine/py-statistics * Official Python documentation: https://docs.python.org/3/library/statistics.html * Original PEP 450: http://www.python.org/dev/peps/pep-0450/ * Original source code: https://github.com/python/cpython/blob/master/Lib/statistics.py (`alt `_) * Original source documentation: https://github.com/python/cpython/blob/master/Doc/library/statistics.rst License ======= This is a backport, and adopts the same PSF License as the original code TODO ==== Make a version for Python 3.0 -> 3.2 (http://pythonhosted.org//setuptools/python3.html) py-statistics-3.4.0b3/docs/000077500000000000000000000000001304170427100155125ustar00rootroot00000000000000py-statistics-3.4.0b3/docs/statistics.rst000066400000000000000000000330731304170427100204440ustar00rootroot00000000000000:mod:`statistics` --- Mathematical statistics functions ======================================================= .. module:: statistics :synopsis: mathematical statistics functions .. moduleauthor:: Steven D'Aprano .. sectionauthor:: Steven D'Aprano .. portauthor:: Stefano Crosta .. versionadded:: 2.* (backport from 3.4) .. testsetup:: * from statistics import * __name__ = '' **Source code:** :source:`Lib/statistics.py` -------------- This module provides functions for calculating mathematical statistics of numeric (:class:`Real`-valued) data. .. note:: Unless explicitly noted otherwise, these functions support :class:`int`, :class:`float`, :class:`decimal.Decimal` and :class:`fractions.Fraction`. Behaviour with other types (whether in the numeric tower or not) is currently unsupported. Mixed types are also undefined and implementation-dependent. If your input data consists of mixed types, you may be able to use :func:`map` to ensure a consistent result, e.g. ``map(float, input_data)``. Averages and measures of central location ----------------------------------------- These functions calculate an average or typical value from a population or sample. ======================= ============================================= :func:`mean` Arithmetic mean ("average") of data. :func:`median` Median (middle value) of data. :func:`median_low` Low median of data. :func:`median_high` High median of data. :func:`median_grouped` Median, or 50th percentile, of grouped data. :func:`mode` Mode (most common value) of discrete data. ======================= ============================================= Measures of spread ------------------ These functions calculate a measure of how much the population or sample tends to deviate from the typical or average values. ======================= ============================================= :func:`pstdev` Population standard deviation of data. :func:`pvariance` Population variance of data. :func:`stdev` Sample standard deviation of data. :func:`variance` Sample variance of data. ======================= ============================================= Function details ---------------- Note: The functions do not require the data given to them to be sorted. However, for reading convenience, most of the examples show sorted sequences. .. function:: mean(data) Return the sample arithmetic mean of *data*, a sequence or iterator of real-valued numbers. The arithmetic mean is the sum of the data divided by the number of data points. It is commonly called "the average", although it is only one of many different mathematical averages. It is a measure of the central location of the data. If *data* is empty, :exc:`StatisticsError` will be raised. Some examples of use: .. doctest:: >>> mean([1, 2, 3, 4, 4]) 2.8 >>> mean([-1.0, 2.5, 3.25, 5.75]) 2.625 >>> from fractions import Fraction as F >>> mean([F(3, 7), F(1, 21), F(5, 3), F(1, 3)]) Fraction(13, 21) >>> from decimal import Decimal as D >>> mean([D("0.5"), D("0.75"), D("0.625"), D("0.375")]) Decimal('0.5625') .. note:: The mean is strongly affected by outliers and is not a robust estimator for central location: the mean is not necessarily a typical example of the data points. For more robust, although less efficient, measures of central location, see :func:`median` and :func:`mode`. (In this case, "efficient" refers to statistical efficiency rather than computational efficiency.) The sample mean gives an unbiased estimate of the true population mean, which means that, taken on average over all the possible samples, ``mean(sample)`` converges on the true mean of the entire population. If *data* represents the entire population rather than a sample, then ``mean(data)`` is equivalent to calculating the true population mean μ. .. function:: median(data) Return the median (middle value) of numeric data, using the common "mean of middle two" method. If *data* is empty, :exc:`StatisticsError` is raised. The median is a robust measure of central location, and is less affected by the presence of outliers in your data. When the number of data points is odd, the middle data point is returned: .. doctest:: >>> median([1, 3, 5]) 3 When the number of data points is even, the median is interpolated by taking the average of the two middle values: .. doctest:: >>> median([1, 3, 5, 7]) 4.0 This is suited for when your data is discrete, and you don't mind that the median may not be an actual data point. .. seealso:: :func:`median_low`, :func:`median_high`, :func:`median_grouped` .. function:: median_low(data) Return the low median of numeric data. If *data* is empty, :exc:`StatisticsError` is raised. The low median is always a member of the data set. When the number of data points is odd, the middle value is returned. When it is even, the smaller of the two middle values is returned. .. doctest:: >>> median_low([1, 3, 5]) 3 >>> median_low([1, 3, 5, 7]) 3 Use the low median when your data are discrete and you prefer the median to be an actual data point rather than interpolated. .. function:: median_high(data) Return the high median of data. If *data* is empty, :exc:`StatisticsError` is raised. The high median is always a member of the data set. When the number of data points is odd, the middle value is returned. When it is even, the larger of the two middle values is returned. .. doctest:: >>> median_high([1, 3, 5]) 3 >>> median_high([1, 3, 5, 7]) 5 Use the high median when your data are discrete and you prefer the median to be an actual data point rather than interpolated. .. function:: median_grouped(data, interval=1) Return the median of grouped continuous data, calculated as the 50th percentile, using interpolation. If *data* is empty, :exc:`StatisticsError` is raised. .. doctest:: >>> median_grouped([52, 52, 53, 54]) 52.5 In the following example, the data are rounded, so that each value represents the midpoint of data classes, e.g. 1 is the midpoint of the class 0.5-1.5, 2 is the midpoint of 1.5-2.5, 3 is the midpoint of 2.5-3.5, etc. With the data given, the middle value falls somewhere in the class 3.5-4.5, and interpolation is used to estimate it: .. doctest:: >>> median_grouped([1, 2, 2, 3, 4, 4, 4, 4, 4, 5]) 3.7 Optional argument *interval* represents the class interval, and defaults to 1. Changing the class interval naturally will change the interpolation: .. doctest:: >>> median_grouped([1, 3, 3, 5, 7], interval=1) 3.25 >>> median_grouped([1, 3, 3, 5, 7], interval=2) 3.5 This function does not check whether the data points are at least *interval* apart. .. impl-detail:: Under some circumstances, :func:`median_grouped` may coerce data points to floats. This behaviour is likely to change in the future. .. seealso:: * "Statistics for the Behavioral Sciences", Frederick J Gravetter and Larry B Wallnau (8th Edition). * Calculating the `median `_. * The `SSMEDIAN `_ function in the Gnome Gnumeric spreadsheet, including `this discussion `_. .. function:: mode(data) Return the most common data point from discrete or nominal *data*. The mode (when it exists) is the most typical value, and is a robust measure of central location. If *data* is empty, or if there is not exactly one most common value, :exc:`StatisticsError` is raised. ``mode`` assumes discrete data, and returns a single value. This is the standard treatment of the mode as commonly taught in schools: .. doctest:: >>> mode([1, 1, 2, 3, 3, 3, 3, 4]) 3 The mode is unique in that it is the only statistic which also applies to nominal (non-numeric) data: .. doctest:: >>> mode(["red", "blue", "blue", "red", "green", "red", "red"]) 'red' .. function:: pstdev(data, mu=None) Return the population standard deviation (the square root of the population variance). See :func:`pvariance` for arguments and other details. .. doctest:: >>> pstdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75]) 0.986893273527251 .. function:: pvariance(data, mu=None) Return the population variance of *data*, a non-empty iterable of real-valued numbers. Variance, or second moment about the mean, is a measure of the variability (spread or dispersion) of data. A large variance indicates that the data is spread out; a small variance indicates it is clustered closely around the mean. If the optional second argument *mu* is given, it should be the mean of *data*. If it is missing or ``None`` (the default), the mean is automatically calculated. Use this function to calculate the variance from the entire population. To estimate the variance from a sample, the :func:`variance` function is usually a better choice. Raises :exc:`StatisticsError` if *data* is empty. Examples: .. doctest:: >>> data = [0.0, 0.25, 0.25, 1.25, 1.5, 1.75, 2.75, 3.25] >>> pvariance(data) 1.25 If you have already calculated the mean of your data, you can pass it as the optional second argument *mu* to avoid recalculation: .. doctest:: >>> mu = mean(data) >>> pvariance(data, mu) 1.25 This function does not attempt to verify that you have passed the actual mean as *mu*. Using arbitrary values for *mu* may lead to invalid or impossible results. Decimals and Fractions are supported: .. doctest:: >>> from decimal import Decimal as D >>> pvariance([D("27.5"), D("30.25"), D("30.25"), D("34.5"), D("41.75")]) Decimal('24.815') >>> from fractions import Fraction as F >>> pvariance([F(1, 4), F(5, 4), F(1, 2)]) Fraction(13, 72) .. note:: When called with the entire population, this gives the population variance σ². When called on a sample instead, this is the biased sample variance s², also known as variance with N degrees of freedom. If you somehow know the true population mean μ, you may use this function to calculate the variance of a sample, giving the known population mean as the second argument. Provided the data points are representative (e.g. independent and identically distributed), the result will be an unbiased estimate of the population variance. .. function:: stdev(data, xbar=None) Return the sample standard deviation (the square root of the sample variance). See :func:`variance` for arguments and other details. .. doctest:: >>> stdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75]) 1.0810874155219827 .. function:: variance(data, xbar=None) Return the sample variance of *data*, an iterable of at least two real-valued numbers. Variance, or second moment about the mean, is a measure of the variability (spread or dispersion) of data. A large variance indicates that the data is spread out; a small variance indicates it is clustered closely around the mean. If the optional second argument *xbar* is given, it should be the mean of *data*. If it is missing or ``None`` (the default), the mean is automatically calculated. Use this function when your data is a sample from a population. To calculate the variance from the entire population, see :func:`pvariance`. Raises :exc:`StatisticsError` if *data* has fewer than two values. Examples: .. doctest:: >>> data = [2.75, 1.75, 1.25, 0.25, 0.5, 1.25, 3.5] >>> variance(data) 1.3720238095238095 If you have already calculated the mean of your data, you can pass it as the optional second argument *xbar* to avoid recalculation: .. doctest:: >>> m = mean(data) >>> variance(data, m) 1.3720238095238095 This function does not attempt to verify that you have passed the actual mean as *xbar*. Using arbitrary values for *xbar* can lead to invalid or impossible results. Decimal and Fraction values are supported: .. doctest:: >>> from decimal import Decimal as D >>> variance([D("27.5"), D("30.25"), D("30.25"), D("34.5"), D("41.75")]) Decimal('31.01875') >>> from fractions import Fraction as F >>> variance([F(1, 6), F(1, 2), F(5, 3)]) Fraction(67, 108) .. note:: This is the sample variance s² with Bessel's correction, also known as variance with N-1 degrees of freedom. Provided that the data points are representative (e.g. independent and identically distributed), the result should be an unbiased estimate of the true population variance. If you somehow know the actual population mean μ you should pass it to the :func:`pvariance` function as the *mu* parameter to get the variance of a sample. Exceptions ---------- A single exception is defined: .. exception:: StatisticsError Subclass of :exc:`ValueError` for statistics-related exceptions. .. # This modelines must appear within the last ten lines of the file. kate: indent-width 3; remove-trailing-space on; replace-tabs on; encoding utf-8;py-statistics-3.4.0b3/setup.py000066400000000000000000000073631304170427100163050ustar00rootroot00000000000000# -*- coding: UTF-8 -*- __author__ = 'Stefano Crosta' from setuptools import setup, find_packages # Always prefer setuptools over distutils from codecs import open # To use a consistent encoding from os import path here = path.abspath(path.dirname(__file__)) # Get the long description from the relevant file with open(path.join(here, 'README.rst'), encoding='utf-8') as f: long_description = f.read() # DESCRIPTION CONTAINS DOCS setup( name='statistics', # Versions should comply with PEP440. For a discussion on single-sourcing # the version across setup.py and the project code, see # http://packaging.python.org/en/latest/tutorial.html#version version='3.4.0b3', description='A Python 2.* port of 3.4 Statistics Module', long_description=long_description, # The project's main homepage. url='https://github.com/digitalemagine/py-statistics', # Maintainer details (this is a backport) maintainer='Stefano Crosta', maintainer_email='stefano@digitalemagine.com', # Choose your license license='PSF License', # See https://pypi.python.org/pypi?%3Aaction=list_classifiers classifiers=[ # How mature is this project? Common values are # 3 - Alpha # 4 - Beta # 5 - Production/Stable 'Development Status :: 5 - Production/Stable', # Indicate who your project is intended for 'Intended Audience :: Developers', # 'Topic :: Software Development :: Build Tools', # Pick your license as you wish (should match "license" above) 'License :: OSI Approved :: Python Software Foundation License', # Specify the Python versions you support here. In particular, ensure # that you indicate whether you support Python 2, Python 3 or both. 'Programming Language :: Python :: 2', 'Programming Language :: Python :: 2.6', 'Programming Language :: Python :: 2.7', ], # What does your project relate to? keywords='statistics', # You can just specify the packages manually here if your project is simple. # Or you can use find_packages(). packages=['statistics'], # List run-time dependencies here. These will be installed by pip when your # project is installed. For an analysis of "install_requires" vs pip's # requirements files see: # https://packaging.python.org/en/latest/technical.html#install-requires-vs-requirements-files install_requires = ['docutils>=0.3'], # Tests # # Tests must be wrapped in a unittest test suite by either a # function, a TestCase class or method, or a module or package # containing TestCase classes. If the named suite is a package, # any submodules and subpackages are recursively added to the # overall test suite. test_suite = 'statistics.tests.suite', # If there are data files included in your packages that need to be # installed, specify them here. If using Python 2.6 or less, then these # have to be included in MANIFEST.in as well. # package_data={ '': ['*.rst'] }, # Although 'package_data' is the preferred approach, in some case you may # need to place data files outside of your packages. # see http://docs.python.org/3.4/distutils/setupscript.html#installing-additional-files # In this case, 'data_file' will be installed into '/my_data' ## data_files=[('my_data', ['data/data_file'])], # To provide executable scripts, use entry points in preference to the # "scripts" keyword. Entry points provide cross-platform support and allow # pip to create the appropriate form of executable for the target platform. # # entry_points={ # 'console_scripts': [ # 'command=module:main', # ], # }, ) py-statistics-3.4.0b3/source/000077500000000000000000000000001304170427100160625ustar00rootroot00000000000000py-statistics-3.4.0b3/source/statistics.py000066400000000000000000000427021304170427100206330ustar00rootroot00000000000000## Module statistics.py ## ## Copyright (c) 2013 Steven D'Aprano . ## ## Licensed under the Apache License, Version 2.0 (the "License"); ## you may not use this file except in compliance with the License. ## You may obtain a copy of the License at ## ## http://www.apache.org/licenses/LICENSE-2.0 ## ## Unless required by applicable law or agreed to in writing, software ## distributed under the License is distributed on an "AS IS" BASIS, ## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ## See the License for the specific language governing permissions and ## limitations under the License. """ Basic statistics module. This module provides functions for calculating statistics of data, including averages, variance, and standard deviation. Calculating averages -------------------- ================== ============================================= Function Description ================== ============================================= mean Arithmetic mean (average) of data. median Median (middle value) of data. median_low Low median of data. median_high High median of data. median_grouped Median, or 50th percentile, of grouped data. mode Mode (most common value) of data. ================== ============================================= Calculate the arithmetic mean ("the average") of data: >>> mean([-1.0, 2.5, 3.25, 5.75]) 2.625 Calculate the standard median of discrete data: >>> median([2, 3, 4, 5]) 3.5 Calculate the median, or 50th percentile, of data grouped into class intervals centred on the data values provided. E.g. if your data points are rounded to the nearest whole number: >>> median_grouped([2, 2, 3, 3, 3, 4]) #doctest: +ELLIPSIS 2.8333333333... This should be interpreted in this way: you have two data points in the class interval 1.5-2.5, three data points in the class interval 2.5-3.5, and one in the class interval 3.5-4.5. The median of these data points is 2.8333... Calculating variability or spread --------------------------------- ================== ============================================= Function Description ================== ============================================= pvariance Population variance of data. variance Sample variance of data. pstdev Population standard deviation of data. stdev Sample standard deviation of data. ================== ============================================= Calculate the standard deviation of sample data: >>> stdev([2.5, 3.25, 5.5, 11.25, 11.75]) #doctest: +ELLIPSIS 4.38961843444... If you have previously calculated the mean, you can pass it as the optional second argument to the four "spread" functions to avoid recalculating it: >>> data = [1, 2, 2, 4, 4, 4, 5, 6] >>> mu = mean(data) >>> pvariance(data, mu) 2.5 Exceptions ---------- A single exception is defined: StatisticsError is a subclass of ValueError. """ __all__ = [ 'StatisticsError', 'pstdev', 'pvariance', 'stdev', 'variance', 'median', 'median_low', 'median_high', 'median_grouped', 'mean', 'mode', ] import collections import math from fractions import Fraction from decimal import Decimal # === Exceptions === class StatisticsError(ValueError): pass # === Private utilities === def _sum(data, start=0): """_sum(data [, start]) -> value Return a high-precision sum of the given numeric data. If optional argument ``start`` is given, it is added to the total. If ``data`` is empty, ``start`` (defaulting to 0) is returned. Examples -------- >>> _sum([3, 2.25, 4.5, -0.5, 1.0], 0.75) 11.0 Some sources of round-off error will be avoided: >>> _sum([1e50, 1, -1e50] * 1000) # Built-in sum returns zero. 1000.0 Fractions and Decimals are also supported: >>> from fractions import Fraction as F >>> _sum([F(2, 3), F(7, 5), F(1, 4), F(5, 6)]) Fraction(63, 20) >>> from decimal import Decimal as D >>> data = [D("0.1375"), D("0.2108"), D("0.3061"), D("0.0419")] >>> _sum(data) Decimal('0.6963') Mixed types are currently treated as an error, except that int is allowed. """ # We fail as soon as we reach a value that is not an int or the type of # the first value which is not an int. E.g. _sum([int, int, float, int]) # is okay, but sum([int, int, float, Fraction]) is not. allowed_types = set([int, type(start)]) n, d = _exact_ratio(start) partials = {d: n} # map {denominator: sum of numerators} # Micro-optimizations. exact_ratio = _exact_ratio partials_get = partials.get # Add numerators for each denominator. for x in data: _check_type(type(x), allowed_types) n, d = exact_ratio(x) partials[d] = partials_get(d, 0) + n # Find the expected result type. If allowed_types has only one item, it # will be int; if it has two, use the one which isn't int. assert len(allowed_types) in (1, 2) if len(allowed_types) == 1: assert allowed_types.pop() is int T = int else: T = (allowed_types - set([int])).pop() if None in partials: assert issubclass(T, (float, Decimal)) assert not math.isfinite(partials[None]) return T(partials[None]) total = Fraction() for d, n in sorted(partials.items()): total += Fraction(n, d) if issubclass(T, int): assert total.denominator == 1 return T(total.numerator) if issubclass(T, Decimal): return T(total.numerator)/total.denominator return T(total) def _check_type(T, allowed): if T not in allowed: if len(allowed) == 1: allowed.add(T) else: types = ', '.join([t.__name__ for t in allowed] + [T.__name__]) raise TypeError("unsupported mixed types: %s" % types) def _exact_ratio(x): """Convert Real number x exactly to (numerator, denominator) pair. >>> _exact_ratio(0.25) (1, 4) x is expected to be an int, Fraction, Decimal or float. """ try: try: # int, Fraction return (x.numerator, x.denominator) except AttributeError: # float try: return x.as_integer_ratio() except AttributeError: # Decimal try: return _decimal_to_ratio(x) except AttributeError: msg = "can't convert type '{}' to numerator/denominator" raise TypeError(msg.format(type(x).__name__)) from None except (OverflowError, ValueError): # INF or NAN if __debug__: # Decimal signalling NANs cannot be converted to float :-( if isinstance(x, Decimal): assert not x.is_finite() else: assert not math.isfinite(x) return (x, None) # FIXME This is faster than Fraction.from_decimal, but still too slow. def _decimal_to_ratio(d): """Convert Decimal d to exact integer ratio (numerator, denominator). >>> from decimal import Decimal >>> _decimal_to_ratio(Decimal("2.6")) (26, 10) """ sign, digits, exp = d.as_tuple() if exp in ('F', 'n', 'N'): # INF, NAN, sNAN assert not d.is_finite() raise ValueError num = 0 for digit in digits: num = num*10 + digit if exp < 0: den = 10**-exp else: num *= 10**exp den = 1 if sign: num = -num return (num, den) def _counts(data): # Generate a table of sorted (value, frequency) pairs. table = collections.Counter(iter(data)).most_common() if not table: return table # Extract the values with the highest frequency. maxfreq = table[0][1] for i in range(1, len(table)): if table[i][1] != maxfreq: table = table[:i] break return table # === Measures of central tendency (averages) === def mean(data): """Return the sample arithmetic mean of data. >>> mean([1, 2, 3, 4, 4]) 2.8 >>> from fractions import Fraction as F >>> mean([F(3, 7), F(1, 21), F(5, 3), F(1, 3)]) Fraction(13, 21) >>> from decimal import Decimal as D >>> mean([D("0.5"), D("0.75"), D("0.625"), D("0.375")]) Decimal('0.5625') If ``data`` is empty, StatisticsError will be raised. """ if iter(data) is data: data = list(data) n = len(data) if n < 1: raise StatisticsError('mean requires at least one data point') return _sum(data)/n # FIXME: investigate ways to calculate medians without sorting? Quickselect? def median(data): """Return the median (middle value) of numeric data. When the number of data points is odd, return the middle data point. When the number of data points is even, the median is interpolated by taking the average of the two middle values: >>> median([1, 3, 5]) 3 >>> median([1, 3, 5, 7]) 4.0 """ data = sorted(data) n = len(data) if n == 0: raise StatisticsError("no median for empty data") if n%2 == 1: return data[n//2] else: i = n//2 return (data[i - 1] + data[i])/2 def median_low(data): """Return the low median of numeric data. When the number of data points is odd, the middle value is returned. When it is even, the smaller of the two middle values is returned. >>> median_low([1, 3, 5]) 3 >>> median_low([1, 3, 5, 7]) 3 """ data = sorted(data) n = len(data) if n == 0: raise StatisticsError("no median for empty data") if n%2 == 1: return data[n//2] else: return data[n//2 - 1] def median_high(data): """Return the high median of data. When the number of data points is odd, the middle value is returned. When it is even, the larger of the two middle values is returned. >>> median_high([1, 3, 5]) 3 >>> median_high([1, 3, 5, 7]) 5 """ data = sorted(data) n = len(data) if n == 0: raise StatisticsError("no median for empty data") return data[n//2] def median_grouped(data, interval=1): """"Return the 50th percentile (median) of grouped continuous data. >>> median_grouped([1, 2, 2, 3, 4, 4, 4, 4, 4, 5]) 3.7 >>> median_grouped([52, 52, 53, 54]) 52.5 This calculates the median as the 50th percentile, and should be used when your data is continuous and grouped. In the above example, the values 1, 2, 3, etc. actually represent the midpoint of classes 0.5-1.5, 1.5-2.5, 2.5-3.5, etc. The middle value falls somewhere in class 3.5-4.5, and interpolation is used to estimate it. Optional argument ``interval`` represents the class interval, and defaults to 1. Changing the class interval naturally will change the interpolated 50th percentile value: >>> median_grouped([1, 3, 3, 5, 7], interval=1) 3.25 >>> median_grouped([1, 3, 3, 5, 7], interval=2) 3.5 This function does not check whether the data points are at least ``interval`` apart. """ data = sorted(data) n = len(data) if n == 0: raise StatisticsError("no median for empty data") elif n == 1: return data[0] # Find the value at the midpoint. Remember this corresponds to the # centre of the class interval. x = data[n//2] for obj in (x, interval): if isinstance(obj, (str, bytes)): raise TypeError('expected number but got %r' % obj) try: L = x - interval/2 # The lower limit of the median interval. except TypeError: # Mixed type. For now we just coerce to float. L = float(x) - float(interval)/2 cf = data.index(x) # Number of values below the median interval. # FIXME The following line could be more efficient for big lists. f = data.count(x) # Number of data points in the median interval. return L + interval*(n/2 - cf)/f def mode(data): """Return the most common data point from discrete or nominal data. ``mode`` assumes discrete data, and returns a single value. This is the standard treatment of the mode as commonly taught in schools: >>> mode([1, 1, 2, 3, 3, 3, 3, 4]) 3 This also works with nominal (non-numeric) data: >>> mode(["red", "blue", "blue", "red", "green", "red", "red"]) 'red' If there is not exactly one most common value, ``mode`` will raise StatisticsError. """ # Generate a table of sorted (value, frequency) pairs. table = _counts(data) if len(table) == 1: return table[0][0] elif table: raise StatisticsError( 'no unique mode; found %d equally common values' % len(table) ) else: raise StatisticsError('no mode for empty data') # === Measures of spread === # See http://mathworld.wolfram.com/Variance.html # http://mathworld.wolfram.com/SampleVariance.html # http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance # # Under no circumstances use the so-called "computational formula for # variance", as that is only suitable for hand calculations with a small # amount of low-precision data. It has terrible numeric properties. # # See a comparison of three computational methods here: # http://www.johndcook.com/blog/2008/09/26/comparing-three-methods-of-computing-standard-deviation/ def _ss(data, c=None): """Return sum of square deviations of sequence data. If ``c`` is None, the mean is calculated in one pass, and the deviations from the mean are calculated in a second pass. Otherwise, deviations are calculated from ``c`` as given. Use the second case with care, as it can lead to garbage results. """ if c is None: c = mean(data) ss = _sum((x-c)**2 for x in data) # The following sum should mathematically equal zero, but due to rounding # error may not. ss -= _sum((x-c) for x in data)**2/len(data) assert not ss < 0, 'negative sum of square deviations: %f' % ss return ss def variance(data, xbar=None): """Return the sample variance of data. data should be an iterable of Real-valued numbers, with at least two values. The optional argument xbar, if given, should be the mean of the data. If it is missing or None, the mean is automatically calculated. Use this function when your data is a sample from a population. To calculate the variance from the entire population, see ``pvariance``. Examples: >>> data = [2.75, 1.75, 1.25, 0.25, 0.5, 1.25, 3.5] >>> variance(data) 1.3720238095238095 If you have already calculated the mean of your data, you can pass it as the optional second argument ``xbar`` to avoid recalculating it: >>> m = mean(data) >>> variance(data, m) 1.3720238095238095 This function does not check that ``xbar`` is actually the mean of ``data``. Giving arbitrary values for ``xbar`` may lead to invalid or impossible results. Decimals and Fractions are supported: >>> from decimal import Decimal as D >>> variance([D("27.5"), D("30.25"), D("30.25"), D("34.5"), D("41.75")]) Decimal('31.01875') >>> from fractions import Fraction as F >>> variance([F(1, 6), F(1, 2), F(5, 3)]) Fraction(67, 108) """ if iter(data) is data: data = list(data) n = len(data) if n < 2: raise StatisticsError('variance requires at least two data points') ss = _ss(data, xbar) return ss/(n-1) def pvariance(data, mu=None): """Return the population variance of ``data``. data should be an iterable of Real-valued numbers, with at least one value. The optional argument mu, if given, should be the mean of the data. If it is missing or None, the mean is automatically calculated. Use this function to calculate the variance from the entire population. To estimate the variance from a sample, the ``variance`` function is usually a better choice. Examples: >>> data = [0.0, 0.25, 0.25, 1.25, 1.5, 1.75, 2.75, 3.25] >>> pvariance(data) 1.25 If you have already calculated the mean of the data, you can pass it as the optional second argument to avoid recalculating it: >>> mu = mean(data) >>> pvariance(data, mu) 1.25 This function does not check that ``mu`` is actually the mean of ``data``. Giving arbitrary values for ``mu`` may lead to invalid or impossible results. Decimals and Fractions are supported: >>> from decimal import Decimal as D >>> pvariance([D("27.5"), D("30.25"), D("30.25"), D("34.5"), D("41.75")]) Decimal('24.815') >>> from fractions import Fraction as F >>> pvariance([F(1, 4), F(5, 4), F(1, 2)]) Fraction(13, 72) """ if iter(data) is data: data = list(data) n = len(data) if n < 1: raise StatisticsError('pvariance requires at least one data point') ss = _ss(data, mu) return ss/n def stdev(data, xbar=None): """Return the square root of the sample variance. See ``variance`` for arguments and other details. >>> stdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75]) 1.0810874155219827 """ var = variance(data, xbar) try: return var.sqrt() except AttributeError: return math.sqrt(var) def pstdev(data, mu=None): """Return the square root of the population variance. See ``pvariance`` for arguments and other details. >>> pstdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75]) 0.986893273527251 """ var = pvariance(data, mu) try: return var.sqrt() except AttributeError: return math.sqrt(var)py-statistics-3.4.0b3/statistics/000077500000000000000000000000001304170427100167545ustar00rootroot00000000000000py-statistics-3.4.0b3/statistics/__init__.py000066400000000000000000000430361304170427100210730ustar00rootroot00000000000000# -*- coding: UTF-8 -*- ## Module statistics.py ## ## Copyright (c) 2013 Steven D'Aprano . ## ## Licensed under the Apache License, Version 2.0 (the "License"); ## you may not use this file except in compliance with the License. ## You may obtain a copy of the License at ## ## http://www.apache.org/licenses/LICENSE-2.0 ## ## Unless required by applicable law or agreed to in writing, software ## distributed under the License is distributed on an "AS IS" BASIS, ## WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. ## See the License for the specific language governing permissions and ## limitations under the License. u""" Basic statistics module. This module provides functions for calculating statistics of data, including averages, variance, and standard deviation. Calculating averages -------------------- ================== ============================================= Function Description ================== ============================================= mean Arithmetic mean (average) of data. median Median (middle value) of data. median_low Low median of data. median_high High median of data. median_grouped Median, or 50th percentile, of grouped data. mode Mode (most common value) of data. ================== ============================================= Calculate the arithmetic mean ("the average") of data: >>> mean([-1.0, 2.5, 3.25, 5.75]) 2.625 Calculate the standard median of discrete data: >>> median([2, 3, 4, 5]) 3.5 Calculate the median, or 50th percentile, of data grouped into class intervals centred on the data values provided. E.g. if your data points are rounded to the nearest whole number: >>> median_grouped([2, 2, 3, 3, 3, 4]) #doctest: +ELLIPSIS 2.8333333333... This should be interpreted in this way: you have two data points in the class interval 1.5-2.5, three data points in the class interval 2.5-3.5, and one in the class interval 3.5-4.5. The median of these data points is 2.8333... Calculating variability or spread --------------------------------- ================== ============================================= Function Description ================== ============================================= pvariance Population variance of data. variance Sample variance of data. pstdev Population standard deviation of data. stdev Sample standard deviation of data. ================== ============================================= Calculate the standard deviation of sample data: >>> stdev([2.5, 3.25, 5.5, 11.25, 11.75]) #doctest: +ELLIPSIS 4.38961843444... If you have previously calculated the mean, you can pass it as the optional second argument to the four "spread" functions to avoid recalculating it: >>> data = [1, 2, 2, 4, 4, 4, 5, 6] >>> mu = mean(data) >>> pvariance(data, mu) 2.5 Exceptions ---------- A single exception is defined: StatisticsError is a subclass of ValueError. """ from __future__ import division __all__ = [ u'StatisticsError', u'pstdev', u'pvariance', u'stdev', u'variance', u'median', u'median_low', u'median_high', u'median_grouped', u'mean', u'mode', ] import collections import math from fractions import Fraction from decimal import Decimal # === Exceptions === class StatisticsError(ValueError): pass # === Private utilities === def _sum(data, start=0): u"""_sum(data [, start]) -> value Return a high-precision sum of the given numeric data. If optional argument ``start`` is given, it is added to the total. If ``data`` is empty, ``start`` (defaulting to 0) is returned. Examples -------- >>> _sum([3, 2.25, 4.5, -0.5, 1.0], 0.75) 11.0 Some sources of round-off error will be avoided: >>> _sum([1e50, 1, -1e50] * 1000) # Built-in sum returns zero. 1000.0 Fractions and Decimals are also supported: >>> from fractions import Fraction as F >>> _sum([F(2, 3), F(7, 5), F(1, 4), F(5, 6)]) Fraction(63, 20) >>> from decimal import Decimal as D >>> data = [D("0.1375"), D("0.2108"), D("0.3061"), D("0.0419")] >>> _sum(data) Decimal('0.6963') Mixed types are currently treated as an error, except that int is allowed. """ # We fail as soon as we reach a value that is not an int or the type of # the first value which is not an int. E.g. _sum([int, int, float, int]) # is okay, but sum([int, int, float, Fraction]) is not. allowed_types = set([int, type(start)]) n, d = _exact_ratio(start) partials = {d: n} # map {denominator: sum of numerators} # Micro-optimizations. exact_ratio = _exact_ratio partials_get = partials.get # Add numerators for each denominator. for x in data: _check_type(type(x), allowed_types) n, d = exact_ratio(x) partials[d] = partials_get(d, 0) + n # Find the expected result type. If allowed_types has only one item, it # will be int; if it has two, use the one which isn't int. assert len(allowed_types) in (1, 2) if len(allowed_types) == 1: assert allowed_types.pop() is int T = int else: T = (allowed_types - set([int])).pop() if None in partials: assert issubclass(T, (float, Decimal)) assert not math.isfinite(partials[None]) return T(partials[None]) total = Fraction() for d, n in sorted(partials.items()): total += Fraction(n, d) if issubclass(T, int): assert total.denominator == 1 return T(total.numerator) if issubclass(T, Decimal): return T(total.numerator)/total.denominator return T(total) def _check_type(T, allowed): if T not in allowed: if len(allowed) == 1: allowed.add(T) else: types = u', '.join([t.__name__ for t in allowed] + [T.__name__]) raise TypeError(u"unsupported mixed types: %s" % types) def _exact_ratio(x): u"""Convert Real number x exactly to (numerator, denominator) pair. >>> _exact_ratio(0.25) (1, 4) x is expected to be an int, Fraction, Decimal or float. """ try: try: # int, Fraction return (x.numerator, x.denominator) except AttributeError: # float try: return x.as_integer_ratio() except AttributeError: # Decimal try: return _decimal_to_ratio(x) except AttributeError: msg = u"can't convert type '{}' to numerator/denominator" raise TypeError(msg.format(type(x).__name__)) except (OverflowError, ValueError): # INF or NAN if __debug__: # Decimal signalling NANs cannot be converted to float :-( if isinstance(x, Decimal): assert not x.is_finite() else: assert not math.isfinite(x) return (x, None) # FIXME This is faster than Fraction.from_decimal, but still too slow. def _decimal_to_ratio(d): u"""Convert Decimal d to exact integer ratio (numerator, denominator). >>> from decimal import Decimal >>> _decimal_to_ratio(Decimal("2.6")) (26, 10) """ sign, digits, exp = d.as_tuple() if exp in (u'F', u'n', u'N'): # INF, NAN, sNAN assert not d.is_finite() raise ValueError num = 0 for digit in digits: num = num*10 + digit if exp < 0: den = 10**-exp else: num *= 10**exp den = 1 if sign: num = -num return (num, den) def _counts(data): # Generate a table of sorted (value, frequency) pairs. table = collections.Counter(iter(data)).most_common() if not table: return table # Extract the values with the highest frequency. maxfreq = table[0][1] for i in xrange(1, len(table)): if table[i][1] != maxfreq: table = table[:i] break return table # === Measures of central tendency (averages) === def mean(data): u"""Return the sample arithmetic mean of data. >>> mean([1, 2, 3, 4, 4]) 2.8 >>> from fractions import Fraction as F >>> mean([F(3, 7), F(1, 21), F(5, 3), F(1, 3)]) Fraction(13, 21) >>> from decimal import Decimal as D >>> mean([D("0.5"), D("0.75"), D("0.625"), D("0.375")]) Decimal('0.5625') If ``data`` is empty, StatisticsError will be raised. """ if iter(data) is data: data = list(data) n = len(data) if n < 1: raise StatisticsError(u'mean requires at least one data point') return _sum(data)/n # FIXME: investigate ways to calculate medians without sorting? Quickselect? def median(data): u"""Return the median (middle value) of numeric data. When the number of data points is odd, return the middle data point. When the number of data points is even, the median is interpolated by taking the average of the two middle values: >>> median([1, 3, 5]) 3 >>> median([1, 3, 5, 7]) 4.0 """ data = sorted(data) n = len(data) if n == 0: raise StatisticsError(u"no median for empty data") if n%2 == 1: return data[n//2] else: i = n//2 return (data[i - 1] + data[i])/2 def median_low(data): u"""Return the low median of numeric data. When the number of data points is odd, the middle value is returned. When it is even, the smaller of the two middle values is returned. >>> median_low([1, 3, 5]) 3 >>> median_low([1, 3, 5, 7]) 3 """ data = sorted(data) n = len(data) if n == 0: raise StatisticsError(u"no median for empty data") if n%2 == 1: return data[n//2] else: return data[n//2 - 1] def median_high(data): u"""Return the high median of data. When the number of data points is odd, the middle value is returned. When it is even, the larger of the two middle values is returned. >>> median_high([1, 3, 5]) 3 >>> median_high([1, 3, 5, 7]) 5 """ data = sorted(data) n = len(data) if n == 0: raise StatisticsError(u"no median for empty data") return data[n//2] def median_grouped(data, interval=1): u""""Return the 50th percentile (median) of grouped continuous data. >>> median_grouped([1, 2, 2, 3, 4, 4, 4, 4, 4, 5]) 3.7 >>> median_grouped([52, 52, 53, 54]) 52.5 This calculates the median as the 50th percentile, and should be used when your data is continuous and grouped. In the above example, the values 1, 2, 3, etc. actually represent the midpoint of classes 0.5-1.5, 1.5-2.5, 2.5-3.5, etc. The middle value falls somewhere in class 3.5-4.5, and interpolation is used to estimate it. Optional argument ``interval`` represents the class interval, and defaults to 1. Changing the class interval naturally will change the interpolated 50th percentile value: >>> median_grouped([1, 3, 3, 5, 7], interval=1) 3.25 >>> median_grouped([1, 3, 3, 5, 7], interval=2) 3.5 This function does not check whether the data points are at least ``interval`` apart. """ data = sorted(data) n = len(data) if n == 0: raise StatisticsError(u"no median for empty data") elif n == 1: return data[0] # Find the value at the midpoint. Remember this corresponds to the # centre of the class interval. x = data[n//2] for obj in (x, interval): if isinstance(obj, (unicode, str)): raise TypeError(u'expected number but got %r' % obj) try: L = x - interval/2 # The lower limit of the median interval. except TypeError: # Mixed type. For now we just coerce to float. L = float(x) - float(interval)/2 cf = data.index(x) # Number of values below the median interval. # FIXME The following line could be more efficient for big lists. f = data.count(x) # Number of data points in the median interval. return L + interval*(n/2 - cf)/f def mode(data): u"""Return the most common data point from discrete or nominal data. ``mode`` assumes discrete data, and returns a single value. This is the standard treatment of the mode as commonly taught in schools: >>> mode([1, 1, 2, 3, 3, 3, 3, 4]) 3 This also works with nominal (non-numeric) data: >>> mode(["red", "blue", "blue", "red", "green", "red", "red"]) 'red' If there is not exactly one most common value, ``mode`` will raise StatisticsError. """ # Generate a table of sorted (value, frequency) pairs. table = _counts(data) if len(table) == 1: return table[0][0] elif table: raise StatisticsError( u'no unique mode; found %d equally common values' % len(table) ) else: raise StatisticsError(u'no mode for empty data') # === Measures of spread === # See http://mathworld.wolfram.com/Variance.html # http://mathworld.wolfram.com/SampleVariance.html # http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance # # Under no circumstances use the so-called "computational formula for # variance", as that is only suitable for hand calculations with a small # amount of low-precision data. It has terrible numeric properties. # # See a comparison of three computational methods here: # http://www.johndcook.com/blog/2008/09/26/comparing-three-methods-of-computing-standard-deviation/ def _ss(data, c=None): u"""Return sum of square deviations of sequence data. If ``c`` is None, the mean is calculated in one pass, and the deviations from the mean are calculated in a second pass. Otherwise, deviations are calculated from ``c`` as given. Use the second case with care, as it can lead to garbage results. """ if c is None: c = mean(data) ss = _sum((x-c)**2 for x in data) # The following sum should mathematically equal zero, but due to rounding # error may not. ss -= _sum((x-c) for x in data)**2/len(data) assert not ss < 0, u'negative sum of square deviations: %f' % ss return ss def variance(data, xbar=None): u"""Return the sample variance of data. data should be an iterable of Real-valued numbers, with at least two values. The optional argument xbar, if given, should be the mean of the data. If it is missing or None, the mean is automatically calculated. Use this function when your data is a sample from a population. To calculate the variance from the entire population, see ``pvariance``. Examples: >>> data = [2.75, 1.75, 1.25, 0.25, 0.5, 1.25, 3.5] >>> variance(data) 1.3720238095238095 If you have already calculated the mean of your data, you can pass it as the optional second argument ``xbar`` to avoid recalculating it: >>> m = mean(data) >>> variance(data, m) 1.3720238095238095 This function does not check that ``xbar`` is actually the mean of ``data``. Giving arbitrary values for ``xbar`` may lead to invalid or impossible results. Decimals and Fractions are supported: >>> from decimal import Decimal as D >>> variance([D("27.5"), D("30.25"), D("30.25"), D("34.5"), D("41.75")]) Decimal('31.01875') >>> from fractions import Fraction as F >>> variance([F(1, 6), F(1, 2), F(5, 3)]) Fraction(67, 108) """ if iter(data) is data: data = list(data) n = len(data) if n < 2: raise StatisticsError(u'variance requires at least two data points') ss = _ss(data, xbar) return ss/(n-1) def pvariance(data, mu=None): u"""Return the population variance of ``data``. data should be an iterable of Real-valued numbers, with at least one value. The optional argument mu, if given, should be the mean of the data. If it is missing or None, the mean is automatically calculated. Use this function to calculate the variance from the entire population. To estimate the variance from a sample, the ``variance`` function is usually a better choice. Examples: >>> data = [0.0, 0.25, 0.25, 1.25, 1.5, 1.75, 2.75, 3.25] >>> pvariance(data) 1.25 If you have already calculated the mean of the data, you can pass it as the optional second argument to avoid recalculating it: >>> mu = mean(data) >>> pvariance(data, mu) 1.25 This function does not check that ``mu`` is actually the mean of ``data``. Giving arbitrary values for ``mu`` may lead to invalid or impossible results. Decimals and Fractions are supported: >>> from decimal import Decimal as D >>> pvariance([D("27.5"), D("30.25"), D("30.25"), D("34.5"), D("41.75")]) Decimal('24.815') >>> from fractions import Fraction as F >>> pvariance([F(1, 4), F(5, 4), F(1, 2)]) Fraction(13, 72) """ if iter(data) is data: data = list(data) n = len(data) if n < 1: raise StatisticsError(u'pvariance requires at least one data point') ss = _ss(data, mu) return ss/n def stdev(data, xbar=None): u"""Return the square root of the sample variance. See ``variance`` for arguments and other details. >>> stdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75]) 1.0810874155219827 """ var = variance(data, xbar) try: return var.sqrt() except AttributeError: return math.sqrt(var) def pstdev(data, mu=None): u"""Return the square root of the population variance. See ``pvariance`` for arguments and other details. >>> pstdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75]) 0.986893273527251 """ var = pvariance(data, mu) try: return var.sqrt() except AttributeError: return math.sqrt(var)py-statistics-3.4.0b3/statistics/tests/000077500000000000000000000000001304170427100201165ustar00rootroot00000000000000py-statistics-3.4.0b3/statistics/tests/__init__.py000066400000000000000000000004241304170427100222270ustar00rootroot00000000000000# -*- coding: UTF-8 -*- import statistics import unittest def suite(): import doctest suite = unittest.TestSuite() suite.addTests(doctest.DocTestSuite(statistics)) return suite if __name__ == '__main__': unittest.TextTestRunner(verbosity=2).run(suite())