pax_global_header 0000666 0000000 0000000 00000000064 14741064252 0014517 g ustar 00root root 0000000 0000000 52 comment=7c18cd7c07276fa7df369e031745f47b683decc3
pycm-4.2/ 0000775 0000000 0000000 00000000000 14741064252 0012334 5 ustar 00root root 0000000 0000000 pycm-4.2/.coveragerc 0000664 0000000 0000000 00000000322 14741064252 0014452 0 ustar 00root root 0000000 0000000 [run]
branch = True
omit =
*/pycm/__main__.py
*/pycm/__init__.py
*/pycm/profile.py
*/pycm/basic_test.py
[report]
# Regexes for lines to exclude from consideration
exclude_lines =
pragma: no cover
pycm-4.2/.gitattributes 0000664 0000000 0000000 00000000207 14741064252 0015226 0 ustar 00root root 0000000 0000000 *.html linguist-detectable=false
*.ipynb linguist-detectable=false
Document/* linguist-vendored
Otherfiles/test.html linguist-vendored
pycm-4.2/.github/ 0000775 0000000 0000000 00000000000 14741064252 0013674 5 ustar 00root root 0000000 0000000 pycm-4.2/.github/CODE_OF_CONDUCT.md 0000664 0000000 0000000 00000012527 14741064252 0016502 0 ustar 00root root 0000000 0000000 # Contributor Covenant Code of Conduct
## Our Pledge
We as members, contributors, and leaders pledge to make participation in our
community a harassment-free experience for everyone, regardless of age, body
size, visible or invisible disability, ethnicity, sex characteristics, gender
identity and expression, level of experience, education, socio-economic status,
nationality, personal appearance, race, caste, color, religion, or sexual
identity and orientation.
We pledge to act and interact in ways that contribute to an open, welcoming,
diverse, inclusive, and healthy community.
## Our Standards
Examples of behavior that contributes to a positive environment for our
community include:
* Demonstrating empathy and kindness toward other people
* Being respectful of differing opinions, viewpoints, and experiences
* Giving and gracefully accepting constructive feedback
* Accepting responsibility and apologizing to those affected by our mistakes,
and learning from the experience
* Focusing on what is best not just for us as individuals, but for the overall
community
Examples of unacceptable behavior include:
* The use of sexualized language or imagery, and sexual attention or advances of
any kind
* Trolling, insulting or derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or email address,
without their explicit permission
* Other conduct which could reasonably be considered inappropriate in a professional setting
## Enforcement Responsibilities
Community leaders are responsible for clarifying and enforcing our standards of
acceptable behavior and will take appropriate and fair corrective action in
response to any behavior that they deem inappropriate, threatening, offensive,
or harmful.
Community leaders have the right and responsibility to remove, edit, or reject
comments, commits, code, wiki edits, issues, and other contributions that are
not aligned to this Code of Conduct, and will communicate reasons for moderation
decisions when appropriate.
## Scope
This Code of Conduct applies both within project spaces and in public spaces
when an individual is representing the project or its community.
Examples of representing our community include using an official e-mail address,
posting via an official social media account, or acting as an appointed
representative at an online or offline event.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported to the community leaders responsible for enforcement at
info@pycm.io.
All complaints will be reviewed and investigated promptly and fairly.
All community leaders are obligated to respect the privacy and security of the
reporter of any incident.
## Enforcement Guidelines
Community leaders will follow these Community Impact Guidelines in determining
the consequences for any action they deem in violation of this Code of Conduct:
### 1. Correction
**Community Impact**: Use of inappropriate language or other behavior deemed
unprofessional or unwelcome in the community.
**Consequence**: A private, written warning from community leaders, providing
clarity around the nature of the violation and an explanation of why the
behavior was inappropriate. A public apology may be requested.
### 2. Warning
**Community Impact**: A violation through a single incident or series of
actions.
**Consequence**: A warning with consequences for continued behavior. No
interaction with the people involved, including unsolicited interaction with
those enforcing the Code of Conduct, for a specified period of time. This
includes avoiding interactions in community spaces as well as external channels
like social media. Violating these terms may lead to a temporary or permanent
ban.
### 3. Temporary Ban
**Community Impact**: A serious violation of community standards, including
sustained inappropriate behavior.
**Consequence**: A temporary ban from any sort of interaction or public
communication with the community for a specified period of time. No public or
private interaction with the people involved, including unsolicited interaction
with those enforcing the Code of Conduct, is allowed during this period.
Violating these terms may lead to a permanent ban.
### 4. Permanent Ban
**Community Impact**: Demonstrating a pattern of violation of community
standards, including sustained inappropriate behavior, harassment of an
individual, or aggression toward or disparagement of classes of individuals.
**Consequence**: A permanent ban from any sort of public interaction within the
community.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
version 2.1, available at
[https://www.contributor-covenant.org/version/2/1/code_of_conduct.html][v2.1].
Community Impact Guidelines were inspired by
[Mozilla's code of conduct enforcement ladder][Mozilla CoC].
For answers to common questions about this code of conduct, see the FAQ at
[https://www.contributor-covenant.org/faq][FAQ]. Translations are available at
[https://www.contributor-covenant.org/translations][translations].
[homepage]: https://www.contributor-covenant.org
[v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html
[Mozilla CoC]: https://github.com/mozilla/diversity
[FAQ]: https://www.contributor-covenant.org/faq
[translations]: https://www.contributor-covenant.org/translations
pycm-4.2/.github/CONTRIBUTING.md 0000664 0000000 0000000 00000010306 14741064252 0016125 0 ustar 00root root 0000000 0000000 # Contribution
**Last Update: 2024-10-08**
Changes and improvements are more than welcome! ❤️ Feel free to fork and open a pull request.
Please consider the following :
1. Fork it!
2. Create your feature branch (under `dev` branch)
3. Add your new features or fix detected bugs
- To add a new class statistic visit [here](#class-statistic)
- To add a new overall statistic visit [here](#overall-statistic)
- To add a new interpretation visit [here](#interpretation)
4. Add standard `docstring` to your functions/methods according to the [standard format](#standard-docstring-format)
5. Add tests for your functions/methods (`doctest`, `Test` folder)
6. Update `README.md` (if needed)
7. Update `Document.ipynb` (if needed)
8. Pass all CI tests
9. Update `CHANGELOG.md`
- Describe changes under `[Unreleased]` section
10. Update `AUTHORS.md`
- Add your name under `# Other Contributors #` section
11. Submit a pull request into `dev` (please complete the pull request template)
## Class statistic
1. Add new functions to `class_funcs.py`
2. Update `CLASS_PARAMS` dictionary in `params.py`
3. Update `class_statistics` function in `class_funcs.py`
- Call statistic function and store result in `result` dictionary
4. Update `PARAMS_DESCRIPTION` dictionary in `params.py` by a short description
- If you don't want capitalization, update `CAPITALIZE_FILTER` list in `params.py` (*Optional*)
5. Update `References` section in `Document.ipynb` (`IEEE` format)
6. Add description to `Class Statistics` section in `Document.ipynb`
- Cite reference
- Update table of contents
- Use `LaTeX` for formula
7. Update `PARAMS_LINK` dictionary in `params.py` by document tag (without `#`)
8. Add tests to `overall_test.py` and `function_test.py` in `TEST` folder
- If you have any verified test add them to `verified_test.py`
9. Run `autopep8.bat`/`autopep8.sh` (*Optional*, need to install latest version of `autopep8` package)
## Overall statistic
1. Add new functions to `overall_funcs.py`
2. Update `OVERALL_PARAMS` dictionary in `params.py`
3. Update `overall_statistics` function in `class_funcs.py`
- Call statistic function and store result in a variable
- Add this variable to output
4. Update `References` section in `Document.ipynb` (`IEEE` format)
5. Add description to `Overall Statistics` section in `Document.ipynb`
- Cite reference
- Update table of contents
- Use `LaTeX` for formula
6. Update `PARAMS_LINK` dictionary in `params.py` by document tag (without `#`)
7. Add tests to `overall_test.py` and `function_test.py` in `TEST` folder
- If you have any verified test add them to `verified_test.py`
8. Run `autopep8.bat`/`autopep8.sh` (*Optional*, need to install latest version of `autopep8` package)
## Interpretation
1. Add new interpretation table as a function to `interpret.py`
2. Add a score dictionary to `params.py`
- Example : ```PLRI_SCORE = {"Good": 4, "Fair": 3, "Poor": 2, "Negligible": 1, "None": "None"}```
3. Add a color dictionary to `BENCHMARK_COLOR` in `params.py`
- Example :
```"PLRI": {"Negligible": "Red","Poor": "Orange","Fair": "Yellow","Good": "Green","None": "White"}```
4. If interpretation table is for a class statistic:
- Step 2-7 [class statistic](#class-statistic)
- Update `CLASS_BENCHMARK_SCORE_DICT` in `params.py`
5. If interpretation table is for a overall statistic:
- Step 2-6 [overall statistic](#overall-statistic)
- Update `OVERALL_BENCHMARK_SCORE_DICT` in `params.py`
6. Add tests to `compare_test.py`, `overall_test.py` and `function_test.py` in `TEST` folder
- If you have any verified test add them to `verified_test.py`
7. Run `autopep8.bat`/`autopep8.sh` (*Optional*, need to install latest version of `autopep8` package)
## Standard docstring format
Here, the `docstring` format mainly follows the PEP suggested structure. Note the following items
- Start the `docstring` description with uppercase letter and end it with a dot
- All other descriptions should be written in lowercase (unless exceptions)
- Declare the abbreviations before using them
Example:
def DF_calc(classes):
"""
Calculate Chi-squared degree of freedom (DF).
:param classes: confusion matrix classes
:type classes: list
:return: DF as int
"""
pycm-4.2/.github/FUNDING.yml 0000664 0000000 0000000 00000000050 14741064252 0015504 0 ustar 00root root 0000000 0000000 custom: https://www.pycm.io/donate.html
pycm-4.2/.github/ISSUE_TEMPLATE/ 0000775 0000000 0000000 00000000000 14741064252 0016057 5 ustar 00root root 0000000 0000000 pycm-4.2/.github/ISSUE_TEMPLATE/bug_report.yml 0000664 0000000 0000000 00000006453 14741064252 0020762 0 ustar 00root root 0000000 0000000 name: Bug Report
description: File a bug report
title: "[Bug]: "
body:
- type: markdown
attributes:
value: |
Thanks for your time to fill out this bug report!
- type: input
id: contact
attributes:
label: Contact details
description: How can we get in touch with you if we need more info?
placeholder: ex. email@example.com
validations:
required: false
- type: textarea
id: what-happened
attributes:
label: What happened?
description: Provide a clear and concise description of what the bug is.
placeholder: >
Tell us a description of the bug.
validations:
required: true
- type: textarea
id: step-to-reproduce
attributes:
label: Steps to reproduce
description: Provide details of how to reproduce the bug.
placeholder: >
ex. 1. Go to '...'
validations:
required: true
- type: textarea
id: expected-behavior
attributes:
label: Expected behavior
description: What did you expect to happen?
placeholder: >
ex. I expected '...' to happen
validations:
required: true
- type: textarea
id: actual-behavior
attributes:
label: Actual behavior
description: What did actually happen?
placeholder: >
ex. Instead '...' happened
validations:
required: true
- type: dropdown
id: operating-system
attributes:
label: Operating system
description: Which operating system are you using?
options:
- Windows
- macOS
- Linux
default: 0
validations:
required: true
- type: dropdown
id: python-version
attributes:
label: Python version
description: Which version of Python are you using?
options:
- Python 3.13
- Python 3.12
- Python 3.11
- Python 3.10
- Python 3.9
- Python 3.8
- Python 3.7
- Python 3.6
- Python 3.5
default: 1
validations:
required: true
- type: dropdown
id: pycm-version
attributes:
label: PyCM version
description: Which version of PyCM are you using?
options:
- PyCM 4.2
- PyCM 4.1
- PyCM 4.0
- PyCM 3.9
- PyCM 3.8
- PyCM 3.7
- PyCM 3.6
- PyCM 3.5
- PyCM 3.4
- PyCM 3.3
- PyCM 3.2
- PyCM 3.1
- PyCM 3.0
- PyCM 2.9
- PyCM 2.8
- PyCM 2.7
- PyCM 2.6
- PyCM 2.5
- PyCM 2.4
- PyCM 2.3
- PyCM 2.2
- PyCM 2.1
- PyCM 2.0
- PyCM 1.9
- PyCM 1.8
- PyCM 1.7
- PyCM 1.6
- PyCM 1.5
- PyCM 1.4
- PyCM 1.3
- PyCM 1.2
- PyCM 1.1
- PyCM 1.0
- PyCM 0.9.5
- PyCM 0.9
- PyCM 0.8.6
- PyCM 0.8.5
- PyCM 0.8.1
- PyCM 0.7
- PyCM 0.6
- PyCM 0.5
- PyCM 0.4
- PyCM 0.3
- PyCM 0.2
- PyCM 0.1
default: 0
validations:
required: true
- type: textarea
id: logs
attributes:
label: Relevant log output
description: Please copy and paste any relevant log output. This will be automatically formatted into code, so no need for backticks.
render: shell
pycm-4.2/.github/ISSUE_TEMPLATE/config.yml 0000664 0000000 0000000 00000000523 14741064252 0020047 0 ustar 00root root 0000000 0000000 blank_issues_enabled: false
contact_links:
- name: Discord
url: https://discord.com/invite/zqpU2b3J3f
about: Ask questions and discuss with other PyCM community members
- name: Mailing List
url: https://mail.python.org/mailman3/lists/pycm.python.org/
about: Ask questions and discuss with other PyCM community members
pycm-4.2/.github/ISSUE_TEMPLATE/feature_request.yml 0000664 0000000 0000000 00000001707 14741064252 0022012 0 ustar 00root root 0000000 0000000 name: Feature Request
description: Suggest a feature for this project
title: "[Feature]: "
body:
- type: textarea
id: description
attributes:
label: Describe the feature you want to add
placeholder: >
I'd like to be able to [...]
validations:
required: true
- type: textarea
id: possible-solution
attributes:
label: Describe your proposed solution
placeholder: >
I think this could be done by [...]
validations:
required: false
- type: textarea
id: alternatives
attributes:
label: Describe alternatives you've considered, if relevant
placeholder: >
Another way to do this would be [...]
validations:
required: false
- type: textarea
id: additional-context
attributes:
label: Additional context
placeholder: >
Add any other context or screenshots about the feature request here.
validations:
required: false
pycm-4.2/.github/PULL_REQUEST_TEMPLATE.md 0000664 0000000 0000000 00000000160 14741064252 0017472 0 ustar 00root root 0000000 0000000 #### Reference Issues/PRs
#### What does this implement/fix? Explain your changes.
#### Any other comments?
pycm-4.2/.github/dependabot.yml 0000664 0000000 0000000 00000000342 14741064252 0016523 0 ustar 00root root 0000000 0000000 version: 2
updates:
- package-ecosystem: pip
directory: "/"
schedule:
interval: weekly
time: "01:30"
open-pull-requests-limit: 10
target-branch: dev
assignees:
- "sadrasabouri"
- "sepandhaghighi"
pycm-4.2/.github/workflows/ 0000775 0000000 0000000 00000000000 14741064252 0015731 5 ustar 00root root 0000000 0000000 pycm-4.2/.github/workflows/publish_conda.yaml 0000664 0000000 0000000 00000000673 14741064252 0021435 0 ustar 00root root 0000000 0000000 name: publish_conda
on:
push:
# Sequence of patterns matched against refs/tags
tags:
- '*' # Push events to matching v*, i.e. v1.0, v20.15.10
jobs:
publish:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: publish-to-conda
uses: sepandhaghighi/conda-package-publish-action@v1.2
with:
subDir: 'Otherfiles'
AnacondaToken: ${{ secrets.ANACONDA_TOKEN }}
pycm-4.2/.github/workflows/publish_pypi.yml 0000664 0000000 0000000 00000001755 14741064252 0021173 0 ustar 00root root 0000000 0000000 # This workflow will upload a Python Package using Twine when a release is created
# For more information see: https://help.github.com/en/actions/language-and-framework-guides/using-python-with-github-actions#publishing-to-package-registries
name: Upload Python Package
on:
push:
# Sequence of patterns matched against refs/tags
tags:
- '*' # Push events to matching v*, i.e. v1.0, v20.15.10
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.x'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install setuptools wheel twine
- name: Build and publish
env:
TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }}
TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
run: |
python setup.py sdist bdist_wheel
twine upload dist/*.tar.gz
twine upload dist/*.whl
pycm-4.2/.github/workflows/test.yml 0000664 0000000 0000000 00000005113 14741064252 0017433 0 ustar 00root root 0000000 0000000 # This workflow will install Python dependencies, run tests and lint with a variety of Python versions
# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions
name: CI
on:
push:
branches:
- master
- dev
pull_request:
branches:
- master
- dev
env:
TEST_PYTHON_VERSION: 3.9
TEST_OS: 'ubuntu-20.04'
jobs:
build:
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [ubuntu-20.04, windows-2022, macOS-13]
python-version: [3.6, 3.7, 3.8, 3.9, 3.10.0, 3.11.0, 3.12.0, 3.13.0]
steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Installation
run: |
python -m pip install --upgrade pip
pip install .
- name: First test
run: |
python -m pycm test
python -m pycm
- name: Test requirements Installation
run: |
python Otherfiles/requirements-splitter.py
pip install --upgrade --upgrade-strategy=only-if-needed -r test-requirements.txt
- name: Test with pytest (Basic)
run: |
python -m pytest --cov=pycm --cov-report=term --ignore-glob=Test/plot_test.py
- name: Plot requirements Installation
run: |
pip install --upgrade --upgrade-strategy=only-if-needed -r plot-requirements.txt
- name: Test with pytest (+Plot)
run: |
python -m pytest --cov=pycm --cov-report=term --ignore-glob=Test/plot_error_test.py
- name: Version check
run: |
python Otherfiles/version_check.py
if: matrix.python-version == env.TEST_PYTHON_VERSION
- name: Notebook check
run: |
pip install notebook>=5.2.2
python Otherfiles/notebook_run.py
if: matrix.python-version == env.TEST_PYTHON_VERSION && matrix.os == env.TEST_OS
- name: Other tests
run: |
python -m vulture pycm/ Otherfiles/ setup.py --min-confidence 65 --exclude=__init__.py --sort-by-size
python -m bandit -r pycm -s B311
python -m pydocstyle -v --match-dir=pycm
if: matrix.python-version == env.TEST_PYTHON_VERSION
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v4
with:
fail_ci_if_error: true
token: ${{ secrets.CODECOV_TOKEN }}
if: matrix.python-version == env.TEST_PYTHON_VERSION && matrix.os == env.TEST_OS
- name: cProfile
run: |
python -m cProfile -s cumtime pycm/profile.py
pycm-4.2/.gitignore 0000664 0000000 0000000 00000002312 14741064252 0014322 0 ustar 00root root 0000000 0000000 # Created by .ignore support plugin (hsz.mobi)
### Python template
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
env/
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
*.egg-info/
.installed.cfg
*.egg
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*,cover
.hypothesis/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
target/
# Jupyter Notebook
.ipynb_checkpoints
# pyenv
.python-version
# celery beat schedule file
celerybeat-schedule
# dotenv
.env
# virtualenv
.venv/
venv/
ENV/
# Spyder project settings
.spyderproject
# Rope project settings
.ropeproject
### Example user template template
### Example user template
# IntelliJ project files
.idea
*.iml
out
gen
pycm-4.2/AUTHORS.md 0000664 0000000 0000000 00000001554 14741064252 0014010 0 ustar 00root root 0000000 0000000 # Core Developers
----------
- Sepand Haghighi - Open Science Laboratory ([Github](https://github.com/sepandhaghighi)) **
- Alireza Zolanvari - Open Science Laboratory ([Github](https://github.com/AlirezaZolanvari)) **
- Sadra Sabouri - Open Science Laboratory ([Github](https://github.com/sadrasabouri)) **
- Masoomeh Jasemi - Microsoft ([Github](https://github.com/MasoomehJasemi))
- Shaahin Hessabi - Sharif University of Technology ([Email](mailto:hessabi@sharif.edu))
** **Maintainer**
# Other Contributors
----------
- [@soheeyang](https://github.com/soheeyang)
- [@mahi97](https://github.com/mahi97)
- [@cclauss](https://github.com/cclauss)
- [@negarzabetian](https://github.com/negarzabetian)
- [@GeetDsa](https://github.com/GeetDsa)
- [@the-lay](https://github.com/the-lay)
- [@lewiuberg](https://github.com/lewiuberg)
- [@AHReccese](https://github.com/AHReccese)
pycm-4.2/CHANGELOG.md 0000664 0000000 0000000 00000052752 14741064252 0014160 0 ustar 00root root 0000000 0000000 # Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
## [Unreleased]
## [4.2] - 2025-01-14
### Added
- 5 new distance/similarity
1. KuhnsIII
2. KuhnsIV
3. KuhnsV
4. KuhnsVI
5. KuhnsVII
### Changed
- Test system modified
- PyPI badge in `README.md`
- GitHub actions are limited to the `dev` and `master` branches
- `AUTHORS.md` updated
- `README.md` modified
- Document modified
## [4.1] - 2024-10-17
### Added
- 5 new distance/similarity
1. KoppenI
2. KoppenII
3. KuderRichardson
4. KuhnsI
5. KuhnsII
- `feature_request.yml` template
- `config.yml` for issue template
- `SECURITY.md`
### Changed
- Bug report template modified
- `thresholds_calc` function updated
- `__midpoint_numeric_integral__` function updated
- `__trapezoidal_numeric_integral__` function updated
- Diagrams updated
- Document modified
- Document build system updated
- `AUTHORS.md` updated
- `README.md` modified
- Test system modified
- `Python 3.12` added to `test.yml`
- `Python 3.13` added to `test.yml`
- Warning and error messages updated
- `pycm_util.py` renamed to `utils.py`
- `pycm_test.py` renamed to `basic_test.py`
- `pycm_profile.py` renamed to `profile.py`
- `pycm_param.py` renamed to `params.py`
- `pycm_overall_func.py` renamed to `overall_funcs.py`
- `pycm_output.py` renamed to `output.py`
- `pycm_obj.py` renamed to `cm.py`
- `pycm_multilabel_cm.py` renamed to `multilabel_cm.py`
- `pycm_interpret.py` renamed to `interpret.py`
- `pycm_handler.py` renamed to `handlers.py`
- `pycm_error.py` renamed to `errors.py`
- `pycm_distance.py` renamed to `distance.py`
- `pycm_curve.py` renamed to `curve.py`
- `pycm_compare.py` renamed to `compare.py`
- `pycm_class_func.py` renamed to `class_funcs.py`
- `pycm_ci.py` renamed to `ci.py`
## [4.0] - 2023-06-07
### Added
- `pycmMultiLabelError` class
- `MultiLabelCM` class
- `get_cm_by_class` method
- `get_cm_by_sample` method
- `__mlcm_vector_handler__` function
- `__mlcm_assign_classes__` function
- `__mlcm_vectors_filter__` function
- `__set_to_multihot__` function
- `deprecated` function
### Changed
- Document modified
- `README.md` modified
- Example-4 modified
- Test system modified
- Python 3.5 support dropped
## [3.9] - 2023-05-01
### Added
- `OVERALL_PARAMS` dictionary
- `__imbalancement_handler__` function
- `vector_serializer` function
- NPV micro/macro
- `log_loss` method
- 23 new distance/similarity
1. Dennis
2. Digby
3. Dispersion
4. Doolittle
5. Eyraud
6. Fager & McGowan
7. Faith
8. Fleiss-Levin-Paik
9. Forbes I
10. Forbes II
11. Fossum
12. Gilbert & Wells
13. Goodall
14. Goodman & Kruskal's Lambda
15. Goodman & Kruskal Lambda-r
16. Guttman's Lambda A
17. Guttman's Lambda B
18. Hamann
19. Harris & Lahey
20. Hawkins & Dotson
21. Kendall's Tau
22. Kent & Foster I
23. Kent & Foster II
### Changed
- `metrics_off` parameter added to ConfusionMatrix `__init__` method
- `CLASS_PARAMS` changed to a dictionary
- Code style modified
- `sort` parameter added to `relabel` method
- Document modified
- `CONTRIBUTING.md` updated
- `codecov` removed from `dev-requirements.txt`
- Test system modified
## [3.8] - 2023-02-01
### Added
- `distance` method
- `__contains__` method
- `__getitem__` method
- Goodman-Kruskal's Lambda A benchmark
- Goodman-Kruskal's Lambda B benchmark
- Krippendorff's Alpha benchmark
- Pearson's C benchmark
- 30 new distance/similarity
1. AMPLE
2. Anderberg's D
3. Andres & Marzo's Delta
4. Baroni-Urbani & Buser I
5. Baroni-Urbani & Buser II
6. Batagelj & Bren
7. Baulieu I
8. Baulieu II
9. Baulieu III
10. Baulieu IV
11. Baulieu V
12. Baulieu VI
13. Baulieu VII
14. Baulieu VIII
15. Baulieu IX
16. Baulieu X
17. Baulieu XI
18. Baulieu XII
19. Baulieu XIII
20. Baulieu XIV
21. Baulieu XV
22. Benini I
23. Benini II
24. Canberra
25. Clement
26. Consonni & Todeschini I
27. Consonni & Todeschini II
28. Consonni & Todeschini III
29. Consonni & Todeschini IV
30. Consonni & Todeschini V
### Changed
- `relabel` method sort bug fixed
- `README.md` modified
- `Compare` overall benchmarks default weights updated
- Document modified
- Test system modified
## [3.7] - 2022-12-15
### Added
- `Curve` class
- `ROCCurve` class
- `PRCurve` class
- `pycmCurveError` class
### Changed
- `CONTRIBUTING.md` updated
- `matrix_params_calc` function optimized
- `README.md` modified
- Document modified
- Test system modified
- `Python 3.11` added to `test.yml`
## [3.6] - 2022-08-17
### Added
- Hamming distance
- Braun-Blanquet similarity
### Changed
- `classes` parameter added to `matrix_params_from_table` function
- Matrices with `numpy.integer` elements are now accepted
- Arrays added to `matrix` parameter accepting formats
- Website changed to [http://www.pycm.io](http://www.pycm.io)
- Document modified
- `README.md` modified
## [3.5] - 2022-04-27
### Added
- Anaconda workflow
- Custom iterating setting
- Custom casting setting
### Changed
- `plot` method updated
- `class_statistics` function modified
- `overall_statistics` function modified
- `BCD_calc` function modified
- `CONTRIBUTING.md` updated
- `CODE_OF_CONDUCT.md` updated
- Document modified
## [3.4] - 2022-01-26
### Added
- Colab badge
- Discord badge
- `brier_score` method
### Changed
- `J (Jaccard index)` section in `Document.ipynb` updated
- `save_obj` method updated
- `Python 3.10` added to `test.yml`
- Example-3 updated
- Docstrings of the functions updated
- `CONTRIBUTING.md` updated
## [3.3] - 2021-10-27
### Added
- `__compare_weight_handler__` function
### Changed
- `is_imbalanced` parameter added to ConfusionMatrix `__init__` method
- `class_benchmark_weight` and `overall_benchmark_weight` parameters added to Compare `__init__` method
- `statistic_recommend` function modified
- Compare `weight` parameter renamed to `class_weight`
- Document modified
- License updated
- `AUTHORS.md` updated
- `README.md` modified
- Block diagrams updated
## [3.2] - 2021-08-11
### Added
- `classes_filter` function
### Changed
- `classes` parameter added to `matrix_params_calc` function
- `classes` parameter added to `__obj_vector_handler__` function
- `classes` parameter added to ConfusionMatrix `__init__` method
- `name` parameter removed from `html_init` function
- `shortener` parameter added to `html_table` function
- `shortener` parameter added to `save_html` method
- Document modified
- HTML report modified
## [3.1] - 2021-03-11
### Added
- `requirements-splitter.py`
- `sensitivity_index` method
### Changed
- Test system modified
- `overall_statistics` function modified
- HTML report modified
- Document modified
- References format updated
- `CONTRIBUTING.md` updated
## [3.0] - 2020-10-26
### Added
- `plot_test.py`
- `axes_gen` function
- `add_number_label` function
- `plot` method
- `combine` method
- `matrix_combine` function
### Changed
- Document modified
- `README.md` modified
- Example-2 deprecated
- Example-7 deprecated
- Error messages modified
## [2.9] - 2020-09-23
### Added
- `notebook_check.py`
- `to_array` method
- `__copy__` method
- `copy` method
### Changed
- `average` method refactored
## [2.8] - 2020-07-09
### Added
- `label_map` attribute
- `positions` attribute
- `position` method
- Krippendorff's Alpha
- Aickin's Alpha
- `weighted_alpha` method
### Changed
- Single class bug fixed
- `CLASS_NUMBER_ERROR` error type changed to `pycmMatrixError`
- `relabel` method bug fixed
- Document modified
- `README.md` modified
## [2.7] - 2020-05-11
### Added
- `average` method
- `weighted_average` method
- `weighted_kappa` method
- `pycmAverageError` class
- Bangdiwala's B
- MATLAB examples
- Github action
### Changed
- Document modified
- `README.md` modified
- `relabel` method bug fixed
- `sparse_table_print` function bug fixed
- `matrix_check` function bug fixed
- Minor bug in `Compare` class fixed
- Class names mismatch bug fixed
## [2.6] - 2020-03-25
### Added
- `custom_rounder` function
- `complement` function
- `sparse_matrix` attribute
- `sparse_normalized_matrix` attribute
- Net benefit (NB)
- Yule's Q interpretation (QI)
- Adjusted Rand index (ARI)
- TNR micro/macro
- FPR micro/macro
- FNR micro/macro
### Changed
- `sparse` parameter added to `print_matrix`,`print_normalized_matrix` and `save_stat` methods
- `header` parameter added to `save_csv` method
- Handler functions moved to `pycm_handler.py`
- Error objects moved to `pycm_error.py`
- Verified tests references updated
- Verified tests moved to `verified_test.py`
- Test system modified
- `CONTRIBUTING.md` updated
- Namespace optimized
- `README.md` modified
- Document modified
- `print_normalized_matrix` method modified
- `normalized_table_calc` function modified
- `setup.py` modified
- summary mode updated
- Dockerfile updated
- `Python 3.8` added to `.travis.yaml` and `appveyor.yml`
### Removed
- `PC_PI_calc` function
## [2.5] - 2019-10-16
### Added
- `__version__` variable
- Individual classification success index (ICSI)
- Classification success index (CSI)
- Example-8 (Confidence interval)
- `install.sh`
- `autopep8.sh`
- Dockerfile
- `CI` method (supported statistics : `ACC`,`AUC`,`Overall ACC`,`Kappa`,`TPR`,`TNR`,`PPV`,`NPV`,`PLR`,`NLR`,`PRE`)
### Changed
- `test.sh` moved to `.travis` folder
- Python 3.4 support dropped
- Python 2.7 support dropped
- `AUTHORS.md` updated
- `save_stat`,`save_csv` and `save_html` methods Non-ASCII character bug fixed
- Mixed type input vectors bug fixed
- `CONTRIBUTING.md` updated
- Example-3 updated
- `README.md` modified
- Document modified
- `CI` attribute renamed to `CI95`
- `kappa_se_calc` function renamed to `kappa_SE_calc`
- `se_calc` function modified and renamed to `SE_calc`
- CI/SE functions moved to `pycm_ci.py`
- Minor bug in `save_html` method fixed
## [2.4] - 2019-07-31
### Added
- Tversky index (TI)
- Area under the PR curve (AUPR)
- `FUNDING.yml`
### Changed
- `AUC_calc` function modified
- Document modified
- `summary` parameter added to `save_html`,`save_stat`,`save_csv` and `stat` methods
- `sample_weight` bug in `numpy` array format fixed
- Inputs manipulation bug fixed
- Test system modified
- Warning system modified
- `alt_link` parameter added to `save_html` method and `online_help` function
- `Compare` class tests moved to `compare_test.py`
- Warning tests moved to `warning_test.py`
## [2.3] - 2019-06-27
### Added
- Adjusted F-score (AGF)
- Overlap coefficient (OC)
- Otsuka-Ochiai coefficient (OOC)
### Changed
- `save_stat` and `save_vector` parameters added to `save_obj` method
- Document modified
- `README.md` modified
- Parameters recommendation for imbalance dataset modified
- Minor bug in `Compare` class fixed
- `pycm_help` function modified
- Benchmarks color modified
## [2.2] - 2019-05-30
### Added
- Negative likelihood ratio interpretation (NLRI)
- Cramer's benchmark (SOA5)
- Matthews correlation coefficient interpretation (MCCI)
- Matthews's benchmark (SOA6)
- F1 macro
- F1 micro
- Accuracy macro
### Changed
- `Compare` class score calculation modified
- Parameters recommendation for multi-class dataset modified
- Parameters recommendation for imbalance dataset modified
- `README.md` modified
- Document modified
- Logo updated
## [2.1] - 2019-05-06
### Added
- Adjusted geometric mean (AGM)
- Yule's Q (Q)
- `Compare` class and parameters recommendation system block diagrams
### Changed
- Document links bug fixed
- Document modified
## [2.0] - 2019-04-15
### Added
- G-Mean (GM)
- Index of balanced accuracy (IBA)
- Optimized precision (OP)
- Pearson's C (C)
- `Compare` class
- Parameters recommendation warning
- `ConfusionMatrix` equal method
### Changed
- Document modified
- `stat_print` function bug fixed
- `table_print` function bug fixed
- `Beta` parameter renamed to `beta` (`F_calc` function & `F_beta` method)
- Parameters recommendation for imbalance dataset modified
- `normalize` parameter added to `save_html` method
- `pycm_func.py` splitted into `pycm_class_func.py` and `pycm_overall_func.py`
- `vector_filter`,`vector_check`,`class_check` and `matrix_check` functions moved to `pycm_util.py`
- `RACC_calc` and `RACCU_calc` functions exception handler modified
- Docstrings modified
## [1.9] - 2019-02-25
### Added
- Automatic/Manual (AM)
- Bray-Curtis dissimilarity (BCD)
- `CODE_OF_CONDUCT.md`
- `ISSUE_TEMPLATE.md`
- `PULL_REQUEST_TEMPLATE.md`
- `CONTRIBUTING.md`
- X11 color names support for `save_html` method
- Parameters recommendation system
- Warning message for high dimension matrix print
- Interactive notebooks section (binder)
### Changed
- `save_matrix` and `normalize` parameters added to `save_csv` method
- `README.md` modified
- Document modified
- `ConfusionMatrix.__init__` optimized
- Document and examples output files moved to different folders
- Test system modified
- `relabel` method bug fixed
## [1.8] - 2019-01-05
### Added
- Lift score (LS)
- `version_check.py`
### Changed
- `color` parameter added to `save_html` method
- Error messages modified
- Document modified
- Website changed to [http://www.pycm.ir](http://www.pycm.ir)
- Interpretation functions moved to `pycm_interpret.py`
- Utility functions moved to `pycm_util.py`
- Unnecessary `else` and `elif` removed
- `==` changed to `is`
## [1.7] - 2018-12-18
### Added
- Gini index (GI)
- Example-7
- `pycm_profile.py`
### Changed
- `class_name` parameter added to `stat`,`save_stat`,`save_csv` and `save_html` methods
- `overall_param` and `class_param` parameters empty list bug fixed
- `matrix_params_calc`, `matrix_params_from_table` and `vector_filter` functions optimized
- `overall_MCC_calc`, `CEN_misclassification_calc` and `convex_combination` functions optimized
- Document modified
## [1.6] - 2018-12-06
### Added
- AUC value interpretation (AUCI)
- Example-6
- Anaconda cloud package
### Changed
- `overall_param` and `class_param` parameters added to `stat`,`save_stat` and `save_html` methods
- `class_param` parameter added to `save_csv` method
- `_` removed from overall statistics names
- `README.md` modified
- Document modified
## [1.5] - 2018-11-26
### Added
- Relative classifier information (RCI)
- Discriminator power (DP)
- Youden's index (Y)
- Discriminant power interpretation (DPI)
- Positive likelihood ratio interpretation (PLRI)
- `__len__` method
- `relabel` method
- `__class_stat_init__` function
- `__overall_stat_init__` function
- `matrix` attribute as dict
- `normalized_matrix` attribute as dict
- `normalized_table` attribute as dict
### Changed
- `README.md` modified
- Document modified
- `LR+` renamed to `PLR`
- `LR-` renamed to `NLR`
- `normalized_matrix` method renamed to `print_normalized_matrix`
- `matrix` method renamed to `print_matrix`
- `entropy_calc` fixed
- `cross_entropy_calc` fixed
- `conditional_entropy_calc` fixed
- `print_table` bug for large numbers fixed
- JSON key bug in `save_obj` fixed
- `transpose` bug in `save_obj` fixed
- `Python 3.7` added to `.travis.yaml` and `appveyor.yml`
## [1.4] - 2018-11-12
### Added
- Area under curve (AUC)
- AUNU
- AUNP
- Class balance accuracy (CBA)
- Global performance index (RR)
- Overall MCC
- Distance index (dInd)
- Similarity index (sInd)
- `one_vs_all`
- `dev-requirements.txt`
### Changed
- `README.md` modified
- Document modified
- `save_stat` modified
- `requirements.txt` modified
## [1.3] - 2018-10-10
### Added
- Confusion entropy (CEN)
- Overall confusion entropy (Overall CEN)
- Modified confusion entropy (MCEN)
- Overall modified confusion entropy (Overall MCEN)
- Information score (IS)
### Changed
- `README.md` modified
## [1.2] - 2018-10-01
### Added
- No information rate (NIR)
- P-Value
- `sample_weight`
- `transpose`
### Changed
- `README.md` modified
- Key error in some parameters fixed
- `OSX` env added to `.travis.yml`
## [1.1] - 2018-09-08
### Added
- Zero-one loss
- Support
- `online_help` function
### Changed
- `README.md` modified
- `html_table` function modified
- `table_print` function modified
- `normalized_table_print` function modified
## [1.0] - 2018-08-30
### Added
- Hamming loss
### Changed
- `README.md` modified
## [0.9.5] - 2018-07-08
### Added
- Obj load
- Obj save
- Example-4
### Changed
- `README.md` modified
- Block diagram updated
## [0.9] - 2018-06-28
### Added
- Activation threshold
- Example-3
- Jaccard index
- Overall Jaccard index
### Changed
- `README.md` modified
- `setup.py` modified
## [0.8.6] - 2018-05-31
### Added
- Example section in document
- Python 2.7 CI
- JOSS paper pdf
### Changed
- Cite section
- ConfusionMatrix docstring
- round function changed to numpy.around
- `README.md` modified
## [0.8.5] - 2018-05-21
### Added
- Example-1 (Comparison of three different classifiers)
- Example-2 (How to plot via matplotlib)
- JOSS paper
- ConfusionMatrix docstring
### Changed
- Table size in HTML report
- Test system
- `README.md` modified
## [0.8.1] - 2018-03-22
### Added
- Goodman and Kruskal's lambda B
- Goodman and Kruskal's lambda A
- Cross entropy
- Conditional entropy
- Joint entropy
- Reference entropy
- Response entropy
- Kullback-Liebler divergence
- Direct ConfusionMatrix
- Kappa unbiased
- Kappa no prevalence
- Random accuracy unbiased
- `pycmVectorError` class
- `pycmMatrixError` class
- Mutual information
- Support `numpy` arrays
### Changed
- Notebook file updated
### Removed
- `pycmError` class
## [0.7] - 2018-02-26
### Added
- Cramer's V
- 95% confidence interval
- Chi-Squared
- Phi-Squared
- Chi-Squared DF
- Standard error
- Kappa standard error
- Kappa 95% confidence interval
- Cicchetti benchmark
### Changed
- Overall statistics color in HTML report
- Parameters description link in HTML report
## [0.6] - 2018-02-21
### Added
- CSV report
- Changelog
- Output files
- `digit` parameter to `ConfusionMatrix` object
### Changed
- Confusion matrix color in HTML report
- Parameters description link in HTML report
- Capitalize descriptions
## [0.5] - 2018-02-17
### Added
- Scott's pi
- Gwet's AC1
- Bennett S score
- HTML report
## [0.4] - 2018-02-05
### Added
- TPR micro/macro
- PPV micro/macro
- Overall RACC
- Error rate (ERR)
- FBeta score
- F0.5
- F2
- Fleiss benchmark
- Altman benchmark
- Output file(.pycm)
### Changed
- Class with zero item
- Normalized matrix
### Removed
- Kappa and SOA for each class
## [0.3] - 2018-01-27
### Added
- Kappa
- Random accuracy
- Landis and Koch benchmark
- `overall_stat`
## [0.2] - 2018-01-24
### Added
- Population
- Condition positive
- Condition negative
- Test outcome positive
- Test outcome negative
- Prevalence
- G-measure
- Matrix method
- Normalized matrix method
- Params method
### Changed
- `statistic_result` to `class_stat`
- `params` to `stat`
## [0.1] - 2018-01-22
### Added
- ACC
- BM
- DOR
- F1-Score
- FDR
- FNR
- FOR
- FPR
- LR+
- LR-
- MCC
- MK
- NPV
- PPV
- TNR
- TPR
- documents and `README.md`
[Unreleased]: https://github.com/sepandhaghighi/pycm/compare/v4.2...dev
[4.2]: https://github.com/sepandhaghighi/pycm/compare/v4.1...v4.2
[4.1]: https://github.com/sepandhaghighi/pycm/compare/v4.0...v4.1
[4.0]: https://github.com/sepandhaghighi/pycm/compare/v3.9...v4.0
[3.9]: https://github.com/sepandhaghighi/pycm/compare/v3.8...v3.9
[3.8]: https://github.com/sepandhaghighi/pycm/compare/v3.7...v3.8
[3.7]: https://github.com/sepandhaghighi/pycm/compare/v3.6...v3.7
[3.6]: https://github.com/sepandhaghighi/pycm/compare/v3.5...v3.6
[3.5]: https://github.com/sepandhaghighi/pycm/compare/v3.4...v3.5
[3.4]: https://github.com/sepandhaghighi/pycm/compare/v3.3...v3.4
[3.3]: https://github.com/sepandhaghighi/pycm/compare/v3.2...v3.3
[3.2]: https://github.com/sepandhaghighi/pycm/compare/v3.1...v3.2
[3.1]: https://github.com/sepandhaghighi/pycm/compare/v3.0...v3.1
[3.0]: https://github.com/sepandhaghighi/pycm/compare/v2.9...v3.0
[2.9]: https://github.com/sepandhaghighi/pycm/compare/v2.8...v2.9
[2.8]: https://github.com/sepandhaghighi/pycm/compare/v2.7...v2.8
[2.7]: https://github.com/sepandhaghighi/pycm/compare/v2.6...v2.7
[2.6]: https://github.com/sepandhaghighi/pycm/compare/v2.5...v2.6
[2.5]: https://github.com/sepandhaghighi/pycm/compare/v2.4...v2.5
[2.4]: https://github.com/sepandhaghighi/pycm/compare/v2.3...v2.4
[2.3]: https://github.com/sepandhaghighi/pycm/compare/v2.2...v2.3
[2.2]: https://github.com/sepandhaghighi/pycm/compare/v2.1...v2.2
[2.1]: https://github.com/sepandhaghighi/pycm/compare/v2.0...v2.1
[2.0]: https://github.com/sepandhaghighi/pycm/compare/v1.9...v2.0
[1.9]: https://github.com/sepandhaghighi/pycm/compare/v1.8...v1.9
[1.8]: https://github.com/sepandhaghighi/pycm/compare/v1.7...v1.8
[1.7]: https://github.com/sepandhaghighi/pycm/compare/v1.6...v1.7
[1.6]: https://github.com/sepandhaghighi/pycm/compare/v1.5...v1.6
[1.5]: https://github.com/sepandhaghighi/pycm/compare/v1.4...v1.5
[1.4]: https://github.com/sepandhaghighi/pycm/compare/v1.3...v1.4
[1.3]: https://github.com/sepandhaghighi/pycm/compare/v1.2...v1.3
[1.2]: https://github.com/sepandhaghighi/pycm/compare/v1.1...v1.2
[1.1]: https://github.com/sepandhaghighi/pycm/compare/v1.0...v1.1
[1.0]: https://github.com/sepandhaghighi/pycm/compare/v0.9.5...v1.0
[0.9.5]: https://github.com/sepandhaghighi/pycm/compare/v0.9...v0.9.5
[0.9]: https://github.com/sepandhaghighi/pycm/compare/v0.8.6...v0.9
[0.8.6]: https://github.com/sepandhaghighi/pycm/compare/v0.8.5...v0.8.6
[0.8.5]: https://github.com/sepandhaghighi/pycm/compare/v0.8.1...v0.8.5
[0.8.1]: https://github.com/sepandhaghighi/pycm/compare/v0.7...v0.8.1
[0.7]: https://github.com/sepandhaghighi/pycm/compare/v0.6...v0.7
[0.6]: https://github.com/sepandhaghighi/pycm/compare/v0.5...v0.6
[0.5]: https://github.com/sepandhaghighi/pycm/compare/v0.4...v0.5
[0.4]: https://github.com/sepandhaghighi/pycm/compare/v0.3...v0.4
[0.3]: https://github.com/sepandhaghighi/pycm/compare/v0.2...v0.3
[0.2]: https://github.com/sepandhaghighi/pycm/compare/v0.1...v0.2
[0.1]: https://github.com/sepandhaghighi/pycm/compare/1e238cd...v0.1
pycm-4.2/CITATION.cff 0000664 0000000 0000000 00000003204 14741064252 0014225 0 ustar 00root root 0000000 0000000 cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: "pycm"
abstract: "PyCM is a multi-class confusion matrix library written in Python that supports both input data vectors and direct matrix, and a proper tool for post-classification model evaluation that supports most classes and overall statistics parameters. PyCM is the swiss-army knife of confusion matrices, targeted mainly at data scientists that need a broad array of metrics for predictive models and accurate evaluation of a large variety of classifiers."
authors:
- family-names: "Haghighi"
given-names: "Sepand"
- family-names: "Zolanvari"
given-names: "Alireza"
- family-names: "Sabouri"
given-names: "Sadra"
version: 3.3
date-released: 2021-10-27
repository-code: "https://github.com/sepandhaghighi/pycm"
url: "https://www.pycm.io"
license: MIT
keywords:
- "confusion matrix"
- "python"
- "F-score"
- "Accuracy"
preferred-citation:
type: article
authors:
- family-names: "Haghighi"
given-names: "Sepand"
orcid: "https://orcid.org/0000-0001-9450-2375"
- family-names: "Jasemi"
given-names: "Masoomeh"
orcid: "https://orcid.org/0000-0002-4831-1698"
- family-names: "Hessabi"
given-names: "Shaahin"
orcid: "https://orcid.org/0000-0003-3193-2567"
- family-names: "Zolanvari"
given-names: "Alireza"
orcid: "https://orcid.org/0000-0003-2367-8343"
doi: "10.21105/joss.00729"
journal: "Journal of Open Source Software"
month: 5
start: 729 # First page number
end: 729 # Last page number
title: "PyCM: Multiclass confusion matrix library in Python"
issue: 25
volume: 3
year: 2018
pycm-4.2/Document/ 0000775 0000000 0000000 00000000000 14741064252 0014112 5 ustar 00root root 0000000 0000000 pycm-4.2/Document/Distance.ipynb 0000664 0000000 0000000 00000226217 14741064252 0016721 0 ustar 00root root 0000000 0000000 {
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
2- V. Dallmeier, C. Lindig, and A. Zeller, \"Lightweight defect localization for Java,\" in European conference on object-oriented programming, 2005: Springer, pp. 528-550.
\n",
"\n",
"
3- R. Abreu, P. Zoeteweij, and A. J. Van Gemund, \"An evaluation of similarity coefficients for software fault localization,\" in 2006 12th Pacific Rim International Symposium on Dependable Computing (PRDC'06), 2006: IEEE, pp. 39-46.
\n",
"\n",
"
4- M. R. Anderberg, Cluster analysis for applications: probability and mathematical statistics: a series of monographs and textbooks. Academic press, 2014.
\n",
"\n",
"
5- A. M. Andrés and P. F. Marzo, \"Delta: A new measure of agreement between two raters,\" British journal of mathematical and statistical psychology, vol. 57, no. 1, pp. 1-19, 2004.
\n",
"\n",
"
6- C. Baroni-Urbani and M. W. Buser, \"Similarity of binary data,\" Systematic Zoology, vol. 25, no. 3, pp. 251-259, 1976.
\n",
"\n",
"
7- V. Batagelj and M. Bren, \"Comparing resemblance measures,\" Journal of classification, vol. 12, no. 1, pp. 73-90, 1995.
\n",
"\n",
"
8- F. B. Baulieu, \"A classification of presence/absence based dissimilarity coefficients,\" Journal of Classification, vol. 6, no. 1, pp. 233-246, 1989.
\n",
"\n",
"
9- F. B. Baulieu, \"Two variant axiom systems for presence/absence based dissimilarity coefficients,\" Journal of Classification, vol. 14, no. 1, pp. 0159-0170, 1997.
\n",
"\n",
"
10- R. Benini, Principii di demografia. Barbera, 1901.
\n",
"\n",
"
11- G. N. Lance and W. T. Williams, \"Computer programs for hierarchical polythetic classification (“similarity analyses”),\" The Computer Journal, vol. 9, no. 1, pp. 60-64, 1966.
\n",
"\n",
"
12- G. N. Lance and W. T. Williams, \"Mixed-Data Classificatory Programs I - Agglomerative Systems,\" Australian Computer Journal, vol. 1, no. 1, pp. 15-20, 1967.
\n",
"\n",
"
13- P. W. Clement, \"A formula for computing inter-observer agreement,\" Psychological Reports, vol. 39, no. 1, pp. 257-258, 1976.
\n",
"\n",
"
14- V. Consonni and R. Todeschini, \"New similarity coefficients for binary data,\" Match-Communications in Mathematical and Computer Chemistry, vol. 68, no. 2, p. 581, 2012.
\n",
"\n",
"
15- S. F. Dennis, \"The Construction of a Thesaurus Automatically From,\" in Statistical Association Methods for Mechanized Documentation: Symposium Proceedings, 1965, vol. 269: US Government Printing Office, p. 61.
\n",
"\n",
"
16- P. G. Digby, \"Approximating the tetrachoric correlation coefficient,\" Biometrics, pp. 753-757, 1983.
\n",
"\n",
"
17- IBM Corp, \"IBM SPSS Statistics Algorithms,\" ed: IBM Corp Armonk, NY, USA, 2017.
\n",
"\n",
"
18- M. H. Doolittle, \"The verification of predictions,\" Bulletin of the Philosophical Society of Washington, vol. 7, pp. 122-127, 1885.
\n",
"\n",
"
19- H. Eyraud, \"Les principes de la mesure des correlations,\" Ann. Univ. Lyon, III. Ser., Sect. A, vol. 1, no. 30-47, p. 111, 1936.
\n",
"\n",
"
20- E. W. Fager, \"Determination and analysis of recurrent groups,\" Ecology, vol. 38, no. 4, pp. 586-595, 1957.
\n",
"\n",
"
21- E. W. Fager and J. A. McGowan, \"Zooplankton Species Groups in the North Pacific: Co-occurrences of species can be used to derive groups whose members react similarly to water-mass types,\" Science, vol. 140, no. 3566, pp. 453-460, 1963.
\n",
"\n",
"
22- D. P. Faith, \"Asymmetric binary similarity measures,\" Oecologia, vol. 57, pp. 287-290, 1983.
\n",
"\n",
"
23- J. L. Fleiss, B. Levin, and M. C. Paik, Statistical methods for rates and proportions. john wiley & sons, 2013.
\n",
"\n",
"
24- S. A. Forbes, On the local distribution of certain Illinois fishes: an essay in statistical ecology. Illinois State Laboratory of Natural History, 1907.
\n",
"\n",
"
25- A. Mozley, \"The statistical analysis of the distribution of pond molluscs in western Canada,\" The American Naturalist, vol. 70, no. 728, pp. 237-244, 1936.
\n",
"\n",
"
26- S. A. Forbes, \"Method of determining and measuring the associative relations of species,\" Science, vol. 61, no. 1585, pp. 518-524, 1925.
\n",
"\n",
"
27- E. G. Fossum and G. Kaskey, \"Optimization and standardization of information retrieval language and systems,\" SPERRY RAND CORP PHILADELPHIA PA UNIVAC DIV, 1966.
\n",
"\n",
"
28- N. Gilbert and T. C. Wells, \"Analysis of quadrat data,\" The Journal of Ecology, pp. 675-685, 1966.
\n",
"\n",
"
29- D. W. Goodall, \"The distribution of the matching coefficient,\" Biometrics, pp. 647-656, 1967.
\n",
"\n",
"
30- B. Austin and R. R. Colwell, \"Evaluation of some coefficients for use in numerical taxonomy of microorganisms,\" International Journal of Systematic and Evolutionary Microbiology, vol. 27, no. 3, pp. 204-210, 1977.
\n",
"\n",
"
31- L. A. Goodman, W. H. Kruskal, L. A. Goodman, and W. H. Kruskal, Measures of association for cross classifications. Springer, 1979.
\n",
"\n",
"
32- L. Guttman, \"An outline of the statistical theory of prediction,\" The prediction of personal adjustment, vol. 48, pp. 253-318, 1941.
\n",
"\n",
"
33- U. Hamann, \"Merkmalsbestand und verwandtschaftsbeziehungen der farinosae: ein beitrag zum system der monokotyledonen,\" Willdenowia, pp. 639-768, 1961.
\n",
"\n",
"
34- F. C. Harris and B. B. Lahey, \"A method for combining occurrence and nonoccurrence interobserver agreement scores,\" Journal of Applied Behavior Analysis, vol. 11, no. 4, pp. 523-527, 1978.
\n",
"\n",
"
35- R. P. Hawkins and V. A. Dotson, \"Reliability Scores That Delude: An Alice in Wonderland Trip Through the Misleading Characteristics of Inter-Observer Agreement Scores in Interval Recording,\" 1973.
\n",
"\n",
"
36- M. G. Kendall, \"A new measure of rank correlation,\" Biometrika, vol. 30, no. 1/2, pp. 81-93, 1938.
\n",
"\n",
"
37- R. N. Kent and S. L. Foster, \"Direct observational procedures: Methodological issues in naturalistic settings,\" Handbook of behavioral assessment, pp. 279-328, 1977.
\n",
"\n",
"
38- W. Köppen, \"In Repertorium für Meteorologie,\" Akademiia Nauk, pp. 189–238, 1870.
\n",
"\n",
"
39- G. F. Kuder and M. W. Richardson, \"The theory of the estimation of test reliability,\" Psychometrika, pp. 151–160, 1937.
\n",
"\n",
"
40- J. L. Kuhns, \"Statistical Association Methods for Mechanized Documentation,\" National Bureau of Standards Miscellaneous Publication, pp. 33-40, 1964.
\n",
"PyCM is a multi-class confusion matrix library written in Python that supports both input data vectors and direct matrix, and a proper tool for post-classification model evaluation that supports most classes and overall statistics parameters.\t\n",
"PyCM is the swiss-army knife of confusion matrices, targeted mainly at data scientists that need a broad array of metrics for predictive models and accurate evaluation of a large variety of classifiers.\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Parameter recommender"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This option has been added in `version 1.9` to recommend the most related parameters considering the characteristics of the input dataset. The suggested parameters are selected according to some characteristics of the input such as being balance/imbalance and binary/multi-class. All suggestions can be categorized into three main groups: imbalanced dataset, binary classification for a balanced dataset, and multi-class classification for a balanced dataset. The recommendation lists have been gathered according to the respective paper of each parameter and the capabilities which had been claimed by the paper."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
" \n",
"
\n",
"
Fig2. Parameter Recommender Block Diagram
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For determining if the dataset is imbalanced, we use the following strategy:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$R=\\frac{Max(P)}{Min(P)}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$State=\\begin{cases}Balance & R\\leq 3\\\\Imbalance & R > 3\\end{cases}$$"
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 72,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.imbalance"
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 73,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.binary"
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['ERR',\n",
" 'TPR Micro',\n",
" 'TPR Macro',\n",
" 'F1 Macro',\n",
" 'PPV Macro',\n",
" 'NPV Macro',\n",
" 'ACC',\n",
" 'Overall ACC',\n",
" 'MCC',\n",
" 'MCCI',\n",
" 'Overall MCC',\n",
" 'SOA6(Matthews)',\n",
" 'BCD',\n",
" 'Hamming Loss',\n",
" 'Zero-one Loss']"
]
},
"execution_count": 74,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.recommended_list"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`is_imbalanced` parameter has been added in `version 3.3`, so the user can indicate whether the concerned dataset is imbalanced or not. As long as the user does not provide any information in this regard, the automatic detection algorithm will be used."
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 75,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm4 = ConfusionMatrix(y_actu, y_pred, is_imbalanced=True)\n",
"cm4.imbalance"
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 76,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm4 = ConfusionMatrix(y_actu, y_pred, is_imbalanced=False)\n",
"cm4.imbalance"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
Notice : The recommender system assumes that the input is the result of classification over the whole data rather than just a part of it. If the confusion matrix is the result of test data classification, the recommendation is not valid.
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Compare"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In `version 2.0`, a method for comparing several confusion matrices is introduced. This option is a combination of several overall and class-based benchmarks. Each of the benchmarks evaluates the performance of the classification algorithm from good to poor and give them a numeric score. The score of good and poor performances are 1 and 0, respectively.\n",
"\n",
"After that, two scores are calculated for each confusion matrices, overall and class-based. The overall score is the average of the score of seven overall benchmarks which are Landis & Koch, Cramer, Matthews, Goodman-Kruskal's Lambda A, Goodman-Kruskal's Lambda B, Krippendorff's Alpha, and Pearson's C. In the same manner, the class-based score is the average of the score of six class-based benchmarks which are Positive Likelihood Ratio Interpretation, Negative Likelihood Ratio Interpretation, Discriminant Power Interpretation, AUC value Interpretation, Matthews Correlation Coefficient Interpretation and Yule's Q Interpretation. It should be noticed that if one of the benchmarks returns none for one of the classes, that benchmarks will be eliminated in total averaging. If the user sets weights for the classes, the averaging over the value of class-based benchmark scores will transform to a weighted average.\n",
"\n",
"If the user sets the value of `by_class` boolean input `True`, the best confusion matrix is the one with the maximum class-based score. Otherwise, if a confusion matrix obtains the maximum of both overall and class-based scores, that will be reported as the best confusion matrix, but in any other case, the compared object doesn’t select the best confusion matrix."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
Notice : From version 3.8, Goodman-Kruskal's Lambda A, Goodman-Kruskal's Lambda B, Krippendorff's Alpha, and Pearson's C benchmarks are considered in the overall score and default weights of the overall benchmarks are modified accordingly.
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ROC curve"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`ROCCurve`, added in `version 3.7`, is devised to compute the Receiver Operating Characteristic (ROC) or simply ROC curve. In ROC curves, the Y axis represents the True Positive Rate, and the X axis represents the False Positive Rate. Thus, the ideal point is located at the top left of the curve, and a larger area under the curve represents better performance. ROC curve is a graphical representation of binary classifiers' performance. In PyCM, `ROCCurve` binarizes the output based on the \"One vs. Rest\" strategy to provide an extension of ROC for multi-class classifiers. By getting the actual labels vector and the target probability estimates of the positive classes, this method is able to compute and plot TPR-FPR pairs for different discrimination thresholds and compute the area under the ROC curve. \n",
"The thresholds for which the TPR-FPR pairs are calculated can be either specified by users (by setting `thresholds` input) or calculated automatically. Furthermore, sample weights can be adjusted via `sample_weight` as an input; otherwise, they are assumed to be 1. `ROCCurve` has two methods named `area()` and `plot()`. `area()` provides the user with the value of area under curve, which can be calculated using either `trapezoidal` (default method) or `midpoint` numerical integral technique. `plot()` is also provided to plot the given curve."
]
},
{
"cell_type": "code",
"execution_count": 91,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.75"
]
},
"execution_count": 91,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from pycm import ROCCurve\n",
"crv = ROCCurve(\n",
" actual_vector=numpy.array([1, 1, 2, 2]),\n",
" probs=numpy.array([[0.1, 0.9], [0.4, 0.6], [0.35, 0.65], [0.8, 0.2]]),\n",
" classes=[2, 1])\n",
"crv.thresholds\n",
"auc_trp = crv.area()\n",
"auc_trp[1]\n",
"auc_trp[2]"
]
},
{
"cell_type": "code",
"execution_count": 92,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 92,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Precision-Recall curve"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`PRCurve`, added in `version 3.7`, is devised to compute the Precision-Recall curve in which the Y axis represents the Precision, and the X axis represents the Recall of a classifier. Thus, the ideal point is located at the top right of the curve, and a larger area under the curve represents better performance. Precision-Recall curve is a graphical representation of binary classifiers' performance. In PyCM, `PRCurve` binarizes the output based on the \"One vs. Rest\" strategy to provide an extension of this curve for multi-class classifiers. By getting the actual labels vector and the target probability estimates of the positive classes, this method is able to compute and plot Precision-Recall pairs for different discrimination thresholds and compute the area under the curve. \n",
"The thresholds for which the Precision-Recall pairs are calculated can be either specified by users (by setting `thresholds` input) or calculated automatically. Furthermore, sample weights can be adjusted via `sample_weight` as an input; otherwise, they are assumed to be 1. `PRCurve` has two methods named `area()` and `plot()`. `area()` provides the user with the value of area under curve, which can be calculated using either `trapezoidal` (default method) or `midpoint` numerical integral technique. `plot()` is also provided to plot the given curve."
]
},
{
"cell_type": "code",
"execution_count": 93,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"C:\\Users\\Sepkjaer\\AppData\\Local\\Programs\\Python\\Python35-32\\lib\\site-packages\\pycm-4.2-py3.5.egg\\pycm\\curve.py:382: RuntimeWarning: The curve contains non-numerical value(s).\n"
]
},
{
"data": {
"text/plain": [
"0.29166666666666663"
]
},
"execution_count": 93,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from pycm import PRCurve\n",
"crv = PRCurve(\n",
" actual_vector=numpy.array([1, 1, 2, 2]),\n",
" probs=numpy.array([[0.1, 0.9], [0.4, 0.6], [0.35, 0.65], [0.8, 0.2]]),\n",
" classes=[2, 1])\n",
"crv.thresholds\n",
"auc_trp = crv.area()\n",
"auc_trp[1]\n",
"auc_trp[2]"
]
},
{
"cell_type": "code",
"execution_count": 94,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 94,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"
Notice : overall_benchmark_weight and class_benchmark_weight, new in version 3.3
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"ROCCurve"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1. `actual_vector` : python `list` or numpy `array` of any stringable objects\n",
"2. `probs` : python `list` or numpy `array`\n",
"3. `classes` : python `list`\n",
"4. `thresholds`: python `list` or numpy `array` \n",
"5. `sample_weight`: python `list` or numpy `array`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* run `help(ROCCurve)` for more information"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"PRCurve"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1. `actual_vector` : python `list` or numpy `array` of any stringable objects\n",
"2. `probs` : python `list` or numpy `array`\n",
"3. `classes` : python `list`\n",
"4. `thresholds`: python `list` or numpy `array` \n",
"5. `sample_weight`: python `list` or numpy `array`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* run `help(PRCurve)` for more information"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"MultiLabelCM"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1. `actual_vector` : python `list` or numpy array of `sets`\n",
"2. `predict_vector` : python `list` or numpy array of `sets`\n",
"3. `sample_weight`: python `list` or numpy `array`\n",
"4. `classes` : python `list`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* run `help(MultiLabelCM)` for more information"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Basic parameters"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### TP (True positive)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A true positive test result is one that detects the condition when the\n",
"condition is present (correctly identified) [[3]](#ref3)."
]
},
{
"cell_type": "code",
"execution_count": 97,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 3, 'L2': 1, 'L3': 3}"
]
},
"execution_count": 97,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.TP"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### TN (True negative)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A true negative test result is one that does not detect the condition when\n",
"the condition is absent (correctly rejected) [[3]](#ref3)."
]
},
{
"cell_type": "code",
"execution_count": 98,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 7, 'L2': 8, 'L3': 4}"
]
},
"execution_count": 98,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.TN"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### FP (False positive)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A false positive test result is one that detects the condition when the\n",
"condition is absent (incorrectly identified) [[3]](#ref3)."
]
},
{
"cell_type": "code",
"execution_count": 99,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 0, 'L2': 2, 'L3': 3}"
]
},
"execution_count": 99,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.FP"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### FN (False negative)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A false negative test result is one that does not detect the condition when\n",
"the condition is present (incorrectly rejected) [[3]](#ref3)."
]
},
{
"cell_type": "code",
"execution_count": 100,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 2, 'L2': 1, 'L3': 2}"
]
},
"execution_count": 100,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.FN"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### P (Condition positive)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Number of positive samples.\n",
"Also known as support (the number of occurrences of each class in y_true) [[3]](#ref3)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$P=TP+FN$$"
]
},
{
"cell_type": "code",
"execution_count": 101,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 5, 'L2': 2, 'L3': 5}"
]
},
"execution_count": 101,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.P"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### N (Condition negative)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Number of negative samples [[3]](#ref3)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$N=TN+FP$$"
]
},
{
"cell_type": "code",
"execution_count": 102,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 7, 'L2': 10, 'L3': 7}"
]
},
"execution_count": 102,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.N"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### TOP (Test outcome positive)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Number of positive outcomes [[3]](#ref3)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$TOP=TP+FP$$"
]
},
{
"cell_type": "code",
"execution_count": 103,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 3, 'L2': 3, 'L3': 6}"
]
},
"execution_count": 103,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.TOP"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### TON (Test outcome negative)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Number of negative outcomes [[3]](#ref3)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$TON=TN+FN$$"
]
},
{
"cell_type": "code",
"execution_count": 104,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 9, 'L2': 9, 'L3': 6}"
]
},
"execution_count": 104,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.TON"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### POP (Population)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Total sample size [[3]](#ref3)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$POP=TP+TN+FN+FP$$"
]
},
{
"cell_type": "code",
"execution_count": 105,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 12, 'L2': 12, 'L3': 12}"
]
},
"execution_count": 105,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.POP"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Wikipedia page"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Class statistics"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### TPR (True positive rate)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Sensitivity (also called the true positive rate, the recall, or probability of detection in some fields) measures the proportion of positives that are correctly identified as such (e.g. the percentage of sick people who are correctly identified as having the condition) [[3]](#ref3).\n",
"\n",
"Wikipedia page"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$TPR=\\frac{TP}{P}=\\frac{TP}{TP+FN}$$"
]
},
{
"cell_type": "code",
"execution_count": 106,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 0.6, 'L2': 0.5, 'L3': 0.6}"
]
},
"execution_count": 106,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.TPR"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### TNR (True negative rate)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Specificity (also called the true negative rate) measures the proportion of negatives that are correctly identified as such (e.g. the percentage of healthy people who are correctly identified as not having the condition) [[3]](#ref3).\n",
"\n",
"Wikipedia page"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$TNR=\\frac{TN}{N}=\\frac{TN}{TN+FP}$$"
]
},
{
"cell_type": "code",
"execution_count": 107,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 1.0, 'L2': 0.8, 'L3': 0.5714285714285714}"
]
},
"execution_count": 107,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.TNR"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### PPV (Positive predictive value)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Positive predictive value (PPV) is the proportion of positives that correspond to\n",
"the presence of the condition [[3]](#ref3).\n",
"\n",
"Wikipedia page"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$PPV=\\frac{TP}{TP+FP}$$"
]
},
{
"cell_type": "code",
"execution_count": 108,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 1.0, 'L2': 0.3333333333333333, 'L3': 0.5}"
]
},
"execution_count": 108,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.PPV"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### NPV (Negative predictive value)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Negative predictive value (NPV) is the proportion of negatives that correspond to\n",
"the absence of the condition [[3]](#ref3).\n",
"\n",
"Wikipedia page"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$NPV=\\frac{TN}{TN+FN}$$"
]
},
{
"cell_type": "code",
"execution_count": 109,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 0.7777777777777778, 'L2': 0.8888888888888888, 'L3': 0.6666666666666666}"
]
},
"execution_count": 109,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.NPV"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### FNR (False negative rate)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The false negative rate is the proportion of positives which yield negative test outcomes with the test, i.e., the conditional probability of a negative test result given that the condition being looked for is present [[3]](#ref3).\n",
"\n",
"Wikipedia page"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$FNR=\\frac{FN}{P}=\\frac{FN}{FN+TP}=1-TPR$$"
]
},
{
"cell_type": "code",
"execution_count": 110,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 0.4, 'L2': 0.5, 'L3': 0.4}"
]
},
"execution_count": 110,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.FNR"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### FPR (False positive rate)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The false positive rate is the proportion of all negatives that still yield positive test outcomes, i.e., the conditional probability of a positive test result given an event that was not present [[3]](#ref3).\n",
"\n",
"The false positive rate is equal to the significance level. The specificity of the test is equal to $ 1 $ minus the false positive rate.\n",
"\n",
"Wikipedia page"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$FPR=\\frac{FP}{N}=\\frac{FP}{FP+TN}=1-TNR$$"
]
},
{
"cell_type": "code",
"execution_count": 111,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 0.0, 'L2': 0.19999999999999996, 'L3': 0.4285714285714286}"
]
},
"execution_count": 111,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.FPR"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### FDR (False discovery rate)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The false discovery rate (FDR) is a method of conceptualizing the rate of type I errors in null hypothesis testing when conducting multiple comparisons. FDR-controlling procedures are designed to control the expected proportion of \"discoveries\" (rejected null hypotheses) that are false (incorrect rejections) [[3]](#ref3).\n",
"\n",
"Wikipedia page"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$FDR=\\frac{FP}{FP+TP}=1-PPV$$"
]
},
{
"cell_type": "code",
"execution_count": 112,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 0.0, 'L2': 0.6666666666666667, 'L3': 0.5}"
]
},
"execution_count": 112,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.FDR"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### FOR (False omission rate)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"False omission rate (FOR) is a statistical method used in multiple hypothesis testing to correct for multiple comparisons and it is the complement of the negative predictive value. It measures the proportion of false negatives which are incorrectly rejected [[3]](#ref3).\n",
"\n",
"Wikipedia page"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$FOR=\\frac{FN}{FN+TN}=1-NPV$$"
]
},
{
"cell_type": "code",
"execution_count": 113,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 0.2222222222222222,\n",
" 'L2': 0.11111111111111116,\n",
" 'L3': 0.33333333333333337}"
]
},
"execution_count": 113,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.FOR"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ACC (Accuracy)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The accuracy is the number of correct predictions from all predictions made [[3]](#ref3).\n",
"\n",
"Wikipedia page"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ACC=\\frac{TP+TN}{P+N}=\\frac{TP+TN}{TP+TN+FP+FN}$$"
]
},
{
"cell_type": "code",
"execution_count": 114,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 0.8333333333333334, 'L2': 0.75, 'L3': 0.5833333333333334}"
]
},
"execution_count": 114,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.ACC"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ERR (Error rate)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The error rate is the number of incorrect predictions from all predictions made [[3]](#ref3)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ERR=\\frac{FP+FN}{P+N}=\\frac{FP+FN}{TP+TN+FP+FN}=1-ACC$$"
]
},
{
"cell_type": "code",
"execution_count": 115,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 0.16666666666666663, 'L2': 0.25, 'L3': 0.41666666666666663}"
]
},
"execution_count": 115,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.ERR"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
Notice : new in version 0.4
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### FBeta-Score"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In statistical analysis of classification, the F1 score (also F-score or F-measure) is a measure of a test's accuracy. It considers both the precision $ p $ and the recall $ r $ of the test to compute the score.\n",
"The F1 score is the harmonic average of the precision and recall, where F1 score reaches its best value at $ 1 $ (perfect precision and recall) and worst at $ 0 $ [[3]](#ref3).\n",
"\n",
"Wikipedia page"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$F_{\\beta}=(1+\\beta^2)\\times \\frac{PPV\\times TPR}{(\\beta^2 \\times PPV)+TPR}=\\frac{(1+\\beta^2) \\times TP}{(1+\\beta^2)\\times TP+FP+\\beta^2 \\times FN}$$"
]
},
{
"cell_type": "code",
"execution_count": 116,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 0.75, 'L2': 0.4, 'L3': 0.5454545454545454}"
]
},
"execution_count": 116,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.F1"
]
},
{
"cell_type": "code",
"execution_count": 117,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 0.8823529411764706, 'L2': 0.35714285714285715, 'L3': 0.5172413793103449}"
]
},
"execution_count": 117,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.F05"
]
},
{
"cell_type": "code",
"execution_count": 118,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 0.6521739130434783, 'L2': 0.45454545454545453, 'L3': 0.5769230769230769}"
]
},
"execution_count": 118,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.F2"
]
},
{
"cell_type": "code",
"execution_count": 119,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 0.6144578313253012, 'L2': 0.4857142857142857, 'L3': 0.5930232558139535}"
]
},
"execution_count": 119,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.F_beta(beta=4)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Parameters "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1. `beta` : beta parameter (type : `float`)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Output"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`{class1: FBeta-Score1, class2: FBeta-Score2, ...}`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
Notice : new in version 0.4
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### MCC (Matthews correlation coefficient)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The Matthews correlation coefficient is used in machine learning as a measure of the quality of binary (two-class) classifications, introduced by biochemist Brian W. Matthews in 1975. It takes into account true and false positives and negatives and is generally regarded as a balanced measure that can be used even if the classes are of very different sizes. The MCC is, in essence, a correlation coefficient between the observed and predicted binary classifications; it returns a value between $ −1 $ and $ +1 $. A coefficient of $ +1 $ represents a perfect prediction, $ 0 $ no better than random prediction and $ −1 $ indicates total disagreement between prediction and observation [[27]](#ref27).\n",
"\n",
"Interpretation\n",
"\n",
"Wikipedia page"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$MCC=\\frac{TP \\times TN-FP \\times FN}{\\sqrt{(TP+FP)\\times (TP+FN)\\times (TN+FP)\\times (TN+FN)}}$$"
]
},
{
"cell_type": "code",
"execution_count": 120,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 0.6831300510639732, 'L2': 0.25819888974716115, 'L3': 0.1690308509457033}"
]
},
"execution_count": 120,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.MCC"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### BM (Bookmaker informedness)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The informedness of a prediction method as captured by a contingency matrix is defined as the probability that the prediction method will make a correct decision as opposed to guessing and is calculated using the bookmaker algorithm [[2]](#ref2).\n",
"\n",
"Equals to Youden Index"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$BM=TPR+TNR-1$$"
]
},
{
"cell_type": "code",
"execution_count": 121,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 0.6000000000000001,\n",
" 'L2': 0.30000000000000004,\n",
" 'L3': 0.17142857142857126}"
]
},
"execution_count": 121,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.BM"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### MK (Markedness)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In statistics and psychology, the social science concept of markedness is quantified as a measure of how much one variable is marked as a predictor or possible cause of another and is also known as $ \\triangle P $ in simple two-choice cases [[2]](#ref2)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$MK=PPV+NPV-1$$"
]
},
{
"cell_type": "code",
"execution_count": 122,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 0.7777777777777777, 'L2': 0.2222222222222221, 'L3': 0.16666666666666652}"
]
},
"execution_count": 122,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.MK"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### PLR (Positive likelihood ratio)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Likelihood ratios are used for assessing the value of performing a diagnostic test. They use the sensitivity and specificity of the test to determine whether a test result usefully changes the probability that a condition (such as a disease state) exists. The first description of the use of likelihood ratios for decision rules was made at a symposium on information theory in 1954 [[28]](#ref28).\n",
"\n",
"Interpretation\n",
"\n",
"Wikipedia page"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$LR_+=PLR=\\frac{TPR}{FPR}$$"
]
},
{
"cell_type": "code",
"execution_count": 123,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 'None', 'L2': 2.5000000000000004, 'L3': 1.4}"
]
},
"execution_count": 123,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.PLR"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
Notice : LR+ renamed to PLR in version 1.5
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### NLR (Negative likelihood ratio)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Likelihood ratios are used for assessing the value of performing a diagnostic test. They use the sensitivity and specificity of the test to determine whether a test result usefully changes the probability that a condition (such as a disease state) exists. The first description of the use of likelihood ratios for decision rules was made at a symposium on information theory in 1954 [[28]](#ref28).\n",
"\n",
"Interpretation\n",
"\n",
"Wikipedia page"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$LR_-=NLR=\\frac{FNR}{TNR}$$"
]
},
{
"cell_type": "code",
"execution_count": 124,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 0.4, 'L2': 0.625, 'L3': 0.7000000000000001}"
]
},
"execution_count": 124,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.NLR"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
Notice : LR- renamed to NLR in version 1.5
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### DOR (Diagnostic odds ratio)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The diagnostic odds ratio is a measure of the effectiveness of a diagnostic test. It is defined as the ratio of the odds of the test being positive if the subject has a disease relative to the odds of the test being positive if the subject does not have the disease [[28]](#ref28).\n",
"\n",
"Wikipedia page"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$DOR=\\frac{LR_+}{LR_-}$$"
]
},
{
"cell_type": "code",
"execution_count": 125,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 'None', 'L2': 4.000000000000001, 'L3': 1.9999999999999998}"
]
},
"execution_count": 125,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.DOR"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### PRE (Prevalence)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Prevalence is a statistical concept referring to the number of cases of a disease that are present in a particular population at a given time (Reference Likelihood) [[14]](#ref14).\n",
"\n",
"Wikipedia page"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$Prevalence=\\frac{P}{POP}$$"
]
},
{
"cell_type": "code",
"execution_count": 126,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 0.4166666666666667, 'L2': 0.16666666666666666, 'L3': 0.4166666666666667}"
]
},
"execution_count": 126,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.PRE"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### G (G-measure)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The geometric mean of precision and sensitivity, also known as Fowlkes–Mallows index [[3]](#ref3).\n",
"\n",
"Wikipedia page"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$G=\\sqrt{PPV\\times TPR}$$"
]
},
{
"cell_type": "code",
"execution_count": 127,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 0.7745966692414834, 'L2': 0.408248290463863, 'L3': 0.5477225575051661}"
]
},
"execution_count": 127,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.G"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### RACC (Random accuracy)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The expected accuracy from a strategy of randomly guessing categories according to reference and response distributions [[24]](#ref24)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$RACC=\\frac{TOP \\times P}{POP^2}$$"
]
},
{
"cell_type": "code",
"execution_count": 128,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 0.10416666666666667,\n",
" 'L2': 0.041666666666666664,\n",
" 'L3': 0.20833333333333334}"
]
},
"execution_count": 128,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.RACC"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
Notice : new in version 0.3
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### RACCU (Random accuracy unbiased)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The expected accuracy from a strategy of randomly guessing categories according to the average of the reference and response distributions [[25]](#ref25)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$RACCU=(\\frac{TOP+P}{2 \\times POP})^2$$"
]
},
{
"cell_type": "code",
"execution_count": 129,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 0.1111111111111111,\n",
" 'L2': 0.04340277777777778,\n",
" 'L3': 0.21006944444444442}"
]
},
"execution_count": 129,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.RACCU"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
Notice : new in version 0.8.1
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### J (Jaccard index)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The Jaccard index, also known as Intersection over Union and the Jaccard similarity coefficient (originally coined coefficient de communauté by Paul Jaccard), is a statistic used for comparing the similarity and diversity of sample sets [[29]](#ref29).\n",
"\n",
"Wikipedia page\n",
"\n",
"Some articles also named it as the F* (An Interpretable Transformation of the F-measure) [[77]](#ref77)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$J=\\frac{TP}{TOP+P-TP}$$"
]
},
{
"cell_type": "code",
"execution_count": 130,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 0.6, 'L2': 0.25, 'L3': 0.375}"
]
},
"execution_count": 130,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.J"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
Notice : new in version 0.9
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### IS (Information score)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The amount of information needed to correctly classify an example into\n",
"class C, whose prior probability is $ p(C) $, is defined as $ -\\log_2(p(C)) $ [[18]](#ref18) [[39]](#ref39)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$IS=-log_2(\\frac{TP+FN}{POP})+log_2(\\frac{TP}{TP+FP})$$"
]
},
{
"cell_type": "code",
"execution_count": 131,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 1.2630344058337937, 'L2': 0.9999999999999998, 'L3': 0.26303440583379367}"
]
},
"execution_count": 131,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.IS"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
Notice : new in version 1.3
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### CEN (Confusion entropy)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"CEN based upon the concept of entropy for evaluating classifier performances. By exploiting the misclassification information of confusion matrices, the measure evaluates the confusion level of the class distribution of\n",
"misclassified samples. Both theoretical analysis and statistical results show that the proposed measure is more discriminating than accuracy and RCI while it remains relatively consistent with the two measures. Moreover, it is more capable of measuring how the samples of different classes have been separated from each\n",
"other. Hence the proposed measure is more precise than the two measures and can substitute for them to evaluate classifiers in classification applications [[17]](#ref17)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$P_{i,j}^{j}=\\frac{Matrix(i,j)}{\\sum_{k=1}^{|C|}\\Big(Matrix(j,k)+Matrix(k,j)\\Big)}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$P_{i,j}^{i}=\\frac{Matrix(i,j)}{\\sum_{k=1}^{|C|}\\Big(Matrix(i,k)+Matrix(k,i)\\Big)}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$CEN_j=-\\sum_{k=1,k\\neq j}^{|C|}\\Bigg(P_{j,k}^jlog_{2(|C|-1)}\\Big(P_{j,k}^j\\Big)+P_{k,j}^jlog_{2(|C|-1)}\\Big(P_{k,j}^j\\Big)\\Bigg)$$"
]
},
{
"cell_type": "code",
"execution_count": 132,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 0.25, 'L2': 0.49657842846620864, 'L3': 0.6044162769630221}"
]
},
"execution_count": 132,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.CEN"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### AUC (Area under the ROC curve)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The area under the curve (often referred to as simply the AUC) is equal to the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one (assuming 'positive' ranks higher than 'negative').\n",
"Thus, AUC corresponds to the arithmetic mean of sensitivity and specificity values of each class [[23]](#ref23).\n",
"\n",
"Interpretation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$AUC=\\frac{TNR+TPR}{2}$$"
]
},
{
"cell_type": "code",
"execution_count": 134,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 0.8, 'L2': 0.65, 'L3': 0.5857142857142856}"
]
},
"execution_count": 134,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.AUC"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
Notice : new in version 1.4
\n",
"
Notice : this is an approximate calculation of AUC
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### dInd (Distance index)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Euclidean distance of a ROC point from the top left corner of the ROC space, which can take values between 0 (perfect classification) and $ \\sqrt{2} $ [[23]](#ref23)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$dInd=\\sqrt{(1-TNR)^2+(1-TPR)^2}$$"
]
},
{
"cell_type": "code",
"execution_count": 135,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 0.4, 'L2': 0.5385164807134504, 'L3': 0.5862367008195198}"
]
},
"execution_count": 135,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.dInd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### AGF (Adjusted F-score)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The F-measures used only three of the four elements of the confusion matrix and hence two classifiers with different TNR values may have the same F-score. Therefore, the AGF metric is introduced to use all elements of the confusion matrix and provide more weights to samples which are correctly classified in the minority class [[50]](#ref50)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$AGF=\\sqrt{F_2 \\times InvF_{0.5}}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$F_{2}=5\\times \\frac{PPV\\times TPR}{(4 \\times PPV)+TPR}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$InvF_{0.5}=(1+0.5^2)\\times \\frac{NPV\\times TNR}{(0.5^2 \\times NPV)+TNR}$$"
]
},
{
"cell_type": "code",
"execution_count": 156,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 0.7285871475307653, 'L2': 0.6286946134619315, 'L3': 0.610088876086563}"
]
},
"execution_count": 156,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.AGF"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
Notice : new in version 2.3
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### OC (Overlap coefficient)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The overlap coefficient, or Szymkiewicz–Simpson coefficient, is a similarity measure that measures the overlap between two finite sets. It is defined as the size of the intersection divided by the smaller of the size of the two sets [[52]](#ref52).\n",
"\n",
"Wikipedia page"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$OC=\\frac{TP}{min(TOP,P)}=max(PPV,TPR)$$"
]
},
{
"cell_type": "code",
"execution_count": 157,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 1.0, 'L2': 0.5, 'L3': 0.6}"
]
},
"execution_count": 157,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.OC"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
Notice : new in version 2.3
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### BB (Braun-Blanquet similarity)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The Braun-Blanquet coefficient is a similarity measure that is mostly used in botany. It is defined as the size of the intersection divided by the larger of the size of the two sets [[82]](#ref82) [[83]](#ref83)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$BB=\\frac{TP}{max(TOP,P)}=min(PPV,TPR)$$"
]
},
{
"cell_type": "code",
"execution_count": 158,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 0.6, 'L2': 0.3333333333333333, 'L3': 0.5}"
]
},
"execution_count": 158,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.BB"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
Notice : new in version 3.6
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### OOC (Otsuka-Ochiai coefficient)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In biology, there is a similarity index, known as the Otsuka-Ochiai coefficient named after Yanosuke Otsuka and Akira Ochiai, also known as the Ochiai-Barkman or Ochiai coefficient. If sets are represented as bit vectors, the Otsuka-Ochiai coefficient can be seen to be the same as the cosine similarity [[53]](#ref53).\n",
"\n",
"Wikipedia page"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$OOC=\\frac{TP}{\\sqrt{TOP\\times P}}$$"
]
},
{
"cell_type": "code",
"execution_count": 159,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 0.7745966692414834, 'L2': 0.4082482904638631, 'L3': 0.5477225575051661}"
]
},
"execution_count": 159,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.OOC"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
Notice : new in version 2.3
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### TI (Tversky index)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The Tversky index, named after Amos Tversky, is an asymmetric similarity measure on sets that compares a variant to a prototype. The Tversky index can be seen as a generalization of Dice's coefficient and Tanimoto coefficient [[54]](#ref54).\n",
"\n",
"Wikipedia page"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$TI(\\alpha,\\beta)=\\frac{TP}{TP+\\alpha FN+\\beta FP}$$"
]
},
{
"cell_type": "code",
"execution_count": 160,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 0.42857142857142855, 'L2': 0.1111111111111111, 'L3': 0.1875}"
]
},
"execution_count": 160,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.TI(2,3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Parameters "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1. `alpha` : alpha coefficient (type : `float`)\n",
"2. `beta` : beta coefficient (type : `float`)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Output"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`{class1: TI1, class2: TI2, ...}`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
Notice : new in version 2.4
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### AUPR (Area under the PR curve)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A PR curve is plotting precision against recall. The precision recall area under curve (AUPR) is just the area under the PR curve. The higher it is, the better the model is [[55]](#ref55) [[56]](#ref56).\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$AUPR=\\frac{TPR+PPV}{2}$$"
]
},
{
"cell_type": "code",
"execution_count": 161,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 0.8, 'L2': 0.41666666666666663, 'L3': 0.55}"
]
},
"execution_count": 161,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.AUPR"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
Notice : new in version 2.4
\n",
"
Notice : this is an approximate calculation of AUPR
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ICSI (Individual classification success index)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The Individual Classification Success Index (ICSI), is a\n",
"class-specific symmetric measure defined for classification\n",
"assessment purpose. ICSI is hence $ 1 $ minus the sum of type I and type II errors.\n",
"It ranges from $ -1 $ (both errors are maximal, i.e. $ 1 $) to $ 1 $ (both\n",
"errors are minimal, i.e. $ 0 $), but the value $ 0 $ does not have any\n",
"clear meaning. The measure is symmetric, and linearly related\n",
"to the arithmetic mean of TPR and PPV [[58]](#ref58)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ICSI=PPV+TPR-1$$"
]
},
{
"cell_type": "code",
"execution_count": 162,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 0.6000000000000001,\n",
" 'L2': -0.16666666666666674,\n",
" 'L3': 0.10000000000000009}"
]
},
"execution_count": 162,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.ICSI"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
Notice : new in version 2.5
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### CI (Confidence interval)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In statistics, a confidence interval (CI) is a type of interval estimate (of a population parameter) that is computed from the observed data. The confidence level is the frequency (i.e., the proportion) of possible confidence intervals that contain the true value of their corresponding parameter. In other words, if confidence intervals are constructed using a given confidence level in an infinite number of independent experiments, the proportion of those intervals that contain the true value of the parameter will match the confidence level [[31]](#ref31).\n",
"\n",
"Supported statistics : `ACC`,`AUC`,`PRE`,`Overall ACC`,`Kappa`,`TPR`,`TNR`,`PPV`,`NPV`,`PLR`,`NLR`\n",
"\n",
"Supported alpha values (two-sided) : 0.001, 0.002, 0.01, 0.02, 0.05, 0.1, 0.2\n",
"\n",
"Supported alpha values (one-sided) : 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Confidence intervals for `TPR`,`TNR`,`PPV`,`NPV`,`ACC`,`PRE` and `Overall ACC` are calculated using the normal approximation to the binomial distribution [[59]](#ref59), Wilson score [[62]](#ref62) and Agresti-Coull method [[63]](#ref63): \n",
"\n",
"#### Normal approximation\n",
"\n",
"$$SE=\\sqrt{\\frac{\\hat{p}(1-\\hat{p})}{n}}$$\n",
"\n",
"$$CI=\\hat{p}\\pm z\\times SE$$\n",
"\n",
"$$n=\\begin{cases}P & \\hat{p} == TPR/FNR\\\\N & \\hat{p} == TNR/FPR\\\\TOP & \\hat{p} == PPV\\\\TON & \\hat{p} ==NPV \\\\POP& \\hat{p} == ACC/ACC_{Overall}\\end{cases}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Wilson score\n",
"\n",
"$$CI=\\frac{\\hat{p}+\\frac{z^2}{2n}}{1+\\frac{z^2}{n}}\\pm\\frac{z}{1+\\frac{z^2}{n}}\\sqrt{\\frac{\\hat{p}(1-\\hat{p})}{n}+\\frac{z^2}{4n^2}}$$\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Agresti-Coull\n",
"\n",
"$$\\hat{p}=\\frac{x}{n}$$\n",
"\n",
"$$\\tilde{p}=\\frac{x+\\frac{z^2}{2}}{n+z^2}$$\n",
"\n",
"$$CI =\\tilde{p}\\pm\\sqrt{\\frac{\\tilde{p}(1-\\tilde{p})}{n+z^2}}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Confidence interval for `Kappa` are calculated using Fleiss formula [[24]](#ref24) [[38]](#ref38) :\n",
"\n",
"$$SE_{Kappa}=\\sqrt{\\frac{ACC_{Overall}\\times (1-RACC_{Overall})}{(1-RACC_{Overall})^2}}$$\n",
"\n",
"$$CI_{Kappa}=Kappa\\pm z\\times SE_{Kappa}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Confidence intervals for `NLR` and `PLR` are calculated using the log method [[60]](#ref60) :\n",
"\n",
"$$SE_{LR}=\\sqrt{\\frac{1}{a}-\\frac{1}{b}+\\frac{1}{c}-\\frac{1}{d}}$$\n",
"\n",
"$$CI_{LR}=e^{ln(LR)\\pm z\\times SE_{LR}}$$\n",
"\n",
"$$PLR:\\begin{cases}a=TP\\\\b=P\\\\c=FP\\\\d=N\\end{cases}$$\n",
"\n",
"$$NLR:\\begin{cases}a=FN\\\\b=P\\\\c=TN\\\\d=N\\end{cases}$$\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Confidence interval for `AUC` is calculated using Hanley and McNeil formula [[61]](#ref61) :\n",
"\n",
"$$SE_{AUC}=\\sqrt{\\frac{q_0+(N-1)q_1+(P-1)q_2}{N\\times P}}$$\n",
"\n",
"$$q_0=AUC(1-AUC)$$\n",
"\n",
"$$q_1=\\frac{AUC}{2-AUC}-AUC^2$$\n",
"\n",
"$$q_2=\\frac{2AUC^2}{1+AUC}-AUC^2$$\n",
"\n",
"$$CI_{AUC}=AUC\\pm z\\times SE_{AUC}$$"
]
},
{
"cell_type": "code",
"execution_count": 163,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': [0.21908902300206645, (0.17058551491594975, 1.0294144850840503)],\n",
" 'L2': [0.3535533905932738, (-0.19296464556281656, 1.1929646455628165)],\n",
" 'L3': [0.21908902300206645, (0.17058551491594975, 1.0294144850840503)]}"
]
},
"execution_count": 163,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.CI(\"TPR\")"
]
},
{
"cell_type": "code",
"execution_count": 164,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': [0.21908902300206645, (-0.2769850810763853, 1.0769850810763852)],\n",
" 'L2': [0.3535533905932738, (-0.5924799769332159, 1.5924799769332159)],\n",
" 'L3': [0.21908902300206645, (-0.2769850810763853, 1.0769850810763852)]}"
]
},
"execution_count": 164,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.CI(\"FNR\", alpha=0.001, one_sided=True)"
]
},
{
"cell_type": "code",
"execution_count": 165,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': [0.14231876063832774, (0.19325746190524654, 0.6804926643446272)],\n",
" 'L2': [0.10758287072798381, (0.04696414761482223, 0.44803635738467273)],\n",
" 'L3': [0.14231876063832774, (0.19325746190524654, 0.6804926643446272)]}"
]
},
"execution_count": 165,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.CI(\"PRE\", alpha=0.05, binom_method=\"wilson\")"
]
},
{
"cell_type": "code",
"execution_count": 166,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[0.14231876063832777, (0.2805568916340536, 0.8343177950165198)]"
]
},
"execution_count": 166,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.CI(\"Overall ACC\", alpha=0.02, binom_method=\"agresti-coull\")"
]
},
{
"cell_type": "code",
"execution_count": 167,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[0.14231876063832777, (0.30438856248221097, 0.8622781041844558)]"
]
},
"execution_count": 167,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.CI(\"Overall ACC\", alpha=0.05)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Parameters "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1. `param` : input parameter (type : `str`)\n",
"2. `alpha` : type I error (type : `float`, default : `0.05`)\n",
"3. `one_sided` : one-sided mode flag (type : `bool`, default : `False`)\n",
"4. `binom_method` : binomial confidence intervals method (type : `str`, default : `normal-approx`)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Output"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1. Two-sided : `{class1: [SE1, (Lower CI, Upper CI)], ...}`\n",
"2. One-sided : `{class1: [SE1, (Lower one-sided CI, Upper one-sided CI)], ...}`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Sensitivity index"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The sensitivity index or d′ is a statistic used in signal detection theory. It provides the separation between the means of the signal and the noise distributions, compared against the standard deviation of the signal or noise distribution.\n",
"d′ can be estimated from the observed hit rate and false-alarm rate, as follows [[76]](#ref76):"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$d^{\\prime}=Z(TPR) - Z(FPR)$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Function Z(p), p ∈ [0,1], is the inverse of the cumulative distribution function of the Gaussian distribution.\n",
"\n",
"Wikipedia page"
]
},
{
"cell_type": "code",
"execution_count": 176,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 'None', 'L2': 0.8416212335729143, 'L3': 0.4333594729285047}"
]
},
"execution_count": 176,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.sensitivity_index()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Output"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`{class1: SI1, class2: SI2, ...}`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
Notice : new in version 3.1
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### HD (Hamming distance)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number of substitutions required to change one string into the other, or the minimum number of errors that could have transformed one string into the other. In a more general context, the Hamming distance is one of several string metrics for measuring the edit distance between two sequences. It is named after the American mathematician Richard Hamming [[80]](#ref80) [[81]](#ref81).\n",
"\n",
"A major application is in coding theory, more specifically to block codes, in which the equal-length strings are vectors over a finite field."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Wikipedia page"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$HD = FN + FP$$"
]
},
{
"cell_type": "code",
"execution_count": 177,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'L1': 2, 'L2': 3, 'L3': 5}"
]
},
"execution_count": 177,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.HD"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
Notice : new in version 3.6
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Overall statistics"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Kappa"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Kappa is a statistic that measures inter-rater agreement for qualitative (categorical) items. It is generally thought to be a more robust measure than simple percent agreement calculation, as kappa takes into account the possibility of the agreement occurring by chance [[24]](#ref24).\n",
"\n",
"Benchmark1\n",
"Benchmark2\n",
"Benchmark3\n",
"Benchmark4\n",
"\n",
"Wikipedia page"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$Kappa=\\frac{ACC_{Overall}-RACC_{Overall}}{1-RACC_{Overall}}$$"
]
},
{
"cell_type": "code",
"execution_count": 178,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.35483870967741943"
]
},
"execution_count": 178,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.Kappa"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
Notice : new in version 0.3
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Kappa unbiased"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The unbiased kappa value is defined in terms of total accuracy and a slightly different computation of expected likelihood that averages the reference and response probabilities [[25]](#ref25).\n",
"\n",
"Equals to [Scott's Pi](#Scott's-Pi)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$Kappa_{Unbiased}=\\frac{ACC_{Overall}-RACCU_{Overall}}{1-RACCU_{Overall}}$$"
]
},
{
"cell_type": "code",
"execution_count": 179,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.34426229508196726"
]
},
"execution_count": 179,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.KappaUnbiased"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Phi-squared"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In statistics, the phi coefficient (or mean square contingency coefficient) is a measure of association for two binary variables. Introduced by Karl Pearson, this measure is similar to the Pearson correlation coefficient in its interpretation. In fact, a Pearson correlation coefficient estimated for two binary variables will return the phi coefficient [[10]](#ref10).\n",
"\n",
"Wikipedia page"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$\\phi^2=\\frac{\\chi^2}{POP}$$"
]
},
{
"cell_type": "code",
"execution_count": 187,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.55"
]
},
"execution_count": 187,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.Phi_Squared"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
Notice : new in version 0.7
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Cramer's V"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In statistics, Cramér's V (sometimes referred to as Cramér's phi) is a measure of association between two nominal variables, giving a value between $ 0 $ and $ +1 $ (inclusive). It is based on Pearson's chi-squared statistic and was published by Harald Cramér in 1946 [[26]](#ref26).\n",
"\n",
"Benchmark\n",
"\n",
"Wikipedia page"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$V=\\sqrt{\\frac{\\phi^2}{|C|-1}}$$"
]
},
{
"cell_type": "code",
"execution_count": 188,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.5244044240850758"
]
},
"execution_count": 188,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.V"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
Notice : new in version 0.7
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Standard error"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The standard error (SE) of a statistic (usually an estimate of a parameter) is the standard deviation of its sampling distribution or an estimate of that standard deviation [[31]](#ref31).\n",
"\n",
"Wikipedia page"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$SE_{ACC}=\\sqrt{\\frac{ACC\\times (1-ACC)}{POP}}$$"
]
},
{
"cell_type": "code",
"execution_count": 189,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.14231876063832777"
]
},
"execution_count": 189,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.SE"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
Notice : new in version 0.7
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 95% CI"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In statistics, a confidence interval (CI) is a type of interval estimate (of a population parameter) that is computed from the observed data. The confidence level is the frequency (i.e., the proportion) of possible confidence intervals that contain the true value of their corresponding parameter. In other words, if confidence intervals are constructed using a given confidence level in an infinite number of independent experiments, the proportion of those intervals that contain the true value of the parameter will match the confidence level [[31]](#ref31).\n",
"\n",
"Wikipedia page"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$CI=ACC \\pm 1.96\\times SE_{ACC}$$"
]
},
{
"cell_type": "code",
"execution_count": 190,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(0.30438856248221097, 0.8622781041844558)"
]
},
"execution_count": 190,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.CI95"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Bennett's S"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Bennett, Alpert & Goldstein’s S is a statistical measure of inter-rater agreement. It was created by Bennett et al. in 1954.\n",
"Bennett et al. suggested adjusting inter-rater reliability to accommodate the percentage of rater agreement that might be expected by chance was a better measure than a simple agreement between raters [[8]](#ref8).\n",
"\n",
"Wikipedia Page"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$p_c=\\frac{1}{|C|}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$S=\\frac{ACC_{Overall}-p_c}{1-p_c}$$"
]
},
{
"cell_type": "code",
"execution_count": 191,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.37500000000000006"
]
},
"execution_count": 191,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.S"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
Notice : new in version 0.5
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Scott's Pi"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Scott's pi (named after William A. Scott) is a statistic for measuring inter-rater reliability for nominal data in communication studies. Textual entities are annotated with categories by different annotators, and various measures are used to assess the extent of agreement between the annotators, one of which is Scott's pi. Since automatically annotating text is a popular problem in natural language processing, and the goal is to get the computer program that is being developed to agree with the humans in the annotations it creates, assessing the extent to which humans agree with each other is important for establishing a reasonable upper limit on computer performance [[7]](#ref7).\n",
"\n",
"Wikipedia page\n",
"\n",
"\n",
"Equals to [Kappa Unbiased](#Kappa-unbiased)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$p_c=\\sum_{i=1}^{|C|}(\\frac{TOP_i + P_i}{2\\times POP})^2$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$\\pi=\\frac{ACC_{Overall}-p_c}{1-p_c}$$"
]
},
{
"cell_type": "code",
"execution_count": 192,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.34426229508196726"
]
},
"execution_count": 192,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.PI"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
Notice : new in version 0.5
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Gwet's AC1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"AC1 was originally introduced by Gwet in 2001 (Gwet, 2001). The interpretation of AC1 is similar to generalized kappa (Fleiss, 1971), which is used to assess inter-rater reliability when there are multiple raters. Gwet (2002) demonstrated that AC1 can overcome the limitations that kappa is sensitive to trait prevalence and rater's classification probabilities (i.e., marginal probabilities), whereas AC1 provides more robust measure of inter-rater reliability [[6]](#ref6)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$\\pi_i=\\frac{TOP_i + P_i}{2\\times POP}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$p_c=\\frac{1}{|C|-1}\\sum_{i=1}^{|C|}\\Big(\\pi_i\\times (1-\\pi_i)\\Big)$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$AC_1=\\frac{ACC_{Overall}-p_c}{1-p_c}$$"
]
},
{
"cell_type": "code",
"execution_count": 193,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.3893129770992367"
]
},
"execution_count": 193,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.AC1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
Notice : new in version 0.5
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Reference entropy"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The entropy of the decision problem itself as defined by the counts for the reference. The entropy of a distribution is the average negative log probability of outcomes [[30]](#ref30)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$Likelihood_{Reference}=\\frac{P_i}{POP}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$Entropy_{Reference}=-\\sum_{i=1}^{|C|}Likelihood_{Reference}(i)\\times\\log_{2}{Likelihood_{Reference}(i)}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$0\\times\\log_{2}{0}\\equiv0$$"
]
},
{
"cell_type": "code",
"execution_count": 194,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1.4833557549816874"
]
},
"execution_count": 194,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.ReferenceEntropy"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
Notice : new in version 0.8.1
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Response entropy"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The entropy of the response distribution. The entropy of a distribution is the average negative log probability of outcomes [[30]](#ref30)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$Likelihood_{Response}=\\frac{TOP_i}{POP}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$Entropy_{Response}=-\\sum_{i=1}^{|C|}Likelihood_{Response}(i)\\times\\log_{2}{Likelihood_{Response}(i)}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$0\\times\\log_{2}{0}\\equiv0$$"
]
},
{
"cell_type": "code",
"execution_count": 195,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1.5"
]
},
"execution_count": 195,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.ResponseEntropy"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
Notice : new in version 0.8.1
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Cross entropy"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The cross-entropy of the response distribution against the reference distribution. The cross-entropy is defined by the negative log probabilities of the response distribution weighted by the reference distribution [[30]](#ref30).\n",
"\n",
"Wikipedia page"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$Likelihood_{Reference}=\\frac{P_i}{POP}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$Likelihood_{Response}=\\frac{TOP_i}{POP}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$Entropy_{Cross}=-\\sum_{i=1}^{|C|}Likelihood_{Reference}(i)\\times\\log_{2}{Likelihood_{Response}(i)}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$0\\times\\log_{2}{0}\\equiv0$$"
]
},
{
"cell_type": "code",
"execution_count": 196,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1.5833333333333335"
]
},
"execution_count": 196,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.CrossEntropy"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### P-Value"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In statistical hypothesis testing, the p-value or probability value is, for a given statistical model, the probability that, when the null hypothesis is true, the statistical summary (such as the absolute value of the sample mean difference between two compared groups) would be greater than or equal to the actual observed results [[31]](#ref31) . \n",
"Here a one-sided binomial test to see if the accuracy is better than the no information rate [[57]](#ref57).\n",
"\n",
"\n",
"\n",
"\n",
"Wikipedia Page"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$x=\\sum_{i=1}^{|C|}TP_{i}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$p=NIR$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$n=POP$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$P-Value_{(ACC > NIR)}=1-\\sum_{i=1}^{x}\\left(\\begin{array}{c}n\\\\ i\\end{array}\\right)p^{i}(1-p)^{n-i}$$"
]
},
{
"cell_type": "code",
"execution_count": 235,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.18926430237560654"
]
},
"execution_count": 235,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.PValue"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ARI (Adjusted Rand index)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The Rand index or Rand measure (named after William M. Rand) in statistics, and in particular in data clustering, is a measure of the similarity between two data clusterings. A form of the Rand index may be defined that is adjusted for the chance grouping of elements, this is the adjusted Rand index. From a mathematical standpoint, Rand index is related to the accuracy, but is applicable even when class labels are not used [[68]](#ref68).\n",
"\n",
"The Adjusted Rand Index (ARI) is frequently used in cluster validation since it is a measure of agreement between two partitions: one given by the clustering process and the other defined by external criteria, but it can also be used in supervised learning [[69]](#ref69).\n",
"\n",
"Wikipedia Page"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$X=\\frac{\\sum_{i}C_{2}^{P_i}\\times \\sum_{j}C_{2}^{TOP_j}}{C_2^{POP}}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ARI=\\frac{\\sum_{i,j}C_{2}^{Matrix(i,j)}-X}{\\frac{1}{2}[\\sum_{i}C_{2}^{P_i} + \\sum_{j}C_{2}^{TOP_j}]-X}$$"
]
},
{
"cell_type": "code",
"execution_count": 246,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.09206349206349207"
]
},
"execution_count": 246,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.ARI"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
Notice : $ C_{r}^{n} $ is the number of combinations of $ n $ objects taken $ r $
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Bangdiwala's B"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Bangdiwala's B statistic was created by Shrikant Bangdiwala in 1985 and is a measure of inter-rater agreement. While not as commonly used as the kappa statistic the B test has been used by various workers [[72]](#ref72) [[73]](#ref73).\n",
"\n",
"Wikipedia Page"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$B=\\frac{\\sum_{i=1}^{|C|}TP_i^2}{\\sum_{i=1}^{|C|}TOP_i\\times P_i}$$"
]
},
{
"cell_type": "code",
"execution_count": 247,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.37254901960784315"
]
},
"execution_count": 247,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.B"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
Notice : new in version 2.7
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Krippendorff's alpha"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Krippendorff's alpha coefficient, named after academic Klaus Krippendorff, is a statistical measure of the agreement achieved when coding a set of units of analysis in terms of the values of a variable.\n",
"Krippendorff's alpha generalizes several known statistics, often called measures of inter-coder agreement, inter-rater reliability, reliability of coding given sets of units (as distinct from unitizing) but it also distinguishes itself from statistics that are called reliability coefficients but are unsuitable to the particulars of coding data generated for subsequent analysis [[74]](#ref74).\n",
"\n",
"Wikipedia Page"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$\\epsilon = \\frac{1}{2\\times POP}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$P_a=(1-\\epsilon)\\times ACC_{Overall}+\\epsilon$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$P_e=RACCU_{Overall}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$\\alpha=\\frac{P_a-P_e}{1-P_e}$$"
]
},
{
"cell_type": "code",
"execution_count": 248,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.3715846994535519"
]
},
"execution_count": 248,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm.Alpha"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
Notice : pos_class always defaults to the greater class name (i.e. max(classes)), unless, the actual_vector contains string. In that case, pos_class does not have any default value, and it must be explicitly specified or else an error will result.
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Log loss"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In information theory, the cross-entropy between two probability distributions \n",
"$p$ and $q$ over the same underlying set of events measures the average number of bits needed to identify an event drawn from the set if a coding scheme used for the set is optimized for an estimated probability distribution $q$, rather than the true distribution $p$.\n",
"This is also known as the log loss (logarithmic loss or logistic loss); the terms \"log loss\" and \"cross-entropy loss\" are used interchangeably. [[30]](#ref30).\n",
"\n",
"Wikipedia Page"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$L_{\\log}(y, p) = -(y \\log (p) + (1 - y) \\log (1 - p))$$"
]
},
{
"cell_type": "code",
"execution_count": 255,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.19763488164214868"
]
},
"execution_count": 255,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm_test.log_loss()"
]
},
{
"cell_type": "code",
"execution_count": 256,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1.854645225687032"
]
},
"execution_count": 256,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cm_test.log_loss(pos_class=0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Parameters "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"1. `pos_class` : positive class name (type : `int/str`, default : `None`)\n",
"2. `normalize` : normalization flag (type : `bool`, default : `True`)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Output"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`Log loss`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
Notice : pos_class always defaults to the greater class name (i.e. max(classes)), unless, the actual_vector contains string. In that case, pos_class does not have any default value, and it must be explicitly specified or else an error will result.
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Input errors"
]
},
{
"cell_type": "code",
"execution_count": 301,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Input vectors must be provided as a list or a NumPy array.\n"
]
}
],
"source": [
"try:\n",
" cm2 = ConfusionMatrix(y_actu, 2)\n",
"except pycmVectorError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "code",
"execution_count": 302,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Input vectors must have the same length.\n"
]
}
],
"source": [
"try:\n",
" cm3 = ConfusionMatrix(y_actu, [1, 2, 3])\n",
"except pycmVectorError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "code",
"execution_count": 303,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Input vectors must not be empty.\n"
]
}
],
"source": [
"try:\n",
" cm_4 = ConfusionMatrix([], [])\n",
"except pycmVectorError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "code",
"execution_count": 304,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Input vectors must have the same length.\n"
]
}
],
"source": [
"try:\n",
" cm_5 = ConfusionMatrix([1, 1, 1, ], [1, 1, 1, 1])\n",
"except pycmVectorError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "code",
"execution_count": 305,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Invalid input confusion matrix format.\n"
]
}
],
"source": [
"try:\n",
" cm3 = ConfusionMatrix(matrix={})\n",
"except pycmMatrixError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "code",
"execution_count": 306,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"All input matrix classes must be of the same type.\n"
]
}
],
"source": [
"try:\n",
" cm_4 = ConfusionMatrix(matrix={1: {1: 2, \"1\": 2}, \"1\": {1: 2, \"1\": 3}})\n",
"except pycmMatrixError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "code",
"execution_count": 307,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The number of classes must be at least 2.\n"
]
}
],
"source": [
"try:\n",
" cm_5 = ConfusionMatrix(matrix={1: {1: 2}})\n",
"except pycmMatrixError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "code",
"execution_count": 308,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Input must be provided as a dictionary.\n"
]
}
],
"source": [
"try:\n",
" cp = Compare([cm2, cm3])\n",
"except pycmCompareError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "code",
"execution_count": 309,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"All ConfusionMatrix objects must have the same domain (same sample size and number of classes).\n"
]
}
],
"source": [
"try:\n",
" cp = Compare({\"cm1\": cm, \"cm2\": cm2})\n",
"except pycmCompareError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "code",
"execution_count": 310,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Input must be a dictionary containing pycm.ConfusionMatrix objects.\n"
]
}
],
"source": [
"try:\n",
" cp = Compare({\"cm1\": [], \"cm2\": cm2})\n",
"except pycmCompareError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "code",
"execution_count": 311,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"At least 2 confusion matrices are required for comparison.\n"
]
}
],
"source": [
"try:\n",
" cp = Compare({\"cm2\": cm2})\n",
"except pycmCompareError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "code",
"execution_count": 312,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"`class_weight` must be a dictionary and specified for all classes.\n"
]
}
],
"source": [
"try:\n",
" cp = Compare(\n",
" {\"cm1\": cm2, \"cm2\": cm3},\n",
" by_class=True,\n",
" class_weight={1: 2, 2: 0})\n",
"except pycmCompareError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "code",
"execution_count": 313,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"`class_benchmark_weight` must be a dictionary and specified for all class benchmarks.\n"
]
}
],
"source": [
"try:\n",
" cp = Compare(\n",
" {\"cm1\":cm2, \"cm2\":cm3},\n",
" class_benchmark_weight={1: 2, 2: 0})\n",
"except pycmCompareError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "code",
"execution_count": 314,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"`overall_benchmark_weight` must be a dictionary and specified for all overall benchmarks.\n"
]
}
],
"source": [
"try:\n",
" cp = Compare(\n",
" {\"cm1\": cm2, \"cm2\": cm3},\n",
" overall_benchmark_weight={1: 2, 2: 0})\n",
"except pycmCompareError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "code",
"execution_count": 315,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Confidence interval calculation for this parameter is not supported in this version of pycm.\n",
" Supported parameters are: TPR, TNR, PPV, NPV, ACC, PLR, NLR, FPR, FNR, AUC, PRE, Kappa, Overall ACC\n"
]
}
],
"source": [
"try:\n",
" cm.CI(\"MCC\")\n",
"except pycmCIError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "code",
"execution_count": 316,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Input must be provided as a string.\n"
]
}
],
"source": [
"try:\n",
" cm.CI(2)\n",
"except pycmCIError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "code",
"execution_count": 317,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Invalid parameter!\n"
]
}
],
"source": [
"try:\n",
" cm.average(\"AXY\")\n",
"except pycmAverageError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "code",
"execution_count": 318,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Invalid parameter!\n"
]
}
],
"source": [
"try:\n",
" cm.weighted_average(\"AXY\")\n",
"except pycmAverageError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "code",
"execution_count": 319,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"`weight` must be a dictionary and specified for all classes."
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n"
]
}
],
"source": [
"try:\n",
" cm.weighted_average(\"AUC\", weight={1: 22})\n",
"except pycmAverageError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "code",
"execution_count": 320,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"This option is only available in vector mode."
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n"
]
}
],
"source": [
"try:\n",
" cm.position()\n",
"except pycmVectorError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "code",
"execution_count": 321,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Input must be an instance of pycm.ConfusionMatrix.\n"
]
}
],
"source": [
"try:\n",
" cm.combine(2)\n",
"except pycmMatrixError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "code",
"execution_count": 322,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The number of classes must be at least 2.\n"
]
}
],
"source": [
"try:\n",
" cm6 = ConfusionMatrix([1, 1, 1, 1], [1, 1, 1, 1], classes=[])\n",
"except pycmMatrixError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "code",
"execution_count": 323,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"`classes` must contain unique labels with no duplicates.\n"
]
}
],
"source": [
"try:\n",
" cm7 = ConfusionMatrix([1, 1, 1, 1], [1, 1, 1, 1], classes=[1, 1, 2])\n",
"except pycmVectorError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "code",
"execution_count": 324,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Input vectors must be provided as a list or a NumPy array.\n"
]
}
],
"source": [
"try:\n",
" crv = ROCCurve([1, 2, 2, 1], {1, 2, 2, 1}, classes=[1, 2])\n",
"except pycmCurveError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "code",
"execution_count": 325,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Input vectors must have the same length.\n"
]
}
],
"source": [
"try:\n",
" crv = ROCCurve([1, 2, 2, 1],[[0.1, 0.9]], classes=[1, 2])\n",
"except pycmCurveError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "code",
"execution_count": 326,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The sum of the probability values must equal 1.\n"
]
}
],
"source": [
"try:\n",
" crv = ROCCurve(\n",
" [1, 2, 2, 1],\n",
" [[0.1, 0.9], [0.1, 0.9], [0.1, 0.9], [0.2, 0.9]],\n",
" classes=[1, 2])\n",
"except pycmCurveError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "code",
"execution_count": 327,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"`classes` must be provided as a list.\n"
]
}
],
"source": [
"try:\n",
" crv = ROCCurve(\n",
" [1, 2, 2, 1],\n",
" [[0.1, 0.9], [0.1, 0.9], [0.1, 0.9], [0.1, 0.9]],\n",
" classes={1, 2})\n",
"except pycmCurveError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "code",
"execution_count": 328,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"`classes` does not match the actual vector.\n"
]
}
],
"source": [
"try:\n",
" crv = ROCCurve(\n",
" [1, 2, 2, 1],\n",
" [[0.1, 0.9], [0.1, 0.9], [0.1, 0.9], [0.1, 0.9]],\n",
" classes=[1, 2, 3])\n",
"except pycmCurveError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "code",
"execution_count": 329,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The number of classes must be at least 2.\n"
]
}
],
"source": [
"try:\n",
" crv = ROCCurve(\n",
" [1, 1, 1, 1],\n",
" [[0.1, 0.9], [0.1, 0.9], [0.1, 0.9], [0.1, 0.9]],\n",
" classes=[1])\n",
"except pycmCurveError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "code",
"execution_count": 330,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Probability vector elements must be numeric.\n"
]
}
],
"source": [
"try:\n",
" crv = ROCCurve(\n",
" [1, 2, 2, 1],\n",
" [[0.1, 0.9], [0.1, 0.9], [0.1, 0.9], [0.2, \"s\"]],\n",
" classes=[1, 2])\n",
"except pycmCurveError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "code",
"execution_count": 331,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"`thresholds` must be provided as a list or a NumPy array.\n"
]
}
],
"source": [
"try:\n",
" crv = ROCCurve(\n",
" [1, 2, 2, 1],\n",
" [[0.1, 0.9],[0.1, 0.9], [0.1, 0.9], [0.2, 0.8]],\n",
" classes=[1, 2],\n",
" thresholds={1, 2})\n",
"except pycmCurveError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "code",
"execution_count": 332,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The number of thresholds must be at least 2.\n"
]
}
],
"source": [
"try:\n",
" crv = ROCCurve(\n",
" [1, 2, 2, 1],\n",
" [[0.1, 0.9], [0.1, 0.9], [0.1, 0.9], [0.2, 0.8]],\n",
" classes=[1, 2],\n",
" thresholds=[0.1])\n",
"except pycmCurveError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "code",
"execution_count": 333,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"`thresholds` must contain only numeric values.\n"
]
}
],
"source": [
"try:\n",
" crv = ROCCurve(\n",
" [1, 2, 2, 1],\n",
" [[0.1, 0.9], [0.1, 0.9], [0.1, 0.9], [0.2, 0.8]],\n",
" classes=[1, 2],\n",
" thresholds=[0.1, \"q\"])\n",
"except pycmCurveError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "code",
"execution_count": 334,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"`classes` must contain unique labels with no duplicates.\n"
]
}
],
"source": [
"try:\n",
" crv = ROCCurve(\n",
" [1, 2, 2, 1],\n",
" [[0.1, 0.9], [0.1, 0.9], [0.1, 0.9], [0.2, 0.8]],\n",
" classes=[1, 1, 2])\n",
"except pycmCurveError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "code",
"execution_count": 335,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"All elements of the probability vector must have the same length and match the number of classes.\n"
]
}
],
"source": [
"try:\n",
" crv = ROCCurve(\n",
" [1, 2, 2, 1],\n",
" [[0.1, 0.9], [0.1, 0.9], [0.1, 0.8, 0.1], [0.2, 0.8]],\n",
" classes=[1, 2])\n",
"except pycmCurveError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "code",
"execution_count": 336,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The integral method must be either 'trapezoidal' or 'midpoint'.\n"
]
}
],
"source": [
"try:\n",
" crv = ROCCurve(\n",
" actual_vector=numpy.array([1, 1, 2, 2]),\n",
" probs=numpy.array([[0.1, 0.9], [0.4, 0.6], [0.35, 0.65], [0.8, 0.2]]),\n",
" classes=[2, 1])\n",
" crv.area(method=\"trpz\")\n",
"except pycmCurveError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "code",
"execution_count": 337,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Failed to extract classes from input. Input vectors should be a list of sets with unified types.\n"
]
}
],
"source": [
"try:\n",
" mlcm = MultiLabelCM([[0, 1], [1, 1]], [[1, 0], [1, 0]])\n",
"except pycmVectorError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "code",
"execution_count": 338,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The specified class name is not among the confusion matrix's classes.\n"
]
}
],
"source": [
"try:\n",
" mlcm = MultiLabelCM([{'dog'}, {'cat', 'dog'}], [{'cat'}, {'cat'}])\n",
" mlcm.get_cm_by_class(1)\n",
"except pycmMultiLabelError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "code",
"execution_count": 339,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Index is out of range for the given vector.\n"
]
}
],
"source": [
"try:\n",
" mlcm.get_cm_by_sample(2)\n",
"except pycmMultiLabelError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "code",
"execution_count": 340,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Input vectors must be provided as a list or a NumPy array."
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n"
]
}
],
"source": [
"try:\n",
" mlcm = MultiLabelCM(2, [{1, 0}, {1, 0}])\n",
"except pycmVectorError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "code",
"execution_count": 341,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Input vectors must have the same length.\n"
]
}
],
"source": [
"try:\n",
" mlcm = MultiLabelCM([{1, 0}, {1, 0}, {1,1}], [{1, 0}, {1, 0}])\n",
"except pycmVectorError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "code",
"execution_count": 342,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Input vectors must not be empty.\n"
]
}
],
"source": [
"try:\n",
" mlcm = MultiLabelCM([], [])\n",
"except pycmVectorError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "code",
"execution_count": 343,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"`classes` must contain unique labels with no duplicates.\n"
]
}
],
"source": [
"try:\n",
" mlcm = MultiLabelCM([{1, 0}, {1, 0}], [{1, 0}, {1, 0}], classes=[1,0,1])\n",
"except pycmVectorError as e:\n",
" print(str(e))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
Notice : updated in version 4.0
\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Examples"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"### Example-1 (Comparison of three different classifiers)\t\n",
"\n",
"- [Jupyter Notebook](https://nbviewer.jupyter.org/github/sepandhaghighi/pycm/blob/master/Document/Example1.ipynb)\n",
"- [HTML](http://www.pycm.io/doc/Example1.html)\n",
"\n",
"### Example-2 (How to plot via matplotlib)\n",
"\n",
"- [Jupyter Notebook](https://nbviewer.jupyter.org/github/sepandhaghighi/pycm/blob/master/Document/Example2.ipynb)\n",
"- [HTML](http://www.pycm.io/doc/Example2.html)\n",
"\n",
"### Example-3 (Activation threshold)\n",
"\n",
"- [Jupyter Notebook](https://nbviewer.jupyter.org/github/sepandhaghighi/pycm/blob/master/Document/Example3.ipynb)\n",
"- [HTML](http://www.pycm.io/doc/Example3.html)\n",
"\n",
"### Example-4 (File)\n",
"\n",
"- [Jupyter Notebook](https://nbviewer.jupyter.org/github/sepandhaghighi/pycm/blob/master/Document/Example4.ipynb)\n",
"- [HTML](http://www.pycm.io/doc/Example4.html)\n",
"\n",
"### Example-5 (Sample weights)\n",
"\n",
"- [Jupyter Notebook](https://nbviewer.jupyter.org/github/sepandhaghighi/pycm/blob/master/Document/Example5.ipynb)\n",
"- [HTML](http://www.pycm.io/doc/Example5.html)\n",
"\n",
"### Example-6 (Unbalanced data)\n",
"\n",
"- [Jupyter Notebook](https://nbviewer.jupyter.org/github/sepandhaghighi/pycm/blob/master/Document/Example6.ipynb)\n",
"- [HTML](http://www.pycm.io/doc/Example6.html)\n",
"\n",
"### Example-7 (How to plot via seaborn+pandas)\n",
"\n",
"- [Jupyter Notebook](https://nbviewer.jupyter.org/github/sepandhaghighi/pycm/blob/master/Document/Example7.ipynb)\n",
"- [HTML](http://www.pycm.io/doc/Example7.html)\n",
"\n",
"### Example-8 (Confidence interval)\n",
"\n",
"- [Jupyter Notebook](https://nbviewer.jupyter.org/github/sepandhaghighi/pycm/blob/master/Document/Example8.ipynb)\n",
"- [HTML](http://www.pycm.io/doc/Example8.html)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Cite"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you use PyCM in your research, we would appreciate citations to the following paper :"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
Haghighi, S., Jasemi, M., Hessabi, S. and Zolanvari, A. (2018). PyCM: Multiclass confusion matrix library in Python. Journal of Open Source Software, 3(25), p.729.
1- J. R. Landis and G. G. Koch, \"The measurement of observer agreement for categorical data,\" biometrics, pp. 159-174, 1977.
\n",
"\n",
"
2- D. M. Powers, \"Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation,\" arXiv preprint arXiv:2010.16061, 2020.
\n",
"\n",
"\n",
"
3- C. Sammut and G. I. Webb, Encyclopedia of machine learning. Springer Science & Business Media, 2011.
\n",
"\n",
"
4- J. L. Fleiss, \"Measuring nominal scale agreement among many raters,\" Psychological bulletin, vol. 76, no. 5, p. 378, 1971.
\n",
"\n",
"
5- D. G. Altman, Practical statistics for medical research. CRC press, 1990.
\n",
"\n",
"
6- K. L. Gwet, \"Computing inter-rater reliability and its variance in the presence of high agreement,\" British Journal of Mathematical and Statistical Psychology, vol. 61, no. 1, pp. 29-48, 2008.
\n",
"\n",
"
7- W. A. Scott, \"Reliability of content analysis: The case of nominal scale coding,\" Public opinion quarterly, pp. 321-325, 1955.
\n",
"\n",
"
8- E. M. Bennett, R. Alpert, and A. Goldstein, \"Communications through limited-response questioning,\" Public Opinion Quarterly, vol. 18, no. 3, pp. 303-308, 1954.
\n",
"\n",
"
9- D. V. Cicchetti, \"Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology,\" Psychological assessment, vol. 6, no. 4, p. 284, 1994.
\n",
"\n",
"
10- R. B. Davies, \"Algorithm AS 155: The distribution of a linear combination of χ2 random variables,\" Applied Statistics, pp. 323-333, 1980.
\n",
"\n",
"
11- S. Kullback and R. A. Leibler, \"On information and sufficiency,\" The annals of mathematical statistics, vol. 22, no. 1, pp. 79-86, 1951.
\n",
"\n",
"
12- L. A. Goodman and W. H. Kruskal, \"Measures of association for cross classifications, IV: Simplification of asymptotic variances,\" Journal of the American Statistical Association, vol. 67, no. 338, pp. 415-421, 1972.
\n",
"\n",
"
13- L. A. Goodman and W. H. Kruskal, \"Measures of association for cross classifications III: Approximate sampling theory,\" Journal of the American Statistical Association, vol. 58, no. 302, pp. 310-364, 1963.
\n",
"\n",
"
14- T. Byrt, J. Bishop, and J. B. Carlin, \"Bias, prevalence and kappa,\" Journal of clinical epidemiology, vol. 46, no. 5, pp. 423-429, 1993.
\n",
"\n",
"
15- M. Shepperd, D. Bowes, and T. Hall, \"Researcher bias: The use of machine learning in software defect prediction,\" IEEE Transactions on Software Engineering, vol. 40, no. 6, pp. 603-616, 2014.
\n",
"\n",
"
16- X. Deng, Q. Liu, Y. Deng, and S. Mahadevan, \"An improved method to construct basic probability assignment based on the confusion matrix for classification problem,\" Information Sciences, vol. 340, pp. 250-261, 2016.
\n",
"\n",
"
17- J.-M. Wei, X.-J. Yuan, Q.-H. Hu, and S.-Q. Wang, \"A novel measure for evaluating classifiers,\" Expert Systems with Applications, vol. 37, no. 5, pp. 3799-3809, 2010.
\n",
"\n",
"
18- I. Kononenko and I. Bratko, \"Information-based evaluation criterion for classifier's performance,\" Machine learning, vol. 6, no. 1, pp. 67-80, 1991.
\n",
"\n",
"
19- R. Delgado and J. D. Núnez-González, \"Enhancing confusion entropy as measure for evaluating classifiers,\" in The 13th International Conference on Soft Computing Models in Industrial and Environmental Applications, 2018: Springer, pp. 79-89.
\n",
"\n",
"
20- J. Gorodkin, \"Comparing two K-category assignments by a K-category correlation coefficient,\" Computational biology and chemistry, vol.28, no. 5-6, pp. 367-374, 2004.
\n",
"\n",
"
21- C. O. Freitas, J. M. De Carvalho, J. Oliveira, S. B. Aires, and R. Sabourin, \"Confusion matrix disagreement for multiple classifiers,\" in Iberoamerican Congress on Pattern Recognition, 2007: Springer, pp. 387-396.
\n",
"\n",
"
22- P. Branco, L. Torgo, and R. P. Ribeiro, \"Relevance-based evaluation metrics for multi-class imbalanced domains,\" in Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2017: Springer, pp. 698-710.
\n",
"\n",
"
23- D. Ballabio, F. Grisoni, and R. Todeschini, \"Multivariate comparison of classification performance measures,\" Chemometrics and Intelligent Laboratory Systems, vol. 174, pp. 33-44, 2018.
\n",
"\n",
"
24- J. Cohen, \"A coefficient of agreement for nominal scales,\" Educational and psychological measurement, vol. 20, no. 1, pp. 37-46, 1960.
\n",
"\n",
"
25- S. Siegel, \"Nonparametric statistics for the behavioral sciences,\" 1956.
\n",
"\n",
"
26- H. Cramér, Mathematical methods of statistics. Princeton university press, 1999.
\n",
"\n",
"
27- B. W. Matthews, \"Comparison of the predicted and observed secondary structure of T4 phage lysozyme,\" Biochimica et Biophysica Acta (BBA)-Protein Structure, vol. 405, no. 2, pp. 442-451, 1975.
\n",
"\n",
"
28- J. A. Swets, \"The relative operating characteristic in psychology: a technique for isolating effects of response bias finds wide use in the study of perception and cognition,\" Science, vol. 182, no. 4116, pp. 990-1000, 1973.
\n",
"\n",
"
29- P. Jaccard, \"Étude comparative de la distribution florale dans une portion des Alpes et des Jura,\" Bull Soc Vaudoise Sci Nat, vol. 37, pp. 547-579, 1901.
\n",
"\n",
"
30- T. M. Cover and J. A. Thomas, Elements of Information Theory. John Wiley & Sons, 2012.
\n",
"\n",
"
31- E. S. Keeping, Introduction to statistical inference. Courier Corporation, 1995.
\n",
"\n",
"
32- V. Sindhwani, P. Bhattacharya, and S. Rakshit, \"Information theoretic feature crediting in multiclass support vector machines,\" in Proceedings of the 2001 SIAM International Conference on Data Mining, 2001: SIAM, pp. 1-18.
\n",
"\n",
"
33- M. Bekkar, H. K. Djemaa, and T. A. Alitouche, \"Evaluation measures for models assessment over imbalanced data sets,\" J Inf Eng Appl, vol. 3, no. 10, 2013.
\n",
"\n",
"
34- W. J. Youden, \"Index for rating diagnostic tests,\" Cancer, vol. 3, no. 1, pp. 32-35, 1950.
\n",
"\n",
"
35- S. Brin, R. Motwani, J. D. Ullman, and S. Tsur, \"Dynamic itemset counting and implication rules for market basket data,\" in Proceedings of the 1997 ACM SIGMOD international conference on Management of data, 1997, pp. 255-264.
\n",
"\n",
"
36- S. Raschka, \"MLxtend: Providing machine learning and data science utilities and extensions to Python's scientific computing stack,\" Journal of open source software, vol. 3, no. 24, p. 638, 2018.
\n",
"\n",
"
37- J. R. Bray and J. T. Curtis, \"An ordination of the upland forest communities of southern Wisconsin,\" Ecological monographs, vol. 27, no. 4, pp. 325-349, 1957.
\t\n",
"\n",
"
38- J. L. Fleiss, J. Cohen, and B. S. Everitt, \"Large sample standard errors of kappa and weighted kappa,\" Psychological bulletin, vol. 72, no. 5, p. 323, 1969.
\n",
"\n",
"
39- M. Felkin, \"Comparing classification results between n-ary and binary problems,\" in Quality Measures in Data Mining: Springer, 2007, pp. 277-301.
\n",
"\n",
"
40- R. Ranawana and V. Palade, \"Optimized precision-a new measure for classifier performance evaluation,\" in 2006 IEEE International Conference on Evolutionary Computation, 2006: IEEE, pp. 2254-2261.
\n",
"\n",
"
41- V. García, R. A. Mollineda, and J. S. Sánchez, \"Index of balanced accuracy: A performance measure for skewed class distributions,\" in Iberian conference on pattern recognition and image analysis, 2009: Springer, pp. 441-448.
\n",
"\n",
"
42- P. Branco, L. Torgo, and R. P. Ribeiro, \"A survey of predictive modeling on imbalanced domains,\" ACM Computing Surveys (CSUR), vol. 49, no. 2, pp. 1-50, 2016.
\n",
"\n",
"
43- K. Pearson, \"Notes on Regression and Inheritance in the Case of Two Parents,\" in Proceedings of the Royal Society of London, p. 240-242, 1895.
\n",
"\n",
"
44- W. J. Conover, Practical nonparametric statistics. John Wiley & Sons, 1998.
\n",
"\n",
"
45- G. U. Yule, \"On the methods of measuring association between two attributes,\" Journal of the Royal Statistical Society, vol. 75, no. 6, pp. 579-652, 1912.
\n",
"\n",
"
46- R. Batuwita and V. Palade, \"A new performance measure for class imbalance learning. application to bioinformatics problems,\" in 2009 International Conference on Machine Learning and Applications, 2009: IEEE, pp. 545-550.
\n",
"\n",
"
47- D. K. Lee, \"Alternatives to P value: confidence interval and effect size,\" Korean journal of anesthesiology, vol. 69, no. 6, p. 555, 2016.
\n",
"\n",
"
48- M. A. Raslich, R. J. Markert, and S. A. Stutes, \"Selecting and interpreting diagnostic tests,\" Biochemia Medica, vol. 17, no. 2, pp. 151-161, 2007.
\n",
"\n",
"
49- D. E. Hinkle, W. Wiersma, and S. G. Jurs, Applied statistics for the behavioral sciences. Houghton Mifflin College Division, 2003.
\n",
"\n",
"
50- A. Maratea, A. Petrosino, and M. Manzo, \"Adjusted F-measure and kernel scaling for imbalanced data learning,\" Information Sciences, vol. 257, pp. 331-341, 2014.
\n",
"\n",
"
51- L. Mosley, \"A balanced approach to the multi-class imbalance problem,\" 2013.
\n",
"\n",
"
52- M. Vijaymeena and K. Kavitha, \"A survey on similarity measures in text mining,\" Machine Learning and Applications: An International Journal, vol. 3, no. 2, pp. 19-28, 2016.
\n",
"\n",
"
53- Y. Otsuka, \"The faunal character of the Japanese Pleistocene marine Mollusca, as evidence of climate having become colder during the Pleistocene in Japan,\" Biogeograph Soc Japan, vol. 6, no. 16, pp. 165-170, 1936.
\n",
"\n",
"
54- A. Tversky, \"Features of similarity,\" Psychological review, vol. 84, no. 4, p. 327, 1977.
\n",
"\n",
"
55- K. Boyd, K. H. Eng, and C. D. Page, \"Area under the precision-recall curve: point estimates and confidence intervals,\" in Joint European conference on machine learning and knowledge discovery in databases, 2013: Springer, pp. 451-466.
\n",
"\n",
"
56- J. Davis and M. Goadrich, \"The relationship between Precision-Recall and ROC curves,\" in Proceedings of the 23rd international conference on Machine learning, 2006, pp. 233-240.
\n",
"\n",
"
57- M. Kuhn, \"Building predictive models in R using the caret package,\" J Stat Softw, vol. 28, no. 5, pp. 1-26, 2008.
\n",
"\n",
"
58- V. Labatut and H. Cherifi, \"Accuracy measures for the comparison of classifiers,\" arXiv preprint arXiv:1207.3790, 2012.
\n",
"\n",
"
59- S. Wallis, \"Binomial confidence intervals and contingency tests: mathematical fundamentals and the evaluation of alternative methods,\" Journal of Quantitative Linguistics, vol. 20, no. 3, pp. 178-208, 2013.
\n",
"\n",
"
60- D. Altman, D. Machin, T. Bryant, and M. Gardner, Statistics with confidence: confidence intervals and statistical guidelines. John Wiley & Sons, 2013.
\n",
"\n",
"
61- J. A. Hanley and B. J. McNeil, \"The meaning and use of the area under a receiver operating characteristic (ROC) curve,\" Radiology, vol. 143, no. 1, pp. 29-36, 1982.
\n",
"\n",
"
62- E. B. Wilson, \"Probable inference, the law of succession, and statistical inference,\" Journal of the American Statistical Association, vol. 22, no. 158, pp. 209-212, 1927.
\n",
"\n",
"
63- A. Agresti and B. A. Coull, \"Approximate is better than “exact” for interval estimation of binomial proportions,\" The American Statistician, vol. 52, no. 2, pp. 119-126, 1998.
\n",
"\n",
"
64- C. S. Peirce, \"The numerical measure of the success of predictions,\" Science, no. 93, pp. 453-454, 1884.
\n",
"\n",
"
65- E. W. Steyerberg, B. Van Calster, and M. J. Pencina, \"Performance measures for prediction models and markers: evaluation of predictions and classifications,\" Revista Española de Cardiología (English Edition), vol. 64, no. 9, pp. 788-794, 2011.
\n",
"\n",
"
66- A. J. Vickers and E. B. Elkin, \"Decision curve analysis: a novel method for evaluating prediction models,\" Medical Decision Making, vol. 26, no. 6, pp. 565-574, 2006.
\n",
"\n",
"
67- G. W. Bohrnstedt and D. Knoke,\"Statistics for social data analysis,\" 1982.
\n",
"\n",
"
68- W. M. Rand, \"Objective criteria for the evaluation of clustering methods,\" Journal of the American Statistical association, vol. 66, no. 336, pp. 846-850, 1971.
\n",
"\n",
"
69- J. M. Santos and M. Embrechts, \"On the use of the adjusted rand index as a metric for evaluating supervised classification,\" in International conference on artificial neural networks, 2009: Springer, pp. 175-184.
\n",
"\n",
"
70- J. Cohen, \"Weighted kappa: nominal scale agreement provision for scaled disagreement or partial credit,\" Psychological bulletin, vol. 70, no. 4, p. 213, 1968.
\n",
"\n",
"
71- R. Bakeman and J. M. Gottman, Observing interaction: An introduction to sequential analysis. Cambridge university press, 1997.
\n",
"\n",
"
72- S. Bangdiwala, \"A graphical test for observer agreement,\" in 45th International Statistical Institute Meeting, 1985, vol. 1985, p. 307.
\n",
"\n",
"
73- K. Bangdiwala and H. Bryan, \"Using SAS software graphical procedures for the observer agreement chart,\" in Proceedings of the SAS Users Group International Conference, 1987, vol. 12, pp. 1083-1088.
\n",
"\n",
"
74- A. F. Hayes and K. Krippendorff, \"Answering the call for a standard reliability measure for coding data,\" Communication methods and measures, vol. 1, no. 1, pp. 77-89, 2007.
\n",
"\n",
"
75- M. Aickin, \"Maximum likelihood estimation of agreement in the constant predictive probability model, and its relation to Cohen's kappa,\" Biometrics, pp. 293-302, 1990.
\n",
"\n",
"
76- N. A. Macmillan and C. D. Creelman, Detectiontheory: A user's guide. Psychology press, 2004.
\n",
"\n",
"
77- D. J. Hand, P. Christen, and N. Kirielle, \"F*: an interpretable transformation of the F-measure,\" Machine Learning, vol. 110, no. 3, pp. 451-456, 2021.
\n",
"\n",
"
78- G. W. Brier, \"Verification of forecasts expressed in terms of probability,\" Monthly weather review, vol. 78, no. 1, pp. 1-3, 1950.
\n",
"\n",
"
79- L. Buitinck et al., \"API design for machine learning software: experiences from the scikit-learn project,\" arXiv preprint arXiv:1309.0238, 2013.
\n",
"\n",
"
80- R. W. Hamming, \"Error detecting and error correcting codes,\" The Bell system technical journal, vol. 29, no. 2, pp. 147-160, 1950.
\n",
"\n",
"
81- S. S. Choi, S. H. Cha, and C. C. Tappert, \"A survey of binary similarity and distance measures,\" Journal of systemics, cybernetics and informatics, vol. 8, no. 1, pp. 43-48, 2010.
\n",
"\n",
"
82- J. Braun-Blanquet, \"Plant sociology. The study of plant communities,\" Plant sociology. The study of plant communities. First ed., 1932.
\n",
"\n",
"
83- C. C. Little, \"Abydos Documentation,\" 2020.
\n",
"\n",
"
84- K. Villela, A. Silva, T. Vale, and E. S. de Almeida, \"A survey on software variability management approaches,\" in Proceedings of the 18th International Software Product Line Conference-Volume 1, 2014, pp. 147-156.
\n",
" \n",
"
85- J. R. Saura, A. Reyes-Menendez, and P. Palos-Sanchez, \"Are black Friday deals worth it? Mining Twitter users’ sentiment and behavior response,\" Journal of Open Innovation: Technology, Market, and Complexity, vol. 5, no. 3, p. 58, 2019.
\n",
"\n",
"
86- P. Schubert and U. Leimstoll, \"Importance and use of information technology in small and medium‐sized companies,\" Electronic Markets, vol. 17, no. 1, pp. 38-55, 2007.
Note 1 : Recommended statistics for this type of classification highlighted in aqua
Note 2 : The recommendation system assumes the input is the result of classification over the entire dataset, not just a subset. If the confusion matrix is based on test data classification, the recommendation may not be valid.
Note 1 : Recommended statistics for this type of classification highlighted in aqua
Note 2 : The recommendation system assumes the input is the result of classification over the entire dataset, not just a subset. If the confusion matrix is based on test data classification, the recommendation may not be valid.
Note 1 : Recommended statistics for this type of classification highlighted in aqua
Note 2 : The recommendation system assumes the input is the result of classification over the entire dataset, not just a subset. If the confusion matrix is based on test data classification, the recommendation may not be valid.
Note 1 : Recommended statistics for this type of classification highlighted in aqua
Note 2 : The recommendation system assumes the input is the result of classification over the entire dataset, not just a subset. If the confusion matrix is based on test data classification, the recommendation may not be valid.
Note 1 : Recommended statistics for this type of classification highlighted in aqua
Note 2 : The recommendation system assumes the input is the result of classification over the entire dataset, not just a subset. If the confusion matrix is based on test data classification, the recommendation may not be valid.
Note 1 : Recommended statistics for this type of classification highlighted in aqua
Note 2 : The recommendation system assumes the input is the result of classification over the entire dataset, not just a subset. If the confusion matrix is based on test data classification, the recommendation may not be valid.
Note 1 : Recommended statistics for this type of classification highlighted in aqua
Note 2 : The recommendation system assumes the input is the result of classification over the entire dataset, not just a subset. If the confusion matrix is based on test data classification, the recommendation may not be valid.
Note 1 : Recommended statistics for this type of classification highlighted in aqua
Note 2 : The recommendation system assumes the input is the result of classification over the entire dataset, not just a subset. If the confusion matrix is based on test data classification, the recommendation may not be valid.
Note 1 : Recommended statistics for this type of classification highlighted in aqua
Note 2 : The recommendation system assumes the input is the result of classification over the entire dataset, not just a subset. If the confusion matrix is based on test data classification, the recommendation may not be valid.
Note 1 : Recommended statistics for this type of classification highlighted in aqua
Note 2 : The recommendation system assumes the input is the result of classification over the entire dataset, not just a subset. If the confusion matrix is based on test data classification, the recommendation may not be valid.