pax_global_header00006660000000000000000000000064147410642520014517gustar00rootroot0000000000000052 comment=7c18cd7c07276fa7df369e031745f47b683decc3 pycm-4.2/000077500000000000000000000000001474106425200123345ustar00rootroot00000000000000pycm-4.2/.coveragerc000066400000000000000000000003221474106425200144520ustar00rootroot00000000000000[run] branch = True omit = */pycm/__main__.py */pycm/__init__.py */pycm/profile.py */pycm/basic_test.py [report] # Regexes for lines to exclude from consideration exclude_lines = pragma: no cover pycm-4.2/.gitattributes000066400000000000000000000002071474106425200152260ustar00rootroot00000000000000*.html linguist-detectable=false *.ipynb linguist-detectable=false Document/* linguist-vendored Otherfiles/test.html linguist-vendored pycm-4.2/.github/000077500000000000000000000000001474106425200136745ustar00rootroot00000000000000pycm-4.2/.github/CODE_OF_CONDUCT.md000066400000000000000000000125271474106425200165020ustar00rootroot00000000000000# Contributor Covenant Code of Conduct ## Our Pledge We as members, contributors, and leaders pledge to make participation in our community a harassment-free experience for everyone, regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, caste, color, religion, or sexual identity and orientation. We pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community. ## Our Standards Examples of behavior that contributes to a positive environment for our community include: * Demonstrating empathy and kindness toward other people * Being respectful of differing opinions, viewpoints, and experiences * Giving and gracefully accepting constructive feedback * Accepting responsibility and apologizing to those affected by our mistakes, and learning from the experience * Focusing on what is best not just for us as individuals, but for the overall community Examples of unacceptable behavior include: * The use of sexualized language or imagery, and sexual attention or advances of any kind * Trolling, insulting or derogatory comments, and personal or political attacks * Public or private harassment * Publishing others' private information, such as a physical or email address, without their explicit permission * Other conduct which could reasonably be considered inappropriate in a professional setting ## Enforcement Responsibilities Community leaders are responsible for clarifying and enforcing our standards of acceptable behavior and will take appropriate and fair corrective action in response to any behavior that they deem inappropriate, threatening, offensive, or harmful. Community leaders have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, and will communicate reasons for moderation decisions when appropriate. ## Scope This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing our community include using an official e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. ## Enforcement Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to the community leaders responsible for enforcement at info@pycm.io. All complaints will be reviewed and investigated promptly and fairly. All community leaders are obligated to respect the privacy and security of the reporter of any incident. ## Enforcement Guidelines Community leaders will follow these Community Impact Guidelines in determining the consequences for any action they deem in violation of this Code of Conduct: ### 1. Correction **Community Impact**: Use of inappropriate language or other behavior deemed unprofessional or unwelcome in the community. **Consequence**: A private, written warning from community leaders, providing clarity around the nature of the violation and an explanation of why the behavior was inappropriate. A public apology may be requested. ### 2. Warning **Community Impact**: A violation through a single incident or series of actions. **Consequence**: A warning with consequences for continued behavior. No interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, for a specified period of time. This includes avoiding interactions in community spaces as well as external channels like social media. Violating these terms may lead to a temporary or permanent ban. ### 3. Temporary Ban **Community Impact**: A serious violation of community standards, including sustained inappropriate behavior. **Consequence**: A temporary ban from any sort of interaction or public communication with the community for a specified period of time. No public or private interaction with the people involved, including unsolicited interaction with those enforcing the Code of Conduct, is allowed during this period. Violating these terms may lead to a permanent ban. ### 4. Permanent Ban **Community Impact**: Demonstrating a pattern of violation of community standards, including sustained inappropriate behavior, harassment of an individual, or aggression toward or disparagement of classes of individuals. **Consequence**: A permanent ban from any sort of public interaction within the community. ## Attribution This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 2.1, available at [https://www.contributor-covenant.org/version/2/1/code_of_conduct.html][v2.1]. Community Impact Guidelines were inspired by [Mozilla's code of conduct enforcement ladder][Mozilla CoC]. For answers to common questions about this code of conduct, see the FAQ at [https://www.contributor-covenant.org/faq][FAQ]. Translations are available at [https://www.contributor-covenant.org/translations][translations]. [homepage]: https://www.contributor-covenant.org [v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html [Mozilla CoC]: https://github.com/mozilla/diversity [FAQ]: https://www.contributor-covenant.org/faq [translations]: https://www.contributor-covenant.org/translations pycm-4.2/.github/CONTRIBUTING.md000066400000000000000000000103061474106425200161250ustar00rootroot00000000000000# Contribution **Last Update: 2024-10-08** Changes and improvements are more than welcome! ❤️ Feel free to fork and open a pull request. Please consider the following : 1. Fork it! 2. Create your feature branch (under `dev` branch) 3. Add your new features or fix detected bugs - To add a new class statistic visit [here](#class-statistic) - To add a new overall statistic visit [here](#overall-statistic) - To add a new interpretation visit [here](#interpretation) 4. Add standard `docstring` to your functions/methods according to the [standard format](#standard-docstring-format) 5. Add tests for your functions/methods (`doctest`, `Test` folder) 6. Update `README.md` (if needed) 7. Update `Document.ipynb` (if needed) 8. Pass all CI tests 9. Update `CHANGELOG.md` - Describe changes under `[Unreleased]` section 10. Update `AUTHORS.md` - Add your name under `# Other Contributors #` section 11. Submit a pull request into `dev` (please complete the pull request template) ## Class statistic 1. Add new functions to `class_funcs.py` 2. Update `CLASS_PARAMS` dictionary in `params.py` 3. Update `class_statistics` function in `class_funcs.py` - Call statistic function and store result in `result` dictionary 4. Update `PARAMS_DESCRIPTION` dictionary in `params.py` by a short description - If you don't want capitalization, update `CAPITALIZE_FILTER` list in `params.py` (*Optional*) 5. Update `References` section in `Document.ipynb` (`IEEE` format) 6. Add description to `Class Statistics` section in `Document.ipynb` - Cite reference - Update table of contents - Use `LaTeX` for formula 7. Update `PARAMS_LINK` dictionary in `params.py` by document tag (without `#`) 8. Add tests to `overall_test.py` and `function_test.py` in `TEST` folder - If you have any verified test add them to `verified_test.py` 9. Run `autopep8.bat`/`autopep8.sh` (*Optional*, need to install latest version of `autopep8` package) ## Overall statistic 1. Add new functions to `overall_funcs.py` 2. Update `OVERALL_PARAMS` dictionary in `params.py` 3. Update `overall_statistics` function in `class_funcs.py` - Call statistic function and store result in a variable - Add this variable to output 4. Update `References` section in `Document.ipynb` (`IEEE` format) 5. Add description to `Overall Statistics` section in `Document.ipynb` - Cite reference - Update table of contents - Use `LaTeX` for formula 6. Update `PARAMS_LINK` dictionary in `params.py` by document tag (without `#`) 7. Add tests to `overall_test.py` and `function_test.py` in `TEST` folder - If you have any verified test add them to `verified_test.py` 8. Run `autopep8.bat`/`autopep8.sh` (*Optional*, need to install latest version of `autopep8` package) ## Interpretation 1. Add new interpretation table as a function to `interpret.py` 2. Add a score dictionary to `params.py` - Example : ```PLRI_SCORE = {"Good": 4, "Fair": 3, "Poor": 2, "Negligible": 1, "None": "None"}``` 3. Add a color dictionary to `BENCHMARK_COLOR` in `params.py` - Example : ```"PLRI": {"Negligible": "Red","Poor": "Orange","Fair": "Yellow","Good": "Green","None": "White"}``` 4. If interpretation table is for a class statistic: - Step 2-7 [class statistic](#class-statistic) - Update `CLASS_BENCHMARK_SCORE_DICT` in `params.py` 5. If interpretation table is for a overall statistic: - Step 2-6 [overall statistic](#overall-statistic) - Update `OVERALL_BENCHMARK_SCORE_DICT` in `params.py` 6. Add tests to `compare_test.py`, `overall_test.py` and `function_test.py` in `TEST` folder - If you have any verified test add them to `verified_test.py` 7. Run `autopep8.bat`/`autopep8.sh` (*Optional*, need to install latest version of `autopep8` package) ## Standard docstring format Here, the `docstring` format mainly follows the PEP suggested structure. Note the following items - Start the `docstring` description with uppercase letter and end it with a dot - All other descriptions should be written in lowercase (unless exceptions) - Declare the abbreviations before using them Example: def DF_calc(classes): """ Calculate Chi-squared degree of freedom (DF). :param classes: confusion matrix classes :type classes: list :return: DF as int """ pycm-4.2/.github/FUNDING.yml000066400000000000000000000000501474106425200155040ustar00rootroot00000000000000custom: https://www.pycm.io/donate.html pycm-4.2/.github/ISSUE_TEMPLATE/000077500000000000000000000000001474106425200160575ustar00rootroot00000000000000pycm-4.2/.github/ISSUE_TEMPLATE/bug_report.yml000066400000000000000000000064531474106425200207620ustar00rootroot00000000000000name: Bug Report description: File a bug report title: "[Bug]: " body: - type: markdown attributes: value: | Thanks for your time to fill out this bug report! - type: input id: contact attributes: label: Contact details description: How can we get in touch with you if we need more info? placeholder: ex. email@example.com validations: required: false - type: textarea id: what-happened attributes: label: What happened? description: Provide a clear and concise description of what the bug is. placeholder: > Tell us a description of the bug. validations: required: true - type: textarea id: step-to-reproduce attributes: label: Steps to reproduce description: Provide details of how to reproduce the bug. placeholder: > ex. 1. Go to '...' validations: required: true - type: textarea id: expected-behavior attributes: label: Expected behavior description: What did you expect to happen? placeholder: > ex. I expected '...' to happen validations: required: true - type: textarea id: actual-behavior attributes: label: Actual behavior description: What did actually happen? placeholder: > ex. Instead '...' happened validations: required: true - type: dropdown id: operating-system attributes: label: Operating system description: Which operating system are you using? options: - Windows - macOS - Linux default: 0 validations: required: true - type: dropdown id: python-version attributes: label: Python version description: Which version of Python are you using? options: - Python 3.13 - Python 3.12 - Python 3.11 - Python 3.10 - Python 3.9 - Python 3.8 - Python 3.7 - Python 3.6 - Python 3.5 default: 1 validations: required: true - type: dropdown id: pycm-version attributes: label: PyCM version description: Which version of PyCM are you using? options: - PyCM 4.2 - PyCM 4.1 - PyCM 4.0 - PyCM 3.9 - PyCM 3.8 - PyCM 3.7 - PyCM 3.6 - PyCM 3.5 - PyCM 3.4 - PyCM 3.3 - PyCM 3.2 - PyCM 3.1 - PyCM 3.0 - PyCM 2.9 - PyCM 2.8 - PyCM 2.7 - PyCM 2.6 - PyCM 2.5 - PyCM 2.4 - PyCM 2.3 - PyCM 2.2 - PyCM 2.1 - PyCM 2.0 - PyCM 1.9 - PyCM 1.8 - PyCM 1.7 - PyCM 1.6 - PyCM 1.5 - PyCM 1.4 - PyCM 1.3 - PyCM 1.2 - PyCM 1.1 - PyCM 1.0 - PyCM 0.9.5 - PyCM 0.9 - PyCM 0.8.6 - PyCM 0.8.5 - PyCM 0.8.1 - PyCM 0.7 - PyCM 0.6 - PyCM 0.5 - PyCM 0.4 - PyCM 0.3 - PyCM 0.2 - PyCM 0.1 default: 0 validations: required: true - type: textarea id: logs attributes: label: Relevant log output description: Please copy and paste any relevant log output. This will be automatically formatted into code, so no need for backticks. render: shell pycm-4.2/.github/ISSUE_TEMPLATE/config.yml000066400000000000000000000005231474106425200200470ustar00rootroot00000000000000blank_issues_enabled: false contact_links: - name: Discord url: https://discord.com/invite/zqpU2b3J3f about: Ask questions and discuss with other PyCM community members - name: Mailing List url: https://mail.python.org/mailman3/lists/pycm.python.org/ about: Ask questions and discuss with other PyCM community members pycm-4.2/.github/ISSUE_TEMPLATE/feature_request.yml000066400000000000000000000017071474106425200220120ustar00rootroot00000000000000name: Feature Request description: Suggest a feature for this project title: "[Feature]: " body: - type: textarea id: description attributes: label: Describe the feature you want to add placeholder: > I'd like to be able to [...] validations: required: true - type: textarea id: possible-solution attributes: label: Describe your proposed solution placeholder: > I think this could be done by [...] validations: required: false - type: textarea id: alternatives attributes: label: Describe alternatives you've considered, if relevant placeholder: > Another way to do this would be [...] validations: required: false - type: textarea id: additional-context attributes: label: Additional context placeholder: > Add any other context or screenshots about the feature request here. validations: required: false pycm-4.2/.github/PULL_REQUEST_TEMPLATE.md000066400000000000000000000001601474106425200174720ustar00rootroot00000000000000#### Reference Issues/PRs #### What does this implement/fix? Explain your changes. #### Any other comments? pycm-4.2/.github/dependabot.yml000066400000000000000000000003421474106425200165230ustar00rootroot00000000000000version: 2 updates: - package-ecosystem: pip directory: "/" schedule: interval: weekly time: "01:30" open-pull-requests-limit: 10 target-branch: dev assignees: - "sadrasabouri" - "sepandhaghighi" pycm-4.2/.github/workflows/000077500000000000000000000000001474106425200157315ustar00rootroot00000000000000pycm-4.2/.github/workflows/publish_conda.yaml000066400000000000000000000006731474106425200214350ustar00rootroot00000000000000name: publish_conda on: push: # Sequence of patterns matched against refs/tags tags: - '*' # Push events to matching v*, i.e. v1.0, v20.15.10 jobs: publish: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: publish-to-conda uses: sepandhaghighi/conda-package-publish-action@v1.2 with: subDir: 'Otherfiles' AnacondaToken: ${{ secrets.ANACONDA_TOKEN }} pycm-4.2/.github/workflows/publish_pypi.yml000066400000000000000000000017551474106425200211730ustar00rootroot00000000000000# This workflow will upload a Python Package using Twine when a release is created # For more information see: https://help.github.com/en/actions/language-and-framework-guides/using-python-with-github-actions#publishing-to-package-registries name: Upload Python Package on: push: # Sequence of patterns matched against refs/tags tags: - '*' # Push events to matching v*, i.e. v1.0, v20.15.10 jobs: deploy: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Set up Python uses: actions/setup-python@v4 with: python-version: '3.x' - name: Install dependencies run: | python -m pip install --upgrade pip pip install setuptools wheel twine - name: Build and publish env: TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }} TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }} run: | python setup.py sdist bdist_wheel twine upload dist/*.tar.gz twine upload dist/*.whl pycm-4.2/.github/workflows/test.yml000066400000000000000000000051131474106425200174330ustar00rootroot00000000000000# This workflow will install Python dependencies, run tests and lint with a variety of Python versions # For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions name: CI on: push: branches: - master - dev pull_request: branches: - master - dev env: TEST_PYTHON_VERSION: 3.9 TEST_OS: 'ubuntu-20.04' jobs: build: runs-on: ${{ matrix.os }} strategy: fail-fast: false matrix: os: [ubuntu-20.04, windows-2022, macOS-13] python-version: [3.6, 3.7, 3.8, 3.9, 3.10.0, 3.11.0, 3.12.0, 3.13.0] steps: - uses: actions/checkout@v4 - name: Set up Python ${{ matrix.python-version }} uses: actions/setup-python@v4 with: python-version: ${{ matrix.python-version }} - name: Installation run: | python -m pip install --upgrade pip pip install . - name: First test run: | python -m pycm test python -m pycm - name: Test requirements Installation run: | python Otherfiles/requirements-splitter.py pip install --upgrade --upgrade-strategy=only-if-needed -r test-requirements.txt - name: Test with pytest (Basic) run: | python -m pytest --cov=pycm --cov-report=term --ignore-glob=Test/plot_test.py - name: Plot requirements Installation run: | pip install --upgrade --upgrade-strategy=only-if-needed -r plot-requirements.txt - name: Test with pytest (+Plot) run: | python -m pytest --cov=pycm --cov-report=term --ignore-glob=Test/plot_error_test.py - name: Version check run: | python Otherfiles/version_check.py if: matrix.python-version == env.TEST_PYTHON_VERSION - name: Notebook check run: | pip install notebook>=5.2.2 python Otherfiles/notebook_run.py if: matrix.python-version == env.TEST_PYTHON_VERSION && matrix.os == env.TEST_OS - name: Other tests run: | python -m vulture pycm/ Otherfiles/ setup.py --min-confidence 65 --exclude=__init__.py --sort-by-size python -m bandit -r pycm -s B311 python -m pydocstyle -v --match-dir=pycm if: matrix.python-version == env.TEST_PYTHON_VERSION - name: Upload coverage to Codecov uses: codecov/codecov-action@v4 with: fail_ci_if_error: true token: ${{ secrets.CODECOV_TOKEN }} if: matrix.python-version == env.TEST_PYTHON_VERSION && matrix.os == env.TEST_OS - name: cProfile run: | python -m cProfile -s cumtime pycm/profile.py pycm-4.2/.gitignore000066400000000000000000000023121474106425200143220ustar00rootroot00000000000000# Created by .ignore support plugin (hsz.mobi) ### Python template # Byte-compiled / optimized / DLL files __pycache__/ *.py[cod] *$py.class # C extensions *.so # Distribution / packaging .Python env/ build/ develop-eggs/ dist/ downloads/ eggs/ .eggs/ lib/ lib64/ parts/ sdist/ var/ *.egg-info/ .installed.cfg *.egg # PyInstaller # Usually these files are written by a python script from a template # before PyInstaller builds the exe, so as to inject date/other infos into it. *.manifest *.spec # Installer logs pip-log.txt pip-delete-this-directory.txt # Unit test / coverage reports htmlcov/ .tox/ .coverage .coverage.* .cache nosetests.xml coverage.xml *,cover .hypothesis/ # Translations *.mo *.pot # Django stuff: *.log local_settings.py # Flask stuff: instance/ .webassets-cache # Scrapy stuff: .scrapy # Sphinx documentation docs/_build/ # PyBuilder target/ # Jupyter Notebook .ipynb_checkpoints # pyenv .python-version # celery beat schedule file celerybeat-schedule # dotenv .env # virtualenv .venv/ venv/ ENV/ # Spyder project settings .spyderproject # Rope project settings .ropeproject ### Example user template template ### Example user template # IntelliJ project files .idea *.iml out gen pycm-4.2/AUTHORS.md000066400000000000000000000015541474106425200140100ustar00rootroot00000000000000# Core Developers ---------- - Sepand Haghighi - Open Science Laboratory ([Github](https://github.com/sepandhaghighi)) ** - Alireza Zolanvari - Open Science Laboratory ([Github](https://github.com/AlirezaZolanvari)) ** - Sadra Sabouri - Open Science Laboratory ([Github](https://github.com/sadrasabouri)) ** - Masoomeh Jasemi - Microsoft ([Github](https://github.com/MasoomehJasemi)) - Shaahin Hessabi - Sharif University of Technology ([Email](mailto:hessabi@sharif.edu)) ** **Maintainer** # Other Contributors ---------- - [@soheeyang](https://github.com/soheeyang) - [@mahi97](https://github.com/mahi97) - [@cclauss](https://github.com/cclauss) - [@negarzabetian](https://github.com/negarzabetian) - [@GeetDsa](https://github.com/GeetDsa) - [@the-lay](https://github.com/the-lay) - [@lewiuberg](https://github.com/lewiuberg) - [@AHReccese](https://github.com/AHReccese) pycm-4.2/CHANGELOG.md000066400000000000000000000527521474106425200141600ustar00rootroot00000000000000# Changelog All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html). ## [Unreleased] ## [4.2] - 2025-01-14 ### Added - 5 new distance/similarity 1. KuhnsIII 2. KuhnsIV 3. KuhnsV 4. KuhnsVI 5. KuhnsVII ### Changed - Test system modified - PyPI badge in `README.md` - GitHub actions are limited to the `dev` and `master` branches - `AUTHORS.md` updated - `README.md` modified - Document modified ## [4.1] - 2024-10-17 ### Added - 5 new distance/similarity 1. KoppenI 2. KoppenII 3. KuderRichardson 4. KuhnsI 5. KuhnsII - `feature_request.yml` template - `config.yml` for issue template - `SECURITY.md` ### Changed - Bug report template modified - `thresholds_calc` function updated - `__midpoint_numeric_integral__` function updated - `__trapezoidal_numeric_integral__` function updated - Diagrams updated - Document modified - Document build system updated - `AUTHORS.md` updated - `README.md` modified - Test system modified - `Python 3.12` added to `test.yml` - `Python 3.13` added to `test.yml` - Warning and error messages updated - `pycm_util.py` renamed to `utils.py` - `pycm_test.py` renamed to `basic_test.py` - `pycm_profile.py` renamed to `profile.py` - `pycm_param.py` renamed to `params.py` - `pycm_overall_func.py` renamed to `overall_funcs.py` - `pycm_output.py` renamed to `output.py` - `pycm_obj.py` renamed to `cm.py` - `pycm_multilabel_cm.py` renamed to `multilabel_cm.py` - `pycm_interpret.py` renamed to `interpret.py` - `pycm_handler.py` renamed to `handlers.py` - `pycm_error.py` renamed to `errors.py` - `pycm_distance.py` renamed to `distance.py` - `pycm_curve.py` renamed to `curve.py` - `pycm_compare.py` renamed to `compare.py` - `pycm_class_func.py` renamed to `class_funcs.py` - `pycm_ci.py` renamed to `ci.py` ## [4.0] - 2023-06-07 ### Added - `pycmMultiLabelError` class - `MultiLabelCM` class - `get_cm_by_class` method - `get_cm_by_sample` method - `__mlcm_vector_handler__` function - `__mlcm_assign_classes__` function - `__mlcm_vectors_filter__` function - `__set_to_multihot__` function - `deprecated` function ### Changed - Document modified - `README.md` modified - Example-4 modified - Test system modified - Python 3.5 support dropped ## [3.9] - 2023-05-01 ### Added - `OVERALL_PARAMS` dictionary - `__imbalancement_handler__` function - `vector_serializer` function - NPV micro/macro - `log_loss` method - 23 new distance/similarity 1. Dennis 2. Digby 3. Dispersion 4. Doolittle 5. Eyraud 6. Fager & McGowan 7. Faith 8. Fleiss-Levin-Paik 9. Forbes I 10. Forbes II 11. Fossum 12. Gilbert & Wells 13. Goodall 14. Goodman & Kruskal's Lambda 15. Goodman & Kruskal Lambda-r 16. Guttman's Lambda A 17. Guttman's Lambda B 18. Hamann 19. Harris & Lahey 20. Hawkins & Dotson 21. Kendall's Tau 22. Kent & Foster I 23. Kent & Foster II ### Changed - `metrics_off` parameter added to ConfusionMatrix `__init__` method - `CLASS_PARAMS` changed to a dictionary - Code style modified - `sort` parameter added to `relabel` method - Document modified - `CONTRIBUTING.md` updated - `codecov` removed from `dev-requirements.txt` - Test system modified ## [3.8] - 2023-02-01 ### Added - `distance` method - `__contains__` method - `__getitem__` method - Goodman-Kruskal's Lambda A benchmark - Goodman-Kruskal's Lambda B benchmark - Krippendorff's Alpha benchmark - Pearson's C benchmark - 30 new distance/similarity 1. AMPLE 2. Anderberg's D 3. Andres & Marzo's Delta 4. Baroni-Urbani & Buser I 5. Baroni-Urbani & Buser II 6. Batagelj & Bren 7. Baulieu I 8. Baulieu II 9. Baulieu III 10. Baulieu IV 11. Baulieu V 12. Baulieu VI 13. Baulieu VII 14. Baulieu VIII 15. Baulieu IX 16. Baulieu X 17. Baulieu XI 18. Baulieu XII 19. Baulieu XIII 20. Baulieu XIV 21. Baulieu XV 22. Benini I 23. Benini II 24. Canberra 25. Clement 26. Consonni & Todeschini I 27. Consonni & Todeschini II 28. Consonni & Todeschini III 29. Consonni & Todeschini IV 30. Consonni & Todeschini V ### Changed - `relabel` method sort bug fixed - `README.md` modified - `Compare` overall benchmarks default weights updated - Document modified - Test system modified ## [3.7] - 2022-12-15 ### Added - `Curve` class - `ROCCurve` class - `PRCurve` class - `pycmCurveError` class ### Changed - `CONTRIBUTING.md` updated - `matrix_params_calc` function optimized - `README.md` modified - Document modified - Test system modified - `Python 3.11` added to `test.yml` ## [3.6] - 2022-08-17 ### Added - Hamming distance - Braun-Blanquet similarity ### Changed - `classes` parameter added to `matrix_params_from_table` function - Matrices with `numpy.integer` elements are now accepted - Arrays added to `matrix` parameter accepting formats - Website changed to [http://www.pycm.io](http://www.pycm.io) - Document modified - `README.md` modified ## [3.5] - 2022-04-27 ### Added - Anaconda workflow - Custom iterating setting - Custom casting setting ### Changed - `plot` method updated - `class_statistics` function modified - `overall_statistics` function modified - `BCD_calc` function modified - `CONTRIBUTING.md` updated - `CODE_OF_CONDUCT.md` updated - Document modified ## [3.4] - 2022-01-26 ### Added - Colab badge - Discord badge - `brier_score` method ### Changed - `J (Jaccard index)` section in `Document.ipynb` updated - `save_obj` method updated - `Python 3.10` added to `test.yml` - Example-3 updated - Docstrings of the functions updated - `CONTRIBUTING.md` updated ## [3.3] - 2021-10-27 ### Added - `__compare_weight_handler__` function ### Changed - `is_imbalanced` parameter added to ConfusionMatrix `__init__` method - `class_benchmark_weight` and `overall_benchmark_weight` parameters added to Compare `__init__` method - `statistic_recommend` function modified - Compare `weight` parameter renamed to `class_weight` - Document modified - License updated - `AUTHORS.md` updated - `README.md` modified - Block diagrams updated ## [3.2] - 2021-08-11 ### Added - `classes_filter` function ### Changed - `classes` parameter added to `matrix_params_calc` function - `classes` parameter added to `__obj_vector_handler__` function - `classes` parameter added to ConfusionMatrix `__init__` method - `name` parameter removed from `html_init` function - `shortener` parameter added to `html_table` function - `shortener` parameter added to `save_html` method - Document modified - HTML report modified ## [3.1] - 2021-03-11 ### Added - `requirements-splitter.py` - `sensitivity_index` method ### Changed - Test system modified - `overall_statistics` function modified - HTML report modified - Document modified - References format updated - `CONTRIBUTING.md` updated ## [3.0] - 2020-10-26 ### Added - `plot_test.py` - `axes_gen` function - `add_number_label` function - `plot` method - `combine` method - `matrix_combine` function ### Changed - Document modified - `README.md` modified - Example-2 deprecated - Example-7 deprecated - Error messages modified ## [2.9] - 2020-09-23 ### Added - `notebook_check.py` - `to_array` method - `__copy__` method - `copy` method ### Changed - `average` method refactored ## [2.8] - 2020-07-09 ### Added - `label_map` attribute - `positions` attribute - `position` method - Krippendorff's Alpha - Aickin's Alpha - `weighted_alpha` method ### Changed - Single class bug fixed - `CLASS_NUMBER_ERROR` error type changed to `pycmMatrixError` - `relabel` method bug fixed - Document modified - `README.md` modified ## [2.7] - 2020-05-11 ### Added - `average` method - `weighted_average` method - `weighted_kappa` method - `pycmAverageError` class - Bangdiwala's B - MATLAB examples - Github action ### Changed - Document modified - `README.md` modified - `relabel` method bug fixed - `sparse_table_print` function bug fixed - `matrix_check` function bug fixed - Minor bug in `Compare` class fixed - Class names mismatch bug fixed ## [2.6] - 2020-03-25 ### Added - `custom_rounder` function - `complement` function - `sparse_matrix` attribute - `sparse_normalized_matrix` attribute - Net benefit (NB) - Yule's Q interpretation (QI) - Adjusted Rand index (ARI) - TNR micro/macro - FPR micro/macro - FNR micro/macro ### Changed - `sparse` parameter added to `print_matrix`,`print_normalized_matrix` and `save_stat` methods - `header` parameter added to `save_csv` method - Handler functions moved to `pycm_handler.py` - Error objects moved to `pycm_error.py` - Verified tests references updated - Verified tests moved to `verified_test.py` - Test system modified - `CONTRIBUTING.md` updated - Namespace optimized - `README.md` modified - Document modified - `print_normalized_matrix` method modified - `normalized_table_calc` function modified - `setup.py` modified - summary mode updated - Dockerfile updated - `Python 3.8` added to `.travis.yaml` and `appveyor.yml` ### Removed - `PC_PI_calc` function ## [2.5] - 2019-10-16 ### Added - `__version__` variable - Individual classification success index (ICSI) - Classification success index (CSI) - Example-8 (Confidence interval) - `install.sh` - `autopep8.sh` - Dockerfile - `CI` method (supported statistics : `ACC`,`AUC`,`Overall ACC`,`Kappa`,`TPR`,`TNR`,`PPV`,`NPV`,`PLR`,`NLR`,`PRE`) ### Changed - `test.sh` moved to `.travis` folder - Python 3.4 support dropped - Python 2.7 support dropped - `AUTHORS.md` updated - `save_stat`,`save_csv` and `save_html` methods Non-ASCII character bug fixed - Mixed type input vectors bug fixed - `CONTRIBUTING.md` updated - Example-3 updated - `README.md` modified - Document modified - `CI` attribute renamed to `CI95` - `kappa_se_calc` function renamed to `kappa_SE_calc` - `se_calc` function modified and renamed to `SE_calc` - CI/SE functions moved to `pycm_ci.py` - Minor bug in `save_html` method fixed ## [2.4] - 2019-07-31 ### Added - Tversky index (TI) - Area under the PR curve (AUPR) - `FUNDING.yml` ### Changed - `AUC_calc` function modified - Document modified - `summary` parameter added to `save_html`,`save_stat`,`save_csv` and `stat` methods - `sample_weight` bug in `numpy` array format fixed - Inputs manipulation bug fixed - Test system modified - Warning system modified - `alt_link` parameter added to `save_html` method and `online_help` function - `Compare` class tests moved to `compare_test.py` - Warning tests moved to `warning_test.py` ## [2.3] - 2019-06-27 ### Added - Adjusted F-score (AGF) - Overlap coefficient (OC) - Otsuka-Ochiai coefficient (OOC) ### Changed - `save_stat` and `save_vector` parameters added to `save_obj` method - Document modified - `README.md` modified - Parameters recommendation for imbalance dataset modified - Minor bug in `Compare` class fixed - `pycm_help` function modified - Benchmarks color modified ## [2.2] - 2019-05-30 ### Added - Negative likelihood ratio interpretation (NLRI) - Cramer's benchmark (SOA5) - Matthews correlation coefficient interpretation (MCCI) - Matthews's benchmark (SOA6) - F1 macro - F1 micro - Accuracy macro ### Changed - `Compare` class score calculation modified - Parameters recommendation for multi-class dataset modified - Parameters recommendation for imbalance dataset modified - `README.md` modified - Document modified - Logo updated ## [2.1] - 2019-05-06 ### Added - Adjusted geometric mean (AGM) - Yule's Q (Q) - `Compare` class and parameters recommendation system block diagrams ### Changed - Document links bug fixed - Document modified ## [2.0] - 2019-04-15 ### Added - G-Mean (GM) - Index of balanced accuracy (IBA) - Optimized precision (OP) - Pearson's C (C) - `Compare` class - Parameters recommendation warning - `ConfusionMatrix` equal method ### Changed - Document modified - `stat_print` function bug fixed - `table_print` function bug fixed - `Beta` parameter renamed to `beta` (`F_calc` function & `F_beta` method) - Parameters recommendation for imbalance dataset modified - `normalize` parameter added to `save_html` method - `pycm_func.py` splitted into `pycm_class_func.py` and `pycm_overall_func.py` - `vector_filter`,`vector_check`,`class_check` and `matrix_check` functions moved to `pycm_util.py` - `RACC_calc` and `RACCU_calc` functions exception handler modified - Docstrings modified ## [1.9] - 2019-02-25 ### Added - Automatic/Manual (AM) - Bray-Curtis dissimilarity (BCD) - `CODE_OF_CONDUCT.md` - `ISSUE_TEMPLATE.md` - `PULL_REQUEST_TEMPLATE.md` - `CONTRIBUTING.md` - X11 color names support for `save_html` method - Parameters recommendation system - Warning message for high dimension matrix print - Interactive notebooks section (binder) ### Changed - `save_matrix` and `normalize` parameters added to `save_csv` method - `README.md` modified - Document modified - `ConfusionMatrix.__init__` optimized - Document and examples output files moved to different folders - Test system modified - `relabel` method bug fixed ## [1.8] - 2019-01-05 ### Added - Lift score (LS) - `version_check.py` ### Changed - `color` parameter added to `save_html` method - Error messages modified - Document modified - Website changed to [http://www.pycm.ir](http://www.pycm.ir) - Interpretation functions moved to `pycm_interpret.py` - Utility functions moved to `pycm_util.py` - Unnecessary `else` and `elif` removed - `==` changed to `is` ## [1.7] - 2018-12-18 ### Added - Gini index (GI) - Example-7 - `pycm_profile.py` ### Changed - `class_name` parameter added to `stat`,`save_stat`,`save_csv` and `save_html` methods - `overall_param` and `class_param` parameters empty list bug fixed - `matrix_params_calc`, `matrix_params_from_table` and `vector_filter` functions optimized - `overall_MCC_calc`, `CEN_misclassification_calc` and `convex_combination` functions optimized - Document modified ## [1.6] - 2018-12-06 ### Added - AUC value interpretation (AUCI) - Example-6 - Anaconda cloud package ### Changed - `overall_param` and `class_param` parameters added to `stat`,`save_stat` and `save_html` methods - `class_param` parameter added to `save_csv` method - `_` removed from overall statistics names - `README.md` modified - Document modified ## [1.5] - 2018-11-26 ### Added - Relative classifier information (RCI) - Discriminator power (DP) - Youden's index (Y) - Discriminant power interpretation (DPI) - Positive likelihood ratio interpretation (PLRI) - `__len__` method - `relabel` method - `__class_stat_init__` function - `__overall_stat_init__` function - `matrix` attribute as dict - `normalized_matrix` attribute as dict - `normalized_table` attribute as dict ### Changed - `README.md` modified - Document modified - `LR+` renamed to `PLR` - `LR-` renamed to `NLR` - `normalized_matrix` method renamed to `print_normalized_matrix` - `matrix` method renamed to `print_matrix` - `entropy_calc` fixed - `cross_entropy_calc` fixed - `conditional_entropy_calc` fixed - `print_table` bug for large numbers fixed - JSON key bug in `save_obj` fixed - `transpose` bug in `save_obj` fixed - `Python 3.7` added to `.travis.yaml` and `appveyor.yml` ## [1.4] - 2018-11-12 ### Added - Area under curve (AUC) - AUNU - AUNP - Class balance accuracy (CBA) - Global performance index (RR) - Overall MCC - Distance index (dInd) - Similarity index (sInd) - `one_vs_all` - `dev-requirements.txt` ### Changed - `README.md` modified - Document modified - `save_stat` modified - `requirements.txt` modified ## [1.3] - 2018-10-10 ### Added - Confusion entropy (CEN) - Overall confusion entropy (Overall CEN) - Modified confusion entropy (MCEN) - Overall modified confusion entropy (Overall MCEN) - Information score (IS) ### Changed - `README.md` modified ## [1.2] - 2018-10-01 ### Added - No information rate (NIR) - P-Value - `sample_weight` - `transpose` ### Changed - `README.md` modified - Key error in some parameters fixed - `OSX` env added to `.travis.yml` ## [1.1] - 2018-09-08 ### Added - Zero-one loss - Support - `online_help` function ### Changed - `README.md` modified - `html_table` function modified - `table_print` function modified - `normalized_table_print` function modified ## [1.0] - 2018-08-30 ### Added - Hamming loss ### Changed - `README.md` modified ## [0.9.5] - 2018-07-08 ### Added - Obj load - Obj save - Example-4 ### Changed - `README.md` modified - Block diagram updated ## [0.9] - 2018-06-28 ### Added - Activation threshold - Example-3 - Jaccard index - Overall Jaccard index ### Changed - `README.md` modified - `setup.py` modified ## [0.8.6] - 2018-05-31 ### Added - Example section in document - Python 2.7 CI - JOSS paper pdf ### Changed - Cite section - ConfusionMatrix docstring - round function changed to numpy.around - `README.md` modified ## [0.8.5] - 2018-05-21 ### Added - Example-1 (Comparison of three different classifiers) - Example-2 (How to plot via matplotlib) - JOSS paper - ConfusionMatrix docstring ### Changed - Table size in HTML report - Test system - `README.md` modified ## [0.8.1] - 2018-03-22 ### Added - Goodman and Kruskal's lambda B - Goodman and Kruskal's lambda A - Cross entropy - Conditional entropy - Joint entropy - Reference entropy - Response entropy - Kullback-Liebler divergence - Direct ConfusionMatrix - Kappa unbiased - Kappa no prevalence - Random accuracy unbiased - `pycmVectorError` class - `pycmMatrixError` class - Mutual information - Support `numpy` arrays ### Changed - Notebook file updated ### Removed - `pycmError` class ## [0.7] - 2018-02-26 ### Added - Cramer's V - 95% confidence interval - Chi-Squared - Phi-Squared - Chi-Squared DF - Standard error - Kappa standard error - Kappa 95% confidence interval - Cicchetti benchmark ### Changed - Overall statistics color in HTML report - Parameters description link in HTML report ## [0.6] - 2018-02-21 ### Added - CSV report - Changelog - Output files - `digit` parameter to `ConfusionMatrix` object ### Changed - Confusion matrix color in HTML report - Parameters description link in HTML report - Capitalize descriptions ## [0.5] - 2018-02-17 ### Added - Scott's pi - Gwet's AC1 - Bennett S score - HTML report ## [0.4] - 2018-02-05 ### Added - TPR micro/macro - PPV micro/macro - Overall RACC - Error rate (ERR) - FBeta score - F0.5 - F2 - Fleiss benchmark - Altman benchmark - Output file(.pycm) ### Changed - Class with zero item - Normalized matrix ### Removed - Kappa and SOA for each class ## [0.3] - 2018-01-27 ### Added - Kappa - Random accuracy - Landis and Koch benchmark - `overall_stat` ## [0.2] - 2018-01-24 ### Added - Population - Condition positive - Condition negative - Test outcome positive - Test outcome negative - Prevalence - G-measure - Matrix method - Normalized matrix method - Params method ### Changed - `statistic_result` to `class_stat` - `params` to `stat` ## [0.1] - 2018-01-22 ### Added - ACC - BM - DOR - F1-Score - FDR - FNR - FOR - FPR - LR+ - LR- - MCC - MK - NPV - PPV - TNR - TPR - documents and `README.md` [Unreleased]: https://github.com/sepandhaghighi/pycm/compare/v4.2...dev [4.2]: https://github.com/sepandhaghighi/pycm/compare/v4.1...v4.2 [4.1]: https://github.com/sepandhaghighi/pycm/compare/v4.0...v4.1 [4.0]: https://github.com/sepandhaghighi/pycm/compare/v3.9...v4.0 [3.9]: https://github.com/sepandhaghighi/pycm/compare/v3.8...v3.9 [3.8]: https://github.com/sepandhaghighi/pycm/compare/v3.7...v3.8 [3.7]: https://github.com/sepandhaghighi/pycm/compare/v3.6...v3.7 [3.6]: https://github.com/sepandhaghighi/pycm/compare/v3.5...v3.6 [3.5]: https://github.com/sepandhaghighi/pycm/compare/v3.4...v3.5 [3.4]: https://github.com/sepandhaghighi/pycm/compare/v3.3...v3.4 [3.3]: https://github.com/sepandhaghighi/pycm/compare/v3.2...v3.3 [3.2]: https://github.com/sepandhaghighi/pycm/compare/v3.1...v3.2 [3.1]: https://github.com/sepandhaghighi/pycm/compare/v3.0...v3.1 [3.0]: https://github.com/sepandhaghighi/pycm/compare/v2.9...v3.0 [2.9]: https://github.com/sepandhaghighi/pycm/compare/v2.8...v2.9 [2.8]: https://github.com/sepandhaghighi/pycm/compare/v2.7...v2.8 [2.7]: https://github.com/sepandhaghighi/pycm/compare/v2.6...v2.7 [2.6]: https://github.com/sepandhaghighi/pycm/compare/v2.5...v2.6 [2.5]: https://github.com/sepandhaghighi/pycm/compare/v2.4...v2.5 [2.4]: https://github.com/sepandhaghighi/pycm/compare/v2.3...v2.4 [2.3]: https://github.com/sepandhaghighi/pycm/compare/v2.2...v2.3 [2.2]: https://github.com/sepandhaghighi/pycm/compare/v2.1...v2.2 [2.1]: https://github.com/sepandhaghighi/pycm/compare/v2.0...v2.1 [2.0]: https://github.com/sepandhaghighi/pycm/compare/v1.9...v2.0 [1.9]: https://github.com/sepandhaghighi/pycm/compare/v1.8...v1.9 [1.8]: https://github.com/sepandhaghighi/pycm/compare/v1.7...v1.8 [1.7]: https://github.com/sepandhaghighi/pycm/compare/v1.6...v1.7 [1.6]: https://github.com/sepandhaghighi/pycm/compare/v1.5...v1.6 [1.5]: https://github.com/sepandhaghighi/pycm/compare/v1.4...v1.5 [1.4]: https://github.com/sepandhaghighi/pycm/compare/v1.3...v1.4 [1.3]: https://github.com/sepandhaghighi/pycm/compare/v1.2...v1.3 [1.2]: https://github.com/sepandhaghighi/pycm/compare/v1.1...v1.2 [1.1]: https://github.com/sepandhaghighi/pycm/compare/v1.0...v1.1 [1.0]: https://github.com/sepandhaghighi/pycm/compare/v0.9.5...v1.0 [0.9.5]: https://github.com/sepandhaghighi/pycm/compare/v0.9...v0.9.5 [0.9]: https://github.com/sepandhaghighi/pycm/compare/v0.8.6...v0.9 [0.8.6]: https://github.com/sepandhaghighi/pycm/compare/v0.8.5...v0.8.6 [0.8.5]: https://github.com/sepandhaghighi/pycm/compare/v0.8.1...v0.8.5 [0.8.1]: https://github.com/sepandhaghighi/pycm/compare/v0.7...v0.8.1 [0.7]: https://github.com/sepandhaghighi/pycm/compare/v0.6...v0.7 [0.6]: https://github.com/sepandhaghighi/pycm/compare/v0.5...v0.6 [0.5]: https://github.com/sepandhaghighi/pycm/compare/v0.4...v0.5 [0.4]: https://github.com/sepandhaghighi/pycm/compare/v0.3...v0.4 [0.3]: https://github.com/sepandhaghighi/pycm/compare/v0.2...v0.3 [0.2]: https://github.com/sepandhaghighi/pycm/compare/v0.1...v0.2 [0.1]: https://github.com/sepandhaghighi/pycm/compare/1e238cd...v0.1 pycm-4.2/CITATION.cff000066400000000000000000000032041474106425200142250ustar00rootroot00000000000000cff-version: 1.2.0 message: "If you use this software, please cite it as below." title: "pycm" abstract: "PyCM is a multi-class confusion matrix library written in Python that supports both input data vectors and direct matrix, and a proper tool for post-classification model evaluation that supports most classes and overall statistics parameters. PyCM is the swiss-army knife of confusion matrices, targeted mainly at data scientists that need a broad array of metrics for predictive models and accurate evaluation of a large variety of classifiers." authors: - family-names: "Haghighi" given-names: "Sepand" - family-names: "Zolanvari" given-names: "Alireza" - family-names: "Sabouri" given-names: "Sadra" version: 3.3 date-released: 2021-10-27 repository-code: "https://github.com/sepandhaghighi/pycm" url: "https://www.pycm.io" license: MIT keywords: - "confusion matrix" - "python" - "F-score" - "Accuracy" preferred-citation: type: article authors: - family-names: "Haghighi" given-names: "Sepand" orcid: "https://orcid.org/0000-0001-9450-2375" - family-names: "Jasemi" given-names: "Masoomeh" orcid: "https://orcid.org/0000-0002-4831-1698" - family-names: "Hessabi" given-names: "Shaahin" orcid: "https://orcid.org/0000-0003-3193-2567" - family-names: "Zolanvari" given-names: "Alireza" orcid: "https://orcid.org/0000-0003-2367-8343" doi: "10.21105/joss.00729" journal: "Journal of Open Source Software" month: 5 start: 729 # First page number end: 729 # Last page number title: "PyCM: Multiclass confusion matrix library in Python" issue: 25 volume: 3 year: 2018 pycm-4.2/Document/000077500000000000000000000000001474106425200141125ustar00rootroot00000000000000pycm-4.2/Document/Distance.ipynb000066400000000000000000002262171474106425200167210ustar00rootroot00000000000000{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "

Please cite us if you use the software

\n", "\n", "\n", " \n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Distance/Similarity" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "PyCM's `distance` method provides users with a wide range of string distance/similarity metrics to evaluate a confusion matrix by measuring its distance to a perfect confusion matrix. Distance/Similarity metrics measure the distance between two vectors of numbers. Small distances between two objects indicate similarity. In the PyCM's `distance` method, a distance measure can be chosen from `DistanceType`. The measures' names are chosen based on the namig style suggested in [[1]](#ref1)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from pycm import ConfusionMatrix, DistanceType" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "cm = ConfusionMatrix(matrix={0: {0: 3, 1: 0, 2: 0}, 1: {0: 0, 1: 1, 2: 2}, 2: {0: 2, 1: 1, 2: 3}})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$TP \\rightarrow True Positive$$\n", "$$TN \\rightarrow True Negative$$\n", "$$FP \\rightarrow False Positive$$\n", "$$FN \\rightarrow False Negative$$\n", "$$POP \\rightarrow Population$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## AMPLE" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "AMPLE similarity [[2]](#ref2) [[3]](#ref3)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{AMPLE}=|\\frac{TP}{TP+FP}-\\frac{FN}{FN+TN}|$$" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.6, 1: 0.3, 2: 0.17142857142857143}" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.AMPLE)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Anderberg's D" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Anderberg's D [[4]](#ref4)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{Anderberg} =\n", "\\frac{(max(TP,FP)+max(FN,TN)+max(TP,FN)+max(FP,TN))-\n", "(max(TP+FP,FP+TN)+max(TP+FP,FN+TN))}{2\\times POP}$$" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.16666666666666666, 1: 0.0, 2: 0.041666666666666664}" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.Anderberg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Andres & Marzo's Delta" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Andres & Marzo's Delta correlation [[5]](#ref5)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$corr_{AndresMarzo_\\Delta} = \\Delta =\n", "\\frac{TP+TN-2 \\times \\sqrt{FP \\times FN}}{POP}$$" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.8333333333333334, 1: 0.5142977396044842, 2: 0.17508504286947035}" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.AndresMarzoDelta)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Baroni-Urbani & Buser I" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Baroni-Urbani & Buser I similarity [[6]](#ref6)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{BaroniUrbaniBuserI} =\n", "\\frac{\\sqrt{TP\\times TN}+TP}{\\sqrt{TP\\times TN}+TP+FP+FN}$$" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.79128784747792, 1: 0.5606601717798213, 2: 0.5638559245324765}" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.BaroniUrbaniBuserI)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Baroni-Urbani & Buser II" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Baroni-Urbani & Buser II correlation [[6]](#ref6)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$corr_{BaroniUrbaniBuserII} =\n", "\\frac{\\sqrt{TP \\times TN}+TP-FP-FN}{\\sqrt{TP \\times TN}+TP+FP+FN}$$" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.58257569495584, 1: 0.12132034355964261, 2: 0.1277118490649528}" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.BaroniUrbaniBuserII)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Batagelj & Bren" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Batagelj & Bren distance [[7]](#ref7)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$dist_{BatageljBren} =\n", "\\frac{FP \\times FN}{TP \\times TN}$$" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.0, 1: 0.25, 2: 0.5}" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.BatageljBren)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Baulieu I" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Baulieu I distance [[8]](#ref8)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{BaulieuI} =\n", "\\frac{(TP+FP) \\times (TP+FN)-TP^2}{(TP+FP) \\times (TP+FN)}$$" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.4, 1: 0.8333333333333334, 2: 0.7}" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.BaulieuI)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Baulieu II" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Baulieu II similarity [[8]](#ref8)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{BaulieuII} =\n", "\\frac{TP^2 \\times TN^2}{(TP+FP) \\times (TP+FN) \\times (FP+TN) \\times (FN+TN)}$$" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.4666666666666667, 1: 0.11851851851851852, 2: 0.11428571428571428}" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.BaulieuII)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Baulieu III" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Baulieu III distance [[8]](#ref8)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{BaulieuIII} =\n", "\\frac{POP^2 - 4 \\times (TP \\times TN-FP \\times FN)}{2 \\times POP^2}$$" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.20833333333333334, 1: 0.4166666666666667, 2: 0.4166666666666667}" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.BaulieuIII)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Baulieu IV" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Baulieu IV distance [[9]](#ref9)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$dist_{BaulieuIV} = \\frac{FP+FN-(TP+\\frac{1}{2})\\times(TN+\\frac{1}{2})\\times TN \\times k}{POP}$$" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: -41.45702383161246, 1: -22.855395541901885, 2: -13.85431293274332}" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.BaulieuIV)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* The default value of k is Euler's number $e$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Baulieu V" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Baulieu V distance [[9]](#ref9)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$dist_{BaulieuV} = \\frac{FP+FN+1}{TP+FP+FN+1}$$" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.5, 1: 0.8, 2: 0.6666666666666666}" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.BaulieuV)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Baulieu VI" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Baulieu VI distance [[9]](#ref9)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$dist_{BaulieuVI} = \\frac{FP+FN}{TP+FP+FN+1}$$" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.3333333333333333, 1: 0.6, 2: 0.5555555555555556}" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.BaulieuVI)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Baulieu VII" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Baulieu VII distance [[9]](#ref9)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$dist_{BaulieuVII} = \\frac{FP+FN}{POP + TP \\times (TP-4)^2}$$" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.13333333333333333, 1: 0.14285714285714285, 2: 0.3333333333333333}" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.BaulieuVII)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Baulieu VIII" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Baulieu VIII distance [[9]](#ref9)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$dist_{BaulieuVIII} = \\frac{(FP-FN)^2}{POP^2}$$" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.027777777777777776, 1: 0.006944444444444444, 2: 0.006944444444444444}" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.BaulieuVIII)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Baulieu IX" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Baulieu IX distance [[9]](#ref9)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$dist_{BaulieuIX} = \\frac{FP+2 \\times FN}{TP+FP+2 \\times FN+TN}$$" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.16666666666666666, 1: 0.35714285714285715, 2: 0.5333333333333333}" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.BaulieuIX)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Baulieu X" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Baulieu X distance [[9]](#ref9)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$dist_{BaulieuX} = \\frac{FP+FN+max(FP,FN)}{POP+max(FP,FN)}$$" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.2857142857142857, 1: 0.35714285714285715, 2: 0.5333333333333333}" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.BaulieuX)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Baulieu XI" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Baulieu XI distance [[9]](#ref9)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$dist_{BaulieuXI} = \\frac{FP+FN}{FP+FN+TN}$$" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.2222222222222222, 1: 0.2727272727272727, 2: 0.5555555555555556}" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.BaulieuXI)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Baulieu XII" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Baulieu XII distance [[9]](#ref9)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$dist_{BaulieuXII} = \\frac{FP+FN}{TP+FP+FN-1}$$" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.5, 1: 1.0, 2: 0.7142857142857143}" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.BaulieuXII)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Baulieu XIII" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Baulieu XIII distance [[9]](#ref9)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$dist_{BaulieuXIII} = \\frac{FP+FN}{TP+FP+FN+TP \\times (TP-4)^2}$$" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.25, 1: 0.23076923076923078, 2: 0.45454545454545453}" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.BaulieuXIII)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Baulieu XIV" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Baulieu XIV distance [[9]](#ref9)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$dist_{BaulieuXIV} = \\frac{FP+2 \\times FN}{TP+FP+2 \\times FN}$$" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.4, 1: 0.8333333333333334, 2: 0.7272727272727273}" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.BaulieuXIV)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Baulieu XV" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Baulieu XV distance [[9]](#ref9)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$dist_{BaulieuXV} = \\frac{FP+FN+max(FP, FN)}{TP+FP+FN+max(FP, FN)}$$" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.5714285714285714, 1: 0.8333333333333334, 2: 0.7272727272727273}" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.BaulieuXV)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Benini I" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Benini I correlation [[10]](#ref10)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$corr_{BeniniI} = \\frac{TP \\times TN-FP \\times FN}{(TP+FN)\\times(FN+TN)}$$" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 1.0, 1: 0.2, 2: 0.14285714285714285}" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.BeniniI)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Benini II" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Benini II correlation [[10]](#ref10)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$corr_{BeniniII} = \\frac{TP \\times TN-FP \\times FN}{min((TP+FN)\\times(FN+TN), (TP+FP)\\times(FP+TN))}$$" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 1.0, 1: 0.3333333333333333, 2: 0.2}" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.BeniniII)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Canberra" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Canberra distance [[11]](#ref11) [[12]](#ref12)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{Canberra} =\n", "\\frac{FP+FN}{(TP+FP)+(TP+FN)}$$" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.25, 1: 0.6, 2: 0.45454545454545453}" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.Canberra)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Clement" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Clement similarity [[13]](#ref13)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{Clement} =\n", "\\frac{TP}{TP+FP}\\times\\Big(1 - \\frac{TP+FP}{POP}\\Big) +\n", "\\frac{TN}{FN+TN}\\times\\Big(1 - \\frac{FN+TN}{POP}\\Big)$$" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.7666666666666666, 1: 0.55, 2: 0.588095238095238}" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.Clement)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Consonni & Todeschini I" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Consonni & Todeschini I similarity [[14]](#ref14)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{ConsonniTodeschiniI} =\n", "\\frac{log(1+TP+TN)}{log(1+POP)}$$" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.9348704159880586, 1: 0.8977117175026231, 2: 0.8107144632819592}" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.ConsonniTodeschiniI)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Consonni & Todeschini II" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Consonni & Todeschini II similarity [[14]](#ref14)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{ConsonniTodeschiniII} =\n", "\\frac{log(1+POP)-log(1+FP+FN)}{log(1+POP)}$$" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.5716826589686053, 1: 0.4595236911453605, 2: 0.3014445045412856}" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.ConsonniTodeschiniII)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Consonni & Todeschini III" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Consonni & Todeschini III similarity [[14]](#ref14)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{ConsonniTodeschiniIII} =\n", "\\frac{log(1+TP)}{log(1+POP)}$$" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.5404763088546395, 1: 0.27023815442731974, 2: 0.5404763088546395}" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.ConsonniTodeschiniIII)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Consonni & Todeschini IV" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Consonni & Todeschini IV similarity [[14]](#ref14)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{ConsonniTodeschiniIV} =\n", "\\frac{log(1+TP)}{log(1+TP+FP+FN)}$$" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.7737056144690831, 1: 0.43067655807339306, 2: 0.6309297535714574}" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.ConsonniTodeschiniIV)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Consonni & Todeschini V" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Consonni & Todeschini V correlation [[14]](#ref14)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$corr_{ConsonniTodeschiniV} =\n", "\\frac{log(1+TP \\times TN)-log(1+FP \\times FN)}{log(1+\\frac{POP^2}{4})}$$" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.8560267854703983, 1: 0.30424737289682985, 2: 0.17143541431350617}" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.ConsonniTodeschiniV)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dennis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Dennis similarity [[15]](#ref15)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{Dennis} =\n", "\\frac{TP-\\frac{(TP+FP)\\times(TP+FN)}{POP}}{\\sqrt{\\frac{(TP+FP)\\times(TP+FN)}{POP}}}$$" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 1.5652475842498528, 1: 0.7071067811865475, 2: 0.31622776601683794}" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.Dennis)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Digby" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Digby correlation [[16]](#ref16)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$corr_{Digby} =\n", "\\frac{(TP \\times TN) ^\\frac{3}{4}-(FP \\times FN)^\\frac{3}{4}}{(TP \\times TN)^\\frac{3}{4}+(FP \\times FN)^\\frac{3}{4}}$$" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 1.0, 1: 0.47759225007251715, 2: 0.2542302383508219}" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.Digby)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dispersion" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Dispersion correlation [[17]](#ref17)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$corr_{dispersion} =\n", "\\frac{TP \\times TN -FP \\times FN}{POP^2}\n", "$$" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.14583333333333334, 1: 0.041666666666666664, 2: 0.041666666666666664}" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.Dispersion)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Doolittle" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Doolittle similarity [[18]](#ref18)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{Doolittle} =\n", "\\frac{(TP\\times POP - (TP+FP)\\times(TP+FN))^2}{(TP+FP)\\times(TP+FN)\\times(FP+TN)\\times(FN+TN)}$$" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.4666666666666667, 1: 0.06666666666666667, 2: 0.02857142857142857}" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.Doolittle)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Eyraud" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Eyraud similarity [[19]](#ref19)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{Eyraud} =\n", "\\frac{TP-(TP+FP)\\times(TP+FN)}{(TP+FP)\\times(TP+FN)\\times(FP+TN)\\times(FN+TN)}$$" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: -0.012698412698412698, 1: -0.009259259259259259, 2: -0.02142857142857143}" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.Eyraud)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fager & McGowan" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fager & McGowan similarity [[20]](#ref20) [[21]](#ref21)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{FagerMcGowan} =\n", "\\frac{TP}{\\sqrt{(TP+FP)\\times(TP+FN)}} - \\frac{1}{2\\sqrt{max(TP+FP, TP+FN)}}$$" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.5509898714915045, 1: 0.11957315586905015, 2: 0.3435984122732345}" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.FagerMcGowan)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Faith" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Faith similarity [[22]](#ref22)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{Faith} =\n", "\\frac{TP+\\frac{TN}{2}}{POP}$$" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.5416666666666666, 1: 0.4166666666666667, 2: 0.4166666666666667}" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.Faith)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fleiss-Levin-Paik" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fleiss-Levin-Paik similarity [[23]](#ref23)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{FleissLevinPaik} =\n", "\\frac{2 \\times TN}{2 \\times TN + FP + FN}$$" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.875, 1: 0.8421052631578947, 2: 0.6153846153846154}" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.FleissLevinPaik)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Forbes I" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Forbes I similarity [[24]](#ref24) [[25]](#ref25)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{ForbesI} =\n", "\\frac{POP \\times TP}{(TP+FP)\\times(TP+FN)}$$" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 2.4, 1: 2.0, 2: 1.2}" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.ForbesI)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Forbes II" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Forbes II correlation [[26]](#ref26)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$corr_{ForbesII} =\n", "\\frac{FP \\times FN-TP \\times TN}{(TP+FP)\\times(TP+FN) - POP \\times min(TP+FP, TP+FN)}$$" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 1.0, 1: 0.3333333333333333, 2: 0.2}" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.ForbesII)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fossum" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fossum similarity [[27]](#ref27)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{Fossum} =\n", "\\frac{POP \\times (TP-\\frac{1}{2})^2}{(TP+FP)\\times(TP+FN)}$$" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 5.0, 1: 0.5, 2: 2.5}" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.Fossum)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Gilbert & Wells" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Gilbert & Wells similarity [[28]](#ref28)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{GilbertWells} =\n", "ln \\frac{POP^3}{2\\pi (TP+FP)\\times(TP+FN)\\times(FP+TN)\\times(FN+TN)} +\n", "2ln \\frac{POP! \\times TP! \\times FP! \\times FN! \\times TN!}{(TP+FP)! \\times (TP+FN)! \\times (FP+TN)! \\times (FN+TN)!}$$" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 4.947742862177545, 1: 1.1129094954405283, 2: 0.4195337173255813}" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.GilbertWells)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Goodall" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Goodall similarity [[29]](#ref29) [[30]](#ref30)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{Goodall} =\\frac{2}{\\pi} \\sin^{-1}\\Big(\n", "\\sqrt{\\frac{TP + TN}{POP}}\n", "\\Big)$$" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.7322795271987701, 1: 0.6666666666666666, 2: 0.5533003790381138}" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.Goodall)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Goodman & Kruskal's Lambda" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Goodman & Kruskal's Lambda similarity [[31]](#ref31)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{GK_\\lambda} =\n", "\\frac{\\frac{1}{2}((max(TP,FP)+max(FN,TN)+max(TP,FN)+max(FP,TN))-\n", "(max(TP+FP,FN+TN)+max(TP+FN,FP+TN)))}\n", "{POP-\\frac{1}{2}(max(TP+FP,FN+TN)+max(TP+FN,FP+TN))}$$" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.5, 1: 0.0, 2: 0.09090909090909091}" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.GoodmanKruskalLambda)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Goodman & Kruskal Lambda-r" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Goodman & Kruskal Lambda-r correlation [[31]](#ref31)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$corr_{GK_{\\lambda_r}} =\n", "\\frac{TP + TN - \\frac{1}{2}(max(TP+FP,FN+TN)+max(TP+FN,FP+TN))}\n", "{POP - \\frac{1}{2}(max(TP+FP,FN+TN)+max(TP+FN,FP+TN))}\n", "$$" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.5, 1: -0.2, 2: 0.09090909090909091}" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.GoodmanKruskalLambdaR)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Guttman's Lambda A" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Guttman's Lambda A similarity [[32]](#ref32)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{Guttman_{\\lambda_a}} =\n", "\\frac{max(TP, FN) + max(FP, TN) - max(TP+FP, FN+TN)}{POP - max(TP+FP, FN+TN)}\n", "$$" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.6, 1: 0.0, 2: 0.0}" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.GuttmanLambdaA)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Guttman's Lambda B" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Guttman's Lambda B similarity [[32]](#ref32)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{Guttman_{\\lambda_b}} =\n", "\\frac{max(TP, FP) + max(FN, TN) - max(TP+FN, FP+TN)}{POP - max(TP+FN, FP+TN)}\n", "$$" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.3333333333333333, 1: 0.0, 2: 0.16666666666666666}" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.GuttmanLambdaB)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Hamann" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hamann correlation [[33]](#ref33)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$corr_{Hamann} =\n", "\\frac{TP+TN-FP-FN}{POP}\n", "$$" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.6666666666666666, 1: 0.5, 2: 0.16666666666666666}" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.Hamann)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Harris & Lahey" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Harris & Lahey similarity [[34]](#ref34)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{HarrisLahey} =\n", "\\frac{TP}{TP+FP+FN} \\times \\frac{2TN+FP+FN}{2POP}+\n", "\\frac{TN}{TN+FP+FN} \\times \\frac{2TP+FP+FN}{2POP}\n", "$$" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.6592592592592592, 1: 0.3494318181818182, 2: 0.4068287037037037}" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.HarrisLahey)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Hawkins & Dotson" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hawkins & Dotson similarity [[35]](#ref35)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{HawkinsDotson} =\n", "\\frac{1}{2} \\times \\Big(\\frac{TP}{TP+FP+FN}+\\frac{TN}{FP+FN+TN}\\Big)\n", "$$" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.6888888888888889, 1: 0.48863636363636365, 2: 0.4097222222222222}" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.HawkinsDotson)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Kendall's Tau" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Kendall's Tau correlation [[36]](#ref36)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$corr_{KendallTau} =\n", "\\frac{2 \\times (TP+TN-FP-FN)}{POP \\times (POP-1)}\n", "$$" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.12121212121212122, 1: 0.09090909090909091, 2: 0.030303030303030304}" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.KendallTau)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Kent & Foster I" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Kent & Foster I similarity [[37]](#ref37)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{KentFosterI} =\n", "\\frac{TP-\\frac{(TP+FP)\\times(TP+FN)}{TP+FP+FN}}{TP-\\frac{(TP+FP)\\times(TP+FN)}{TP+FP+FN}+FP+FN}\n", "$$" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.0, 1: -0.2, 2: -0.17647058823529413}" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.KentFosterI)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Kent & Foster II" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Kent & Foster II similarity [[37]](#ref37)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{KentFosterII} =\n", "\\frac{TN-\\frac{(FP+TN)\\times(FN+TN)}{FP+FN+TN}}{TN-\\frac{(FP+TN)\\times(FP+TN)}{FP+FN+TN}+FP+FN}\n", "$$" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.0, 1: -0.06451612903225801, 2: -0.15384615384615394}" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.KentFosterII)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Köppen I" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Köppen I correlation [[38]](#ref38)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{KoppenI} =\n", "\\frac{\\frac{2 \\times TP+FP+FN}{2}.\\frac{2 \\times TN+FP+FN}{2} - \\frac{FP+FN}{2}}\n", "{\\frac{2 \\times TP+FP+FN}{2}.\\frac{2 \\times TN+FP+FN}{2}}\n", "$$" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.96875, 1: 0.9368421052631579, 2: 0.9300699300699301}" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.KoppenI)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Köppen II" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Köppen II correlation [[38]](#ref38)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{KoppenII} =\n", "TP + \\frac{FP + FN}{2}\n", "$$" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 4.0, 1: 2.5, 2: 5.5}" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.KoppenII)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Kuder & Richardson" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Kuder & Richardson correlation [[39]](#ref39)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$corr_{KuderRichardson} =\n", "\\frac{4 \\times (TP \\times TN - FP \\times FN)}\n", "{(TP+FP)(FN+TN) + (TP+FN)(FP+TN) + 2(TP \\times TN - FP \\times FN)}\n", "$$" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.8076923076923077, 1: 0.4067796610169492, 2: 0.2891566265060241}" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.KuderRichardson)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Kuhns I" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Kuhns I correlation [[40]](#ref40)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$corr_{KuhnsI} =\n", "\\frac{2 \\times \\delta(TP + FP, TP + FN)}\n", "{N}\n", "$$\n", "\n", "$$\n", "\\delta(TP + FP, TP + FN) = TP - \\frac{(TP + FP) \\times (TP + FN)}{N}\n", "$$" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.2916666666666667, 1: 0.08333333333333333, 2: 0.08333333333333333}" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.KuhnsI)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Kuhns II" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Kuhns II correlation [[40]](#ref40)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$corr_{KuhnsII} =\n", "\\frac{\\delta(TP + FP, TP + FN)}\n", "{\\max(TP + FP, TP + FN)}\n", "$$\n", "\n", "$$\n", "\\delta(TP + FP, TP + FN) = TP - \\frac{(TP + FP) \\times (TP + FN)}{N}\n", "$$" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.35, 1: 0.16666666666666666, 2: 0.08333333333333333}" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.KuhnsII)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Kuhns III" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Kuhns III correlation [[40]](#ref40)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$corr_{KuhnsIII} =\n", "\\frac{\\delta(TP + FP, TP + FN)}\n", "{(1-\\frac{TP}{2 \\times TP + FP + FN})(2 \\times TP + FP + FN-\\frac{(TP + FP)(TP + FN)}{N})}\n", "$$\n", "\n", "$$\n", "\\delta(TP + FP, TP + FN) = TP - \\frac{(TP + FP) \\times (TP + FN)}{N}\n", "$$" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.4148148148148148, 1: 0.1388888888888889, 2: 0.08088235294117647}" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.KuhnsIII)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Kuhns IV" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Kuhns IV correlation [[40]](#ref40)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$corr_{KuhnsIV} =\n", "\\frac{\\delta(TP + FP, TP + FN)}\n", "{\\min(TP + FP, TP + FN)}\n", "$$\n", "\n", "$$\n", "\\delta(TP + FP, TP + FN) = TP - \\frac{(TP + FP) \\times (TP + FN)}{N}\n", "$$" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.5833333333333334, 1: 0.25, 2: 0.1}" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.KuhnsIV)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Kuhns V" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Kuhns V correlation [[40]](#ref40)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$corr_{KuhnsV} =\n", "\\frac{\\delta(TP + FP, TP + FN)}\n", "{\\max((TP+FP)(1-\\frac{TP+FP}{N}), (TP+FN)(1-\\frac{TP+FN}{N}))}\n", "$$\n", "\n", "$$\n", "\\delta(TP + FP, TP + FN) = TP - \\frac{(TP + FP) \\times (TP + FN)}{N}\n", "$$" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.6000000000000001, 1: 0.2222222222222222, 2: 0.16666666666666666}" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.KuhnsV)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Kuhns VI" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Kuhns VI correlation [[40]](#ref40)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$corr_{KuhnsVI} =\n", "\\frac{\\delta(TP + FP, TP + FN)}\n", "{\\min((TP+FP)(1-\\frac{TP+FP}{N}), (TP+FN)(1-\\frac{TP+FN}{N}))}\n", "$$\n", "\n", "$$\n", "\\delta(TP + FP, TP + FN) = TP - \\frac{(TP + FP) \\times (TP + FN)}{N}\n", "$$" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.7777777777777778, 1: 0.3, 2: 0.17142857142857146}" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.KuhnsVI)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Kuhns VII" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Kuhns VII correlation [[40]](#ref40)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$corr_{KuhnsVII} =\n", "\\frac{\\delta(TP + FP, TP + FN)}\n", "{\\sqrt{(TP + FP) \\times (TP + FN)}}\n", "$$\n", "\n", "$$\n", "\\delta(TP + FP, TP + FN) = TP - \\frac{(TP + FP) \\times (TP + FN)}{N}\n", "$$" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.45184805705753195, 1: 0.20412414523193154, 2: 0.09128709291752768}" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.KuhnsVII)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## References" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
1- C. C. Little, \"Abydos Documentation,\" 2018.
\n", "\n", "
2- V. Dallmeier, C. Lindig, and A. Zeller, \"Lightweight defect localization for Java,\" in European conference on object-oriented programming, 2005: Springer, pp. 528-550.
\n", "\n", "
3- R. Abreu, P. Zoeteweij, and A. J. Van Gemund, \"An evaluation of similarity coefficients for software fault localization,\" in 2006 12th Pacific Rim International Symposium on Dependable Computing (PRDC'06), 2006: IEEE, pp. 39-46.
\n", "\n", "
4- M. R. Anderberg, Cluster analysis for applications: probability and mathematical statistics: a series of monographs and textbooks. Academic press, 2014.
\n", "\n", "
5- A. M. Andrés and P. F. Marzo, \"Delta: A new measure of agreement between two raters,\" British journal of mathematical and statistical psychology, vol. 57, no. 1, pp. 1-19, 2004.
\n", "\n", "
6- C. Baroni-Urbani and M. W. Buser, \"Similarity of binary data,\" Systematic Zoology, vol. 25, no. 3, pp. 251-259, 1976.
\n", "\n", "
7- V. Batagelj and M. Bren, \"Comparing resemblance measures,\" Journal of classification, vol. 12, no. 1, pp. 73-90, 1995.
\n", "\n", "
8- F. B. Baulieu, \"A classification of presence/absence based dissimilarity coefficients,\" Journal of Classification, vol. 6, no. 1, pp. 233-246, 1989.
\n", "\n", "
9- F. B. Baulieu, \"Two variant axiom systems for presence/absence based dissimilarity coefficients,\" Journal of Classification, vol. 14, no. 1, pp. 0159-0170, 1997.
\n", "\n", "
10- R. Benini, Principii di demografia. Barbera, 1901.
\n", "\n", "
11- G. N. Lance and W. T. Williams, \"Computer programs for hierarchical polythetic classification (“similarity analyses”),\" The Computer Journal, vol. 9, no. 1, pp. 60-64, 1966.
\n", "\n", "
12- G. N. Lance and W. T. Williams, \"Mixed-Data Classificatory Programs I - Agglomerative Systems,\" Australian Computer Journal, vol. 1, no. 1, pp. 15-20, 1967.
\n", "\n", "
13- P. W. Clement, \"A formula for computing inter-observer agreement,\" Psychological Reports, vol. 39, no. 1, pp. 257-258, 1976.
\n", "\n", "
14- V. Consonni and R. Todeschini, \"New similarity coefficients for binary data,\" Match-Communications in Mathematical and Computer Chemistry, vol. 68, no. 2, p. 581, 2012.
\n", "\n", "
15- S. F. Dennis, \"The Construction of a Thesaurus Automatically From,\" in Statistical Association Methods for Mechanized Documentation: Symposium Proceedings, 1965, vol. 269: US Government Printing Office, p. 61.
\n", "\n", "
16- P. G. Digby, \"Approximating the tetrachoric correlation coefficient,\" Biometrics, pp. 753-757, 1983.
\n", "\n", "
17- IBM Corp, \"IBM SPSS Statistics Algorithms,\" ed: IBM Corp Armonk, NY, USA, 2017.
\n", "\n", "
18- M. H. Doolittle, \"The verification of predictions,\" Bulletin of the Philosophical Society of Washington, vol. 7, pp. 122-127, 1885.
\n", "\n", "
19- H. Eyraud, \"Les principes de la mesure des correlations,\" Ann. Univ. Lyon, III. Ser., Sect. A, vol. 1, no. 30-47, p. 111, 1936.
\n", "\n", "
20- E. W. Fager, \"Determination and analysis of recurrent groups,\" Ecology, vol. 38, no. 4, pp. 586-595, 1957.
\n", "\n", "
21- E. W. Fager and J. A. McGowan, \"Zooplankton Species Groups in the North Pacific: Co-occurrences of species can be used to derive groups whose members react similarly to water-mass types,\" Science, vol. 140, no. 3566, pp. 453-460, 1963.
\n", "\n", "
22- D. P. Faith, \"Asymmetric binary similarity measures,\" Oecologia, vol. 57, pp. 287-290, 1983.
\n", "\n", "
23- J. L. Fleiss, B. Levin, and M. C. Paik, Statistical methods for rates and proportions. john wiley & sons, 2013.
\n", "\n", "
24- S. A. Forbes, On the local distribution of certain Illinois fishes: an essay in statistical ecology. Illinois State Laboratory of Natural History, 1907.
\n", "\n", "
25- A. Mozley, \"The statistical analysis of the distribution of pond molluscs in western Canada,\" The American Naturalist, vol. 70, no. 728, pp. 237-244, 1936.
\n", "\n", "
26- S. A. Forbes, \"Method of determining and measuring the associative relations of species,\" Science, vol. 61, no. 1585, pp. 518-524, 1925.
\n", "\n", "
27- E. G. Fossum and G. Kaskey, \"Optimization and standardization of information retrieval language and systems,\" SPERRY RAND CORP PHILADELPHIA PA UNIVAC DIV, 1966.
\n", "\n", "
28- N. Gilbert and T. C. Wells, \"Analysis of quadrat data,\" The Journal of Ecology, pp. 675-685, 1966.
\n", "\n", "
29- D. W. Goodall, \"The distribution of the matching coefficient,\" Biometrics, pp. 647-656, 1967.
\n", "\n", "
30- B. Austin and R. R. Colwell, \"Evaluation of some coefficients for use in numerical taxonomy of microorganisms,\" International Journal of Systematic and Evolutionary Microbiology, vol. 27, no. 3, pp. 204-210, 1977.
\n", "\n", "
31- L. A. Goodman, W. H. Kruskal, L. A. Goodman, and W. H. Kruskal, Measures of association for cross classifications. Springer, 1979.
\n", "\n", "
32- L. Guttman, \"An outline of the statistical theory of prediction,\" The prediction of personal adjustment, vol. 48, pp. 253-318, 1941.
\n", "\n", "
33- U. Hamann, \"Merkmalsbestand und verwandtschaftsbeziehungen der farinosae: ein beitrag zum system der monokotyledonen,\" Willdenowia, pp. 639-768, 1961.
\n", "\n", "
34- F. C. Harris and B. B. Lahey, \"A method for combining occurrence and nonoccurrence interobserver agreement scores,\" Journal of Applied Behavior Analysis, vol. 11, no. 4, pp. 523-527, 1978.
\n", "\n", "
35- R. P. Hawkins and V. A. Dotson, \"Reliability Scores That Delude: An Alice in Wonderland Trip Through the Misleading Characteristics of Inter-Observer Agreement Scores in Interval Recording,\" 1973.
\n", "\n", "
36- M. G. Kendall, \"A new measure of rank correlation,\" Biometrika, vol. 30, no. 1/2, pp. 81-93, 1938.
\n", "\n", "
37- R. N. Kent and S. L. Foster, \"Direct observational procedures: Methodological issues in naturalistic settings,\" Handbook of behavioral assessment, pp. 279-328, 1977.
\n", "\n", "
38- W. Köppen, \"In Repertorium für Meteorologie,\" Akademiia Nauk, pp. 189–238, 1870.
\n", "\n", "
39- G. F. Kuder and M. W. Richardson, \"The theory of the estimation of test reliability,\" Psychometrika, pp. 151–160, 1937.
\n", "\n", "
40- J. L. Kuhns, \"Statistical Association Methods for Mechanized Documentation,\" National Bureau of Standards Miscellaneous Publication, pp. 33-40, 1964.
" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": false, "sideBar": true, "skip_h1_title": true, "title_cell": "Table of Contents", "title_sidebar": "Distance/Similarity", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 } pycm-4.2/Document/Document.ipynb000066400000000000000000017555461474106425200167620ustar00rootroot00000000000000{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "

Please cite us if you use the software

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# PyCM Document" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Version : 4.2 " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "-----" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "html_hide" ] }, "source": [ "## Table of contents" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "html_hide" ] }, "source": [ "