nameparser-0.5.6/0000755000076500000240000000000013227234650014100 5ustar derekstaff00000000000000nameparser-0.5.6/PKG-INFO0000644000076500000240000001532113227234650015177 0ustar derekstaff00000000000000Metadata-Version: 1.1 Name: nameparser Version: 0.5.6 Summary: A simple Python module for parsing human names into their individual components. Home-page: https://github.com/derek73/python-nameparser Author: Derek Gulbranson Author-email: derek73@gmail.com License: LGPL Description-Content-Type: UNKNOWN Description: Name Parser =========== .. image:: https://travis-ci.org/derek73/python-nameparser.svg?branch=master :target: https://travis-ci.org/derek73/python-nameparser .. image:: https://badge.fury.io/py/nameparser.svg :target: http://badge.fury.io/py/nameparser A simple Python (3.2+ & 2.6+) module for parsing human names into their individual components. * hn.title * hn.first * hn.middle * hn.last * hn.suffix * hn.nickname Supported Name Structures ~~~~~~~~~~~~~~~~~~~~~~~~~ The supported name structure is generally "Title First Middle Last Suffix", where all pieces are optional. Comma-separated format like "Last, First" is also supported. 1. Title Firstname "Nickname" Middle Middle Lastname Suffix 2. Lastname [Suffix], Title Firstname (Nickname) Middle Middle[,] Suffix [, Suffix] 3. Title Firstname M Lastname [Suffix], Suffix [Suffix] [, Suffix] Instantiating the `HumanName` class with a string splits on commas and then spaces, classifying name parts based on placement in the string and matches against known name pieces like titles and suffixes. It correctly handles some common conjunctions and special prefixes to last names like "del". Titles and conjunctions can be chained together to handle complex titles like "Asst Secretary of State". It can also try to correct capitalization of names that are all upper- or lowercase names. It attempts the best guess that can be made with a simple, rule-based approach. Its main use case is English and it is not likely to be useful for languages that do not conform to the supported name structure. It's not perfect, but it gets you pretty far. Installation ------------ :: pip install nameparser If you want to try out the latest code from GitHub you can install with pip using the command below. ``pip install -e git+git://github.com/derek73/python-nameparser.git#egg=nameparser`` If you're looking for a web service, check out `eyeseast's nameparse service `_, a simple Heroku-friendly Flask wrapper for this module. Quick Start Example ------------------- :: >>> from nameparser import HumanName >>> name = HumanName("Dr. Juan Q. Xavier de la Vega III (Doc Vega)") >>> name >>> name.last 'de la Vega' >>> name.as_dict() {'last': 'de la Vega', 'suffix': 'III', 'title': 'Dr.', 'middle': 'Q. Xavier', 'nickname': 'Doc Vega', 'first': 'Juan'} >>> str(name) 'Dr. Juan Q. Xavier de la Vega III (Doc Vega)' >>> name.string_format = "{first} {last}" >>> str(name) 'Juan de la Vega' The parser does not attempt to correct mistakes in the input. It mostly just splits on white space and puts things in buckets based on their position in the string. This also means the difference between 'title' and 'suffix' is positional, not semantic. "Dr" is a title when it comes before the name and a suffix when it comes after. ("Pre-nominal" and "post-nominal" would probably be better names.) :: >>> name = HumanName("1 & 2, 3 4 5, Mr.") >>> name Customization ------------- Your project may need some adjustment for your dataset. You can do this in your own pre- or post-processing, by `customizing the configured pre-defined sets`_ of titles, prefixes, etc., or by subclassing the `HumanName` class. See the `full documentation`_ for more information. `Full documentation`_ ~~~~~~~~~~~~~~~~~~~~~ .. _customizing the configured pre-defined sets: http://nameparser.readthedocs.org/en/latest/customize.html .. _Full documentation: http://nameparser.readthedocs.org/en/latest/ Contributing ------------ If you come across name piece that you think should be in the default config, you're probably right. `Start a New Issue`_ and we can get them added. Please let me know if there are ways this library could be structured to make it easier for you to use in your projects. Read CONTRIBUTING.md_ for more info on running the tests and contributing to the project. **GitHub Project** https://github.com/derek73/python-nameparser .. _CONTRIBUTING.md: https://github.com/derek73/python-nameparser/tree/master/CONTRIBUTING.md .. _Start a New Issue: https://github.com/derek73/python-nameparser/issues .. _click here to propose changes to the titles: https://github.com/derek73/python-nameparser/edit/master/nameparser/config/titles.py Keywords: names,parser Platform: UNKNOWN Classifier: Intended Audience :: Developers Classifier: Operating System :: OS Independent Classifier: License :: OSI Approved :: GNU Library or Lesser General Public License (LGPL) Classifier: Programming Language :: Python Classifier: Programming Language :: Python :: 2.6 Classifier: Programming Language :: Python :: 2.7 Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.2 Classifier: Programming Language :: Python :: 3.3 Classifier: Programming Language :: Python :: 3.4 Classifier: Programming Language :: Python :: 3.5 Classifier: Development Status :: 5 - Production/Stable Classifier: Natural Language :: English Classifier: Topic :: Software Development :: Libraries :: Python Modules Classifier: Topic :: Text Processing :: Linguistic nameparser-0.5.6/nameparser.egg-info/0000755000076500000240000000000013227234650017727 5ustar derekstaff00000000000000nameparser-0.5.6/nameparser.egg-info/PKG-INFO0000644000076500000240000001532113227234650021026 0ustar derekstaff00000000000000Metadata-Version: 1.1 Name: nameparser Version: 0.5.6 Summary: A simple Python module for parsing human names into their individual components. Home-page: https://github.com/derek73/python-nameparser Author: Derek Gulbranson Author-email: derek73@gmail.com License: LGPL Description-Content-Type: UNKNOWN Description: Name Parser =========== .. image:: https://travis-ci.org/derek73/python-nameparser.svg?branch=master :target: https://travis-ci.org/derek73/python-nameparser .. image:: https://badge.fury.io/py/nameparser.svg :target: http://badge.fury.io/py/nameparser A simple Python (3.2+ & 2.6+) module for parsing human names into their individual components. * hn.title * hn.first * hn.middle * hn.last * hn.suffix * hn.nickname Supported Name Structures ~~~~~~~~~~~~~~~~~~~~~~~~~ The supported name structure is generally "Title First Middle Last Suffix", where all pieces are optional. Comma-separated format like "Last, First" is also supported. 1. Title Firstname "Nickname" Middle Middle Lastname Suffix 2. Lastname [Suffix], Title Firstname (Nickname) Middle Middle[,] Suffix [, Suffix] 3. Title Firstname M Lastname [Suffix], Suffix [Suffix] [, Suffix] Instantiating the `HumanName` class with a string splits on commas and then spaces, classifying name parts based on placement in the string and matches against known name pieces like titles and suffixes. It correctly handles some common conjunctions and special prefixes to last names like "del". Titles and conjunctions can be chained together to handle complex titles like "Asst Secretary of State". It can also try to correct capitalization of names that are all upper- or lowercase names. It attempts the best guess that can be made with a simple, rule-based approach. Its main use case is English and it is not likely to be useful for languages that do not conform to the supported name structure. It's not perfect, but it gets you pretty far. Installation ------------ :: pip install nameparser If you want to try out the latest code from GitHub you can install with pip using the command below. ``pip install -e git+git://github.com/derek73/python-nameparser.git#egg=nameparser`` If you're looking for a web service, check out `eyeseast's nameparse service `_, a simple Heroku-friendly Flask wrapper for this module. Quick Start Example ------------------- :: >>> from nameparser import HumanName >>> name = HumanName("Dr. Juan Q. Xavier de la Vega III (Doc Vega)") >>> name >>> name.last 'de la Vega' >>> name.as_dict() {'last': 'de la Vega', 'suffix': 'III', 'title': 'Dr.', 'middle': 'Q. Xavier', 'nickname': 'Doc Vega', 'first': 'Juan'} >>> str(name) 'Dr. Juan Q. Xavier de la Vega III (Doc Vega)' >>> name.string_format = "{first} {last}" >>> str(name) 'Juan de la Vega' The parser does not attempt to correct mistakes in the input. It mostly just splits on white space and puts things in buckets based on their position in the string. This also means the difference between 'title' and 'suffix' is positional, not semantic. "Dr" is a title when it comes before the name and a suffix when it comes after. ("Pre-nominal" and "post-nominal" would probably be better names.) :: >>> name = HumanName("1 & 2, 3 4 5, Mr.") >>> name Customization ------------- Your project may need some adjustment for your dataset. You can do this in your own pre- or post-processing, by `customizing the configured pre-defined sets`_ of titles, prefixes, etc., or by subclassing the `HumanName` class. See the `full documentation`_ for more information. `Full documentation`_ ~~~~~~~~~~~~~~~~~~~~~ .. _customizing the configured pre-defined sets: http://nameparser.readthedocs.org/en/latest/customize.html .. _Full documentation: http://nameparser.readthedocs.org/en/latest/ Contributing ------------ If you come across name piece that you think should be in the default config, you're probably right. `Start a New Issue`_ and we can get them added. Please let me know if there are ways this library could be structured to make it easier for you to use in your projects. Read CONTRIBUTING.md_ for more info on running the tests and contributing to the project. **GitHub Project** https://github.com/derek73/python-nameparser .. _CONTRIBUTING.md: https://github.com/derek73/python-nameparser/tree/master/CONTRIBUTING.md .. _Start a New Issue: https://github.com/derek73/python-nameparser/issues .. _click here to propose changes to the titles: https://github.com/derek73/python-nameparser/edit/master/nameparser/config/titles.py Keywords: names,parser Platform: UNKNOWN Classifier: Intended Audience :: Developers Classifier: Operating System :: OS Independent Classifier: License :: OSI Approved :: GNU Library or Lesser General Public License (LGPL) Classifier: Programming Language :: Python Classifier: Programming Language :: Python :: 2.6 Classifier: Programming Language :: Python :: 2.7 Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.2 Classifier: Programming Language :: Python :: 3.3 Classifier: Programming Language :: Python :: 3.4 Classifier: Programming Language :: Python :: 3.5 Classifier: Development Status :: 5 - Production/Stable Classifier: Natural Language :: English Classifier: Topic :: Software Development :: Libraries :: Python Modules Classifier: Topic :: Text Processing :: Linguistic nameparser-0.5.6/nameparser.egg-info/SOURCES.txt0000644000076500000240000000074213227234650021616 0ustar derekstaff00000000000000AUTHORS LICENSE MANIFEST.in README.rst setup.cfg setup.py tests.py nameparser/__init__.py nameparser/parser.py nameparser/util.py nameparser.egg-info/PKG-INFO nameparser.egg-info/SOURCES.txt nameparser.egg-info/dependency_links.txt nameparser.egg-info/top_level.txt nameparser/config/__init__.py nameparser/config/capitalization.py nameparser/config/conjunctions.py nameparser/config/prefixes.py nameparser/config/regexes.py nameparser/config/suffixes.py nameparser/config/titles.pynameparser-0.5.6/nameparser.egg-info/top_level.txt0000644000076500000240000000001313227234650022453 0ustar derekstaff00000000000000nameparser nameparser-0.5.6/nameparser.egg-info/dependency_links.txt0000644000076500000240000000000113227234650023775 0ustar derekstaff00000000000000 nameparser-0.5.6/LICENSE0000644000076500000240000000117513016171547015112 0ustar derekstaff00000000000000Copyright Derek Gulbranson . http://derekgulbranson.com/ ----- LGPL-2.1+ http://www.opensource.org/licenses/lgpl-license.html This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. nameparser-0.5.6/AUTHORS0000644000076500000240000000004512317106352015143 0ustar derekstaff00000000000000Derek Gulbranson nameparser-0.5.6/MANIFEST.in0000644000076500000240000000010413016171547015632 0ustar derekstaff00000000000000include AUTHORS include LICENSE include README.rst include tests.py nameparser-0.5.6/setup.py0000755000076500000240000000310013016171547015610 0ustar derekstaff00000000000000#!/usr/bin/env python try: from setuptools import setup except ImportError: from distutils.core import setup import nameparser import os def read(fname): return open(os.path.join(os.path.dirname(__file__), fname)).read() README = read('README.rst') setup(name='nameparser', packages = ['nameparser','nameparser.config'], description = 'A simple Python module for parsing human names into their individual components.', long_description = README, version = nameparser.__version__, url = nameparser.__url__, author = nameparser.__author__, author_email = nameparser.__author_email__, license = nameparser.__license__, keywords = ['names','parser'], classifiers = [ 'Intended Audience :: Developers', 'Operating System :: OS Independent', "License :: OSI Approved :: GNU Library or Lesser General Public License (LGPL)", 'Programming Language :: Python', 'Programming Language :: Python :: 2.6', 'Programming Language :: Python :: 2.7', 'Programming Language :: Python :: 3', 'Programming Language :: Python :: 3.2', 'Programming Language :: Python :: 3.3', 'Programming Language :: Python :: 3.4', 'Programming Language :: Python :: 3.5', 'Development Status :: 5 - Production/Stable', 'Natural Language :: English', "Topic :: Software Development :: Libraries :: Python Modules", 'Topic :: Text Processing :: Linguistic', ] ) nameparser-0.5.6/nameparser/0000755000076500000240000000000013227234650016235 5ustar derekstaff00000000000000nameparser-0.5.6/nameparser/config/0000755000076500000240000000000013227234650017502 5ustar derekstaff00000000000000nameparser-0.5.6/nameparser/config/conjunctions.py0000644000076500000240000000055413150650632022571 0ustar derekstaff00000000000000# -*- coding: utf-8 -*- from __future__ import unicode_literals CONJUNCTIONS = set([ '&', 'and', 'et', 'e', 'of', 'the', 'und', 'y', ]) """ Pieces that should join to their neighboring pieces, e.g. "and", "y" and "&". "of" and "the" are also include to facilitate joining multiple titles, e.g. "President of the United States". """nameparser-0.5.6/nameparser/config/suffixes.py0000644000076500000240000000311013225542711021701 0ustar derekstaff00000000000000# -*- coding: utf-8 -*- from __future__ import unicode_literals SUFFIX_NOT_ACRONYMS = set([ 'dr', 'esq', 'esquire', 'jr', 'jnr', 'sr', 'snr', '2', 'i', 'ii', 'iii', 'iv', 'v', ]) """ Post-nominal pieces that are not acronyms. The parser does not remove periods when matching against these pieces. """ SUFFIX_ACRONYMS = set([ 'ae', 'afc', 'afm', 'arrc', 'bart', 'bem', 'bt', 'cb', 'cbe', 'cfp', 'cgc', 'cgm', 'ch', 'chfc', 'cie', 'clu', 'cmg', 'cpa', 'cpm', 'csi', 'csm', 'cvo', 'dbe', 'dcb', 'dcm', 'dcmg', 'dcvo', 'dds', 'dfc', 'dfm', 'dmd', 'do', 'dpm', 'dsc', 'dsm', 'dso', 'dvm', 'ed', 'erd', 'gbe', 'gc', 'gcb', 'gcie', 'gcmg', 'gcsi', 'gcvo', 'gm', 'idsm', 'iom', 'iso', 'jd', 'kbe', 'kcb', 'kcie', 'kcmg', 'kcsi', 'kcvo', 'kg', 'kp', 'kt', 'lg', 'lt', 'lvo', 'ma', 'mba', 'mbe', 'mc', 'md', 'mm', 'mp', 'msc' 'msm', 'mvo', 'obe', 'obi', 'om', 'phd', 'phr', 'pmp', 'qam', 'qc', 'qfsm', 'qgm', 'qpm', 'rd', 'rrc', 'rvm', 'sgm', 'td', 'ud', 'vc', 'vd', 'vrd', ]) """ Post-nominal acronyms. Titles, degrees and other things people stick after their name that may or may not have periods between the letters. The parser removes periods when matching against these pieces. """ nameparser-0.5.6/nameparser/config/__init__.py0000644000076500000240000001721213212135321021603 0ustar derekstaff00000000000000# -*- coding: utf-8 -*- """ The :py:mod:`nameparser.config` module manages the configuration of the nameparser. A module-level instance of :py:class:`~nameparser.config.Constants` is created and used by default for all HumanName instances. You can adjust the entire module's configuration by importing this instance and changing it. :: >>> from nameparser.config import CONSTANTS >>> CONSTANTS.titles.remove('hon').add('chemistry','dean') # doctest: +ELLIPSIS SetManager(set([u'msgt', ..., u'adjutant'])) You can also adjust the configuration of individual instances by passing ``None`` as the second argument upon instantiation. :: >>> from nameparser import HumanName >>> hn = HumanName("Dean Robert Johns", None) >>> hn.C.titles.add('dean') # doctest: +ELLIPSIS SetManager(set([u'msgt', ..., u'adjutant'])) >>> hn.parse_full_name() # need to run this again after config changes **Potential Gotcha**: If you do not pass ``None`` as the second argument, ``hn.C`` will be a reference to the module config, possibly yielding unexpected results. See `Customizing the Parser `_. """ from __future__ import unicode_literals import collections import sys from nameparser.util import binary_type from nameparser.util import lc from nameparser.config.prefixes import PREFIXES from nameparser.config.capitalization import CAPITALIZATION_EXCEPTIONS from nameparser.config.conjunctions import CONJUNCTIONS from nameparser.config.suffixes import SUFFIX_ACRONYMS from nameparser.config.suffixes import SUFFIX_NOT_ACRONYMS from nameparser.config.titles import TITLES from nameparser.config.titles import FIRST_NAME_TITLES from nameparser.config.regexes import REGEXES DEFAULT_ENCODING = 'UTF-8' class SetManager(collections.Set): ''' Easily add and remove config variables per module or instance. Subclass of ``collections.Set``. Only special functionality beyond that provided by set() is to normalize constants for comparison (lower case, no periods) when they are add()ed and remove()d and allow passing multiple string arguments to the :py:func:`add()` and :py:func:`remove()` methods. ''' def __init__(self, elements): self.elements = set(elements) def __call__(self): return self.elements def __repr__(self): return "SetManager({})".format(self.elements) # used for docs def __iter__(self): return iter(self.elements) def __contains__(self, value): return value in self.elements def __len__(self): return len(self.elements) def next(self): return self.__next__() def __next__(self): if self.count >= len(self.elements): self.count = 0 raise StopIteration else: c = self.count self.count = c + 1 return getattr(self, self.elements[c]) or next(self) def add_with_encoding(self, s, encoding=None): """ Add the lower case and no-period version of the string to the set. Pass an explicit `encoding` parameter to specify the encoding of binary strings that are not DEFAULT_ENCODING (UTF-8). """ encoding = encoding or sys.stdin.encoding or DEFAULT_ENCODING if type(s) == binary_type: s = s.decode(encoding) self.elements.add(lc(s)) def add(self, *strings): """ Add the lower case and no-period version of the string arguments to the set. Can pass a list of strings. Returns ``self`` for chaining. """ [self.add_with_encoding(s) for s in strings] return self def remove(self, *strings): """ Remove the lower case and no-period version of the string arguments from the set. Returns ``self`` for chaining. """ [self.elements.remove(lc(s)) for s in strings if lc(s) in self.elements] return self class TupleManager(dict): ''' A dictionary with dot.notation access. Subclass of ``dict``. Makes the tuple constants more friendly. ''' def __getattr__(self, attr): return self.get(attr) __setattr__= dict.__setitem__ __delattr__= dict.__delitem__ def __getstate__(self): return dict(self) def __setstate__(self, state): self.__init__(state) def __reduce__(self): return (TupleManager, (), self.__getstate__()) class Constants(object): """ An instance of this class hold all of the configuration constants for the parser. :param set prefixes: :py:attr:`prefixes` wrapped with :py:class:`SetManager`. :param set titles: :py:attr:`titles` wrapped with :py:class:`SetManager`. :param set first_name_titles: :py:attr:`~titles.FIRST_NAME_TITLES` wrapped with :py:class:`SetManager`. :param set suffix_acronyms: :py:attr:`~suffixes.SUFFIX_ACRONYMS` wrapped with :py:class:`SetManager`. :param set suffix_not_acronyms: :py:attr:`~suffixes.SUFFIX_NOT_ACRONYMS` wrapped with :py:class:`SetManager`. :param set conjunctions: :py:attr:`conjunctions` wrapped with :py:class:`SetManager`. :type capitalization_exceptions: tuple or dict :param capitalization_exceptions: :py:attr:`~capitalization.CAPITALIZATION_EXCEPTIONS` wrapped with :py:class:`TupleManager`. :type regexes: tuple or dict :param regexes: :py:attr:`regexes` wrapped with :py:class:`TupleManager`. """ string_format = "{title} {first} {middle} {last} {suffix} ({nickname})" """ The default string format use for all new `HumanName` instances. """ empty_attribute_default = '' """ Default return value for empty attributes. .. doctest:: >>> from nameparser.config import CONSTANTS >>> CONSTANTS.empty_attribute_default = None >>> name = HumanName("John Doe") >>> name.title None >>>name.first 'John' """ def __init__(self, prefixes=PREFIXES, suffix_acronyms=SUFFIX_ACRONYMS, suffix_not_acronyms=SUFFIX_NOT_ACRONYMS, titles=TITLES, first_name_titles=FIRST_NAME_TITLES, conjunctions=CONJUNCTIONS, capitalization_exceptions=CAPITALIZATION_EXCEPTIONS, regexes=REGEXES ): self.prefixes = SetManager(prefixes) self.suffix_acronyms = SetManager(suffix_acronyms) self.suffix_not_acronyms = SetManager(suffix_not_acronyms) self.titles = SetManager(titles) self.first_name_titles = SetManager(first_name_titles) self.conjunctions = SetManager(conjunctions) self.capitalization_exceptions = TupleManager(capitalization_exceptions) self.regexes = TupleManager(regexes) self._pst = None @property def suffixes_prefixes_titles(self): if not self._pst: self._pst = self.prefixes | self.suffix_acronyms | self.suffix_not_acronyms | self.titles return self._pst def __repr__(self): return "" def __setstate__(self, state): self.__init__(state) def __getstate__(self): attrs = [x for x in dir(self) if not x.startswith('_')] return dict([(a,getattr(self, a)) for a in attrs]) #: A module-level instance of the :py:class:`Constants()` class. #: Provides a common instance for the module to share #: to easily adjust configuration for the entire module. #: See `Customizing the Parser with Your Own Configuration `_. CONSTANTS = Constants() nameparser-0.5.6/nameparser/config/titles.py0000644000076500000240000002275613225542654021400 0ustar derekstaff00000000000000# -*- coding: utf-8 -*- from __future__ import unicode_literals FIRST_NAME_TITLES = set([ 'aunt', 'auntie', 'brother', 'dame', 'father', 'king', 'maid', 'master', 'mother', 'pope', 'queen', 'sir', 'sister', 'uncle', 'sheikh', 'sheik', 'shaik', 'shayk', 'shaykh', 'shaikh', 'cheikh', 'shekh', ]) """ When these titles appear with a single other name, that name is a first name, e.g. "Sir John", "Sister Mary", "Queen Elizabeth". """ #: **Cannot include things that could also be first names**, e.g. "dean". #: Many of these from wikipedia: https://en.wikipedia.org/wiki/Title. #: The parser recognizes chains of these including conjunctions allowing #: recognition titles like "Deputy Secretary of State". TITLES = FIRST_NAME_TITLES | set([ "attaché", "chargé d'affaires", "king's", "marchioness", "marquess", "marquis", "marquise", "queen's", '10th', '1lt', '1sgt', '1st', '1stlt', '1stsgt', '2lt', '2nd', '2ndlt', '3rd', '4th', '5th', '6th', '7th', '8th', '9th', 'a1c', 'ab', 'abbess', 'abbot', 'abolitionist', 'academic', 'acolyte', 'activist', 'actor ', 'actress', 'adept', 'adjutant', 'adm', 'admiral', 'advertising', 'adviser', 'advocate', 'air', 'akhoond', 'alderman', 'almoner', 'ambassador', 'amn', 'analytics', 'anarchist', 'animator', 'anthropologist', 'appellate', 'apprentice', 'arbitrator', 'archbishop', 'archdeacon', 'archdruid', 'archduchess', 'archduke', 'archeologist', 'architect', 'arhat', 'army', 'arranger', 'assistant', 'assoc', 'associate', 'asst', 'astronomer', 'attache', 'attorney', 'author', 'award-winning', 'ayatollah', 'baba', 'bailiff', 'ballet', 'bandleader', 'banker', 'banner', 'bard', 'baron', 'barrister', 'baseball', 'bearer', 'behavioral', 'bench', 'bg', 'bgen', 'biblical', 'bibliographer', 'biochemist', 'biographer', 'biologist', 'bishop', 'blessed', 'blogger', 'blues', 'bodhisattva', 'bookseller', 'botanist', 'brigadier', 'briggen', 'british', 'broadcaster', 'buddha', 'burgess', 'burlesque', 'business', 'businessman', 'businesswoman', 'bwana', 'canon', 'capt', 'captain', 'cardinal', 'cartographer', 'cartoonist', 'catholicos', 'ccmsgt', 'cdr', 'celebrity', 'ceo', 'cfo', 'chair', 'chairs', 'chancellor', 'chaplain', 'chef', 'chemist', 'chief', 'chieftain', 'choreographer', 'civil', 'classical', 'clergyman', 'clerk', 'cmsaf', 'cmsgt', 'co-chair', 'co-chairs', 'co-founder', 'coach', 'col', 'collector', 'colonel', 'comedian', 'comedienne', 'comic', 'commander', 'commander-in-chief', 'commodore', 'composer', 'compositeur', 'comptroller', 'computer', 'comtesse', 'conductor', 'consultant', 'contessa', 'controller', 'corporal', 'corporate', 'correspondent', 'councillor', 'counselor', 'count', 'countess', 'courtier', 'cpl', 'cpo', 'cpt', 'credit', 'criminal', 'criminologist', 'critic', 'csm', 'curator', 'customs', 'cwo-2', 'cwo-3', 'cwo-4', 'cwo-5', 'cwo2', 'cwo3', 'cwo4', 'cwo5', 'cyclist', 'dancer', 'deacon', 'delegate', 'deputy', 'designated', 'designer', 'detective', 'developer', 'diplomat', 'dir', 'director', 'discovery', 'dissident', 'district', 'division', 'do', 'docent', 'docket', 'doctor', 'doyen', 'dpty', 'dr', 'dramatist', 'druid', 'drummer', 'duchesse', 'duke', 'dutchess', 'ecologist', 'economist', 'editor', 'edmi', 'edohen', 'educator', 'effendi', 'ekegbian', 'elder', 'elerunwon', 'eminence', 'emperor', 'empress', 'engineer', 'english', 'ens', 'entertainer', 'entrepreneur', 'envoy', 'essayist', 'evangelist', 'excellency', 'excellent', 'exec', 'executive', 'expert', 'fadm', 'family', 'federal', 'field', 'film', 'financial', 'first', 'flag', 'flying', 'foreign', 'forester', 'founder', 'friar', 'gaf', 'gen', 'general', 'generalissimo', 'gentiluomo', 'giani', 'goodman', 'goodwife', 'governor', 'graf', 'grand', 'group', 'guitarist', 'guru', 'gyani', 'gysgt', 'hajji', 'headman', 'heir', 'heiress', 'her', 'hereditary', 'high', 'highness', 'his', 'historian', 'historicus', 'historien', 'holiness', 'hon', # sorry Hon Solo, but judges seem more common. 'honorable', 'honourable', 'host', 'illustrator', 'imam', 'industrialist', 'information', 'instructor', 'intelligence', 'intendant', 'inventor', 'investigator', 'investor', 'journalist', 'journeyman', 'jr', 'judge', 'judicial', 'junior', 'jurist', 'keyboardist', 'kingdom', 'knowledge', 'lady', 'lama', 'lamido', 'law', 'lawyer', 'lcdr', 'lcpl', 'leader', 'lecturer', 'legal', 'librarian', 'lieutenant', 'linguist', 'literary', 'lord', 'lt', 'ltc', 'ltcol', 'ltg', 'ltgen', 'ltjg', 'lyricist', 'madam', 'madame', 'mademoiselle', 'mag', 'mag-judge', 'mag/judge', 'magistrate', 'magistrate-judge', 'magnate', 'maharajah', 'maharani', 'mahdi', 'maj', 'majesty', 'majgen', 'manager', 'marcher', 'marchess', 'marketing', 'marquis', 'mathematician', 'mathematics', 'matriarch', 'mayor', 'mcpo', 'mcpoc', 'mcpon', 'md', 'member', 'memoirist', 'merchant', 'metropolitan', 'mg', 'mgr', 'mgysgt', 'military', 'minister', 'miss', 'misses', 'missionary', 'mister', 'mlle', 'mme', 'mobster', 'model', 'monk', 'monsignor', 'most', 'mountaineer', 'mpco-cg', 'mr', 'mrs', 'ms', 'msg', 'msgt', 'mufti', 'mullah', 'municipal', 'murshid', 'musician', 'musicologist', 'mystery', 'nanny', 'narrator', 'national', 'naturalist', 'navy', 'neuroscientist', 'novelist', 'nurse', 'obstetritian', 'officer', 'opera', 'operating', 'ornithologist', 'painter', 'paleontologist', 'pastor', 'patriarch', 'pediatrician', 'personality', 'petty', 'pfc', 'pharaoh', 'phd', 'philantropist', 'philosopher', 'photographer', 'physician', 'physicist', 'pianist', 'pilot', 'pioneer', 'pir', 'player', 'playwright', 'po1', 'po2', 'po3', 'poet', 'police', 'political', 'politician', 'prefect', 'prelate', 'premier', 'pres', 'presbyter', 'president', 'presiding', 'priest', 'priestess', 'primate', 'prime', 'prin', 'prince', 'princess', 'principal', 'printer', 'printmaker', 'prior', 'private', 'pro', 'producer', 'prof', 'professor', 'provost', 'pslc', 'psychiatrist', 'psychologist', 'publisher', 'pursuivant', 'pv2', 'pvt', 'rabbi', 'radio', 'radm', 'rangatira', 'ranger', 'rdml', 'rear', 'rebbe', 'registrar', 'rep', 'representative', 'researcher', 'resident', 'rev', 'revenue', 'reverend', 'right', 'risk', 'rock', 'royal', 'rt', 'sa', 'sailor', 'saint', 'sainte', 'saoshyant', 'satirist', 'scholar', 'schoolmaster', 'scientist', 'scpo', 'screenwriter', 'se', 'secretary', 'security', 'seigneur', 'senator', 'senior', 'senior-judge', 'sergeant', 'servant', 'sfc', 'sgm', 'sgt', 'sgtmaj', 'sgtmajmc', 'shehu', 'sheikh', 'sheriff', 'siddha', 'singer', 'singer-songwriter', 'sma', 'smsgt', 'sn', 'soccer', 'social', 'sociologist', 'software', 'soldier', 'solicitor', 'soprano', 'spc', 'speaker', 'special', 'sr', 'sra', 'ssg', 'ssgt', 'staff', 'state', 'states', 'strategy', 'subaltern', 'subedar', 'suffragist', 'sultan', 'sultana', 'superior', 'supreme', 'surgeon', 'swami', 'swordbearer', 'sysselmann', 'tax', 'teacher', 'technical', 'technologist', 'television ', 'tenor', 'theater', 'theatre', 'theologian', 'theorist', 'timi', 'tirthankar', 'translator', 'travel', 'treasurer', 'tsar', 'tsarina', 'tsgt', 'uk', 'united', 'us', 'vadm', 'vardapet', 'vc', 'venerable', 'verderer', 'vicar', 'vice', 'viscount', 'vizier', 'vocalist', 'voice', 'warden', 'warrant', 'wing', 'wm', 'wo-1', 'wo1', 'wo2', 'wo3', 'wo4', 'wo5', 'woodman', 'writer', 'zoologist', ]) nameparser-0.5.6/nameparser/config/prefixes.py0000644000076500000240000000077613212122340021675 0ustar derekstaff00000000000000# -*- coding: utf-8 -*- from __future__ import unicode_literals #: Name pieces that appear before a last name. They join to the piece that follows them to make one new piece. PREFIXES = set([ 'abu', 'bin', 'bon', 'da', 'dal', 'de', 'degli', 'dei', 'del', 'dela', 'della', 'delle', 'delli', 'dello', 'der', 'di', 'du', 'dí', 'ibn', 'la', 'le', 'san', 'santa', 'st', 'ste', 'van', 'vel', 'von', ]) nameparser-0.5.6/nameparser/config/regexes.py0000644000076500000240000000222113212122340021475 0ustar derekstaff00000000000000# -*- coding: utf-8 -*- from __future__ import unicode_literals import re # emoji regex from https://stackoverflow.com/questions/26568722/remove-unicode-emoji-using-re-in-python try: # Wide UCS-4 build re_emoji = re.compile('[' '\U0001F300-\U0001F64F' '\U0001F680-\U0001F6FF' '\u2600-\u26FF\u2700-\u27BF]+', re.UNICODE) except re.error: # Narrow UCS-2 build re_emoji = re.compile('(' '\ud83c[\udf00-\udfff]|' '\ud83d[\udc00-\ude4f\ude80-\udeff]|' '[\u2600-\u26FF\u2700-\u27BF])+', re.UNICODE) REGEXES = set([ ("spaces", re.compile(r"\s+", re.U)), ("word", re.compile(r"(\w|\.)+", re.U)), ("mac", re.compile(r'^(ma?c)(\w{2,})', re.I | re.U)), ("initial", re.compile(r'^(\w\.|[A-Z])?$', re.U)), ("nickname", re.compile(r'\s*?[\("](.+?)[\)"]', re.U)), ("roman_numeral", re.compile(r'^(X|IX|IV|V?I{0,3})$', re.I | re.U)), ("no_vowels",re.compile(r'^[^aeyiuo]+$', re.I | re.U)), ("period_not_at_end",re.compile(r'.*\..+$', re.I | re.U)), ("emoji",re_emoji), ]) """ All regular expressions used by the parser are precompiled and stored in the config. """ nameparser-0.5.6/nameparser/config/capitalization.py0000644000076500000240000000041613150650632023065 0ustar derekstaff00000000000000# -*- coding: utf-8 -*- from __future__ import unicode_literals CAPITALIZATION_EXCEPTIONS = ( ('ii' ,'II'), ('iii','III'), ('iv' ,'IV'), ('md' ,'M.D.'), ('phd','Ph.D.'), ) """ Any pieces that are not capitalized by capitalizing the first letter. """nameparser-0.5.6/nameparser/util.py0000644000076500000240000000154613227234112017563 0ustar derekstaff00000000000000import logging # http://code.google.com/p/python-nameparser/issues/detail?id=10 log = logging.getLogger('HumanName') try: log.addHandler(logging.NullHandler()) except AttributeError: class NullHandler(logging.Handler): def emit(self, record): pass log.addHandler(NullHandler()) log.setLevel(logging.ERROR) import sys if sys.version_info[0] < 3: text_type = unicode binary_type = str def u(x, encoding=None): if encoding: return unicode(x, encoding) else: return unicode(x) else: text_type = str binary_type = bytes def u(x, encoding=None): return text_type(x) text_types = (text_type, binary_type) def lc(value): """Lower case and remove any periods to normalize for comparison.""" if not value: return '' return value.lower().strip('.') nameparser-0.5.6/nameparser/__init__.py0000644000076500000240000000037513227234323020350 0ustar derekstaff00000000000000VERSION = (0, 5, 6) __version__ = '.'.join(map(str, VERSION)) __author__ = "Derek Gulbranson" __author_email__ = 'derek73@gmail.com' __license__ = "LGPL" __url__ = "https://github.com/derek73/python-nameparser" from nameparser.parser import HumanName nameparser-0.5.6/nameparser/parser.py0000644000076500000240000007235513227234112020110 0ustar derekstaff00000000000000# -*- coding: utf-8 -*- from __future__ import unicode_literals import sys from operator import itemgetter from itertools import groupby from nameparser.util import u from nameparser.util import text_types, binary_type from nameparser.util import lc from nameparser.util import log from nameparser.config import CONSTANTS from nameparser.config import Constants from nameparser.config import DEFAULT_ENCODING ENCODING = 'utf-8' def group_contiguous_integers(data): """ return list of tuples containing first and last index position of contiguous numbers in a series """ ranges = [] for key, group in groupby(enumerate(data), lambda i: i[0] - i[1]): group = list(map(itemgetter(1), group)) if len(group) > 1: ranges.append((group[0], group[-1])) return ranges class HumanName(object): """ Parse a person's name into individual components. Instantiation assigns to ``full_name``, and assignment to :py:attr:`full_name` triggers :py:func:`parse_full_name`. After parsing the name, these instance attributes are available. **HumanName Instance Attributes** * :py:attr:`title` * :py:attr:`first` * :py:attr:`middle` * :py:attr:`last` * :py:attr:`suffix` * :py:attr:`nickname` :param str full_name: The name string to be parsed. :param constants constants: a :py:class:`~nameparser.config.Constants` instance. Pass ``None`` for `per-instance config `_. :param str encoding: string representing the encoding of your input :param str string_format: python string formatting """ C = CONSTANTS """ A reference to the configuration for this instance, which may or may not be a reference to the shared, module-wide instance at :py:mod:`~nameparser.config.CONSTANTS`. See `Customizing the Parser `_. """ original = '' """ The original string, untouched by the parser. """ _count = 0 _members = ['title','first','middle','last','suffix','nickname'] unparsable = True _full_name = '' def __init__(self, full_name="", constants=CONSTANTS, encoding=DEFAULT_ENCODING, string_format=None): self.C = constants if type(self.C) is not type(CONSTANTS): self.C = Constants() self.encoding = encoding self.string_format = string_format or self.C.string_format # full_name setter triggers the parse self.full_name = full_name def __iter__(self): return self def __len__(self): l = 0 for x in self: l += 1 return l def __eq__(self, other): """ HumanName instances are equal to other objects whose lower case unicode representation is the same. """ return (u(self)).lower() == (u(other)).lower() def __ne__(self, other): return not (u(self)).lower() == (u(other)).lower() def __getitem__(self, key): if isinstance(key, slice): return [getattr(self, x) for x in self._members[key]] else: return getattr(self, key) def __setitem__(self, key, value): if key in self._members: self._set_list(key, value) else: raise KeyError("Not a valid HumanName attribute", key) def next(self): return self.__next__() def __next__(self): if self._count >= len(self._members): self._count = 0 raise StopIteration else: c = self._count self._count = c + 1 return getattr(self, self._members[c]) or next(self) def __unicode__(self): if self.string_format: # string_format = "{title} {first} {middle} {last} {suffix} ({nickname})" _s = self.string_format.format(**self.as_dict()) # remove trailing punctuation from missing nicknames _s = _s.replace(str(self.C.empty_attribute_default),'').replace(" ()","").replace(" ''","").replace(' ""',"") return self.collapse_whitespace(_s).strip(', ') return " ".join(self) def __str__(self): if sys.version_info[0] >= 3: return self.__unicode__() return self.__unicode__().encode(self.encoding) def __repr__(self): if self.unparsable: _string = "<%(class)s : [ Unparsable ] >" % {'class': self.__class__.__name__,} else: _string = "<%(class)s : [\n\ttitle: '%(title)s' \n\tfirst: '%(first)s' \n\tmiddle: '%(middle)s' \n\tlast: '%(last)s' \n\tsuffix: '%(suffix)s'\n\tnickname: '%(nickname)s'\n]>" % { 'class': self.__class__.__name__, 'title': self.title or '', 'first': self.first or '', 'middle': self.middle or '', 'last': self.last or '', 'suffix': self.suffix or '', 'nickname': self.nickname or '', } if sys.version_info[0] >= 3: return _string return _string.encode(self.encoding) def as_dict(self, include_empty=True): """ Return the parsed name as a dictionary of its attributes. :param bool include_empty: Include keys in the dictionary for empty name attributes. :rtype: dict .. doctest:: >>> name = HumanName("Bob Dole") >>> name.as_dict() {'last': 'Dole', 'suffix': '', 'title': '', 'middle': '', 'nickname': '', 'first': 'Bob'} >>> name.as_dict(False) {'last': 'Dole', 'first': 'Bob'} """ d = {} for m in self._members: if include_empty: d[m] = getattr(self, m) else: val = getattr(self, m) if val: d[m] = val return d @property def has_own_config(self): """ True if this instance is not using the shared module-level configuration. """ return self.C is not CONSTANTS ### attributes @property def title(self): """ The person's titles. Any string of consecutive pieces in :py:mod:`~nameparser.config.titles` or :py:mod:`~nameparser.config.conjunctions` at the beginning of :py:attr:`full_name`. """ return " ".join(self.title_list) or self.C.empty_attribute_default @property def first(self): """ The person's first name. The first name piece after any known :py:attr:`title` pieces parsed from :py:attr:`full_name`. """ return " ".join(self.first_list) or self.C.empty_attribute_default @property def middle(self): """ The person's middle names. All name pieces after the first name and before the last name parsed from :py:attr:`full_name`. """ return " ".join(self.middle_list) or self.C.empty_attribute_default @property def last(self): """ The person's last name. The last name piece parsed from :py:attr:`full_name`. """ return " ".join(self.last_list) or self.C.empty_attribute_default @property def suffix(self): """ The persons's suffixes. Pieces at the end of the name that are found in :py:mod:`~nameparser.config.suffixes`, or pieces that are at the end of comma separated formats, e.g. "Lastname, Title Firstname Middle[,] Suffix [, Suffix]" parsed from :py:attr:`full_name`. """ return ", ".join(self.suffix_list) or self.C.empty_attribute_default @property def nickname(self): """ The person's nicknames. Any text found inside of quotes (``""``) or parenthesis (``()``) """ return " ".join(self.nickname_list) or self.C.empty_attribute_default ### setter methods def _set_list(self, attr, value): if isinstance(value, list): val = value elif isinstance(value, text_types): val = [value] elif value is None: val = [] else: raise TypeError( "Can only assign strings, lists or None to name attributes." " Got {0}".format(type(value))) setattr(self, attr+"_list", self.parse_pieces(val)) @title.setter def title(self, value): self._set_list('title', value) @first.setter def first(self, value): self._set_list('first', value) @middle.setter def middle(self, value): self._set_list('middle', value) @last.setter def last(self, value): self._set_list('last', value) @suffix.setter def suffix(self, value): self._set_list('suffix', value) @nickname.setter def nickname(self, value): self._set_list('nickname', value) ### Parse helpers def is_title(self, value): """Is in the :py:data:`~nameparser.config.titles.TITLES` set.""" return lc(value) in self.C.titles def is_conjunction(self, piece): """Is in the conjuctions set and not :py:func:`is_an_initial()`.""" return piece.lower() in self.C.conjunctions and not self.is_an_initial(piece) def is_prefix(self, piece): """ Lowercase and no periods version of piece is in the `~nameparser.config.titles.PREFIXES` set. """ return lc(piece) in self.C.prefixes def is_roman_numeral(self, value): """ Matches the ``roman_numeral`` regular expression in :py:data:`~nameparser.config.regexes.REGEXES`. """ return bool(self.C.regexes.roman_numeral.match(value)) def is_suffix(self, piece): """ Is in the suffixes set and not :py:func:`is_an_initial()`. Some suffixes may be acronyms (M.B.A) while some are not (Jr.), so we remove the periods from `piece` when testing against `C.suffix_acronyms`. """ # suffixes may have periods inside them like "M.D." return ((lc(piece).replace('.','') in self.C.suffix_acronyms) \ or (lc(piece) in self.C.suffix_not_acronyms)) \ and not self.is_an_initial(piece) def are_suffixes(self, pieces): """Return True if all pieces are suffixes.""" for piece in pieces: if not self.is_suffix(piece): return False return True def is_rootname(self, piece): """ Is not a known title, suffix or prefix. Just first, middle, last names. """ return lc(piece) not in self.C.suffixes_prefixes_titles \ and not self.is_an_initial(piece) def is_an_initial(self, value): """ Words with a single period at the end, or a single uppercase letter. Matches the ``initial`` regular expression in :py:data:`~nameparser.config.regexes.REGEXES`. """ return bool(self.C.regexes.initial.match(value)) ### full_name parser @property def full_name(self): """The name string to be parsed.""" return self._full_name @full_name.setter def full_name(self, value): self.original = value self._full_name = value if isinstance(value, binary_type): self._full_name = value.decode(self.encoding) self.parse_full_name() def collapse_whitespace(self, string): # collapse multiple spaces into single space return self.C.regexes.spaces.sub(" ", string.strip()) def pre_process(self): """ This method happens at the beginning of the :py:func:`parse_full_name` before any other processing of the string aside from unicode normalization, so it's a good place to do any custom handling in a subclass. Runs :py:func:`parse_nicknames`. """ self.parse_nicknames() self.squash_emoji() def post_process(self): """ This happens at the end of the :py:func:`parse_full_name` after all other processing has taken place. Runs :py:func:`handle_firstnames`. """ self.handle_firstnames() def parse_nicknames(self): """ The content of parenthesis or double quotes in the name will be treated as nicknames. This happens before any other processing of the name. """ # https://code.google.com/p/python-nameparser/issues/detail?id=33 re_nickname = self.C.regexes.nickname if re_nickname.search(self._full_name): self.nickname_list = re_nickname.findall(self._full_name) self._full_name = re_nickname.sub('', self._full_name) def squash_emoji(self): """ Remove emoji from the input string. """ re_emoji = self.C.regexes.emoji if re_emoji and re_emoji.search(self._full_name): self._full_name = re_emoji.sub('', self._full_name) def handle_firstnames(self): """ If there are only two parts and one is a title, assume it's a last name instead of a first name. e.g. Mr. Johnson. Unless it's a special title like "Sir", then when it's followed by a single name that name is always a first name. """ if self.title \ and len(self) == 2 \ and not lc(self.title) in self.C.first_name_titles: self.last, self.first = self.first, self.last def parse_full_name(self): """ The main parse method for the parser. This method is run upon assignment to the :py:attr:`full_name` attribute or instantiation. Basic flow is to hand off to :py:func:`pre_process` to handle nicknames. It then splits on commas and chooses a code path depending on the number of commas. :py:func:`parse_pieces` then splits those parts on spaces and :py:func:`join_on_conjunctions` joins any pieces next to conjunctions. """ self.title_list = [] self.first_list = [] self.middle_list = [] self.last_list = [] self.suffix_list = [] self.nickname_list = [] self.unparsable = True self.pre_process() self._full_name = self.collapse_whitespace(self._full_name) # break up full_name by commas parts = [x.strip() for x in self._full_name.split(",")] log.debug("full_name: {0}".format(self._full_name)) log.debug("parts: {0}".format(parts)) if len(parts) == 1: # no commas, title first middle middle middle last suffix # part[0] pieces = self.parse_pieces(parts) p_len = len(pieces) for i, piece in enumerate(pieces): try: nxt = pieces[i + 1] except IndexError: nxt = None # title must have a next piece, unless it's just a title if self.is_title(piece) \ and (nxt or p_len == 1) \ and not self.first: self.title_list.append(piece) continue if not self.first: self.first_list.append(piece) continue if self.are_suffixes(pieces[i+1:]) or \ ( # if the next piece is the last piece and a roman # numeral but this piece is not an initial self.is_roman_numeral(nxt) and i == p_len - 2 and not self.is_an_initial(piece) ): self.last_list.append(piece) self.suffix_list += pieces[i+1:] break if not nxt: self.last_list.append(piece) continue self.middle_list.append(piece) else: # if all the end parts are suffixes and there is more than one piece # in the first part. (Suffixes will never appear after last names # only, and allows potential first names to be in suffixes, e.g. # "Johnson, Bart" if self.are_suffixes(parts[1].split(' ')) \ and len(parts[0].split(' ')) > 1: # suffix comma: # title first middle last [suffix], suffix [suffix] [, suffix] # parts[0], parts[1:...] self.suffix_list += parts[1:] pieces = self.parse_pieces(parts[0].split(' ')) log.debug("pieces: {0}".format(u(pieces))) for i, piece in enumerate(pieces): try: nxt = pieces[i + 1] except IndexError: nxt = None if self.is_title(piece) \ and (nxt or len(pieces) == 1) \ and not self.first: self.title_list.append(piece) continue if not self.first: self.first_list.append(piece) continue if self.are_suffixes(pieces[i+1:]): self.last_list.append(piece) self.suffix_list = pieces[i+1:] + self.suffix_list break if not nxt: self.last_list.append(piece) continue self.middle_list.append(piece) else: # lastname comma: # last [suffix], title first middles[,] suffix [,suffix] # parts[0], parts[1], parts[2:...] pieces = self.parse_pieces(parts[1].split(' '), 1) log.debug("pieces: {0}".format(u(pieces))) # lastname part may have suffixes in it lastname_pieces = self.parse_pieces(parts[0].split(' '), 1) for piece in lastname_pieces: # the first one is always a last name, even if it look like # a suffix if self.is_suffix(piece) and len(self.last_list) > 0: self.suffix_list.append(piece) else: self.last_list.append(piece) for i, piece in enumerate(pieces): try: nxt = pieces[i + 1] except IndexError: nxt = None if self.is_title(piece) \ and (nxt or len(pieces) == 1) \ and not self.first: self.title_list.append(piece) continue if not self.first: self.first_list.append(piece) continue if self.is_suffix(piece): self.suffix_list.append(piece) continue self.middle_list.append(piece) try: if parts[2]: self.suffix_list += parts[2:] except IndexError: pass if len(self) < 0: log.info("Unparsable: \"{}\" ".format(self.original)) else: self.unparsable = False self.post_process() def parse_pieces(self, parts, additional_parts_count=0): """ Split parts on spaces and remove commas, join on conjunctions and lastname prefixes. If parts have periods in the middle, try splitting on periods and check if the parts are titles or suffixes. If they are add to the constant so they will be found. :param list parts: name part strings from the comma split :param int additional_parts_count: if the comma format contains other parts, we need to know how many there are to decide if things should be considered a conjunction. :return: pieces split on spaces and joined on conjunctions :rtype: list """ output = [] for part in parts: if not isinstance(part, text_types): raise TypeError("Name parts must be strings. " "Got {0}".format(type(part))) output += [x.strip(' ,') for x in part.split(' ')] # If part contains periods, check if it's multiple titles or suffixes # together without spaces if so, add the new part with periods to the # constants so they get parsed correctly later for part in output: # if this part has a period not at the beginning or end if self.C.regexes.period_not_at_end.match(part): # split on periods, any of the split pieces titles or suffixes? # ("Lt.Gov.") period_chunks = part.split(".") titles = list(filter(self.is_title, period_chunks)) suffixes = list(filter(self.is_suffix, period_chunks)) # add the part to the constant so it will be found if len(list(titles)): self.C.titles.add(part) continue if len(list(suffixes)): self.C.suffix_not_acronyms.add(part) continue return self.join_on_conjunctions(output, additional_parts_count) def join_on_conjunctions(self, pieces, additional_parts_count=0): """ Join conjunctions to surrounding pieces. Title- and prefix-aware. e.g.: ['Mr.', 'and'. 'Mrs.', 'John', 'Doe'] ==> ['Mr. and Mrs.', 'John', 'Doe'] ['The', 'Secretary', 'of', 'State', 'Hillary', 'Clinton'] ==> ['The Secretary of State', 'Hillary', 'Clinton'] When joining titles, saves newly formed piece to the instance's titles constant so they will be parsed correctly later. E.g. after parsing the example names above, 'The Secretary of State' and 'Mr. and Mrs.' would be present in the titles constant set. :param list pieces: name pieces strings after split on spaces :param int additional_parts_count: :return: new list with piece next to conjunctions merged into one piece with spaces in it. :rtype: list """ length = len(pieces) + additional_parts_count # don't join on conjunctions if there's only 2 parts if length < 3: return pieces rootname_pieces = [p for p in pieces if self.is_rootname(p)] total_length = len(rootname_pieces) + additional_parts_count # find all the conjunctions, join any conjunctions that are next to each # other, then join those newly joined conjunctions and any single # conjunctions to the piece before and after it conj_index = [i for i, piece in enumerate(pieces) if self.is_conjunction(piece)] contiguous_conj_i = [] for i, val in enumerate(conj_index): try: if conj_index[i+1] == val+1: contiguous_conj_i += [val] except IndexError: pass contiguous_conj_i = group_contiguous_integers(conj_index) delete_i = [] for i in contiguous_conj_i: if type(i) == tuple: new_piece = " ".join(pieces[ i[0] : i[1]+1] ) delete_i += list(range( i[0]+1, i[1]+1 )) pieces[i[0]] = new_piece else: new_piece = " ".join(pieces[ i : i+2 ]) delete_i += [i+1] pieces[i] = new_piece #add newly joined conjunctions to constants to be found later self.C.conjunctions.add(new_piece) for i in reversed(delete_i): # delete pieces in reverse order or the index changes on each delete del pieces[i] if len(pieces) == 1: # if there's only one piece left, nothing left to do return pieces # refresh conjunction index locations conj_index = [i for i, piece in enumerate(pieces) if self.is_conjunction(piece)] for i in conj_index: if len(pieces[i]) == 1 and total_length < 4: # if there are only 3 total parts (minus known titles, suffixes # and prefixes) and this conjunction is a single letter, prefer # treating it as an initial rather than a conjunction. # http://code.google.com/p/python-nameparser/issues/detail?id=11 continue if i is 0: new_piece = " ".join(pieces[i:i+2]) if self.is_title(pieces[i+1]): # when joining to a title, make new_piece a title too self.C.titles.add(new_piece) pieces[i] = new_piece pieces.pop(i+1) # subtract 1 from the index of all the remaining conjunctions for j,val in enumerate(conj_index): if val > i: conj_index[j]=val-1 else: new_piece = " ".join(pieces[i-1:i+2]) if self.is_title(pieces[i-1]): # when joining to a title, make new_piece a title too self.C.titles.add(new_piece) pieces[i-1] = new_piece pieces.pop(i) rm_count = 2 try: pieces.pop(i) except IndexError: rm_count = 1 # subtract the number of removed pieces from the index # of all the remaining conjunctions for j,val in enumerate(conj_index): if val > i: conj_index[j] = val - rm_count # join prefixes to following lastnames: ['de la Vega'], ['van Buren'] prefixes = list(filter(self.is_prefix, pieces)) if prefixes: i = pieces.index(prefixes[0]) # join everything after the prefix until the next suffix next_suffix = list(filter(self.is_suffix, pieces[i:])) if next_suffix: j = pieces.index(next_suffix[0]) new_piece = ' '.join(pieces[i:j]) pieces = pieces[:i] + [new_piece] + pieces[j:] else: new_piece = ' '.join(pieces[i:]) pieces = pieces[:i] + [new_piece] log.debug("pieces: {0}".format(pieces)) return pieces ### Capitalization Support def cap_word(self, word): if self.is_prefix(word) or self.is_conjunction(word): return word.lower() exceptions = self.C.capitalization_exceptions if lc(word) in exceptions: return exceptions[lc(word)] mac_match = self.C.regexes.mac.match(word) if mac_match: def cap_after_mac(m): return m.group(1).capitalize() + m.group(2).capitalize() return self.C.regexes.mac.sub(cap_after_mac, word) else: return word.capitalize() def cap_piece(self, piece): if not piece: return "" replacement = lambda m: self.cap_word(m.group(0)) return self.C.regexes.word.sub(replacement, piece) def capitalize(self, force=False): """ The HumanName class can try to guess the correct capitalization of name entered in all upper or lower case. By default, it will not adjust the case of names entered in mixed case. To run capitalization on all names pass the parameter `force=True`. :param bool force: force capitalization of strings that include mixed case **Usage** .. doctest:: capitalize >>> name = HumanName('bob v. de la macdole-eisenhower phd') >>> name.capitalize() >>> str(name) 'Bob V. de la MacDole-Eisenhower Ph.D.' >>> # Don't touch good names >>> name = HumanName('Shirley Maclaine') >>> name.capitalize() >>> str(name) 'Shirley Maclaine' >>> name.capitalize(force=True) >>> str(name) 'Shirley MacLaine' """ name = u(self) if not force and not (name == name.upper() or name == name.lower()): return self.title_list = self.cap_piece(self.title ).split(' ') self.first_list = self.cap_piece(self.first ).split(' ') self.middle_list = self.cap_piece(self.middle).split(' ') self.last_list = self.cap_piece(self.last ).split(' ') self.suffix_list = self.cap_piece(self.suffix).split(', ') nameparser-0.5.6/setup.cfg0000644000076500000240000000010313227234650015713 0ustar derekstaff00000000000000[bdist_wheel] universal = 1 [egg_info] tag_build = tag_date = 0 nameparser-0.5.6/README.rst0000644000076500000240000001101013016171547015561 0ustar derekstaff00000000000000Name Parser =========== .. image:: https://travis-ci.org/derek73/python-nameparser.svg?branch=master :target: https://travis-ci.org/derek73/python-nameparser .. image:: https://badge.fury.io/py/nameparser.svg :target: http://badge.fury.io/py/nameparser A simple Python (3.2+ & 2.6+) module for parsing human names into their individual components. * hn.title * hn.first * hn.middle * hn.last * hn.suffix * hn.nickname Supported Name Structures ~~~~~~~~~~~~~~~~~~~~~~~~~ The supported name structure is generally "Title First Middle Last Suffix", where all pieces are optional. Comma-separated format like "Last, First" is also supported. 1. Title Firstname "Nickname" Middle Middle Lastname Suffix 2. Lastname [Suffix], Title Firstname (Nickname) Middle Middle[,] Suffix [, Suffix] 3. Title Firstname M Lastname [Suffix], Suffix [Suffix] [, Suffix] Instantiating the `HumanName` class with a string splits on commas and then spaces, classifying name parts based on placement in the string and matches against known name pieces like titles and suffixes. It correctly handles some common conjunctions and special prefixes to last names like "del". Titles and conjunctions can be chained together to handle complex titles like "Asst Secretary of State". It can also try to correct capitalization of names that are all upper- or lowercase names. It attempts the best guess that can be made with a simple, rule-based approach. Its main use case is English and it is not likely to be useful for languages that do not conform to the supported name structure. It's not perfect, but it gets you pretty far. Installation ------------ :: pip install nameparser If you want to try out the latest code from GitHub you can install with pip using the command below. ``pip install -e git+git://github.com/derek73/python-nameparser.git#egg=nameparser`` If you're looking for a web service, check out `eyeseast's nameparse service `_, a simple Heroku-friendly Flask wrapper for this module. Quick Start Example ------------------- :: >>> from nameparser import HumanName >>> name = HumanName("Dr. Juan Q. Xavier de la Vega III (Doc Vega)") >>> name >>> name.last 'de la Vega' >>> name.as_dict() {'last': 'de la Vega', 'suffix': 'III', 'title': 'Dr.', 'middle': 'Q. Xavier', 'nickname': 'Doc Vega', 'first': 'Juan'} >>> str(name) 'Dr. Juan Q. Xavier de la Vega III (Doc Vega)' >>> name.string_format = "{first} {last}" >>> str(name) 'Juan de la Vega' The parser does not attempt to correct mistakes in the input. It mostly just splits on white space and puts things in buckets based on their position in the string. This also means the difference between 'title' and 'suffix' is positional, not semantic. "Dr" is a title when it comes before the name and a suffix when it comes after. ("Pre-nominal" and "post-nominal" would probably be better names.) :: >>> name = HumanName("1 & 2, 3 4 5, Mr.") >>> name Customization ------------- Your project may need some adjustment for your dataset. You can do this in your own pre- or post-processing, by `customizing the configured pre-defined sets`_ of titles, prefixes, etc., or by subclassing the `HumanName` class. See the `full documentation`_ for more information. `Full documentation`_ ~~~~~~~~~~~~~~~~~~~~~ .. _customizing the configured pre-defined sets: http://nameparser.readthedocs.org/en/latest/customize.html .. _Full documentation: http://nameparser.readthedocs.org/en/latest/ Contributing ------------ If you come across name piece that you think should be in the default config, you're probably right. `Start a New Issue`_ and we can get them added. Please let me know if there are ways this library could be structured to make it easier for you to use in your projects. Read CONTRIBUTING.md_ for more info on running the tests and contributing to the project. **GitHub Project** https://github.com/derek73/python-nameparser .. _CONTRIBUTING.md: https://github.com/derek73/python-nameparser/tree/master/CONTRIBUTING.md .. _Start a New Issue: https://github.com/derek73/python-nameparser/issues .. _click here to propose changes to the titles: https://github.com/derek73/python-nameparser/edit/master/nameparser/config/titles.pynameparser-0.5.6/tests.py0000644000076500000240000023061713227234112015616 0ustar derekstaff00000000000000# -*- coding: utf-8 -*- from __future__ import unicode_literals """ Run this file to run the tests. ``python tests.py`` Or install nose and run nosetests. ``pip install nose`` then: ``nosetests`` Post a ticket and/or clone and fix it. Pull requests with tests gladly accepted. https://github.com/derek73/python-nameparser/issues https://github.com/derek73/python-nameparser/pulls """ import logging try: import dill except ImportError: dill = False from nameparser import HumanName from nameparser.util import u from nameparser.config import Constants log = logging.getLogger('HumanName') import unittest try: unittest.expectedFailure except AttributeError: # Python 2.6 backport import unittest2 as unittest class HumanNameTestBase(unittest.TestCase): def m(self, actual, expected, hn): """assertEquals with a better message and awareness of hn.C.empty_attribute_default""" expected = expected or hn.C.empty_attribute_default try: self.assertEqual(actual, expected, "'%s' != '%s' for '%s'\n%r" % ( actual, expected, hn.full_name, hn )) except UnicodeDecodeError: self.assertEquals(actual, expected) class HumanNamePythonTests(HumanNameTestBase): def test_utf8(self): hn = HumanName("de la Véña, Jüan") self.m(hn.first, "Jüan", hn) self.m(hn.last, "de la Véña", hn) def test_string_output(self): hn = HumanName("de la Véña, Jüan") print(hn) print(repr(hn)) def test_escaped_utf8_bytes(self): hn = HumanName(b'B\xc3\xb6ck, Gerald') self.m(hn.first, "Gerald", hn) self.m(hn.last, "Böck", hn) def test_len(self): hn = HumanName("Doe-Ray, Dr. John P., CLU, CFP, LUTC") self.m(len(hn), 5, hn) hn = HumanName("John Doe") self.m(len(hn), 2, hn) @unittest.skipUnless(dill,"requires python-dill module to test pickling") def test_config_pickle(self): C = Constants() self.assertTrue(dill.pickles(C)) @unittest.skipUnless(dill,"requires python-dill module to test pickling") def test_name_instance_pickle(self): hn = HumanName("Title First Middle Middle Last, Jr.") self.assertTrue(dill.pickles(hn)) def test_comparison(self): hn1 = HumanName("Doe-Ray, Dr. John P., CLU, CFP, LUTC") hn2 = HumanName("Dr. John P. Doe-Ray, CLU, CFP, LUTC") self.assertTrue(hn1 == hn2) self.assertTrue(not hn1 is hn2) self.assertTrue(hn1 == "Dr. John P. Doe-Ray CLU, CFP, LUTC") hn1 = HumanName("Doe, Dr. John P., CLU, CFP, LUTC") hn2 = HumanName("Dr. John P. Doe-Ray, CLU, CFP, LUTC") self.assertTrue(not hn1 == hn2) self.assertTrue(not hn1 == 0) self.assertTrue(not hn1 == "test") self.assertTrue(not hn1 == ["test"]) self.assertTrue(not hn1 == {"test": hn2}) def test_assignment_to_full_name(self): hn = HumanName("John A. Kenneth Doe, Jr.") self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) self.m(hn.middle, "A. Kenneth", hn) self.m(hn.suffix, "Jr.", hn) hn.full_name = "Juan Velasquez y Garcia III" self.m(hn.first, "Juan", hn) self.m(hn.last, "Velasquez y Garcia", hn) self.m(hn.suffix, "III", hn) def test_assignment_to_attribute(self): hn = HumanName("John A. Kenneth Doe, Jr.") hn.last = "de la Vega" self.m(hn.last, "de la Vega", hn) hn.title = "test" self.m(hn.title, "test", hn) hn.first = "test" self.m(hn.first, "test", hn) hn.middle = "test" self.m(hn.middle, "test", hn) hn.suffix = "test" self.m(hn.suffix, "test", hn) with self.assertRaises(TypeError): hn.suffix = [['test']] with self.assertRaises(TypeError): hn.suffix = {"test":"test"} def test_assign_list_to_attribute(self): hn = HumanName("John A. Kenneth Doe, Jr.") hn.title = ["test1","test2"] self.m(hn.title, "test1 test2", hn) hn.first = ["test3","test4"] self.m(hn.first, "test3 test4", hn) hn.middle = ["test5","test6","test7"] self.m(hn.middle, "test5 test6 test7", hn) hn.last = ["test8","test9","test10"] self.m(hn.last, "test8 test9 test10", hn) hn.suffix = ['test'] self.m(hn.suffix, "test", hn) def test_comparison_case_insensitive(self): hn1 = HumanName("Doe-Ray, Dr. John P., CLU, CFP, LUTC") hn2 = HumanName("dr. john p. doe-Ray, CLU, CFP, LUTC") self.assertTrue(hn1 == hn2) self.assertTrue(not hn1 is hn2) self.assertTrue(hn1 == "Dr. John P. Doe-ray clu, CFP, LUTC") def test_slice(self): hn = HumanName("Doe-Ray, Dr. John P., CLU, CFP, LUTC") self.m(list(hn), ['Dr.', 'John', 'P.', 'Doe-Ray', 'CLU, CFP, LUTC'], hn) self.m(hn[1:], ['John', 'P.', 'Doe-Ray', 'CLU, CFP, LUTC',hn.C.empty_attribute_default], hn) self.m(hn[1:-2], ['John', 'P.', 'Doe-Ray'], hn) def test_getitem(self): hn = HumanName("Dr. John A. Kenneth Doe, Jr.") self.m(hn['title'], "Dr.", hn) self.m(hn['first'], "John", hn) self.m(hn['last'], "Doe", hn) self.m(hn['middle'], "A. Kenneth", hn) self.m(hn['suffix'], "Jr.", hn) def test_setitem(self): hn = HumanName("Dr. John A. Kenneth Doe, Jr.") hn['title'] = 'test' self.m(hn['title'], "test", hn) hn['last'] = ['test','test2'] self.m(hn['last'], "test test2", hn) with self.assertRaises(TypeError): hn["suffix"] = [['test']] with self.assertRaises(TypeError): hn["suffix"] = {"test":"test"} def test_conjunction_names(self): hn = HumanName("johnny y") self.m(hn.first, "johnny", hn) self.m(hn.last, "y", hn) def test_prefix_names(self): hn = HumanName("vai la") self.m(hn.first, "vai", hn) self.m(hn.last, "la", hn) def test_blank_name(self): hn = HumanName() self.m(hn.first, "", hn) self.m(hn.last, "", hn) class FirstNameHandlingTests(HumanNameTestBase): def test_first_name(self): hn = HumanName("Andrew") self.m(hn.first, "Andrew", hn) def test_assume_title_and_one_other_name_is_last_name(self): hn = HumanName("Rev Andrews") self.m(hn.title, "Rev", hn) self.m(hn.last, "Andrews", hn) # TODO: Seems "Andrews, M.D.", Andrews should be treated as a last name # but other suffixes like "George Jr." should be first names. Might be # related to https://github.com/derek73/python-nameparser/issues/2 @unittest.expectedFailure def test_assume_suffix_title_and_one_other_name_is_last_name(self): hn = HumanName("Andrews, M.D.") self.m(hn.suffix, "M.D.", hn) self.m(hn.last, "Andrews", hn) def test_suffix_in_lastname_part_of_lastname_comma_format(self): hn = HumanName("Smith Jr., John") self.m(hn.last, "Smith", hn) self.m(hn.first, "John", hn) self.m(hn.suffix, "Jr.", hn) def test_sir_exception_to_first_name_rule(self): hn = HumanName("Sir Gerald") self.m(hn.title, "Sir", hn) self.m(hn.first, "Gerald", hn) def test_king_exception_to_first_name_rule(self): hn = HumanName("King Henry") self.m(hn.title, "King", hn) self.m(hn.first, "Henry", hn) def test_queen_exception_to_first_name_rule(self): hn = HumanName("Queen Elizabeth") self.m(hn.title, "Queen", hn) self.m(hn.first, "Elizabeth", hn) def test_dame_exception_to_first_name_rule(self): hn = HumanName("Dame Mary") self.m(hn.title, "Dame", hn) self.m(hn.first, "Mary", hn) def test_first_name_is_not_prefix_if_only_two_parts(self): """When there are only two parts, don't join prefixes or conjunctions""" hn = HumanName("Van Nguyen") self.m(hn.first, "Van", hn) self.m(hn.last, "Nguyen", hn) def test_first_name_is_not_prefix_if_only_two_parts_comma(self): hn = HumanName("Nguyen, Van") self.m(hn.first, "Van", hn) self.m(hn.last, "Nguyen", hn) @unittest.expectedFailure def test_first_name_is_prefix_if_three_parts(self): """Not sure how to fix this without breaking Mr and Mrs""" hn = HumanName("Mr. Van Nguyen") self.m(hn.first, "Van", hn) self.m(hn.last, "Nguyen", hn) class HumanNameBruteForceTests(HumanNameTestBase): def test1(self): hn = HumanName("John Doe") self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) def test2(self): hn = HumanName("John Doe, Jr.") self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) self.m(hn.suffix, "Jr.", hn) def test3(self): hn = HumanName("John Doe III") self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) self.m(hn.suffix, "III", hn) def test4(self): hn = HumanName("Doe, John") self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) def test5(self): hn = HumanName("Doe, John, Jr.") self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) self.m(hn.suffix, "Jr.", hn) def test6(self): hn = HumanName("Doe, John III") self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) self.m(hn.suffix, "III", hn) def test7(self): hn = HumanName("John A. Doe") self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) self.m(hn.middle, "A.", hn) def test8(self): hn = HumanName("John A. Doe, Jr") self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) self.m(hn.middle, "A.", hn) self.m(hn.suffix, "Jr", hn) def test9(self): hn = HumanName("John A. Doe III") self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) self.m(hn.middle, "A.", hn) self.m(hn.suffix, "III", hn) def test10(self): hn = HumanName("Doe, John A.") self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) self.m(hn.middle, "A.", hn) def test11(self): hn = HumanName("Doe, John A., Jr.") self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) self.m(hn.middle, "A.", hn) self.m(hn.suffix, "Jr.", hn) def test12(self): hn = HumanName("Doe, John A., III") self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) self.m(hn.middle, "A.", hn) self.m(hn.suffix, "III", hn) def test13(self): hn = HumanName("John A. Kenneth Doe") self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) self.m(hn.middle, "A. Kenneth", hn) def test14(self): hn = HumanName("John A. Kenneth Doe, Jr.") self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) self.m(hn.middle, "A. Kenneth", hn) self.m(hn.suffix, "Jr.", hn) def test15(self): hn = HumanName("John A. Kenneth Doe III") self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) self.m(hn.middle, "A. Kenneth", hn) self.m(hn.suffix, "III", hn) def test16(self): hn = HumanName("Doe, John. A. Kenneth") self.m(hn.first, "John.", hn) self.m(hn.last, "Doe", hn) self.m(hn.middle, "A. Kenneth", hn) def test17(self): hn = HumanName("Doe, John. A. Kenneth, Jr.") self.m(hn.first, "John.", hn) self.m(hn.last, "Doe", hn) self.m(hn.middle, "A. Kenneth", hn) self.m(hn.suffix, "Jr.", hn) def test18(self): hn = HumanName("Doe, John. A. Kenneth III") self.m(hn.first, "John.", hn) self.m(hn.last, "Doe", hn) self.m(hn.middle, "A. Kenneth", hn) self.m(hn.suffix, "III", hn) def test19(self): hn = HumanName("Dr. John Doe") self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) self.m(hn.title, "Dr.", hn) def test20(self): hn = HumanName("Dr. John Doe, Jr.") self.m(hn.title, "Dr.", hn) self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) self.m(hn.suffix, "Jr.", hn) def test21(self): hn = HumanName("Dr. John Doe III") self.m(hn.title, "Dr.", hn) self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) self.m(hn.suffix, "III", hn) def test22(self): hn = HumanName("Doe, Dr. John") self.m(hn.title, "Dr.", hn) self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) def test23(self): hn = HumanName("Doe, Dr. John, Jr.") self.m(hn.title, "Dr.", hn) self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) self.m(hn.suffix, "Jr.", hn) def test24(self): hn = HumanName("Doe, Dr. John III") self.m(hn.title, "Dr.", hn) self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) self.m(hn.suffix, "III", hn) def test25(self): hn = HumanName("Dr. John A. Doe") self.m(hn.title, "Dr.", hn) self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) self.m(hn.middle, "A.", hn) def test26(self): hn = HumanName("Dr. John A. Doe, Jr.") self.m(hn.title, "Dr.", hn) self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) self.m(hn.middle, "A.", hn) self.m(hn.suffix, "Jr.", hn) def test27(self): hn = HumanName("Dr. John A. Doe III") self.m(hn.title, "Dr.", hn) self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) self.m(hn.middle, "A.", hn) self.m(hn.suffix, "III", hn) def test28(self): hn = HumanName("Doe, Dr. John A.") self.m(hn.title, "Dr.", hn) self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) self.m(hn.middle, "A.", hn) def test29(self): hn = HumanName("Doe, Dr. John A. Jr.") self.m(hn.title, "Dr.", hn) self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) self.m(hn.middle, "A.", hn) self.m(hn.suffix, "Jr.", hn) def test30(self): hn = HumanName("Doe, Dr. John A. III") self.m(hn.title, "Dr.", hn) self.m(hn.middle, "A.", hn) self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) self.m(hn.suffix, "III", hn) def test31(self): hn = HumanName("Dr. John A. Kenneth Doe") self.m(hn.title, "Dr.", hn) self.m(hn.middle, "A. Kenneth", hn) self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) def test32(self): hn = HumanName("Dr. John A. Kenneth Doe, Jr.") self.m(hn.title, "Dr.", hn) self.m(hn.middle, "A. Kenneth", hn) self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) self.m(hn.suffix, "Jr.", hn) def test33(self): hn = HumanName("Al Arnold Gore, Jr.") self.m(hn.middle, "Arnold", hn) self.m(hn.first, "Al", hn) self.m(hn.last, "Gore", hn) self.m(hn.suffix, "Jr.", hn) def test34(self): hn = HumanName("Dr. John A. Kenneth Doe III") self.m(hn.title, "Dr.", hn) self.m(hn.middle, "A. Kenneth", hn) self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) self.m(hn.suffix, "III", hn) def test35(self): hn = HumanName("Doe, Dr. John A. Kenneth") self.m(hn.title, "Dr.", hn) self.m(hn.middle, "A. Kenneth", hn) self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) def test36(self): hn = HumanName("Doe, Dr. John A. Kenneth Jr.") self.m(hn.title, "Dr.", hn) self.m(hn.middle, "A. Kenneth", hn) self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) self.m(hn.suffix, "Jr.", hn) def test37(self): hn = HumanName("Doe, Dr. John A. Kenneth III") self.m(hn.title, "Dr.", hn) self.m(hn.middle, "A. Kenneth", hn) self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) self.m(hn.suffix, "III", hn) def test38(self): hn = HumanName("Juan de la Vega") self.m(hn.first, "Juan", hn) self.m(hn.last, "de la Vega", hn) def test39(self): hn = HumanName("Juan de la Vega, Jr.") self.m(hn.first, "Juan", hn) self.m(hn.last, "de la Vega", hn) self.m(hn.suffix, "Jr.", hn) def test40(self): hn = HumanName("Juan de la Vega III") self.m(hn.first, "Juan", hn) self.m(hn.last, "de la Vega", hn) self.m(hn.suffix, "III", hn) def test41(self): hn = HumanName("de la Vega, Juan") self.m(hn.first, "Juan", hn) self.m(hn.last, "de la Vega", hn) def test42(self): hn = HumanName("de la Vega, Juan, Jr.") self.m(hn.first, "Juan", hn) self.m(hn.last, "de la Vega", hn) self.m(hn.suffix, "Jr.", hn) def test43(self): hn = HumanName("de la Vega, Juan III") self.m(hn.first, "Juan", hn) self.m(hn.last, "de la Vega", hn) self.m(hn.suffix, "III", hn) def test44(self): hn = HumanName("Juan Velasquez y Garcia") self.m(hn.first, "Juan", hn) self.m(hn.last, "Velasquez y Garcia", hn) def test45(self): hn = HumanName("Juan Velasquez y Garcia, Jr.") self.m(hn.first, "Juan", hn) self.m(hn.last, "Velasquez y Garcia", hn) self.m(hn.suffix, "Jr.", hn) def test46(self): hn = HumanName("Juan Velasquez y Garcia III") self.m(hn.first, "Juan", hn) self.m(hn.last, "Velasquez y Garcia", hn) self.m(hn.suffix, "III", hn) def test47(self): hn = HumanName("Velasquez y Garcia, Juan") self.m(hn.first, "Juan", hn) self.m(hn.last, "Velasquez y Garcia", hn) def test48(self): hn = HumanName("Velasquez y Garcia, Juan, Jr.") self.m(hn.first, "Juan", hn) self.m(hn.last, "Velasquez y Garcia", hn) self.m(hn.suffix, "Jr.", hn) def test49(self): hn = HumanName("Velasquez y Garcia, Juan III") self.m(hn.first, "Juan", hn) self.m(hn.last, "Velasquez y Garcia", hn) self.m(hn.suffix, "III", hn) def test50(self): hn = HumanName("Dr. Juan de la Vega") self.m(hn.title, "Dr.", hn) self.m(hn.first, "Juan", hn) self.m(hn.last, "de la Vega", hn) def test51(self): hn = HumanName("Dr. Juan de la Vega, Jr.") self.m(hn.title, "Dr.", hn) self.m(hn.first, "Juan", hn) self.m(hn.last, "de la Vega", hn) self.m(hn.suffix, "Jr.", hn) def test52(self): hn = HumanName("Dr. Juan de la Vega III") self.m(hn.title, "Dr.", hn) self.m(hn.first, "Juan", hn) self.m(hn.last, "de la Vega", hn) self.m(hn.suffix, "III", hn) def test53(self): hn = HumanName("de la Vega, Dr. Juan") self.m(hn.title, "Dr.", hn) self.m(hn.first, "Juan", hn) self.m(hn.last, "de la Vega", hn) def test54(self): hn = HumanName("de la Vega, Dr. Juan, Jr.") self.m(hn.title, "Dr.", hn) self.m(hn.first, "Juan", hn) self.m(hn.last, "de la Vega", hn) self.m(hn.suffix, "Jr.", hn) def test55(self): hn = HumanName("de la Vega, Dr. Juan III") self.m(hn.title, "Dr.", hn) self.m(hn.first, "Juan", hn) self.m(hn.last, "de la Vega", hn) self.m(hn.suffix, "III", hn) def test56(self): hn = HumanName("Dr. Juan Velasquez y Garcia") self.m(hn.title, "Dr.", hn) self.m(hn.first, "Juan", hn) self.m(hn.last, "Velasquez y Garcia", hn) def test57(self): hn = HumanName("Dr. Juan Velasquez y Garcia, Jr.") self.m(hn.title, "Dr.", hn) self.m(hn.first, "Juan", hn) self.m(hn.last, "Velasquez y Garcia", hn) self.m(hn.suffix, "Jr.", hn) def test58(self): hn = HumanName("Dr. Juan Velasquez y Garcia III") self.m(hn.title, "Dr.", hn) self.m(hn.first, "Juan", hn) self.m(hn.last, "Velasquez y Garcia", hn) self.m(hn.suffix, "III", hn) def test59(self): hn = HumanName("Velasquez y Garcia, Dr. Juan") self.m(hn.title, "Dr.", hn) self.m(hn.first, "Juan", hn) self.m(hn.last, "Velasquez y Garcia", hn) def test60(self): hn = HumanName("Velasquez y Garcia, Dr. Juan, Jr.") self.m(hn.title, "Dr.", hn) self.m(hn.first, "Juan", hn) self.m(hn.last, "Velasquez y Garcia", hn) self.m(hn.suffix, "Jr.", hn) def test61(self): hn = HumanName("Velasquez y Garcia, Dr. Juan III") self.m(hn.title, "Dr.", hn) self.m(hn.first, "Juan", hn) self.m(hn.last, "Velasquez y Garcia", hn) self.m(hn.suffix, "III", hn) def test62(self): hn = HumanName("Juan Q. de la Vega") self.m(hn.first, "Juan", hn) self.m(hn.middle, "Q.", hn) self.m(hn.last, "de la Vega", hn) def test63(self): hn = HumanName("Juan Q. de la Vega, Jr.") self.m(hn.first, "Juan", hn) self.m(hn.last, "de la Vega", hn) self.m(hn.middle, "Q.", hn) self.m(hn.suffix, "Jr.", hn) def test64(self): hn = HumanName("Juan Q. de la Vega III") self.m(hn.first, "Juan", hn) self.m(hn.middle, "Q.", hn) self.m(hn.last, "de la Vega", hn) self.m(hn.suffix, "III", hn) def test65(self): hn = HumanName("de la Vega, Juan Q.") self.m(hn.first, "Juan", hn) self.m(hn.middle, "Q.", hn) self.m(hn.last, "de la Vega", hn) def test66(self): hn = HumanName("de la Vega, Juan Q., Jr.") self.m(hn.first, "Juan", hn) self.m(hn.last, "de la Vega", hn) self.m(hn.middle, "Q.", hn) self.m(hn.suffix, "Jr.", hn) def test67(self): hn = HumanName("de la Vega, Juan Q. III") self.m(hn.first, "Juan", hn) self.m(hn.last, "de la Vega", hn) self.m(hn.middle, "Q.", hn) self.m(hn.suffix, "III", hn) def test68(self): hn = HumanName("Juan Q. Velasquez y Garcia") self.m(hn.middle, "Q.", hn) self.m(hn.first, "Juan", hn) self.m(hn.last, "Velasquez y Garcia", hn) def test69(self): hn = HumanName("Juan Q. Velasquez y Garcia, Jr.") self.m(hn.middle, "Q.", hn) self.m(hn.first, "Juan", hn) self.m(hn.last, "Velasquez y Garcia", hn) self.m(hn.suffix, "Jr.", hn) def test70(self): hn = HumanName("Juan Q. Velasquez y Garcia III") self.m(hn.middle, "Q.", hn) self.m(hn.first, "Juan", hn) self.m(hn.last, "Velasquez y Garcia", hn) self.m(hn.suffix, "III", hn) def test71(self): hn = HumanName("Velasquez y Garcia, Juan Q.") self.m(hn.middle, "Q.", hn) self.m(hn.first, "Juan", hn) self.m(hn.last, "Velasquez y Garcia", hn) def test72(self): hn = HumanName("Velasquez y Garcia, Juan Q., Jr.") self.m(hn.middle, "Q.", hn) self.m(hn.first, "Juan", hn) self.m(hn.last, "Velasquez y Garcia", hn) self.m(hn.suffix, "Jr.", hn) def test73(self): hn = HumanName("Velasquez y Garcia, Juan Q. III") self.m(hn.middle, "Q.", hn) self.m(hn.first, "Juan", hn) self.m(hn.last, "Velasquez y Garcia", hn) self.m(hn.suffix, "III", hn) def test74(self): hn = HumanName("Dr. Juan Q. de la Vega") self.m(hn.title, "Dr.", hn) self.m(hn.first, "Juan", hn) self.m(hn.middle, "Q.", hn) self.m(hn.last, "de la Vega", hn) def test75(self): hn = HumanName("Dr. Juan Q. de la Vega, Jr.") self.m(hn.first, "Juan", hn) self.m(hn.last, "de la Vega", hn) self.m(hn.middle, "Q.", hn) self.m(hn.title, "Dr.", hn) self.m(hn.suffix, "Jr.", hn) def test76(self): hn = HumanName("Dr. Juan Q. de la Vega III") self.m(hn.first, "Juan", hn) self.m(hn.last, "de la Vega", hn) self.m(hn.middle, "Q.", hn) self.m(hn.title, "Dr.", hn) self.m(hn.suffix, "III", hn) def test77(self): hn = HumanName("de la Vega, Dr. Juan Q.") self.m(hn.first, "Juan", hn) self.m(hn.middle, "Q.", hn) self.m(hn.last, "de la Vega", hn) self.m(hn.title, "Dr.", hn) def test78(self): hn = HumanName("de la Vega, Dr. Juan Q., Jr.") self.m(hn.first, "Juan", hn) self.m(hn.last, "de la Vega", hn) self.m(hn.middle, "Q.", hn) self.m(hn.suffix, "Jr.", hn) self.m(hn.title, "Dr.", hn) def test79(self): hn = HumanName("de la Vega, Dr. Juan Q. III") self.m(hn.first, "Juan", hn) self.m(hn.last, "de la Vega", hn) self.m(hn.middle, "Q.", hn) self.m(hn.suffix, "III", hn) self.m(hn.title, "Dr.", hn) def test80(self): hn = HumanName("Dr. Juan Q. Velasquez y Garcia") self.m(hn.title, "Dr.", hn) self.m(hn.middle, "Q.", hn) self.m(hn.first, "Juan", hn) self.m(hn.last, "Velasquez y Garcia", hn) def test81(self): hn = HumanName("Dr. Juan Q. Velasquez y Garcia, Jr.") self.m(hn.title, "Dr.", hn) self.m(hn.middle, "Q.", hn) self.m(hn.first, "Juan", hn) self.m(hn.last, "Velasquez y Garcia", hn) self.m(hn.suffix, "Jr.", hn) def test82(self): hn = HumanName("Dr. Juan Q. Velasquez y Garcia III") self.m(hn.middle, "Q.", hn) self.m(hn.title, "Dr.", hn) self.m(hn.first, "Juan", hn) self.m(hn.last, "Velasquez y Garcia", hn) self.m(hn.suffix, "III", hn) def test83(self): hn = HumanName("Velasquez y Garcia, Dr. Juan Q.") self.m(hn.title, "Dr.", hn) self.m(hn.middle, "Q.", hn) self.m(hn.first, "Juan", hn) self.m(hn.last, "Velasquez y Garcia", hn) def test84(self): hn = HumanName("Velasquez y Garcia, Dr. Juan Q., Jr.") self.m(hn.middle, "Q.", hn) self.m(hn.first, "Juan", hn) self.m(hn.title, "Dr.", hn) self.m(hn.last, "Velasquez y Garcia", hn) self.m(hn.suffix, "Jr.", hn) def test85(self): hn = HumanName("Velasquez y Garcia, Dr. Juan Q. III") self.m(hn.middle, "Q.", hn) self.m(hn.first, "Juan", hn) self.m(hn.title, "Dr.", hn) self.m(hn.last, "Velasquez y Garcia", hn) self.m(hn.suffix, "III", hn) def test86(self): hn = HumanName("Juan Q. Xavier de la Vega") self.m(hn.first, "Juan", hn) self.m(hn.middle, "Q. Xavier", hn) self.m(hn.last, "de la Vega", hn) def test87(self): hn = HumanName("Juan Q. Xavier de la Vega, Jr.") self.m(hn.first, "Juan", hn) self.m(hn.last, "de la Vega", hn) self.m(hn.middle, "Q. Xavier", hn) self.m(hn.suffix, "Jr.", hn) def test88(self): hn = HumanName("Juan Q. Xavier de la Vega III") self.m(hn.first, "Juan", hn) self.m(hn.last, "de la Vega", hn) self.m(hn.middle, "Q. Xavier", hn) self.m(hn.suffix, "III", hn) def test89(self): hn = HumanName("de la Vega, Juan Q. Xavier") self.m(hn.first, "Juan", hn) self.m(hn.middle, "Q. Xavier", hn) self.m(hn.last, "de la Vega", hn) def test90(self): hn = HumanName("de la Vega, Juan Q. Xavier, Jr.") self.m(hn.first, "Juan", hn) self.m(hn.last, "de la Vega", hn) self.m(hn.middle, "Q. Xavier", hn) self.m(hn.suffix, "Jr.", hn) def test91(self): hn = HumanName("de la Vega, Juan Q. Xavier III") self.m(hn.first, "Juan", hn) self.m(hn.last, "de la Vega", hn) self.m(hn.middle, "Q. Xavier", hn) self.m(hn.suffix, "III", hn) def test92(self): hn = HumanName("Dr. Juan Q. Xavier de la Vega") self.m(hn.first, "Juan", hn) self.m(hn.middle, "Q. Xavier", hn) self.m(hn.title, "Dr.", hn) self.m(hn.last, "de la Vega", hn) def test93(self): hn = HumanName("Dr. Juan Q. Xavier de la Vega, Jr.") self.m(hn.first, "Juan", hn) self.m(hn.last, "de la Vega", hn) self.m(hn.title, "Dr.", hn) self.m(hn.middle, "Q. Xavier", hn) self.m(hn.suffix, "Jr.", hn) def test94(self): hn = HumanName("Dr. Juan Q. Xavier de la Vega III") self.m(hn.first, "Juan", hn) self.m(hn.last, "de la Vega", hn) self.m(hn.title, "Dr.", hn) self.m(hn.middle, "Q. Xavier", hn) self.m(hn.suffix, "III", hn) def test95(self): hn = HumanName("de la Vega, Dr. Juan Q. Xavier") self.m(hn.first, "Juan", hn) self.m(hn.title, "Dr.", hn) self.m(hn.middle, "Q. Xavier", hn) self.m(hn.last, "de la Vega", hn) def test96(self): hn = HumanName("de la Vega, Dr. Juan Q. Xavier, Jr.") self.m(hn.first, "Juan", hn) self.m(hn.last, "de la Vega", hn) self.m(hn.title, "Dr.", hn) self.m(hn.middle, "Q. Xavier", hn) self.m(hn.suffix, "Jr.", hn) def test97(self): hn = HumanName("de la Vega, Dr. Juan Q. Xavier III") self.m(hn.first, "Juan", hn) self.m(hn.title, "Dr.", hn) self.m(hn.last, "de la Vega", hn) self.m(hn.middle, "Q. Xavier", hn) self.m(hn.suffix, "III", hn) def test98(self): hn = HumanName("Juan Q. Xavier Velasquez y Garcia") self.m(hn.middle, "Q. Xavier", hn) self.m(hn.first, "Juan", hn) self.m(hn.last, "Velasquez y Garcia", hn) def test99(self): hn = HumanName("Juan Q. Xavier Velasquez y Garcia, Jr.") self.m(hn.middle, "Q. Xavier", hn) self.m(hn.first, "Juan", hn) self.m(hn.last, "Velasquez y Garcia", hn) self.m(hn.suffix, "Jr.", hn) def test100(self): hn = HumanName("Juan Q. Xavier Velasquez y Garcia III") self.m(hn.middle, "Q. Xavier", hn) self.m(hn.first, "Juan", hn) self.m(hn.last, "Velasquez y Garcia", hn) self.m(hn.suffix, "III", hn) def test101(self): hn = HumanName("Velasquez y Garcia, Juan Q. Xavier") self.m(hn.middle, "Q. Xavier", hn) self.m(hn.first, "Juan", hn) self.m(hn.last, "Velasquez y Garcia", hn) def test102(self): hn = HumanName("Velasquez y Garcia, Juan Q. Xavier, Jr.") self.m(hn.middle, "Q. Xavier", hn) self.m(hn.first, "Juan", hn) self.m(hn.last, "Velasquez y Garcia", hn) self.m(hn.suffix, "Jr.", hn) def test103(self): hn = HumanName("Velasquez y Garcia, Juan Q. Xavier III") self.m(hn.middle, "Q. Xavier", hn) self.m(hn.first, "Juan", hn) self.m(hn.last, "Velasquez y Garcia", hn) self.m(hn.suffix, "III", hn) def test104(self): hn = HumanName("Dr. Juan Q. Xavier Velasquez y Garcia") self.m(hn.title, "Dr.", hn) self.m(hn.middle, "Q. Xavier", hn) self.m(hn.first, "Juan", hn) self.m(hn.last, "Velasquez y Garcia", hn) def test105(self): hn = HumanName("Dr. Juan Q. Xavier Velasquez y Garcia, Jr.") self.m(hn.middle, "Q. Xavier", hn) self.m(hn.first, "Juan", hn) self.m(hn.title, "Dr.", hn) self.m(hn.last, "Velasquez y Garcia", hn) self.m(hn.suffix, "Jr.", hn) def test106(self): hn = HumanName("Dr. Juan Q. Xavier Velasquez y Garcia III") self.m(hn.middle, "Q. Xavier", hn) self.m(hn.first, "Juan", hn) self.m(hn.title, "Dr.", hn) self.m(hn.last, "Velasquez y Garcia", hn) self.m(hn.suffix, "III", hn) def test107(self): hn = HumanName("Velasquez y Garcia, Dr. Juan Q. Xavier") self.m(hn.title, "Dr.", hn) self.m(hn.middle, "Q. Xavier", hn) self.m(hn.first, "Juan", hn) self.m(hn.last, "Velasquez y Garcia", hn) def test108(self): hn = HumanName("Velasquez y Garcia, Dr. Juan Q. Xavier, Jr.") self.m(hn.middle, "Q. Xavier", hn) self.m(hn.first, "Juan", hn) self.m(hn.title, "Dr.", hn) self.m(hn.last, "Velasquez y Garcia", hn) self.m(hn.suffix, "Jr.", hn) def test109(self): hn = HumanName("Velasquez y Garcia, Dr. Juan Q. Xavier III") self.m(hn.middle, "Q. Xavier", hn) self.m(hn.first, "Juan", hn) self.m(hn.title, "Dr.", hn) self.m(hn.last, "Velasquez y Garcia", hn) self.m(hn.suffix, "III", hn) def test110(self): hn = HumanName("John Doe, CLU, CFP, LUTC") self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) self.m(hn.suffix, "CLU, CFP, LUTC", hn) def test111(self): hn = HumanName("John P. Doe, CLU, CFP, LUTC") self.m(hn.first, "John", hn) self.m(hn.middle, "P.", hn) self.m(hn.last, "Doe", hn) self.m(hn.suffix, "CLU, CFP, LUTC", hn) def test112(self): hn = HumanName("Dr. John P. Doe-Ray, CLU, CFP, LUTC") self.m(hn.first, "John", hn) self.m(hn.middle, "P.", hn) self.m(hn.last, "Doe-Ray", hn) self.m(hn.title, "Dr.", hn) self.m(hn.suffix, "CLU, CFP, LUTC", hn) def test113(self): hn = HumanName("Doe-Ray, Dr. John P., CLU, CFP, LUTC") self.m(hn.title, "Dr.", hn) self.m(hn.middle, "P.", hn) self.m(hn.first, "John", hn) self.m(hn.last, "Doe-Ray", hn) self.m(hn.suffix, "CLU, CFP, LUTC", hn) def test115(self): hn = HumanName("Hon. Barrington P. Doe-Ray, Jr.") self.m(hn.title, "Hon.", hn) self.m(hn.middle, "P.", hn) self.m(hn.first, "Barrington", hn) self.m(hn.last, "Doe-Ray", hn) def test116(self): hn = HumanName("Doe-Ray, Hon. Barrington P. Jr., CFP, LUTC") self.m(hn.title, "Hon.", hn) self.m(hn.middle, "P.", hn) self.m(hn.first, "Barrington", hn) self.m(hn.last, "Doe-Ray", hn) self.m(hn.suffix, "Jr., CFP, LUTC", hn) def test117(self): hn = HumanName("Rt. Hon. Paul E. Mary") self.m(hn.title, "Rt. Hon.", hn) self.m(hn.first, "Paul", hn) self.m(hn.middle, "E.", hn) self.m(hn.last, "Mary", hn) def test119(self): hn = HumanName("Lord God Almighty") self.m(hn.title, "Lord", hn) self.m(hn.first, "God", hn) self.m(hn.last, "Almighty", hn) class HumanNameConjunctionTestCase(HumanNameTestBase): # Last name with conjunction def test_last_name_with_conjunction(self): hn = HumanName('Jose Aznar y Lopez') self.m(hn.first, "Jose", hn) self.m(hn.last, "Aznar y Lopez", hn) def test_multiple_conjunctions(self): hn = HumanName("part1 of The part2 of the part3 and part4") self.m(hn.first, "part1 of The part2 of the part3 and part4", hn) def test_multiple_conjunctions2(self): hn = HumanName("part1 of and The part2 of the part3 And part4") self.m(hn.first, "part1 of and The part2 of the part3 And part4", hn) def test_ends_with_conjunction(self): hn = HumanName("Jon Dough and") self.m(hn.first, "Jon", hn) self.m(hn.last, "Dough and", hn) def test_ends_with_two_conjunctions(self): hn = HumanName("Jon Dough and of") self.m(hn.first, "Jon", hn) self.m(hn.last, "Dough and of", hn) def test_starts_with_conjunction(self): hn = HumanName("and Jon Dough") self.m(hn.first, "and Jon", hn) self.m(hn.last, "Dough", hn) def test_starts_with_two_conjunctions(self): hn = HumanName("the and Jon Dough") self.m(hn.first, "the and Jon", hn) self.m(hn.last, "Dough", hn) # Potential conjunction/prefix treated as initial (because uppercase) def test_uppercase_middle_initial_conflict_with_conjunction(self): hn = HumanName('John E Smith') self.m(hn.first, "John", hn) self.m(hn.middle, "E", hn) self.m(hn.last, "Smith", hn) def test_lowercase_middle_initial_with_period_conflict_with_conjunction(self): hn = HumanName('john e. smith') self.m(hn.first, "john", hn) self.m(hn.middle, "e.", hn) self.m(hn.last, "smith", hn) # The conjunction "e" can also be an initial def test_lowercase_first_initial_conflict_with_conjunction(self): hn = HumanName('e j smith') self.m(hn.first, "e", hn) self.m(hn.middle, "j", hn) self.m(hn.last, "smith", hn) def test_lowercase_middle_initial_conflict_with_conjunction(self): hn = HumanName('John e Smith') self.m(hn.first, "John", hn) self.m(hn.middle, "e", hn) self.m(hn.last, "Smith", hn) def test_lowercase_middle_initial_and_suffix_conflict_with_conjunction(self): hn = HumanName('John e Smith, III') self.m(hn.first, "John", hn) self.m(hn.middle, "e", hn) self.m(hn.last, "Smith", hn) self.m(hn.suffix, "III", hn) def test_lowercase_middle_initial_and_nocomma_suffix_conflict_with_conjunction(self): hn = HumanName('John e Smith III') self.m(hn.first, "John", hn) self.m(hn.middle, "e", hn) self.m(hn.last, "Smith", hn) self.m(hn.suffix, "III", hn) def test_lowercase_middle_initial_comma_lastname_and_suffix_conflict_with_conjunction(self): hn = HumanName('Smith, John e, III, Jr') self.m(hn.first, "John", hn) self.m(hn.middle, "e", hn) self.m(hn.last, "Smith", hn) self.m(hn.suffix, "III, Jr", hn) @unittest.expectedFailure def test_two_initials_conflict_with_conjunction(self): # Supporting this seems to screw up titles with periods in them like M.B.A. hn = HumanName('E.T. Smith') self.m(hn.first, "E.", hn) self.m(hn.middle, "T.", hn) self.m(hn.last, "Smith", hn) def test_couples_names(self): hn = HumanName('John and Jane Smith') self.m(hn.first, "John and Jane", hn) self.m(hn.last, "Smith", hn) def test_couples_names_with_conjunction_lastname(self): hn = HumanName('John and Jane Aznar y Lopez') self.m(hn.first, "John and Jane", hn) self.m(hn.last, "Aznar y Lopez", hn) def test_couple_titles(self): hn = HumanName('Mr. and Mrs. John and Jane Smith') self.m(hn.title, "Mr. and Mrs.", hn) self.m(hn.first, "John and Jane", hn) self.m(hn.last, "Smith", hn) def test_title_with_three_part_name_last_initial_is_suffix_uppercase_no_period(self): hn = HumanName("King John Alexander V") self.m(hn.title, "King", hn) self.m(hn.first, "John", hn) self.m(hn.last, "Alexander", hn) self.m(hn.suffix, "V", hn) def test_four_name_parts_with_suffix_that_could_be_initial_lowercase_no_period(self): hn = HumanName("larry james edward johnson v") self.m(hn.first, "larry", hn) self.m(hn.middle, "james edward", hn) self.m(hn.last, "johnson", hn) self.m(hn.suffix, "v", hn) def test_four_name_parts_with_suffix_that_could_be_initial_uppercase_no_period(self): hn = HumanName("Larry James Johnson I") self.m(hn.first, "Larry", hn) self.m(hn.middle, "James", hn) self.m(hn.last, "Johnson", hn) self.m(hn.suffix, "I", hn) def test_roman_numeral_initials(self): hn = HumanName("Larry V I") self.m(hn.first, "Larry", hn) self.m(hn.middle, "V", hn) self.m(hn.last, "I", hn) self.m(hn.suffix, "", hn) # tests for Rev. title (Reverend) def test124(self): hn = HumanName("Rev. John A. Kenneth Doe") self.m(hn.title, "Rev.", hn) self.m(hn.middle, "A. Kenneth", hn) self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) def test125(self): hn = HumanName("Rev John A. Kenneth Doe") self.m(hn.title, "Rev", hn) self.m(hn.middle, "A. Kenneth", hn) self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) def test126(self): hn = HumanName("Doe, Rev. John A. Jr.") self.m(hn.title, "Rev.", hn) self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) self.m(hn.middle, "A.", hn) self.m(hn.suffix, "Jr.", hn) def test127(self): hn = HumanName("Buca di Beppo") self.m(hn.first, "Buca", hn) self.m(hn.last, "di Beppo", hn) def test_le_as_last_name(self): hn = HumanName("Yin Le") self.m(hn.first, "Yin", hn) self.m(hn.last, "Le", hn) def test_le_as_last_name_with_middle_initial(self): hn = HumanName("Yin a Le") self.m(hn.first, "Yin", hn) self.m(hn.middle, "a", hn) self.m(hn.last, "Le", hn) def test_conjunction_in_an_address_with_a_title(self): hn = HumanName("His Excellency Lord Duncan") self.m(hn.title, "His Excellency Lord", hn) self.m(hn.last, "Duncan", hn) @unittest.expectedFailure def test_conjunction_in_an_address_with_a_first_name_title(self): hn = HumanName("Her Majesty Queen Elizabeth") self.m(hn.title, "Her Majesty Queen", hn) # if you want to be technical, Queen is in FIRST_NAME_TITLES self.m(hn.first, "Elizabeth", hn) def test_name_is_conjunctions(self): hn = HumanName("e and e") self.m(hn.first, "e and e", hn) class ConstantsCustomization(HumanNameTestBase): def test_add_title(self): hn = HumanName("Te Awanui-a-Rangi Black", constants=None) hn.C.titles.add('te') hn.parse_full_name() self.m(hn.title,"Te", hn) self.m(hn.first,"Awanui-a-Rangi", hn) self.m(hn.last,"Black", hn) def test_remove_title(self): hn = HumanName("Hon Solo", constants=None) hn.C.titles.remove('hon') hn.parse_full_name() self.m(hn.first,"Hon", hn) self.m(hn.last,"Solo", hn) def test_add_multiple_arguments(self): hn = HumanName("Assoc Dean of Chemistry Robert Johns", constants=None) hn.C.titles.add('dean', 'Chemistry') hn.parse_full_name() self.m(hn.title,"Assoc Dean of Chemistry", hn) self.m(hn.first,"Robert", hn) self.m(hn.last,"Johns", hn) def test_instances_can_have_own_constants(self): hn = HumanName("", None) hn2 = HumanName("") hn.C.titles.remove('hon') self.assertEqual('hon' in hn.C.titles, False) self.assertEqual(hn.has_own_config, True) self.assertEqual('hon' in hn2.C.titles, True) self.assertEqual(hn2.has_own_config, False) def test_can_change_global_constants(self): hn = HumanName("") hn2 = HumanName("") hn.C.titles.remove('hon') self.assertEqual('hon' in hn.C.titles, False) self.assertEqual('hon' in hn2.C.titles, False) self.assertEqual(hn.has_own_config, False) self.assertEqual(hn2.has_own_config, False) # clean up so we don't mess up other tests hn.C.titles.add('hon') def test_remove_multiple_arguments(self): hn = HumanName("Ms Hon Solo", constants=None) hn.C.titles.remove('hon', 'ms') hn.parse_full_name() self.m(hn.first,"Ms", hn) self.m(hn.middle,"Hon", hn) self.m(hn.last,"Solo", hn) def test_chain_multiple_arguments(self): hn = HumanName("Dean Ms Hon Solo", constants=None) hn.C.titles.remove('hon', 'ms').add('dean') hn.parse_full_name() self.m(hn.title,"Dean", hn) self.m(hn.first,"Ms", hn) self.m(hn.middle,"Hon", hn) self.m(hn.last,"Solo", hn) def test_empty_attribute_default(self): from nameparser.config import CONSTANTS _orig = CONSTANTS.empty_attribute_default CONSTANTS.empty_attribute_default = None hn = HumanName("") self.m(hn.title, None, hn) self.m(hn.first, None, hn) self.m(hn.middle, None, hn) self.m(hn.last, None, hn) self.m(hn.suffix, None, hn) self.m(hn.nickname, None, hn) CONSTANTS.empty_attribute_default = _orig def test_empty_attribute_on_instance(self): hn = HumanName("", None) hn.C.empty_attribute_default = None self.m(hn.title, None, hn) self.m(hn.first, None, hn) self.m(hn.middle, None, hn) self.m(hn.last, None, hn) self.m(hn.suffix, None, hn) self.m(hn.nickname, None, hn) def test_none_empty_attribute_string_formatting(self): hn = HumanName("", None) hn.C.empty_attribute_default = None self.assertEqual('', str(hn), hn) def test_add_constant_with_explicit_encoding(self): c = Constants() c.titles.add_with_encoding(b'b\351ck', encoding='latin_1') self.assertIn('béck', c.titles) class HumanNameNicknameTestCase(HumanNameTestBase): # https://code.google.com/p/python-nameparser/issues/detail?id=33 def test_nickname_in_parenthesis(self): hn = HumanName("Benjamin (Ben) Franklin") self.m(hn.first, "Benjamin", hn) self.m(hn.middle, "", hn) self.m(hn.last, "Franklin", hn) self.m(hn.nickname, "Ben", hn) def test_nickname_in_parenthesis_with_comma(self): hn = HumanName("Franklin, Benjamin (Ben)") self.m(hn.first, "Benjamin", hn) self.m(hn.middle, "", hn) self.m(hn.last, "Franklin", hn) self.m(hn.nickname, "Ben", hn) def test_nickname_in_parenthesis_with_comma_and_suffix(self): hn = HumanName("Franklin, Benjamin (Ben), Jr.") self.m(hn.first, "Benjamin", hn) self.m(hn.middle, "", hn) self.m(hn.last, "Franklin", hn) self.m(hn.suffix, "Jr.", hn) self.m(hn.nickname, "Ben", hn) # it would be hard to support this without breaking some of the # other examples with single quotes in the names. @unittest.expectedFailure def test_nickname_in_single_quotes(self): hn = HumanName("Benjamin 'Ben' Franklin") self.m(hn.first, "Benjamin", hn) self.m(hn.middle, "", hn) self.m(hn.last, "Franklin", hn) self.m(hn.nickname, "Ben", hn) def test_nickname_in_double_quotes(self): hn = HumanName("Benjamin \"Ben\" Franklin") self.m(hn.first, "Benjamin", hn) self.m(hn.middle, "", hn) self.m(hn.last, "Franklin", hn) self.m(hn.nickname, "Ben", hn) def test_single_quotes_on_first_name_not_treated_as_nickname(self): hn = HumanName("Brian O'connor") self.m(hn.first, "Brian", hn) self.m(hn.middle, "", hn) self.m(hn.last, "O'connor", hn) self.m(hn.nickname, "", hn) def test_single_quotes_on_both_name_not_treated_as_nickname(self): hn = HumanName("La'tanya O'connor") self.m(hn.first, "La'tanya", hn) self.m(hn.middle, "", hn) self.m(hn.last, "O'connor", hn) self.m(hn.nickname, "", hn) def test_single_quotes_on_end_of_last_name_not_treated_as_nickname(self): hn = HumanName("Mari' Aube'") self.m(hn.first, "Mari'", hn) self.m(hn.middle, "", hn) self.m(hn.last, "Aube'", hn) self.m(hn.nickname, "", hn) #http://code.google.com/p/python-nameparser/issues/detail?id=17 def test_parenthesis_are_removed(self): hn = HumanName("John Jones (Google Docs)") self.m(hn.first, "John", hn) self.m(hn.last, "Jones", hn) # not testing the nicknames because we don't actually care # about Google Docs. def test_parenthesis_are_removed2(self): hn = HumanName("John Jones (Google Docs), Jr. (Unknown)") self.m(hn.first, "John", hn) self.m(hn.last, "Jones", hn) self.m(hn.suffix, "Jr.", hn) class PrefixesTestCase(HumanNameTestBase): def test_prefix(self): hn = HumanName("Juan del Sur") self.m(hn.first, "Juan", hn) self.m(hn.last, "del Sur", hn) def test_prefix_with_period(self): hn = HumanName("Jill St. John") self.m(hn.first, "Jill", hn) self.m(hn.last, "St. John", hn) def test_prefix_before_two_part_last_name(self): hn = HumanName("pennie von bergen wessels") self.m(hn.first, "pennie", hn) self.m(hn.last, "von bergen wessels", hn) def test_prefix_before_two_part_last_name_with_suffix(self): hn = HumanName("pennie von bergen wessels III") self.m(hn.first, "pennie", hn) self.m(hn.last, "von bergen wessels", hn) self.m(hn.suffix, "III", hn) def test_two_part_last_name_with_suffix_comma(self): hn = HumanName("pennie von bergen wessels, III") self.m(hn.first, "pennie", hn) self.m(hn.last, "von bergen wessels", hn) self.m(hn.suffix, "III", hn) def test_two_part_last_name_with_suffix(self): hn = HumanName("von bergen wessels, pennie III") self.m(hn.first, "pennie", hn) self.m(hn.last, "von bergen wessels", hn) self.m(hn.suffix, "III", hn) class SuffixesTestCase(HumanNameTestBase): def test_suffix(self): hn = HumanName("Joe Franklin Jr") self.m(hn.first, "Joe", hn) self.m(hn.last, "Franklin", hn) self.m(hn.suffix, "Jr", hn) def test_suffix_with_periods(self): hn = HumanName("Joe Dentist D.D.S.") self.m(hn.first, "Joe", hn) self.m(hn.last, "Dentist", hn) self.m(hn.suffix, "D.D.S.", hn) def test_two_suffixes(self): hn = HumanName("Kenneth Clarke QC MP") self.m(hn.first, "Kenneth", hn) self.m(hn.last, "Clarke", hn) # NOTE: this adds a comma when the original format did not have one. # not ideal but at least its in the right bucket self.m(hn.suffix, "QC, MP", hn) def test_two_suffixes_lastname_comma_format(self): hn = HumanName("Washington Jr. MD, Franklin") self.m(hn.first, "Franklin", hn) self.m(hn.last, "Washington", hn) # NOTE: this adds a comma when the original format did not have one. self.m(hn.suffix, "Jr., MD", hn) def test_two_suffixes_suffix_comma_format(self): hn = HumanName("Franklin Washington, Jr. MD") self.m(hn.first, "Franklin", hn) self.m(hn.last, "Washington", hn) self.m(hn.suffix, "Jr. MD", hn) def test_suffix_containing_periods(self): hn = HumanName("Kenneth Clarke Q.C.") self.m(hn.first, "Kenneth", hn) self.m(hn.last, "Clarke", hn) self.m(hn.suffix, "Q.C.", hn) def test_suffix_containing_periods_lastname_comma_format(self): hn = HumanName("Clarke, Kenneth, Q.C. M.P.") self.m(hn.first, "Kenneth", hn) self.m(hn.last, "Clarke", hn) self.m(hn.suffix, "Q.C. M.P.", hn) def test_suffix_containing_periods_suffix_comma_format(self): hn = HumanName("Kenneth Clarke Q.C., M.P.") self.m(hn.first, "Kenneth", hn) self.m(hn.last, "Clarke", hn) self.m(hn.suffix, "Q.C., M.P.", hn) def test_suffix_with_single_comma_format(self): hn = HumanName("John Doe jr., MD") self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) self.m(hn.suffix, "jr., MD", hn) def test_suffix_with_double_comma_format(self): hn = HumanName("Doe, John jr., MD") self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) self.m(hn.suffix, "jr., MD", hn) @unittest.expectedFailure def test_phd_with_erroneous_space(self): hn = HumanName("John Smith, Ph. D.") self.m(hn.first, "John", hn) self.m(hn.last, "Smith", hn) self.m(hn.suffix, "Ph. D.", hn) #http://en.wikipedia.org/wiki/Ma_(surname) def test_potential_suffix_that_is_also_last_name(self): hn = HumanName("Jack Ma") self.m(hn.first, "Jack", hn) self.m(hn.last, "Ma", hn) def test_potential_suffix_that_is_also_last_name_comma(self): hn = HumanName("Ma, Jack") self.m(hn.first, "Jack", hn) self.m(hn.last, "Ma", hn) def test_potential_suffix_that_is_also_first_name_comma(self): hn = HumanName("Johnson, Bart") self.m(hn.first, "Bart", hn) self.m(hn.last, "Johnson", hn) # TODO: handle conjunctions in last names followed by first names clashing with suffixes @unittest.expectedFailure def test_potential_suffix_that_is_also_first_name_comma_with_conjunction(self): hn = HumanName("De la Vina, Bart") self.m(hn.first, "Bart", hn) self.m(hn.last, "De la Vina", hn) def test_potential_suffix_that_is_also_last_name_with_suffix(self): hn = HumanName("Jack Ma Jr") self.m(hn.first, "Jack", hn) self.m(hn.last, "Ma", hn) self.m(hn.suffix, "Jr", hn) def test_potential_suffix_that_is_also_last_name_with_suffix_comma(self): hn = HumanName("Ma III, Jack Jr") self.m(hn.first, "Jack", hn) self.m(hn.last, "Ma", hn) self.m(hn.suffix, "III, Jr", hn) # https://github.com/derek73/python-nameparser/issues/27 @unittest.expectedFailure def test_king(self): hn = HumanName("Dr King Jr") self.m(hn.title, "Dr", hn) self.m(hn.last, "King", hn) self.m(hn.suffix, "Jr", hn) def test_suffix_with_periods(self): hn = HumanName("John Doe Msc.Ed.") self.m(hn.first,"John", hn) self.m(hn.last,"Doe", hn) self.m(hn.suffix,"Msc.Ed.", hn) def test_suffix_with_periods_with_comma(self): hn = HumanName("John Doe, Msc.Ed.") self.m(hn.first,"John", hn) self.m(hn.last,"Doe", hn) self.m(hn.suffix,"Msc.Ed.", hn) def test_suffix_with_periods_with_lastname_comma(self): hn = HumanName("Doe, John Msc.Ed.") self.m(hn.first,"John", hn) self.m(hn.last,"Doe", hn) self.m(hn.suffix,"Msc.Ed.", hn) class TitleTestCase(HumanNameTestBase): def test_last_name_is_also_title(self): hn = HumanName("Amy E Maid") self.m(hn.first, "Amy", hn) self.m(hn.middle, "E", hn) self.m(hn.last, "Maid", hn) def test_last_name_is_also_title_no_comma(self): hn = HumanName("Dr. Martin Luther King Jr.") self.m(hn.title, "Dr.", hn) self.m(hn.first, "Martin", hn) self.m(hn.middle, "Luther", hn) self.m(hn.last, "King", hn) self.m(hn.suffix, "Jr.", hn) def test_last_name_is_also_title_with_comma(self): hn = HumanName("Duke Martin Luther King, Jr.") self.m(hn.title, "Duke", hn) self.m(hn.first, "Martin", hn) self.m(hn.middle, "Luther", hn) self.m(hn.last, "King", hn) self.m(hn.suffix, "Jr.", hn) def test_last_name_is_also_title3(self): hn = HumanName("John King") self.m(hn.first, "John", hn) self.m(hn.last, "King", hn) def test_title_with_conjunction(self): hn = HumanName("Secretary of State Hillary Clinton") self.m(hn.title, "Secretary of State", hn) self.m(hn.first, "Hillary", hn) self.m(hn.last, "Clinton", hn) def test_compound_title_with_conjunction(self): hn = HumanName("Cardinal Secretary of State Hillary Clinton") self.m(hn.title, "Cardinal Secretary of State", hn) self.m(hn.first, "Hillary", hn) self.m(hn.last, "Clinton", hn) def test_title_is_title(self): hn = HumanName("Coach") self.m(hn.title, "Coach", hn) # TODO: fix handling of U.S. @unittest.expectedFailure def test_chained_title_first_name_initial(self): hn = HumanName("U.S. District Judge Marc Thomas Treadwell") self.m(hn.title, "U.S. District Judge", hn) self.m(hn.first, "Marc", hn) self.m(hn.middle, "Thomas", hn) self.m(hn.last, "Treadwell", hn) def test_conflict_with_chained_title_first_name_initial(self): hn = HumanName("U. S. Grant") self.m(hn.first, "U.", hn) self.m(hn.middle, "S.", hn) self.m(hn.last, "Grant", hn) def test_chained_title_first_name_initial(self): hn = HumanName("US Magistrate Judge T Michael Putnam") self.m(hn.title, "US Magistrate Judge", hn) self.m(hn.first, "T", hn) self.m(hn.middle, "Michael", hn) self.m(hn.last, "Putnam", hn) def test_chained_hyphenated_title(self): hn = HumanName("US Magistrate-Judge Elizabeth E Campbell") self.m(hn.title, "US Magistrate-Judge", hn) self.m(hn.first, "Elizabeth", hn) self.m(hn.middle, "E", hn) self.m(hn.last, "Campbell", hn) def test_chained_hyphenated_title_with_comma_suffix(self): hn = HumanName("Mag-Judge Harwell G Davis, III") self.m(hn.title, "Mag-Judge", hn) self.m(hn.first, "Harwell", hn) self.m(hn.middle, "G", hn) self.m(hn.last, "Davis", hn) self.m(hn.suffix, "III", hn) @unittest.expectedFailure def test_title_multiple_titles_with_apostrophe_s(self): hn = HumanName("The Right Hon. the President of the Queen's Bench Division") self.m(hn.title, "The Right Hon. the President of the Queen's Bench Division", hn) def test_title_starts_with_conjunction(self): hn = HumanName("The Rt Hon John Jones") self.m(hn.title, "The Rt Hon", hn) self.m(hn.first, "John", hn) self.m(hn.last, "Jones", hn) def test_conjunction_before_title(self): hn = HumanName('The Lord of the Universe') self.m(hn.title, "The Lord of the Universe", hn) def test_double_conjunction_on_title(self): hn = HumanName('Lord of the Universe') self.m(hn.title, "Lord of the Universe", hn) def test_triple_conjunction_on_title(self): hn = HumanName('Lord and of the Universe') self.m(hn.title, "Lord and of the Universe", hn) def test_multiple_conjunctions_on_multiple_titles(self): hn = HumanName('Lord of the Universe and Associate Supreme Queen of the World Lisa Simpson') self.m(hn.title, "Lord of the Universe and Associate Supreme Queen of the World", hn) self.m(hn.first, "Lisa", hn) self.m(hn.last, "Simpson", hn) def test_title_with_last_initial_is_suffix(self): hn = HumanName("King John V.") self.m(hn.title, "King", hn) self.m(hn.first, "John", hn) self.m(hn.last, "V.", hn) def test_initials_also_suffix(self): hn = HumanName("Smith, J.R.") self.m(hn.first, "J.R.", hn) # self.m(hn.middle, "R.", hn) self.m(hn.last, "Smith", hn) def test_two_title_parts_separated_by_periods(self): hn = HumanName("Lt.Gen. John A. Kenneth Doe IV") self.m(hn.title, "Lt.Gen.", hn) self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) self.m(hn.middle, "A. Kenneth", hn) self.m(hn.suffix, "IV", hn) def test_two_part_title(self): hn = HumanName("Lt. Gen. John A. Kenneth Doe IV") self.m(hn.title, "Lt. Gen.", hn) self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) self.m(hn.middle, "A. Kenneth", hn) self.m(hn.suffix, "IV", hn) def test_two_part_title_with_lastname_comma(self): hn = HumanName("Doe, Lt. Gen. John A. Kenneth IV") self.m(hn.title, "Lt. Gen.", hn) self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) self.m(hn.middle, "A. Kenneth", hn) self.m(hn.suffix, "IV", hn) def test_two_part_title_with_suffix_comma(self): hn = HumanName("Lt. Gen. John A. Kenneth Doe, Jr.") self.m(hn.title, "Lt. Gen.", hn) self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) self.m(hn.middle, "A. Kenneth", hn) self.m(hn.suffix, "Jr.", hn) def test_possible_conflict_with_middle_initial_that_could_be_suffix(self): hn = HumanName("Doe, Rev. John V, Jr.") self.m(hn.title, "Rev.", hn) self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) self.m(hn.middle, "V", hn) self.m(hn.suffix, "Jr.", hn) def test_possible_conflict_with_suffix_that_could_be_initial(self): hn = HumanName("Doe, Rev. John A., V, Jr.") self.m(hn.title, "Rev.", hn) self.m(hn.first, "John", hn) self.m(hn.last, "Doe", hn) self.m(hn.middle, "A.", hn) self.m(hn.suffix, "V, Jr.", hn) # 'ben' is removed from PREFIXES in v0.2.5 # this test could re-enable this test if we decide to support 'ben' as a prefix @unittest.expectedFailure def test_ben_as_conjunction(self): hn = HumanName("Ahmad ben Husain") self.m(hn.first,"Ahmad", hn) self.m(hn.last,"ben Husain", hn) def test_ben_as_first_name(self): hn = HumanName("Ben Johnson") self.m(hn.first, "Ben", hn) self.m(hn.last, "Johnson", hn) def test_ben_as_first_name_with_middle_name(self): hn = HumanName("Ben Alex Johnson") self.m(hn.first, "Ben", hn) self.m(hn.middle, "Alex", hn) self.m(hn.last, "Johnson", hn) def test_ben_as_middle_name(self): hn = HumanName("Alex Ben Johnson") self.m(hn.first, "Alex", hn) self.m(hn.middle, "Ben", hn) self.m(hn.last, "Johnson", hn) # http://code.google.com/p/python-nameparser/issues/detail?id=13 def test_last_name_also_prefix(self): hn = HumanName("Jane Doctor") self.m(hn.first, "Jane", hn) self.m(hn.last, "Doctor", hn) def test_title_with_periods(self): hn = HumanName("Lt.Gov. John Doe") self.m(hn.title,"Lt.Gov.", hn) self.m(hn.first,"John", hn) self.m(hn.last,"Doe", hn) def test_title_with_periods_lastname_comma(self): hn = HumanName("Doe, Lt.Gov. John") self.m(hn.title,"Lt.Gov.", hn) self.m(hn.first,"John", hn) self.m(hn.last,"Doe", hn) class HumanNameCapitalizationTestCase(HumanNameTestBase): def test_capitalization_exception_for_III(self): hn = HumanName('juan q. xavier velasquez y garcia iii') hn.capitalize() self.m(str(hn), 'Juan Q. Xavier Velasquez y Garcia III', hn) # FIXME: this test does not pass due to a known issue # http://code.google.com/p/python-nameparser/issues/detail?id=22 @unittest.expectedFailure def test_capitalization_exception_for_already_capitalized_III_KNOWN_FAILURE(self): hn = HumanName('juan garcia III') hn.capitalize() self.m(str(hn), 'Juan Garcia III', hn) def test_capitalize_title(self): hn = HumanName('lt. gen. john a. kenneth doe iv') hn.capitalize() self.m(str(hn), 'Lt. Gen. John A. Kenneth Doe IV', hn) def test_capitalize_title_to_lower(self): hn = HumanName('LT. GEN. JOHN A. KENNETH DOE IV') hn.capitalize() self.m(str(hn), 'Lt. Gen. John A. Kenneth Doe IV', hn) # Capitalization with M(a)c and hyphenated names def test_capitalization_with_Mac_as_hyphenated_names(self): hn = HumanName('donovan mcnabb-smith') hn.capitalize() self.m(str(hn), 'Donovan McNabb-Smith', hn) def test_capitization_middle_initial_is_also_a_conjunction(self): hn = HumanName('scott e. werner') hn.capitalize() self.m(str(hn), 'Scott E. Werner', hn) # Leaving already-capitalized names alone def test_no_change_to_mixed_chase(self): hn = HumanName('Shirley Maclaine') hn.capitalize() self.m(str(hn), 'Shirley Maclaine', hn) def test_force_capitalization(self): hn = HumanName('Shirley Maclaine') hn.capitalize(force=True) self.m(str(hn), 'Shirley MacLaine', hn) def test_capitalize_diacritics(self): hn = HumanName('matthëus schmidt') hn.capitalize() self.m(u(hn), 'Matthëus Schmidt', hn) # http://code.google.com/p/python-nameparser/issues/detail?id=15 def test_downcasing_mac(self): hn = HumanName('RONALD MACDONALD') hn.capitalize() self.m(str(hn), 'Ronald MacDonald', hn) # http://code.google.com/p/python-nameparser/issues/detail?id=23 def test_downcasing_mc(self): hn = HumanName('RONALD MCDONALD') hn.capitalize() self.m(str(hn), 'Ronald McDonald', hn) def test_short_names_with_mac(self): hn = HumanName('mack johnson') hn.capitalize() self.m(str(hn), 'Mack Johnson', hn) class HumanNameOutputFormatTests(HumanNameTestBase): def test_formatting_init_argument(self): hn = HumanName("Rev John A. Kenneth Doe III (Kenny)", string_format = "TEST1") self.assertEqual(u(hn), "TEST1") def test_formatting_constants_attribute(self): from nameparser.config import CONSTANTS _orig = CONSTANTS.string_format CONSTANTS.string_format = "TEST2" hn = HumanName("Rev John A. Kenneth Doe III (Kenny)") self.assertEqual(u(hn), "TEST2") CONSTANTS.string_format = _orig def test_quote_nickname_formating(self): hn = HumanName("Rev John A. Kenneth Doe III (Kenny)") hn.string_format = "{title} {first} {middle} {last} {suffix} '{nickname}'" self.assertEqual(u(hn), "Rev John A. Kenneth Doe III 'Kenny'") hn.string_format = "{last}, {title} {first} {middle}, {suffix} '{nickname}'" self.assertEqual(u(hn), "Doe, Rev John A. Kenneth, III 'Kenny'") def test_formating_removing_keys_from_format_string(self): hn = HumanName("Rev John A. Kenneth Doe III (Kenny)") hn.string_format = "{title} {first} {middle} {last} {suffix} '{nickname}'" self.assertEqual(u(hn), "Rev John A. Kenneth Doe III 'Kenny'") hn.string_format = "{last}, {title} {first} {middle}, {suffix}" self.assertEqual(u(hn), "Doe, Rev John A. Kenneth, III") hn.string_format = "{last}, {title} {first} {middle}" self.assertEqual(u(hn), "Doe, Rev John A. Kenneth") hn.string_format = "{last}, {first} {middle}" self.assertEqual(u(hn), "Doe, John A. Kenneth") hn.string_format = "{last}, {first}" self.assertEqual(u(hn), "Doe, John") hn.string_format = "{first} {last}" self.assertEqual(u(hn), "John Doe") def test_formating_removing_pieces_from_name_buckets(self): hn = HumanName("Rev John A. Kenneth Doe III (Kenny)") hn.string_format = "{title} {first} {middle} {last} {suffix} '{nickname}'" self.assertEqual(u(hn), "Rev John A. Kenneth Doe III 'Kenny'") hn.string_format = "{title} {first} {middle} {last} {suffix}" self.assertEqual(u(hn), "Rev John A. Kenneth Doe III") hn.middle='' self.assertEqual(u(hn), "Rev John Doe III") hn.suffix='' self.assertEqual(u(hn), "Rev John Doe") hn.title='' self.assertEqual(u(hn), "John Doe") def test_formating_of_nicknames_with_parenthesis(self): hn = HumanName("Rev John A. Kenneth Doe III (Kenny)") hn.string_format = "{title} {first} {middle} {last} {suffix} ({nickname})" self.assertEqual(u(hn), "Rev John A. Kenneth Doe III (Kenny)") hn.nickname='' self.assertEqual(u(hn), "Rev John A. Kenneth Doe III") def test_formating_of_nicknames_with_single_quotes(self): hn = HumanName("Rev John A. Kenneth Doe III (Kenny)") hn.string_format = "{title} {first} {middle} {last} {suffix} '{nickname}'" self.assertEqual(u(hn), "Rev John A. Kenneth Doe III 'Kenny'") hn.nickname='' self.assertEqual(u(hn), "Rev John A. Kenneth Doe III") def test_formating_of_nicknames_with_double_quotes(self): hn = HumanName("Rev John A. Kenneth Doe III (Kenny)") hn.string_format = "{title} {first} {middle} {last} {suffix} \"{nickname}\"" self.assertEqual(u(hn), "Rev John A. Kenneth Doe III \"Kenny\"") hn.nickname='' self.assertEqual(u(hn), "Rev John A. Kenneth Doe III") def test_formating_of_nicknames_in_middle(self): hn = HumanName("Rev John A. Kenneth Doe III (Kenny)") hn.string_format = "{title} {first} ({nickname}) {middle} {last} {suffix}" self.assertEqual(u(hn), "Rev John (Kenny) A. Kenneth Doe III") hn.nickname='' self.assertEqual(u(hn), "Rev John A. Kenneth Doe III") def test_remove_emojis(self): hn = HumanName("Sam Smith 😊") self.m(hn.first,"Sam", hn) self.m(hn.last,"Smith", hn) self.assertEqual(u(hn), "Sam Smith") def test_keep_non_emojis(self): hn = HumanName("∫≜⩕ Smith 😊") self.m(hn.first,"∫≜⩕", hn) self.m(hn.last,"Smith", hn) self.assertEqual(u(hn), "∫≜⩕ Smith") def test_keep_emojis(self): from nameparser.config import Constants constants = Constants() constants.regexes.emoji = False hn = HumanName("∫≜⩕ Smith😊", constants) self.m(hn.first,"∫≜⩕", hn) self.m(hn.last,"Smith😊", hn) self.assertEqual(u(hn), "∫≜⩕ Smith😊") # test cleanup TEST_NAMES = ( "John Doe", "John Doe, Jr.", "John Doe III", "Doe, John", "Doe, John, Jr.", "Doe, John III", "John A. Doe", "John A. Doe, Jr.", "John A. Doe III", "Doe, John A.", "Doe, John A., Jr.", "Doe, John A. III", "John A. Kenneth Doe", "John A. Kenneth Doe, Jr.", "John A. Kenneth Doe III", "Doe, John A. Kenneth", "Doe, John A. Kenneth, Jr.", "Doe, John A. Kenneth III", "Dr. John Doe", "Dr. John Doe, Jr.", "Dr. John Doe III", "Doe, Dr. John", "Doe, Dr. John, Jr.", "Doe, Dr. John III", "Dr. John A. Doe", "Dr. John A. Doe, Jr.", "Dr. John A. Doe III", "Doe, Dr. John A.", "Doe, Dr. John A. Jr.", "Doe, Dr. John A. III", "Dr. John A. Kenneth Doe", "Dr. John A. Kenneth Doe, Jr.", "Dr. John A. Kenneth Doe III", "Doe, Dr. John A. Kenneth", "Doe, Dr. John A. Kenneth Jr.", "Doe, Dr. John A. Kenneth III", "Juan de la Vega", "Juan de la Vega, Jr.", "Juan de la Vega III", "de la Vega, Juan", "de la Vega, Juan, Jr.", "de la Vega, Juan III", "Juan Velasquez y Garcia", "Juan Velasquez y Garcia, Jr.", "Juan Velasquez y Garcia III", "Velasquez y Garcia, Juan", "Velasquez y Garcia, Juan, Jr.", "Velasquez y Garcia, Juan III", "Dr. Juan de la Vega", "Dr. Juan de la Vega, Jr.", "Dr. Juan de la Vega III", "de la Vega, Dr. Juan", "de la Vega, Dr. Juan, Jr.", "de la Vega, Dr. Juan III", "Dr. Juan Velasquez y Garcia", "Dr. Juan Velasquez y Garcia, Jr.", "Dr. Juan Velasquez y Garcia III", "Velasquez y Garcia, Dr. Juan", "Velasquez y Garcia, Dr. Juan, Jr.", "Velasquez y Garcia, Dr. Juan III", "Juan Q. de la Vega", "Juan Q. de la Vega, Jr.", "Juan Q. de la Vega III", "de la Vega, Juan Q.", "de la Vega, Juan Q., Jr.", "de la Vega, Juan Q. III", "Juan Q. Velasquez y Garcia", "Juan Q. Velasquez y Garcia, Jr.", "Juan Q. Velasquez y Garcia III", "Velasquez y Garcia, Juan Q.", "Velasquez y Garcia, Juan Q., Jr.", "Velasquez y Garcia, Juan Q. III", "Dr. Juan Q. de la Vega", "Dr. Juan Q. de la Vega, Jr.", "Dr. Juan Q. de la Vega III", "de la Vega, Dr. Juan Q.", "de la Vega, Dr. Juan Q., Jr.", "de la Vega, Dr. Juan Q. III", "Dr. Juan Q. Velasquez y Garcia", "Dr. Juan Q. Velasquez y Garcia, Jr.", "Dr. Juan Q. Velasquez y Garcia III", "Velasquez y Garcia, Dr. Juan Q.", "Velasquez y Garcia, Dr. Juan Q., Jr.", "Velasquez y Garcia, Dr. Juan Q. III", "Juan Q. Xavier de la Vega", "Juan Q. Xavier de la Vega, Jr.", "Juan Q. Xavier de la Vega III", "de la Vega, Juan Q. Xavier", "de la Vega, Juan Q. Xavier, Jr.", "de la Vega, Juan Q. Xavier III", "Juan Q. Xavier Velasquez y Garcia", "Juan Q. Xavier Velasquez y Garcia, Jr.", "Juan Q. Xavier Velasquez y Garcia III", "Velasquez y Garcia, Juan Q. Xavier", "Velasquez y Garcia, Juan Q. Xavier, Jr.", "Velasquez y Garcia, Juan Q. Xavier III", "Dr. Juan Q. Xavier de la Vega", "Dr. Juan Q. Xavier de la Vega, Jr.", "Dr. Juan Q. Xavier de la Vega III", "de la Vega, Dr. Juan Q. Xavier", "de la Vega, Dr. Juan Q. Xavier, Jr.", "de la Vega, Dr. Juan Q. Xavier III", "Dr. Juan Q. Xavier Velasquez y Garcia", "Dr. Juan Q. Xavier Velasquez y Garcia, Jr.", "Dr. Juan Q. Xavier Velasquez y Garcia III", "Velasquez y Garcia, Dr. Juan Q. Xavier", "Velasquez y Garcia, Dr. Juan Q. Xavier, Jr.", "Velasquez y Garcia, Dr. Juan Q. Xavier III", "John Doe, CLU, CFP, LUTC", "John P. Doe, CLU, CFP, LUTC", "Dr. John P. Doe-Ray, CLU, CFP, LUTC", "Doe-Ray, Dr. John P., CLU, CFP, LUTC", "Hon. Barrington P. Doe-Ray, Jr.", "Doe-Ray, Hon. Barrington P. Jr.", "Doe-Ray, Hon. Barrington P. Jr., CFP, LUTC", "Jose Aznar y Lopez", "John E Smith", "John e Smith", "John and Jane Smith", "Rev. John A. Kenneth Doe", "Donovan McNabb-Smith", "Rev John A. Kenneth Doe", "Doe, Rev. John A. Jr.", "Buca di Beppo", "Lt. Gen. John A. Kenneth Doe, Jr.", "Doe, Lt. Gen. John A. Kenneth IV", "Lt. Gen. John A. Kenneth Doe IV", 'Mr. and Mrs. John Smith', 'John Jones (Google Docs)', 'john e jones', 'john e jones, III', 'jones, john e', 'E.T. Smith', 'E.T. Smith, II', 'Smith, E.T., Jr.', 'A.B. Vajpayee', 'Rt. Hon. Paul E. Mary', 'Maid Marion', 'Amy E. Maid', 'Jane Doctor', 'Doctor, Jane E.', 'dr. ben alex johnson III', 'Lord of the Universe and Supreme King of the World Lisa Simpson', 'Benjamin (Ben) Franklin', 'Benjamin "Ben" Franklin', "Brian O'connor", "Sir Gerald", "Magistrate Judge John F. Forster, Jr", # "Magistrate Judge Joaquin V.E. Manibusan, Jr", Intials seem to mess this up "Magistrate-Judge Elizabeth Todd Campbell", "Mag-Judge Harwell G Davis, III", "Mag. Judge Byron G. Cudmore", "Chief Judge J. Leon Holmes", "Chief Judge Sharon Lovelace Blackburn", "Judge James M. Moody", "Judge G. Thomas Eisele", # "Judge Callie V. S. Granade", "Judge C Lynwood Smith, Jr", "Senior Judge Charles R. Butler, Jr", "Senior Judge Harold D. Vietor", "Senior Judge Virgil Pittman", "Honorable Terry F. Moorer", "Honorable W. Harold Albritton, III", "Honorable Judge W. Harold Albritton, III", "Honorable Judge Terry F. Moorer", "Honorable Judge Susan Russ Walker", "Hon. Marian W. Payson", "Hon. Charles J. Siragusa", "US Magistrate Judge T Michael Putnam", "Designated Judge David A. Ezra", "Sr US District Judge Richard G Kopf", "U.S. District Judge Marc Thomas Treadwell", ) class HumanNameVariationTests(HumanNameTestBase): # test automated variations of names in TEST_NAMES. # Helps test that the 3 code trees work the same TEST_NAMES = TEST_NAMES def test_variations_of_TEST_NAMES(self): for name in self.TEST_NAMES: hn = HumanName(name) if len(hn.suffix_list) > 1: hn = HumanName("{title} {first} {middle} {last} {suffix}".format(**hn.as_dict()).split(',')[0]) hn.C.empty_attribute_default = '' # format strings below require empty string hn_dict = hn.as_dict() attrs = [ 'title', 'first', 'middle', 'last', 'suffix', 'nickname', ] nocomma = HumanName("{title} {first} {middle} {last} {suffix}".format(**hn_dict)) lastnamecomma = HumanName("{last}, {title} {first} {middle} {suffix}".format(**hn_dict)) if hn.suffix: suffixcomma = HumanName("{title} {first} {middle} {last}, {suffix}".format(**hn_dict)) if hn.nickname: nocomma = HumanName("{title} {first} {middle} {last} {suffix} ({nickname})".format(**hn_dict)) lastnamecomma = HumanName("{last}, {title} {first} {middle} {suffix} ({nickname})".format(**hn_dict)) if hn.suffix: suffixcomma = HumanName("{title} {first} {middle} {last}, {suffix} ({nickname})".format(**hn_dict)) for attr in hn._members: self.m(getattr(hn, attr), getattr(nocomma, attr), hn) self.m(getattr(hn, attr), getattr(lastnamecomma, attr), hn) if hn.suffix: self.m(getattr(hn, attr), getattr(suffixcomma, attr), hn) if __name__ == '__main__': import sys if len(sys.argv) > 1: log.setLevel(logging.ERROR) log.addHandler(logging.StreamHandler()) name = sys.argv[1] hn = HumanName(name, encoding=sys.stdout.encoding) print((repr(hn))) hn.capitalize() print((repr(hn))) else: print("-"*80) print("Running tests") unittest.main(exit=False) print("-"*80) print("Running tests with empty_attribute_default = None") from nameparser.config import CONSTANTS CONSTANTS.empty_attribute_default = None unittest.main()