intervaltree-2.1.0/0000755000076500000240000000000012522725110015253 5ustar chaimleibstaff00000000000000intervaltree-2.1.0/CHANGELOG.md0000644000076500000240000001645612522724400017101 0ustar chaimleibstaff00000000000000Change log ========== Version 2.1.0 ------------- - Added: - `merge_overlaps()` method and tests - `merge_equals()` method and tests - `range()` method - `span()` method, for returning the difference between `end()` and `begin()` - Fixes: - Development version numbering is changing to be compliant with PEP440. Version numbering now contains major, minor and micro release numbers, plus the number of builds following the stable release version, e.g. 2.0.4b34 - Speed improvement: `begin()` and `end()` methods used iterative `min()` and `max()` builtins instead of the more efficient `iloc` member available to `SortedDict` - `overlaps()` method used to return `True` even if provided null test interval - Maintainers: - Added coverage test (`make coverage`) with html report (`htmlcov/index.html`) - Tests run slightly faster Version 2.0.4 ------------- - Fix: Issue #27: README incorrectly showed using a comma instead of a colon when querying the `IntervalTree`: it showed `tree[begin, end]` instead of `tree[begin:end]` Version 2.0.3 ------------- - Fix: README showed using + operator for setlike union instead of the correct | operator - Removed tests from release package to speed up installation; to get the tests, download from GitHub Version 2.0.2 ------------- - Fix: Issue #20: performance enhancement for large trees. `IntervalTree.search()` made a copy of the entire `boundary_table` resulting in linear search time. The `sortedcollections` package is now the sole install dependency Version 2.0.1 ------------- - Fix: Issue #26: failed to prune empty `Node` after a rotation promoted contents of `s_center` Version 2.0.0 ------------- - `IntervalTree` now supports the full `collections.MutableSet` API - Added: - `__delitem__` to `IntervalTree` - `Interval` comparison methods `lt()`, `gt()`, `le()` and `ge()` to `Interval`, as an alternative to the comparison operators, which are designed for sorting - `IntervalTree.from_tuples(iterable)` - `IntervalTree.clear()` - `IntervalTree.difference(iterable)` - `IntervalTree.difference_update(iterable)` - `IntervalTree.union(iterable)` - `IntervalTree.intersection(iterable)` - `IntervalTree.intersection_update(iterable)` - `IntervalTree.symmetric_difference(iterable)` - `IntervalTree.symmetric_difference_update(iterable)` - `IntervalTree.chop(a, b)` - `IntervalTree.slice(point)` - Deprecated `IntervalTree.extend()` -- use `update()` instead - Internal improvements: - More verbose tests with progress bars - More tests for comparison and sorting behavior - Code in the README is included in the unit tests - Fixes - BACKWARD INCOMPATIBLE: On ranged queries where `begin >= end`, the query operated on the overlaps of `begin`. This behavior was documented as expected in 1.x; it is now changed to be more consistent with the definition of `Interval`s, which are half-open. - Issue #25: pruning empty Nodes with staggered descendants could result in invalid trees - Sorting `Interval`s and numbers in the same list gathered all the numbers at the beginning and the `Interval`s at the end - `IntervalTree.overlaps()` and friends returned `None` instead of `False` - Maintainers: `make install-testpypi` failed because the `pip` was missing a `--pre` flag Version 1.1.1 ------------- - Removed requirement for pyandoc in order to run functionality tests. Version 1.1.0 ------------- - Added ability to use `Interval.distance_to()` with points, not just `Intervals` - Added documentation on return types to `IntervalTree` and `Interval` - `Interval.__cmp__()` works with points too - Fix: `IntervalTree.score()` returned maximum score of 0.5 instead of 1.0. Now returns max of subscores instead of avg - Internal improvements: - Development version numbering scheme, based on `git describe` the "building towards" release is appended after a hyphen, eg. 1.0.2-37-g2da2ef0-1.10. The previous tagged release is 1.0.2, and there have been 37 commits since then, current tag is g2da2ef0, and we are getting ready for a 1.1.0 release - Optimality tests added - `Interval` overlap tests for ranges, `Interval`s and points added Version 1.0.2 ------------- -Bug fixes: - `Node.depth_score_helper()` raised `AttributeError` - README formatting Version 1.0.1 ------------- - Fix: pip install failure because of failure to generate README.rst Version 1.0.0 ------------- - Renamed from PyIntervalTree to intervaltree - Speed improvements for adding and removing Intervals (~70% faster than 0.4) - Bug fixes: - BACKWARD INCOMPATIBLE: `len()` of an `Interval` is always 3, reverting to default behavior for `namedtuples`. In Python 3, `len` returning a non-integer raises an exception. Instead, use `Interval.length()`, which returns 0 for null intervals and `end - begin` otherwise. Also, if the `len() === 0`, then `not iv` is `True`. - When inserting an `Interval` via `__setitem__` and improper parameters given, all errors were transformed to `IndexError` - `split_overlaps` did not update the `boundary_table` counts - Internal improvements: - More robust local testing tools - Long series of interdependent tests have been separated into sections Version 0.4 ------------- - Faster balancing (~80% faster) - Bug fixes: - Double rotations were performed in place of a single rotation when presented an unbalanced Node with a balanced child. - During single rotation, kept referencing an unrotated Node instead of the new, rotated one Version 0.3.3 ------------- - Made IntervalTree crash if inited with a null Interval (end <= begin) - IntervalTree raises ValueError instead of AssertionError when a null Interval is inserted Version 0.3.2 ------------- - Support for Python 3.2+ and 2.6+ - Changed license from LGPL to more permissive Apache license - Merged changes from https://github.com/konstantint/PyIntervalTree to https://github.com/chaimleib/PyIntervalTree - Interval now inherits from a namedtuple. Benefits: should be faster. Drawbacks: slight behavioural change (Intervals not mutable anymore). - Added float tests - Use setup.py for tests - Automatic testing via travis-ci - Removed dependency on six - Interval improvements: - Intervals without data have a cleaner string representation - Intervals without data are pickled more compactly - Better hashing - Intervals are ordered by begin, then end, then by data. If data is not orderable, sorts by type(data) - Bug fixes: - Fixed crash when querying empty tree - Fixed missing close parenthesis in examples - Made IntervalTree crash earlier if a null Interval is added - Internals: - New test directory - Nicer display of data structures for debugging, using custom test/pprint.py (Python 2.6, 2.7) - More sensitive exception handling - Local script to test in all supported versions of Python - Added IntervalTree.score() to measure how optimally a tree is structured Version 0.2.3 ------------- - Slight changes for inclusion in PyPI. - Some documentation changes - Added tests - Bug fix: interval addition via [] was broken in Python 2.7 (see http://bugs.python.org/issue21785) - Added intervaltree.bio subpackage, adding some utilities for use in bioinformatics Version 0.2.2b -------------- - Forked from https://github.com/MusashiAharon/PyIntervalTree intervaltree-2.1.0/intervaltree/0000755000076500000240000000000012522725110017757 5ustar chaimleibstaff00000000000000intervaltree-2.1.0/intervaltree/__init__.py0000644000076500000240000000143712456654477022124 0ustar chaimleibstaff00000000000000""" intervaltree: A mutable, self-balancing interval tree for Python 2 and 3. Queries may be by point, by range overlap, or by range envelopment. Root package. Copyright 2013-2015 Chaim-Leib Halbert Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. """ from .interval import Interval from .intervaltree import IntervalTree intervaltree-2.1.0/intervaltree/interval.py0000644000076500000240000002160612456654477022211 0ustar chaimleibstaff00000000000000""" intervaltree: A mutable, self-balancing interval tree for Python 2 and 3. Queries may be by point, by range overlap, or by range envelopment. Interval class Copyright 2013-2015 Chaim-Leib Halbert Modifications copyright 2014 Konstantin Tretyakov Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. """ from numbers import Number from collections import namedtuple # noinspection PyBroadException class Interval(namedtuple('IntervalBase', ['begin', 'end', 'data'])): __slots__ = () # Saves memory, avoiding the need to create __dict__ for each interval def __new__(cls, begin, end, data=None): return super(Interval, cls).__new__(cls, begin, end, data) def overlaps(self, begin, end=None): """ Whether the interval overlaps the given point, range or Interval. :param begin: beginning point of the range, or the point, or an Interval :param end: end point of the range. Optional if not testing ranges. :return: True or False :rtype: bool """ if end is not None: return ( (begin <= self.begin < end) or (begin < self.end <= end) or (self.begin <= begin < self.end) or (self.begin < end <= self.end) ) try: return self.overlaps(begin.begin, begin.end) except: return self.contains_point(begin) def contains_point(self, p): """ Whether the Interval contains p. :param p: a point :return: True or False :rtype: bool """ return self.begin <= p < self.end def range_matches(self, other): """ Whether the begins equal and the ends equal. Compare __eq__(). :param other: Interval :return: True or False :rtype: bool """ return ( self.begin == other.begin and self.end == other.end ) def contains_interval(self, other): """ Whether other is contained in this Interval. :param other: Interval :return: True or False :rtype: bool """ return ( self.begin <= other.begin and self.end >= other.end ) def distance_to(self, other): """ Returns the size of the gap between intervals, or 0 if they touch or overlap. :param other: Interval or point :return: distance :rtype: Number """ if self.overlaps(other): return 0 try: if self.begin < other.begin: return other.begin - self.end else: return self.begin - other.end except: if self.end < other: return other - self.end else: return self.begin - other def is_null(self): """ Whether this equals the null interval. :return: True if end <= begin else False :rtype: bool """ return self.begin >= self.end def length(self): """ The distance covered by this Interval. :return: length :type: Number """ if self.is_null(): return 0 return self.end - self.begin def __hash__(self): """ Depends on begin and end only. :return: hash :rtype: Number """ return hash((self.begin, self.end)) def __eq__(self, other): """ Whether the begins equal, the ends equal, and the data fields equal. Compare range_matches(). :param other: Interval :return: True or False :rtype: bool """ return ( self.begin == other.begin and self.end == other.end and self.data == other.data ) def __cmp__(self, other): """ Tells whether other sorts before, after or equal to this Interval. Sorting is by begins, then by ends, then by data fields. If data fields are not both sortable types, data fields are compared alphabetically by type name. :param other: Interval :return: -1, 0, 1 :rtype: int """ s = self[0:2] try: o = other[0:2] except: o = (other,) if s != o: return -1 if s < o else 1 try: if self.data == other.data: return 0 return -1 if self.data < other.data else 1 except TypeError: s = type(self.data).__name__ o = type(other.data).__name__ if s == o: return 0 return -1 if s < o else 1 def __lt__(self, other): """ Less than operator. Parrots __cmp__() :param other: Interval or point :return: True or False :rtype: bool """ return self.__cmp__(other) < 0 def __gt__(self, other): """ Greater than operator. Parrots __cmp__() :param other: Interval or point :return: True or False :rtype: bool """ return self.__cmp__(other) > 0 def _raise_if_null(self, other): """ :raises ValueError: if either self or other is a null Interval """ if self.is_null(): raise ValueError("Cannot compare null Intervals!") if hasattr(other, 'is_null') and other.is_null(): raise ValueError("Cannot compare null Intervals!") def lt(self, other): """ Strictly less than. Returns True if no part of this Interval extends higher than or into other. :raises ValueError: if either self or other is a null Interval :param other: Interval or point :return: True or False :rtype: bool """ self._raise_if_null(other) return self.end <= getattr(other, 'begin', other) def le(self, other): """ Less than or overlaps. Returns True if no part of this Interval extends higher than other. :raises ValueError: if either self or other is a null Interval :param other: Interval or point :return: True or False :rtype: bool """ self._raise_if_null(other) return self.end <= getattr(other, 'end', other) def gt(self, other): """ Strictly greater than. Returns True if no part of this Interval extends lower than or into other. :raises ValueError: if either self or other is a null Interval :param other: Interval or point :return: True or False :rtype: bool """ self._raise_if_null(other) if hasattr(other, 'end'): return self.begin >= other.end else: return self.begin > other def ge(self, other): """ Greater than or overlaps. Returns True if no part of this Interval extends lower than other. :raises ValueError: if either self or other is a null Interval :param other: Interval or point :return: True or False :rtype: bool """ self._raise_if_null(other) return self.begin >= getattr(other, 'begin', other) def _get_fields(self): """ Used by str, unicode, repr and __reduce__. Returns only the fields necessary to reconstruct the Interval. :return: reconstruction info :rtype: tuple """ if self.data is not None: return self.begin, self.end, self.data else: return self.begin, self.end def __repr__(self): """ Executable string representation of this Interval. :return: string representation :rtype: str """ if isinstance(self.begin, Number): s_begin = str(self.begin) s_end = str(self.end) else: s_begin = repr(self.begin) s_end = repr(self.end) if self.data is None: return "Interval({0}, {1})".format(s_begin, s_end) else: return "Interval({0}, {1}, {2})".format(s_begin, s_end, repr(self.data)) __str__ = __repr__ def copy(self): """ Shallow copy. :return: copy of self :rtype: Interval """ return Interval(self.begin, self.end, self.data) def __reduce__(self): """ For pickle-ing. :return: pickle data :rtype: tuple """ return Interval, self._get_fields() intervaltree-2.1.0/intervaltree/intervaltree.py0000644000076500000240000011036112522724400023040 0ustar chaimleibstaff00000000000000""" intervaltree: A mutable, self-balancing interval tree for Python 2 and 3. Queries may be by point, by range overlap, or by range envelopment. Core logic. Copyright 2013-2015 Chaim-Leib Halbert Modifications Copyright 2014 Konstantin Tretyakov Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. """ from .interval import Interval from .node import Node from numbers import Number import collections from sortedcontainers import SortedDict from copy import copy from warnings import warn try: xrange # Python 2? except NameError: # pragma: no cover xrange = range # noinspection PyBroadException class IntervalTree(collections.MutableSet): """ A binary lookup tree of intervals. The intervals contained in the tree are represented using ``Interval(a, b, data)`` objects. Each such object represents a half-open interval ``[a, b)`` with optional data. Examples: --------- Initialize a blank tree:: >>> tree = IntervalTree() >>> tree IntervalTree() Initialize a tree from an iterable set of Intervals in O(n * log n):: >>> tree = IntervalTree([Interval(-10, 10), Interval(-20.0, -10.0)]) >>> tree IntervalTree([Interval(-20.0, -10.0), Interval(-10, 10)]) >>> len(tree) 2 Note that this is a set, i.e. repeated intervals are ignored. However, Intervals with different data fields are regarded as different:: >>> tree = IntervalTree([Interval(-10, 10), Interval(-10, 10), Interval(-10, 10, "x")]) >>> tree IntervalTree([Interval(-10, 10), Interval(-10, 10, 'x')]) >>> len(tree) 2 Insertions:: >>> tree = IntervalTree() >>> tree[0:1] = "data" >>> tree.add(Interval(10, 20)) >>> tree.addi(19.9, 20) >>> tree IntervalTree([Interval(0, 1, 'data'), Interval(10, 20), Interval(19.9, 20)]) >>> tree.update([Interval(19.9, 20.1), Interval(20.1, 30)]) >>> len(tree) 5 Inserting the same Interval twice does nothing:: >>> tree = IntervalTree() >>> tree[-10:20] = "arbitrary data" >>> tree[-10:20] = None # Note that this is also an insertion >>> tree IntervalTree([Interval(-10, 20), Interval(-10, 20, 'arbitrary data')]) >>> tree[-10:20] = None # This won't change anything >>> tree[-10:20] = "arbitrary data" # Neither will this >>> len(tree) 2 Deletions:: >>> tree = IntervalTree(Interval(b, e) for b, e in [(-10, 10), (-20, -10), (10, 20)]) >>> tree IntervalTree([Interval(-20, -10), Interval(-10, 10), Interval(10, 20)]) >>> tree.remove(Interval(-10, 10)) >>> tree IntervalTree([Interval(-20, -10), Interval(10, 20)]) >>> tree.remove(Interval(-10, 10)) Traceback (most recent call last): ... ValueError >>> tree.discard(Interval(-10, 10)) # Same as remove, but no exception on failure >>> tree IntervalTree([Interval(-20, -10), Interval(10, 20)]) Delete intervals, overlapping a given point:: >>> tree = IntervalTree([Interval(-1.1, 1.1), Interval(-0.5, 1.5), Interval(0.5, 1.7)]) >>> tree.remove_overlap(1.1) >>> tree IntervalTree([Interval(-1.1, 1.1)]) Delete intervals, overlapping an interval:: >>> tree = IntervalTree([Interval(-1.1, 1.1), Interval(-0.5, 1.5), Interval(0.5, 1.7)]) >>> tree.remove_overlap(0, 0.5) >>> tree IntervalTree([Interval(0.5, 1.7)]) >>> tree.remove_overlap(1.7, 1.8) >>> tree IntervalTree([Interval(0.5, 1.7)]) >>> tree.remove_overlap(1.6, 1.6) # Null interval does nothing >>> tree IntervalTree([Interval(0.5, 1.7)]) >>> tree.remove_overlap(1.6, 1.5) # Ditto >>> tree IntervalTree([Interval(0.5, 1.7)]) Delete intervals, enveloped in the range:: >>> tree = IntervalTree([Interval(-1.1, 1.1), Interval(-0.5, 1.5), Interval(0.5, 1.7)]) >>> tree.remove_envelop(-1.0, 1.5) >>> tree IntervalTree([Interval(-1.1, 1.1), Interval(0.5, 1.7)]) >>> tree.remove_envelop(-1.1, 1.5) >>> tree IntervalTree([Interval(0.5, 1.7)]) >>> tree.remove_envelop(0.5, 1.5) >>> tree IntervalTree([Interval(0.5, 1.7)]) >>> tree.remove_envelop(0.5, 1.7) >>> tree IntervalTree() Point/interval overlap queries:: >>> tree = IntervalTree([Interval(-1.1, 1.1), Interval(-0.5, 1.5), Interval(0.5, 1.7)]) >>> assert tree[-1.1] == set([Interval(-1.1, 1.1)]) >>> assert tree.search(1.1) == set([Interval(-0.5, 1.5), Interval(0.5, 1.7)]) # Same as tree[1.1] >>> assert tree[-0.5:0.5] == set([Interval(-0.5, 1.5), Interval(-1.1, 1.1)]) # Interval overlap query >>> assert tree.search(1.5, 1.5) == set() # Same as tree[1.5:1.5] >>> assert tree.search(1.5) == set([Interval(0.5, 1.7)]) # Same as tree[1.5] >>> assert tree.search(1.7, 1.8) == set() Envelop queries:: >>> assert tree.search(-0.5, 0.5, strict=True) == set() >>> assert tree.search(-0.4, 1.7, strict=True) == set([Interval(0.5, 1.7)]) Membership queries:: >>> tree = IntervalTree([Interval(-1.1, 1.1), Interval(-0.5, 1.5), Interval(0.5, 1.7)]) >>> Interval(-0.5, 0.5) in tree False >>> Interval(-1.1, 1.1) in tree True >>> Interval(-1.1, 1.1, "x") in tree False >>> tree.overlaps(-1.1) True >>> tree.overlaps(1.7) False >>> tree.overlaps(1.7, 1.8) False >>> tree.overlaps(-1.2, -1.1) False >>> tree.overlaps(-1.2, -1.0) True Sizing:: >>> tree = IntervalTree([Interval(-1.1, 1.1), Interval(-0.5, 1.5), Interval(0.5, 1.7)]) >>> len(tree) 3 >>> tree.is_empty() False >>> IntervalTree().is_empty() True >>> not tree False >>> not IntervalTree() True >>> print(tree.begin()) # using print() because of floats in Python 2.6 -1.1 >>> print(tree.end()) # ditto 1.7 Iteration:: >>> tree = IntervalTree([Interval(-11, 11), Interval(-5, 15), Interval(5, 17)]) >>> [iv.begin for iv in sorted(tree)] [-11, -5, 5] >>> assert tree.items() == set([Interval(-5, 15), Interval(-11, 11), Interval(5, 17)]) Copy- and typecasting, pickling:: >>> tree0 = IntervalTree([Interval(0, 1, "x"), Interval(1, 2, ["x"])]) >>> tree1 = IntervalTree(tree0) # Shares Interval objects >>> tree2 = tree0.copy() # Shallow copy (same as above, as Intervals are singletons) >>> import pickle >>> tree3 = pickle.loads(pickle.dumps(tree0)) # Deep copy >>> list(tree0[1])[0].data[0] = "y" # affects shallow copies, but not deep copies >>> tree0 IntervalTree([Interval(0, 1, 'x'), Interval(1, 2, ['y'])]) >>> tree1 IntervalTree([Interval(0, 1, 'x'), Interval(1, 2, ['y'])]) >>> tree2 IntervalTree([Interval(0, 1, 'x'), Interval(1, 2, ['y'])]) >>> tree3 IntervalTree([Interval(0, 1, 'x'), Interval(1, 2, ['x'])]) Equality testing:: >>> IntervalTree([Interval(0, 1)]) == IntervalTree([Interval(0, 1)]) True >>> IntervalTree([Interval(0, 1)]) == IntervalTree([Interval(0, 1, "x")]) False """ @classmethod def from_tuples(cls, tups): """ Create a new IntervalTree from an iterable of 2- or 3-tuples, where the tuple lists begin, end, and optionally data. """ ivs = [Interval(*t) for t in tups] return IntervalTree(ivs) def __init__(self, intervals=None): """ Set up a tree. If intervals is provided, add all the intervals to the tree. Completes in O(n*log n) time. """ intervals = set(intervals) if intervals is not None else set() for iv in intervals: if iv.is_null(): raise ValueError( "IntervalTree: Null Interval objects not allowed in IntervalTree:" " {0}".format(iv) ) self.all_intervals = intervals self.top_node = Node.from_intervals(self.all_intervals) self.boundary_table = SortedDict() for iv in self.all_intervals: self._add_boundaries(iv) def copy(self): """ Construct a new IntervalTree using shallow copies of the intervals in the source tree. Completes in O(n*log n) time. :rtype: IntervalTree """ return IntervalTree(iv.copy() for iv in self) def _add_boundaries(self, interval): """ Records the boundaries of the interval in the boundary table. """ begin = interval.begin end = interval.end if begin in self.boundary_table: self.boundary_table[begin] += 1 else: self.boundary_table[begin] = 1 if end in self.boundary_table: self.boundary_table[end] += 1 else: self.boundary_table[end] = 1 def _remove_boundaries(self, interval): """ Removes the boundaries of the interval from the boundary table. """ begin = interval.begin end = interval.end if self.boundary_table[begin] == 1: del self.boundary_table[begin] else: self.boundary_table[begin] -= 1 if self.boundary_table[end] == 1: del self.boundary_table[end] else: self.boundary_table[end] -= 1 def add(self, interval): """ Adds an interval to the tree, if not already present. Completes in O(log n) time. """ if interval in self: return if interval.is_null(): raise ValueError( "IntervalTree: Null Interval objects not allowed in IntervalTree:" " {0}".format(interval) ) if not self.top_node: self.top_node = Node.from_interval(interval) else: self.top_node = self.top_node.add(interval) self.all_intervals.add(interval) self._add_boundaries(interval) append = add def addi(self, begin, end, data=None): """ Shortcut for add(Interval(begin, end, data)). Completes in O(log n) time. """ return self.add(Interval(begin, end, data)) appendi = addi def update(self, intervals): """ Given an iterable of intervals, add them to the tree. Completes in O(m*log(n+m), where m = number of intervals to add. """ for iv in intervals: self.add(iv) def extend(self, intervals): """ Deprecated: Replaced by update(). """ warn("IntervalTree.extend() has been deprecated. Consider using update() instead", DeprecationWarning) self.update(intervals) def remove(self, interval): """ Removes an interval from the tree, if present. If not, raises ValueError. Completes in O(log n) time. """ #self.verify() if interval not in self: #print(self.all_intervals) raise ValueError self.top_node = self.top_node.remove(interval) self.all_intervals.remove(interval) self._remove_boundaries(interval) #self.verify() def removei(self, begin, end, data=None): """ Shortcut for remove(Interval(begin, end, data)). Completes in O(log n) time. """ return self.remove(Interval(begin, end, data)) def discard(self, interval): """ Removes an interval from the tree, if present. If not, does nothing. Completes in O(log n) time. """ if interval not in self: return self.all_intervals.discard(interval) self.top_node = self.top_node.discard(interval) self._remove_boundaries(interval) def discardi(self, begin, end, data=None): """ Shortcut for discard(Interval(begin, end, data)). Completes in O(log n) time. """ return self.discard(Interval(begin, end, data)) def difference(self, other): """ Returns a new tree, comprising all intervals in self but not in other. """ ivs = set() for iv in self: if iv not in other: ivs.add(iv) return IntervalTree(ivs) def difference_update(self, other): """ Removes all intervals in other from self. """ for iv in other: self.discard(iv) def union(self, other): """ Returns a new tree, comprising all intervals from self and other. """ return IntervalTree(set(self).union(other)) def intersection(self, other): """ Returns a new tree of all intervals common to both self and other. """ ivs = set() shorter, longer = sorted([self, other], key=len) for iv in shorter: if iv in longer: ivs.add(iv) return IntervalTree(ivs) def intersection_update(self, other): """ Removes intervals from self unless they also exist in other. """ for iv in self: if iv not in other: self.remove(iv) def symmetric_difference(self, other): """ Return a tree with elements only in self or other but not both. """ if not isinstance(other, set): other = set(other) me = set(self) ivs = me - other + (other - me) return IntervalTree(ivs) def symmetric_difference_update(self, other): """ Throws out all intervals except those only in self or other, not both. """ other = set(other) for iv in self: if iv in other: self.remove(iv) other.remove(iv) self.update(other) def remove_overlap(self, begin, end=None): """ Removes all intervals overlapping the given point or range. Completes in O((r+m)*log n) time, where: * n = size of the tree * m = number of matches * r = size of the search range (this is 1 for a point) """ hitlist = self.search(begin, end) for iv in hitlist: self.remove(iv) def remove_envelop(self, begin, end): """ Removes all intervals completely enveloped in the given range. Completes in O((r+m)*log n) time, where: * n = size of the tree * m = number of matches * r = size of the search range (this is 1 for a point) """ hitlist = self.search(begin, end, strict=True) for iv in hitlist: self.remove(iv) def chop(self, begin, end, datafunc=None): """ Like remove_envelop(), but trims back Intervals hanging into the chopped area so that nothing overlaps. """ insertions = set() begin_hits = [iv for iv in self[begin] if iv.begin < begin] end_hits = [iv for iv in self[end] if iv.end > end] if datafunc: for iv in begin_hits: insertions.add(Interval(iv.begin, begin, datafunc(iv, True))) for iv in end_hits: insertions.add(Interval(end, iv.end, datafunc(iv, False))) else: for iv in begin_hits: insertions.add(Interval(iv.begin, begin, iv.data)) for iv in end_hits: insertions.add(Interval(end, iv.end, iv.data)) self.remove_envelop(begin, end) self.difference_update(begin_hits) self.difference_update(end_hits) self.update(insertions) def slice(self, point, datafunc=None): """ Split Intervals that overlap point into two new Intervals. if specified, uses datafunc(interval, islower=True/False) to set the data field of the new Intervals. :param point: where to slice :param datafunc(interval, isupper): callable returning a new value for the interval's data field """ hitlist = set(iv for iv in self[point] if iv.begin < point) insertions = set() if datafunc: for iv in hitlist: insertions.add(Interval(iv.begin, point, datafunc(iv, True))) insertions.add(Interval(point, iv.end, datafunc(iv, False))) else: for iv in hitlist: insertions.add(Interval(iv.begin, point, iv.data)) insertions.add(Interval(point, iv.end, iv.data)) self.difference_update(hitlist) self.update(insertions) def clear(self): """ Empties the tree. Completes in O(1) tine. """ self.__init__() def find_nested(self): """ Returns a dictionary mapping parent intervals to sets of intervals overlapped by and contained in the parent. Completes in O(n^2) time. :rtype: dict of [Interval, set of Interval] """ result = {} def add_if_nested(): if parent.contains_interval(child): if parent not in result: result[parent] = set() result[parent].add(child) long_ivs = sorted(self.all_intervals, key=Interval.length, reverse=True) for i, parent in enumerate(long_ivs): for child in long_ivs[i + 1:]: add_if_nested() return result def overlaps(self, begin, end=None): """ Returns whether some interval in the tree overlaps the given point or range. Completes in O(r*log n) time, where r is the size of the search range. :rtype: bool """ if end is not None: return self.overlaps_range(begin, end) elif isinstance(begin, Number): return self.overlaps_point(begin) else: return self.overlaps_range(begin.begin, begin.end) def overlaps_point(self, p): """ Returns whether some interval in the tree overlaps p. Completes in O(log n) time. :rtype: bool """ if self.is_empty(): return False return bool(self.top_node.contains_point(p)) def overlaps_range(self, begin, end): """ Returns whether some interval in the tree overlaps the given range. Returns False if given a null interval over which to test. Completes in O(r*log n) time, where r is the range length and n is the table size. :rtype: bool """ if self.is_empty(): return False elif begin >= end: return False elif self.overlaps_point(begin): return True return any( self.overlaps_point(bound) for bound in self.boundary_table if begin < bound < end ) def split_overlaps(self): """ Finds all intervals with overlapping ranges and splits them along the range boundaries. Completes in worst-case O(n^2*log n) time (many interval boundaries are inside many intervals), best-case O(n*log n) time (small number of overlaps << n per interval). """ if not self: return if len(self.boundary_table) == 2: return bounds = sorted(self.boundary_table) # get bound locations new_ivs = set() for lbound, ubound in zip(bounds[:-1], bounds[1:]): for iv in self[lbound]: new_ivs.add(Interval(lbound, ubound, iv.data)) self.__init__(new_ivs) def merge_overlaps(self, data_reducer=None, data_initializer=None): """ Finds all intervals with overlapping ranges and merges them into a single interval. If provided, uses data_reducer and data_initializer with similar semantics to Python's built-in reduce(reducer_func[, initializer]), as follows: If data_reducer is set to a function, combines the data fields of the Intervals with current_reduced_data = data_reducer(current_reduced_data, new_data) If data_reducer is None, the merged Interval's data field will be set to None, ignoring all the data fields of the merged Intervals. On encountering the first Interval to merge, if data_initializer is None (default), uses the first Interval's data field as the first value for current_reduced_data. If data_initializer is not None, current_reduced_data is set to a shallow copy of data_initiazer created with copy.copy(data_initializer). Completes in O(n*logn). """ if not self: return sorted_intervals = sorted(self.all_intervals) # get sorted intervals merged = [] # use mutable object to allow new_series() to modify it current_reduced = [None] higher = None # iterating variable, which new_series() needs access to def new_series(): if data_initializer is None: current_reduced[0] = higher.data merged.append(higher) return else: # data_initializer is not None current_reduced[0] = copy(data_initializer) current_reduced[0] = data_reducer(current_reduced[0], higher.data) merged.append(Interval(higher.begin, higher.end, current_reduced[0])) for higher in sorted_intervals: if merged: # series already begun lower = merged[-1] if higher.begin <= lower.end: # should merge upper_bound = max(lower.end, higher.end) if data_reducer is not None: current_reduced[0] = data_reducer(current_reduced[0], higher.data) else: # annihilate the data, since we don't know how to merge it current_reduced[0] = None merged[-1] = Interval(lower.begin, upper_bound, current_reduced[0]) else: new_series() else: # not merged; is first of Intervals to merge new_series() self.__init__(merged) def merge_equals(self, data_reducer=None, data_initializer=None): """ Finds all intervals with equal ranges and merges them into a single interval. If provided, uses data_reducer and data_initializer with similar semantics to Python's built-in reduce(reducer_func[, initializer]), as follows: If data_reducer is set to a function, combines the data fields of the Intervals with current_reduced_data = data_reducer(current_reduced_data, new_data) If data_reducer is None, the merged Interval's data field will be set to None, ignoring all the data fields of the merged Intervals. On encountering the first Interval to merge, if data_initializer is None (default), uses the first Interval's data field as the first value for current_reduced_data. If data_initializer is not None, current_reduced_data is set to a shallow copy of data_initiazer created with copy.copy(data_initializer). Completes in O(n*logn). """ if not self: return sorted_intervals = sorted(self.all_intervals) # get sorted intervals merged = [] # use mutable object to allow new_series() to modify it current_reduced = [None] higher = None # iterating variable, which new_series() needs access to def new_series(): if data_initializer is None: current_reduced[0] = higher.data merged.append(higher) return else: # data_initializer is not None current_reduced[0] = copy(data_initializer) current_reduced[0] = data_reducer(current_reduced[0], higher.data) merged.append(Interval(higher.begin, higher.end, current_reduced[0])) for higher in sorted_intervals: if merged: # series already begun lower = merged[-1] if higher.range_matches(lower): # should merge upper_bound = max(lower.end, higher.end) if data_reducer is not None: current_reduced[0] = data_reducer(current_reduced[0], higher.data) else: # annihilate the data, since we don't know how to merge it current_reduced[0] = None merged[-1] = Interval(lower.begin, upper_bound, current_reduced[0]) else: new_series() else: # not merged; is first of Intervals to merge new_series() self.__init__(merged) def items(self): """ Constructs and returns a set of all intervals in the tree. Completes in O(n) time. :rtype: set of Interval """ return set(self.all_intervals) def is_empty(self): """ Returns whether the tree is empty. Completes in O(1) time. :rtype: bool """ return 0 == len(self) def search(self, begin, end=None, strict=False): """ Returns a set of all intervals overlapping the given range. Or, if strict is True, returns the set of all intervals fully contained in the range [begin, end]. Completes in O(m + k*log n) time, where: * n = size of the tree * m = number of matches * k = size of the search range (this is 1 for a point) :rtype: set of Interval """ root = self.top_node if not root: return set() if end is None: try: iv = begin return self.search(iv.begin, iv.end, strict=strict) except: return root.search_point(begin, set()) elif begin >= end: return set() else: result = root.search_point(begin, set()) boundary_table = self.boundary_table bound_begin = boundary_table.bisect_left(begin) bound_end = boundary_table.bisect_left(end) # exclude final end bound result.update(root.search_overlap( # slice notation is slightly slower boundary_table.iloc[index] for index in xrange(bound_begin, bound_end) )) # TODO: improve strict search to use node info instead of less-efficient filtering if strict: result = set( iv for iv in result if iv.begin >= begin and iv.end <= end ) return result def begin(self): """ Returns the lower bound of the first interval in the tree. Completes in O(n) time. """ if not self.boundary_table: return 0 return self.boundary_table.iloc[0] def end(self): """ Returns the upper bound of the last interval in the tree. Completes in O(n) time. """ if not self.boundary_table: return 0 return self.boundary_table.iloc[-1] def range(self): """ Returns a minimum-spanning Interval that encloses all the members of this IntervalTree. If the tree is empty, returns null Interval. :rtype: Interval """ return Interval(self.begin(), self.end()) def span(self): """ Returns the length of the minimum-spanning Interval that encloses all the members of this IntervalTree. If the tree is empty, return 0. """ if not self: return 0 return self.end() - self.begin() def print_structure(self, tostring=False): """ ## FOR DEBUGGING ONLY ## Pretty-prints the structure of the tree. If tostring is true, prints nothing and returns a string. :rtype: None or str """ if self.top_node: return self.top_node.print_structure(tostring=tostring) else: result = "" if not tostring: print(result) else: return result def verify(self): """ ## FOR DEBUGGING ONLY ## Checks the table to ensure that the invariants are held. """ if self.all_intervals: ## top_node.all_children() == self.all_intervals try: assert self.top_node.all_children() == self.all_intervals except AssertionError as e: print( 'Error: the tree and the membership set are out of sync!' ) tivs = set(self.top_node.all_children()) print('top_node.all_children() - all_intervals:') try: pprint except NameError: from pprint import pprint pprint(tivs - self.all_intervals) print('all_intervals - top_node.all_children():') pprint(self.all_intervals - tivs) raise e ## All members are Intervals for iv in self: assert isinstance(iv, Interval), ( "Error: Only Interval objects allowed in IntervalTree:" " {0}".format(iv) ) ## No null intervals for iv in self: assert not iv.is_null(), ( "Error: Null Interval objects not allowed in IntervalTree:" " {0}".format(iv) ) ## Reconstruct boundary_table bound_check = {} for iv in self: if iv.begin in bound_check: bound_check[iv.begin] += 1 else: bound_check[iv.begin] = 1 if iv.end in bound_check: bound_check[iv.end] += 1 else: bound_check[iv.end] = 1 ## Reconstructed boundary table (bound_check) ==? boundary_table assert set(self.boundary_table.keys()) == set(bound_check.keys()),\ 'Error: boundary_table is out of sync with ' \ 'the intervals in the tree!' # For efficiency reasons this should be iteritems in Py2, but we # don't care much for efficiency in debug methods anyway. for key, val in self.boundary_table.items(): assert bound_check[key] == val, \ 'Error: boundary_table[{0}] should be {1},' \ ' but is {2}!'.format( key, bound_check[key], val) ## Internal tree structure self.top_node.verify(set()) else: ## Verify empty tree assert not self.boundary_table, \ "Error: boundary table should be empty!" assert self.top_node is None, \ "Error: top_node isn't None!" def score(self, full_report=False): """ Returns a number between 0 and 1, indicating how suboptimal the tree is. The lower, the better. Roughly, this number represents the fraction of flawed Intervals in the tree. :rtype: float """ if len(self) <= 2: return 0.0 n = len(self) m = self.top_node.count_nodes() def s_center_score(): """ Returns a normalized score, indicating roughly how many times intervals share s_center with other intervals. Output is full-scale from 0 to 1. :rtype: float """ raw = n - m maximum = n - 1 return raw / float(maximum) report = { "depth": self.top_node.depth_score(n, m), "s_center": s_center_score(), } cumulative = max(report.values()) report["_cumulative"] = cumulative if full_report: return report return cumulative def __getitem__(self, index): """ Returns a set of all intervals overlapping the given index or slice. Completes in O(k * log(n) + m) time, where: * n = size of the tree * m = number of matches * k = size of the search range (this is 1 for a point) :rtype: set of Interval """ try: start, stop = index.start, index.stop if start is None: start = self.begin() if stop is None: return set(self) if stop is None: stop = self.end() return self.search(start, stop) except AttributeError: return self.search(index) def __setitem__(self, index, value): """ Adds a new interval to the tree. A shortcut for add(Interval(index.start, index.stop, value)). If an identical Interval object with equal range and data already exists, does nothing. Completes in O(log n) time. """ self.addi(index.start, index.stop, value) def __delitem__(self, point): """ Delete all items overlapping point. """ self.remove_overlap(point) def __contains__(self, item): """ Returns whether item exists as an Interval in the tree. This method only returns True for exact matches; for overlaps, see the overlaps() method. Completes in O(1) time. :rtype: bool """ # Removed point-checking code; it might trick the user into # thinking that this is O(1), which point-checking isn't. #if isinstance(item, Interval): return item in self.all_intervals #else: # return self.contains_point(item) def containsi(self, begin, end, data=None): """ Shortcut for (Interval(begin, end, data) in tree). Completes in O(1) time. :rtype: bool """ return Interval(begin, end, data) in self def __iter__(self): """ Returns an iterator over all the intervals in the tree. Completes in O(1) time. :rtype: collections.Iterable[Interval] """ return self.all_intervals.__iter__() iter = __iter__ def __len__(self): """ Returns how many intervals are in the tree. Completes in O(1) time. :rtype: int """ return len(self.all_intervals) def __eq__(self, other): """ Whether two IntervalTrees are equal. Completes in O(n) time if sizes are equal; O(1) time otherwise. :rtype: bool """ return ( isinstance(other, IntervalTree) and self.all_intervals == other.all_intervals ) def __repr__(self): """ :rtype: str """ ivs = sorted(self) if not ivs: return "IntervalTree()" else: return "IntervalTree({0})".format(ivs) __str__ = __repr__ def __reduce__(self): """ For pickle-ing. :rtype: tuple """ return IntervalTree, (sorted(self.all_intervals),) intervaltree-2.1.0/intervaltree/node.py0000644000076500000240000005034012456654477021307 0ustar chaimleibstaff00000000000000""" intervaltree: A mutable, self-balancing interval tree for Python 2 and 3. Queries may be by point, by range overlap, or by range envelopment. Core logic: internal tree nodes. Copyright 2013-2015 Chaim-Leib Halbert Modifications Copyright 2014 Konstantin Tretyakov Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. """ from operator import attrgetter from math import floor, log def l2(num): """ log base 2 :rtype real """ return log(num, 2) class Node(object): def __init__(self, x_center=None, s_center=set(), left_node=None, right_node=None): self.x_center = x_center self.s_center = set(s_center) self.left_node = left_node self.right_node = right_node self.depth = 0 # will be set when rotated self.balance = 0 # ditto self.rotate() @classmethod def from_interval(cls, interval): """ :rtype : Node """ center = interval.begin return Node(center, [interval]) @classmethod def from_intervals(cls, intervals): """ :rtype : Node """ if not intervals: return None node = Node() node = node.init_from_sorted(sorted(intervals)) return node def init_from_sorted(self, intervals): if not intervals: return None center_iv = intervals[len(intervals) // 2] self.x_center = center_iv.begin self.s_center = set() s_left = [] s_right = [] for k in intervals: if k.end <= self.x_center: s_left.append(k) elif k.begin > self.x_center: s_right.append(k) else: self.s_center.add(k) self.left_node = Node.from_intervals(s_left) self.right_node = Node.from_intervals(s_right) return self.rotate() def center_hit(self, interval): """Returns whether interval overlaps self.x_center.""" return interval.contains_point(self.x_center) def hit_branch(self, interval): """ Assuming not center_hit(interval), return which branch (left=0, right=1) interval is in. """ return interval.begin > self.x_center def refresh_balance(self): """ Recalculate self.balance and self.depth based on child node values. """ left_depth = self.left_node.depth if self.left_node else 0 right_depth = self.right_node.depth if self.right_node else 0 self.depth = 1 + max(left_depth, right_depth) self.balance = right_depth - left_depth def compute_depth(self): """ Recursively computes true depth of the subtree. Should only be needed for debugging. Unless something is wrong, the depth field should reflect the correct depth of the subtree. """ left_depth = self.left_node.compute_depth() if self.left_node else 0 right_depth = self.right_node.compute_depth() if self.right_node else 0 return 1 + max(left_depth, right_depth) def rotate(self): """ Does rotating, if necessary, to balance this node, and returns the new top node. """ self.refresh_balance() if abs(self.balance) < 2: return self # balance > 0 is the heavy side my_heavy = self.balance > 0 child_heavy = self[my_heavy].balance > 0 if my_heavy == child_heavy or self[my_heavy].balance == 0: ## Heavy sides same # self save # save -> 1 self # 1 # ## Heavy side balanced # self save save # save -> 1 self -> 1 self.rot() # 1 2 2 return self.srotate() else: return self.drotate() def srotate(self): """Single rotation. Assumes that balance is +-2.""" # self save save # save 3 -> 1 self -> 1 self.rot() # 1 2 2 3 # # self save save # 3 save -> self 1 -> self.rot() 1 # 2 1 3 2 #assert(self.balance != 0) heavy = self.balance > 0 light = not heavy save = self[heavy] #print("srotate: bal={},{}".format(self.balance, save.balance)) #self.print_structure() self[heavy] = save[light] # 2 #assert(save[light]) save[light] = self.rotate() # Needed to ensure the 2 and 3 are balanced under new subnode # Some intervals may overlap both self.x_center and save.x_center # Promote those to the new tip of the tree promotees = [iv for iv in save[light].s_center if save.center_hit(iv)] if promotees: for iv in promotees: save[light] = save[light].remove(iv) # may trigger pruning # TODO: Use Node.add() here, to simplify future balancing improvements. # For now, this is the same as augmenting save.s_center, but that may # change. save.s_center.update(promotees) save.refresh_balance() return save def drotate(self): # First rotation my_heavy = self.balance > 0 self[my_heavy] = self[my_heavy].srotate() self.refresh_balance() # Second rotation result = self.srotate() return result def add(self, interval): """ Returns self after adding the interval and balancing. """ if self.center_hit(interval): self.s_center.add(interval) return self else: direction = self.hit_branch(interval) if not self[direction]: self[direction] = Node.from_interval(interval) self.refresh_balance() return self else: self[direction] = self[direction].add(interval) return self.rotate() def remove(self, interval): """ Returns self after removing the interval and balancing. If interval is not present, raise ValueError. """ # since this is a list, called methods can set this to [1], # making it true done = [] return self.remove_interval_helper(interval, done, should_raise_error=True) def discard(self, interval): """ Returns self after removing interval and balancing. If interval is not present, do nothing. """ done = [] return self.remove_interval_helper(interval, done, should_raise_error=False) def remove_interval_helper(self, interval, done, should_raise_error): """ Returns self after removing interval and balancing. If interval doesn't exist, raise ValueError. This method may set done to [1] to tell all callers that rebalancing has completed. See Eternally Confuzzled's jsw_remove_r function (lines 1-32) in his AVL tree article for reference. """ #trace = interval.begin == 347 and interval.end == 353 #if trace: print('\nRemoving from {} interval {}'.format( # self.x_center, interval)) if self.center_hit(interval): #if trace: print('Hit at {}'.format(self.x_center)) if not should_raise_error and interval not in self.s_center: done.append(1) #if trace: print('Doing nothing.') return self try: # raises error if interval not present - this is # desired. self.s_center.remove(interval) except: self.print_structure() raise KeyError(interval) if self.s_center: # keep this node done.append(1) # no rebalancing necessary #if trace: print('Removed, no rebalancing.') return self # If we reach here, no intervals are left in self.s_center. # So, prune self. return self.prune() else: # interval not in s_center direction = self.hit_branch(interval) if not self[direction]: if should_raise_error: raise ValueError done.append(1) return self #if trace: # print('Descending to {} branch'.format( # ['left', 'right'][direction] # )) self[direction] = self[direction].remove_interval_helper(interval, done, should_raise_error) # Clean up if not done: #if trace: # print('Rotating {}'.format(self.x_center)) # self.print_structure() return self.rotate() return self def search_overlap(self, point_list): """ Returns all intervals that overlap the point_list. """ result = set() for j in point_list: self.search_point(j, result) return result def search_point(self, point, result): """ Returns all intervals that contain point. """ for k in self.s_center: if k.begin <= point < k.end: result.add(k) if point < self.x_center and self[0]: return self[0].search_point(point, result) elif point > self.x_center and self[1]: return self[1].search_point(point, result) return result def prune(self): """ On a subtree where the root node's s_center is empty, return a new subtree with no empty s_centers. """ if not self[0] or not self[1]: # if I have an empty branch direction = not self[0] # graft the other branch here #if trace: # print('Grafting {} branch'.format( # 'right' if direction else 'left')) result = self[direction] #if result: result.verify() return result else: # Replace the root node with the greatest predecessor. heir, self[0] = self[0].pop_greatest_child() #if trace: # print('Replacing {} with {}.'.format( # self.x_center, heir.x_center # )) # print('Removed greatest predecessor:') # self.print_structure() #if self[0]: self[0].verify() #if self[1]: self[1].verify() # Set up the heir as the new root node (heir[0], heir[1]) = (self[0], self[1]) #if trace: print('Setting up the heir:') #if trace: heir.print_structure() # popping the predecessor may have unbalanced this node; # fix it heir.refresh_balance() heir = heir.rotate() #heir.verify() #if trace: print('Rotated the heir:') #if trace: heir.print_structure() return heir def pop_greatest_child(self): """ Used when pruning a node with both a left and a right branch. Returns (greatest_child, node), where: * greatest_child is a new node to replace the removed node. * node is the subtree after: - removing the greatest child - balancing - moving overlapping nodes into greatest_child Assumes that self.s_center is not empty. See Eternally Confuzzled's jsw_remove_r function (lines 34-54) in his AVL tree article for reference. """ #print('Popping from {}'.format(self.x_center)) if not self.right_node: # This node is the greatest child. # To reduce the chances of an overlap with a parent, return # a child node containing the smallest possible number of # intervals, as close as possible to the maximum bound. ivs = sorted(self.s_center, key=attrgetter('end', 'begin')) max_iv = ivs.pop() new_x_center = self.x_center while ivs: next_max_iv = ivs.pop() if next_max_iv.end == max_iv.end: continue new_x_center = max(new_x_center, next_max_iv.end) def get_new_s_center(): for iv in self.s_center: if iv.contains_point(new_x_center): yield iv # Create a new node with the largest x_center possible. child = Node.from_intervals(get_new_s_center()) # [iv for iv in self.s_center if iv.contains_point(child_x_center)] # ) child.x_center = new_x_center self.s_center -= child.s_center #print('Pop hit! Returning child = {}'.format( # child.print_structure(tostring=True) # )) #assert not child[0] #assert not child[1] if self.s_center: #print(' and returning newnode = {}'.format( self )) #self.verify() return child, self else: #print(' and returning newnode = {}'.format( self[0] )) #if self[0]: self[0].verify() return child, self[0] # Rotate left child up else: #print('Pop descent to {}'.format(self[1].x_center)) (greatest_child, self[1]) = self[1].pop_greatest_child() self.refresh_balance() new_self = self.rotate() # Move any overlaps into greatest_child for iv in set(new_self.s_center): if iv.contains_point(greatest_child.x_center): new_self.s_center.remove(iv) greatest_child.add(iv) #print('Pop Returning child = {}'.format( # greatest_child.print_structure(tostring=True) # )) if new_self.s_center: #print('and returning newnode = {}'.format( # new_self.print_structure(tostring=True) # )) #new_self.verify() return greatest_child, new_self else: new_self = new_self.prune() #print('and returning prune = {}'.format( # new_self.print_structure(tostring=True) # )) #if new_self: new_self.verify() return greatest_child, new_self def contains_point(self, p): """ Returns whether this node or a child overlaps p. """ for iv in self.s_center: if iv.contains_point(p): return True branch = self[p > self.x_center] return branch and branch.contains_point(p) def all_children(self): return self.all_children_helper(set()) def all_children_helper(self, result): result.update(self.s_center) if self[0]: self[0].all_children_helper(result) if self[1]: self[1].all_children_helper(result) return result def verify(self, parents=set()): """ ## DEBUG ONLY ## Recursively ensures that the invariants of an interval subtree hold. """ assert(isinstance(self.s_center, set)) bal = self.balance assert abs(bal) < 2, \ "Error: Rotation should have happened, but didn't! \n{}".format( self.print_structure(tostring=True) ) self.refresh_balance() assert bal == self.balance, \ "Error: self.balance not set correctly! \n{}".format( self.print_structure(tostring=True) ) assert self.s_center, \ "Error: s_center is empty! \n{}".format( self.print_structure(tostring=True) ) for iv in self.s_center: assert hasattr(iv, 'begin') assert hasattr(iv, 'end') assert iv.begin < iv.end assert iv.overlaps(self.x_center) for parent in sorted(parents): assert not iv.contains_point(parent), \ "Error: Overlaps ancestor ({})! \n{}\n\n{}".format( parent, iv, self.print_structure(tostring=True) ) if self[0]: assert self[0].x_center < self.x_center, \ "Error: Out-of-order left child! {}".format(self.x_center) self[0].verify(parents.union([self.x_center])) if self[1]: assert self[1].x_center > self.x_center, \ "Error: Out-of-order right child! {}".format(self.x_center) self[1].verify(parents.union([self.x_center])) def __getitem__(self, index): """ Returns the left child if input is equivalent to False, or the right side otherwise. """ if index: return self.right_node else: return self.left_node def __setitem__(self, key, value): """Sets the left (0) or right (1) child.""" if key: self.right_node = value else: self.left_node = value def __str__(self): """ Shows info about this node. Since Nodes are internal data structures not revealed to the user, I'm not bothering to make this copy-paste-executable as a constructor. """ return "Node<{0}, depth={1}, balance={2}>".format( self.x_center, self.depth, self.balance ) #fieldcount = 'c_count,has_l,has_r = <{}, {}, {}>'.format( # len(self.s_center), # bool(self.left_node), # bool(self.right_node) #) #fields = [self.x_center, self.balance, fieldcount] #return "Node({}, b={}, {})".format(*fields) def count_nodes(self): """ Count the number of Nodes in this subtree. :rtype: int """ count = 1 if self.left_node: count += self.left_node.count_nodes() if self.right_node: count += self.right_node.count_nodes() return count def depth_score(self, n, m): """ Calculates flaws in balancing the tree. :param n: size of tree :param m: number of Nodes in tree :rtype: real """ if n == 0: return 0.0 # dopt is the optimal maximum depth of the tree dopt = 1 + int(floor(l2(m))) f = 1 / float(1 + n - dopt) return f * self.depth_score_helper(1, dopt) def depth_score_helper(self, d, dopt): """ Gets a weighted count of the number of Intervals deeper than dopt. :param d: current depth, starting from 0 :param dopt: optimal maximum depth of a leaf Node :rtype: real """ # di is how may levels deeper than optimal d is di = d - dopt if di > 0: count = di * len(self.s_center) else: count = 0 if self.right_node: count += self.right_node.depth_score_helper(d + 1, dopt) if self.left_node: count += self.left_node.depth_score_helper(d + 1, dopt) return count def print_structure(self, indent=0, tostring=False): """ For debugging. """ nl = '\n' sp = indent * ' ' rlist = [str(self) + nl] if self.s_center: for iv in sorted(self.s_center): rlist.append(sp + ' ' + repr(iv) + nl) if self.left_node: rlist.append(sp + '<: ') # no CR rlist.append(self.left_node.print_structure(indent + 1, True)) if self.right_node: rlist.append(sp + '>: ') # no CR rlist.append(self.right_node.print_structure(indent + 1, True)) result = ''.join(rlist) if tostring: return result else: print(result) intervaltree-2.1.0/intervaltree.egg-info/0000755000076500000240000000000012522725110021451 5ustar chaimleibstaff00000000000000intervaltree-2.1.0/intervaltree.egg-info/dependency_links.txt0000644000076500000240000000000112522725110025517 0ustar chaimleibstaff00000000000000 intervaltree-2.1.0/intervaltree.egg-info/PKG-INFO0000644000076500000240000005432512522725110022557 0ustar chaimleibstaff00000000000000Metadata-Version: 1.1 Name: intervaltree Version: 2.1.0 Summary: Editable interval tree data structure for Python 2 and 3 Home-page: https://github.com/chaimleib/intervaltree Author: Chaim-Leib Halbert, Konstantin Tretyakov Author-email: chaim.leib.halbert@gmail.com License: Apache License, Version 2.0 Download-URL: https://github.com/chaimleib/intervaltree/tarball/2.1.0 Description: .. This file is automatically generated by setup.py from README.md and CHANGELOG.md. intervaltree ============ A mutable, self-balancing interval tree for Python 2 and 3. Queries may be by point, by range overlap, or by range envelopment. This library was designed to allow tagging text and time intervals, where the intervals include the lower bound but not the upper bound. Installing ---------- .. code:: sh pip install intervaltree Features -------- - Supports Python 2.6+ and Python 3.2+ - Initializing - blank ``tree = IntervalTree()`` - from an iterable of ``Interval`` objects (``tree = IntervalTree(intervals)``) - from an iterable of tuples (``tree = IntervalTree.from_tuples(interval_tuples)``) - Insertions - ``tree[begin:end] = data`` - ``tree.add(interval)`` - ``tree.addi(begin, end, data)`` - Deletions - ``tree.remove(interval)`` (raises ``ValueError`` if not present) - ``tree.discard(interval)`` (quiet if not present) - ``tree.removei(begin, end, data)`` (short for ``tree.remove(Interval(begin, end, data))``) - ``tree.discardi(begin, end, data)`` (short for ``tree.discard(Interval(begin, end, data))``) - ``tree.remove_overlap(point)`` - ``tree.remove_overlap(begin, end)`` (removes all overlapping the range) - ``tree.remove_envelop(begin, end)`` (removes all enveloped in the range) - Overlap queries - ``tree[point]`` - ``tree[begin:end]`` - ``tree.search(point)`` - ``tree.search(begin, end)`` - Envelop queries - ``tree.search(begin, end, strict=True)`` - Membership queries - ``interval_obj in tree`` (this is fastest, O(1)) - ``tree.containsi(begin, end, data)`` - ``tree.overlaps(point)`` - ``tree.overlaps(begin, end)`` - Iterable - ``for interval_obj in tree:`` - ``tree.items()`` - Sizing - ``len(tree)`` - ``tree.is_empty()`` - ``not tree`` - ``tree.begin()`` (the ``begin`` coordinate of the leftmost interval) - ``tree.end()`` (the ``end`` coordinate of the rightmost interval) - Set-like operations - union - ``result_tree = tree.union(iterable)`` - ``result_tree = tree1 | tree2`` - ``tree.update(iterable)`` - ``tree |= other_tree`` - difference - ``result_tree = tree.difference(iterable)`` - ``result_tree = tree1 - tree2`` - ``tree.difference_update(iterable)`` - ``tree -= other_tree`` - intersection - ``result_tree = tree.intersection(iterable)`` - ``result_tree = tree1 & tree2`` - ``tree.intersection_update(iterable)`` - ``tree &= other_tree`` - symmetric difference - ``result_tree = tree.symmetric_difference(iterable)`` - ``result_tree = tree1 ^ tree2`` - ``tree.symmetric_difference_update(iterable)`` - ``tree ^= other_tree`` - comparison - ``tree1.issubset(tree2)`` or ``tree1 <= tree2`` - ``tree1 <= tree2`` - ``tree1.issuperset(tree2)`` or ``tree1 > tree2`` - ``tree1 >= tree2`` - ``tree1 == tree2`` - Restructuring - ``chop(begin, end)`` (slice intervals and remove everything between ``begin`` and ``end``) - ``slice(point)`` (slice intervals at ``point``) - ``split_overlaps()`` (slice at all interval boundaries) - Copying and typecasting - ``IntervalTree(tree)`` (``Interval`` objects are same as those in tree) - ``tree.copy()`` (``Interval`` objects are shallow copies of those in tree) - ``set(tree)`` (can later be fed into ``IntervalTree()``) - ``list(tree)`` (ditto) - Pickle-friendly - Automatic AVL balancing Examples -------- - Getting started .. code:: python >>> from intervaltree import Interval, IntervalTree >>> t = IntervalTree() >>> t IntervalTree() - Adding intervals - any object works! .. code:: python >>> t[1:2] = "1-2" >>> t[4:7] = (4, 7) >>> t[5:9] = {5: 9} - Query by point | The result of a query is a ``set`` object, so if ordering is important, | you must sort it first. .. code:: python >>> sorted(t[6]) [Interval(4, 7, (4, 7)), Interval(5, 9, {5: 9})] >>> sorted(t[6])[0] Interval(4, 7, (4, 7)) - Query by range Note that ranges are inclusive of the lower limit, but non-inclusive of the upper limit. So: .. code:: python >>> sorted(t[2:4]) [] But: .. code:: python >>> sorted(t[1:5]) [Interval(1, 2, '1-2'), Interval(4, 7, (4, 7))] - Accessing an ``Interval`` object .. code:: python >>> iv = Interval(4, 7, (4, 7)) >>> iv.begin 4 >>> iv.end 7 >>> iv.data (4, 7) >>> begin, end, data = iv >>> begin 4 >>> end 7 >>> data (4, 7) - Constructing from lists of intervals We could have made a similar tree this way: .. code:: python >>> ivs = [(1, 2), (4, 7), (5, 9)] >>> t = IntervalTree( ... Interval(begin, end, "%d-%d" % (begin, end)) for begin, end in ivs ... ) Or, if we don't need the data fields: .. code:: python >>> t2 = IntervalTree(Interval(*iv) for iv in ivs) - Removing intervals .. code:: python >>> t.remove( Interval(1, 2, "1-2") ) >>> sorted(t) [Interval(4, 7, '4-7'), Interval(5, 9, '5-9')] >>> t.remove( Interval(500, 1000, "Doesn't exist")) # raises ValueError Traceback (most recent call last): ValueError >>> t.discard(Interval(500, 1000, "Doesn't exist")) # quietly does nothing >>> del t[5] # same as t.remove_overlap(5) >>> t IntervalTree() We could also empty a tree entirely: .. code:: python >>> t2.clear() >>> t2 IntervalTree() Or remove intervals that overlap a range: .. code:: python >>> t = IntervalTree([ ... Interval(0, 10), ... Interval(10, 20), ... Interval(20, 30), ... Interval(30, 40)]) >>> t.remove_overlap(25, 35) >>> sorted(t) [Interval(0, 10), Interval(10, 20)] We can also remove only those intervals completely enveloped in a range: .. code:: python >>> t.remove_envelop(5, 20) >>> sorted(t) [Interval(0, 10)] - Chopping We could also chop out parts of the tree: .. code:: python >>> t = IntervalTree([Interval(0, 10)]) >>> t.chop(3, 7) >>> sorted(t) [Interval(0, 3), Interval(7, 10)] To modify the new intervals' data fields based on which side of the interval is being chopped: .. code:: python >>> def datafunc(iv, islower): ... oldlimit = iv[islower] ... return "oldlimit: {0}, islower: {1}".format(oldlimit, islower) >>> t = IntervalTree([Interval(0, 10)]) >>> t.chop(3, 7, datafunc) >>> sorted(t)[0] Interval(0, 3, 'oldlimit: 10, islower: True') >>> sorted(t)[1] Interval(7, 10, 'oldlimit: 0, islower: False') - Slicing You can also slice intervals in the tree without removing them: .. code:: python >>> t = IntervalTree([Interval(0, 10), Interval(5, 15)]) >>> t.slice(3) >>> sorted(t) [Interval(0, 3), Interval(3, 10), Interval(5, 15)] You can also set the data fields, for example, re-using ``datafunc()`` from above: .. code:: python >>> t = IntervalTree([Interval(5, 15)]) >>> t.slice(10, datafunc) >>> sorted(t)[0] Interval(5, 10, 'oldlimit: 15, islower: True') >>> sorted(t)[1] Interval(10, 15, 'oldlimit: 5, islower: False') Future improvements ------------------- See the issue tracker on GitHub. Based on -------- - Eternally Confuzzled's AVL tree - Wikipedia's Interval Tree - Heavily modified from Tyler Kahn's Interval Tree implementation in Python (GitHub project) - Incorporates contributions from: - konstantint/Konstantin Tretyakov of the University of Tartu (Estonia) - siniG/Avi Gabay - lmcarril/Luis M. Carril of the Karlsruhe Institute for Technology (Germany) Copyright --------- - Chaim-Leib Halbert, 2013-2015 - Modifications, Konstantin Tretyakov, 2014 Licensed under the Apache License, version 2.0. The source code for this project is at https://github.com/chaimleib/intervaltree Change log ========== Version 2.1.0 ------------- - Added: - ``merge_overlaps()`` method and tests - ``merge_equals()`` method and tests - ``range()`` method - ``span()`` method, for returning the difference between ``end()`` and ``begin()`` - Fixes: - Development version numbering is changing to be compliant with PEP440. Version numbering now contains major, minor and micro release numbers, plus the number of builds following the stable release version, e.g. 2.0.4b34 - Speed improvement: ``begin()`` and ``end()`` methods used iterative ``min()`` and ``max()`` builtins instead of the more efficient ``iloc`` member available to ``SortedDict`` - ``overlaps()`` method used to return ``True`` even if provided null test interval - Maintainers: - Added coverage test (``make coverage``) with html report (``htmlcov/index.html``) - Tests run slightly faster Version 2.0.4 ------------- - Fix: Issue #27: README incorrectly showed using a comma instead of a colon when querying the ``IntervalTree``: it showed ``tree[begin, end]`` instead of ``tree[begin:end]`` Version 2.0.3 ------------- - Fix: README showed using + operator for setlike union instead of the correct \| operator - Removed tests from release package to speed up installation; to get the tests, download from GitHub Version 2.0.2 ------------- - Fix: Issue #20: performance enhancement for large trees. ``IntervalTree.search()`` made a copy of the entire ``boundary_table`` resulting in linear search time. The ``sortedcollections`` package is now the sole install dependency Version 2.0.1 ------------- - Fix: Issue #26: failed to prune empty ``Node`` after a rotation promoted contents of ``s_center`` Version 2.0.0 ------------- - ``IntervalTree`` now supports the full ``collections.MutableSet`` API - Added: - ``__delitem__`` to ``IntervalTree`` - ``Interval`` comparison methods ``lt()``, ``gt()``, ``le()`` and ``ge()`` to ``Interval``, as an alternative to the comparison operators, which are designed for sorting - ``IntervalTree.from_tuples(iterable)`` - ``IntervalTree.clear()`` - ``IntervalTree.difference(iterable)`` - ``IntervalTree.difference_update(iterable)`` - ``IntervalTree.union(iterable)`` - ``IntervalTree.intersection(iterable)`` - ``IntervalTree.intersection_update(iterable)`` - ``IntervalTree.symmetric_difference(iterable)`` - ``IntervalTree.symmetric_difference_update(iterable)`` - ``IntervalTree.chop(a, b)`` - ``IntervalTree.slice(point)`` - Deprecated ``IntervalTree.extend()`` -- use ``update()`` instead - Internal improvements: - More verbose tests with progress bars - More tests for comparison and sorting behavior - Code in the README is included in the unit tests - Fixes - BACKWARD INCOMPATIBLE: On ranged queries where ``begin >= end``, the query operated on the overlaps of ``begin``. This behavior was documented as expected in 1.x; it is now changed to be more consistent with the definition of ``Interval``\ s, which are half-open. - Issue #25: pruning empty Nodes with staggered descendants could result in invalid trees - Sorting ``Interval``\ s and numbers in the same list gathered all the numbers at the beginning and the ``Interval``\ s at the end - ``IntervalTree.overlaps()`` and friends returned ``None`` instead of ``False`` - Maintainers: ``make install-testpypi`` failed because the ``pip`` was missing a ``--pre`` flag Version 1.1.1 ------------- - Removed requirement for pyandoc in order to run functionality tests. Version 1.1.0 ------------- - Added ability to use ``Interval.distance_to()`` with points, not just ``Intervals`` - Added documentation on return types to ``IntervalTree`` and ``Interval`` - ``Interval.__cmp__()`` works with points too - Fix: ``IntervalTree.score()`` returned maximum score of 0.5 instead of 1.0. Now returns max of subscores instead of avg - Internal improvements: - Development version numbering scheme, based on ``git describe`` the "building towards" release is appended after a hyphen, eg. 1.0.2-37-g2da2ef0-1.10. The previous tagged release is 1.0.2, and there have been 37 commits since then, current tag is g2da2ef0, and we are getting ready for a 1.1.0 release - Optimality tests added - ``Interval`` overlap tests for ranges, ``Interval``\ s and points added Version 1.0.2 ------------- | -Bug fixes: | - ``Node.depth_score_helper()`` raised ``AttributeError`` | - README formatting Version 1.0.1 ------------- - Fix: pip install failure because of failure to generate README.rst Version 1.0.0 ------------- - Renamed from PyIntervalTree to intervaltree - Speed improvements for adding and removing Intervals (~70% faster than 0.4) - Bug fixes: - BACKWARD INCOMPATIBLE: ``len()`` of an ``Interval`` is always 3, reverting to default behavior for ``namedtuples``. In Python 3, ``len`` returning a non-integer raises an exception. Instead, use ``Interval.length()``, which returns 0 for null intervals and ``end - begin`` otherwise. Also, if the ``len() === 0``, then ``not iv`` is ``True``. - When inserting an ``Interval`` via ``__setitem__`` and improper parameters given, all errors were transformed to ``IndexError`` - ``split_overlaps`` did not update the ``boundary_table`` counts - Internal improvements: - More robust local testing tools - Long series of interdependent tests have been separated into sections Version 0.4 ----------- - Faster balancing (~80% faster) - Bug fixes: - Double rotations were performed in place of a single rotation when presented an unbalanced Node with a balanced child. - During single rotation, kept referencing an unrotated Node instead of the new, rotated one Version 0.3.3 ------------- - Made IntervalTree crash if inited with a null Interval (end <= begin) - IntervalTree raises ValueError instead of AssertionError when a null Interval is inserted Version 0.3.2 ------------- - Support for Python 3.2+ and 2.6+ - Changed license from LGPL to more permissive Apache license - Merged changes from https://github.com/konstantint/PyIntervalTree to https://github.com/chaimleib/PyIntervalTree - Interval now inherits from a namedtuple. Benefits: should be faster. Drawbacks: slight behavioural change (Intervals not mutable anymore). - Added float tests - Use setup.py for tests - Automatic testing via travis-ci - Removed dependency on six - Interval improvements: - Intervals without data have a cleaner string representation - Intervals without data are pickled more compactly - Better hashing - Intervals are ordered by begin, then end, then by data. If data is not orderable, sorts by type(data) - Bug fixes: - Fixed crash when querying empty tree - Fixed missing close parenthesis in examples - Made IntervalTree crash earlier if a null Interval is added - Internals: - New test directory - Nicer display of data structures for debugging, using custom test/pprint.py (Python 2.6, 2.7) - More sensitive exception handling - Local script to test in all supported versions of Python - Added IntervalTree.score() to measure how optimally a tree is structured Version 0.2.3 ------------- - Slight changes for inclusion in PyPI. - Some documentation changes - Added tests - Bug fix: interval addition via [] was broken in Python 2.7 (see http://bugs.python.org/issue21785) - Added intervaltree.bio subpackage, adding some utilities for use in bioinformatics Version 0.2.2b -------------- - Forked from https://github.com/MusashiAharon/PyIntervalTree Keywords: interval-tree data-structure intervals tree Platform: UNKNOWN Classifier: Development Status :: 4 - Beta Classifier: Intended Audience :: Developers Classifier: Intended Audience :: Information Technology Classifier: Intended Audience :: Science/Research Classifier: Programming Language :: Python Classifier: Programming Language :: Python :: 2 Classifier: Programming Language :: Python :: 2.6 Classifier: Programming Language :: Python :: 2.7 Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.2 Classifier: Programming Language :: Python :: 3.3 Classifier: Programming Language :: Python :: 3.4 Classifier: License :: OSI Approved :: Apache Software License Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence Classifier: Topic :: Scientific/Engineering :: Bio-Informatics Classifier: Topic :: Scientific/Engineering :: Information Analysis Classifier: Topic :: Software Development :: Libraries Classifier: Topic :: Text Processing :: General Classifier: Topic :: Text Processing :: Linguistic Classifier: Topic :: Text Processing :: Markup intervaltree-2.1.0/intervaltree.egg-info/requires.txt0000644000076500000240000000002112522725110024042 0ustar chaimleibstaff00000000000000sortedcontainers intervaltree-2.1.0/intervaltree.egg-info/SOURCES.txt0000644000076500000240000000060212522725110023333 0ustar chaimleibstaff00000000000000CHANGELOG.md LICENSE.txt MANIFEST.in README.md README.rst setup.cfg setup.py intervaltree/__init__.py intervaltree/interval.py intervaltree/intervaltree.py intervaltree/node.py intervaltree.egg-info/PKG-INFO intervaltree.egg-info/SOURCES.txt intervaltree.egg-info/dependency_links.txt intervaltree.egg-info/requires.txt intervaltree.egg-info/top_level.txt intervaltree.egg-info/zip-safeintervaltree-2.1.0/intervaltree.egg-info/top_level.txt0000644000076500000240000000001512522725110024177 0ustar chaimleibstaff00000000000000intervaltree intervaltree-2.1.0/intervaltree.egg-info/zip-safe0000644000076500000240000000000112522725110023101 0ustar chaimleibstaff00000000000000 intervaltree-2.1.0/LICENSE.txt0000644000076500000240000002613612456261575017126 0ustar chaimleibstaff00000000000000 Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright [yyyy] [name of copyright owner] Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. intervaltree-2.1.0/MANIFEST.in0000644000076500000240000000005212456261575017026 0ustar chaimleibstaff00000000000000include README.md CHANGELOG.md LICENSE.txtintervaltree-2.1.0/PKG-INFO0000644000076500000240000005432512522725110016361 0ustar chaimleibstaff00000000000000Metadata-Version: 1.1 Name: intervaltree Version: 2.1.0 Summary: Editable interval tree data structure for Python 2 and 3 Home-page: https://github.com/chaimleib/intervaltree Author: Chaim-Leib Halbert, Konstantin Tretyakov Author-email: chaim.leib.halbert@gmail.com License: Apache License, Version 2.0 Download-URL: https://github.com/chaimleib/intervaltree/tarball/2.1.0 Description: .. This file is automatically generated by setup.py from README.md and CHANGELOG.md. intervaltree ============ A mutable, self-balancing interval tree for Python 2 and 3. Queries may be by point, by range overlap, or by range envelopment. This library was designed to allow tagging text and time intervals, where the intervals include the lower bound but not the upper bound. Installing ---------- .. code:: sh pip install intervaltree Features -------- - Supports Python 2.6+ and Python 3.2+ - Initializing - blank ``tree = IntervalTree()`` - from an iterable of ``Interval`` objects (``tree = IntervalTree(intervals)``) - from an iterable of tuples (``tree = IntervalTree.from_tuples(interval_tuples)``) - Insertions - ``tree[begin:end] = data`` - ``tree.add(interval)`` - ``tree.addi(begin, end, data)`` - Deletions - ``tree.remove(interval)`` (raises ``ValueError`` if not present) - ``tree.discard(interval)`` (quiet if not present) - ``tree.removei(begin, end, data)`` (short for ``tree.remove(Interval(begin, end, data))``) - ``tree.discardi(begin, end, data)`` (short for ``tree.discard(Interval(begin, end, data))``) - ``tree.remove_overlap(point)`` - ``tree.remove_overlap(begin, end)`` (removes all overlapping the range) - ``tree.remove_envelop(begin, end)`` (removes all enveloped in the range) - Overlap queries - ``tree[point]`` - ``tree[begin:end]`` - ``tree.search(point)`` - ``tree.search(begin, end)`` - Envelop queries - ``tree.search(begin, end, strict=True)`` - Membership queries - ``interval_obj in tree`` (this is fastest, O(1)) - ``tree.containsi(begin, end, data)`` - ``tree.overlaps(point)`` - ``tree.overlaps(begin, end)`` - Iterable - ``for interval_obj in tree:`` - ``tree.items()`` - Sizing - ``len(tree)`` - ``tree.is_empty()`` - ``not tree`` - ``tree.begin()`` (the ``begin`` coordinate of the leftmost interval) - ``tree.end()`` (the ``end`` coordinate of the rightmost interval) - Set-like operations - union - ``result_tree = tree.union(iterable)`` - ``result_tree = tree1 | tree2`` - ``tree.update(iterable)`` - ``tree |= other_tree`` - difference - ``result_tree = tree.difference(iterable)`` - ``result_tree = tree1 - tree2`` - ``tree.difference_update(iterable)`` - ``tree -= other_tree`` - intersection - ``result_tree = tree.intersection(iterable)`` - ``result_tree = tree1 & tree2`` - ``tree.intersection_update(iterable)`` - ``tree &= other_tree`` - symmetric difference - ``result_tree = tree.symmetric_difference(iterable)`` - ``result_tree = tree1 ^ tree2`` - ``tree.symmetric_difference_update(iterable)`` - ``tree ^= other_tree`` - comparison - ``tree1.issubset(tree2)`` or ``tree1 <= tree2`` - ``tree1 <= tree2`` - ``tree1.issuperset(tree2)`` or ``tree1 > tree2`` - ``tree1 >= tree2`` - ``tree1 == tree2`` - Restructuring - ``chop(begin, end)`` (slice intervals and remove everything between ``begin`` and ``end``) - ``slice(point)`` (slice intervals at ``point``) - ``split_overlaps()`` (slice at all interval boundaries) - Copying and typecasting - ``IntervalTree(tree)`` (``Interval`` objects are same as those in tree) - ``tree.copy()`` (``Interval`` objects are shallow copies of those in tree) - ``set(tree)`` (can later be fed into ``IntervalTree()``) - ``list(tree)`` (ditto) - Pickle-friendly - Automatic AVL balancing Examples -------- - Getting started .. code:: python >>> from intervaltree import Interval, IntervalTree >>> t = IntervalTree() >>> t IntervalTree() - Adding intervals - any object works! .. code:: python >>> t[1:2] = "1-2" >>> t[4:7] = (4, 7) >>> t[5:9] = {5: 9} - Query by point | The result of a query is a ``set`` object, so if ordering is important, | you must sort it first. .. code:: python >>> sorted(t[6]) [Interval(4, 7, (4, 7)), Interval(5, 9, {5: 9})] >>> sorted(t[6])[0] Interval(4, 7, (4, 7)) - Query by range Note that ranges are inclusive of the lower limit, but non-inclusive of the upper limit. So: .. code:: python >>> sorted(t[2:4]) [] But: .. code:: python >>> sorted(t[1:5]) [Interval(1, 2, '1-2'), Interval(4, 7, (4, 7))] - Accessing an ``Interval`` object .. code:: python >>> iv = Interval(4, 7, (4, 7)) >>> iv.begin 4 >>> iv.end 7 >>> iv.data (4, 7) >>> begin, end, data = iv >>> begin 4 >>> end 7 >>> data (4, 7) - Constructing from lists of intervals We could have made a similar tree this way: .. code:: python >>> ivs = [(1, 2), (4, 7), (5, 9)] >>> t = IntervalTree( ... Interval(begin, end, "%d-%d" % (begin, end)) for begin, end in ivs ... ) Or, if we don't need the data fields: .. code:: python >>> t2 = IntervalTree(Interval(*iv) for iv in ivs) - Removing intervals .. code:: python >>> t.remove( Interval(1, 2, "1-2") ) >>> sorted(t) [Interval(4, 7, '4-7'), Interval(5, 9, '5-9')] >>> t.remove( Interval(500, 1000, "Doesn't exist")) # raises ValueError Traceback (most recent call last): ValueError >>> t.discard(Interval(500, 1000, "Doesn't exist")) # quietly does nothing >>> del t[5] # same as t.remove_overlap(5) >>> t IntervalTree() We could also empty a tree entirely: .. code:: python >>> t2.clear() >>> t2 IntervalTree() Or remove intervals that overlap a range: .. code:: python >>> t = IntervalTree([ ... Interval(0, 10), ... Interval(10, 20), ... Interval(20, 30), ... Interval(30, 40)]) >>> t.remove_overlap(25, 35) >>> sorted(t) [Interval(0, 10), Interval(10, 20)] We can also remove only those intervals completely enveloped in a range: .. code:: python >>> t.remove_envelop(5, 20) >>> sorted(t) [Interval(0, 10)] - Chopping We could also chop out parts of the tree: .. code:: python >>> t = IntervalTree([Interval(0, 10)]) >>> t.chop(3, 7) >>> sorted(t) [Interval(0, 3), Interval(7, 10)] To modify the new intervals' data fields based on which side of the interval is being chopped: .. code:: python >>> def datafunc(iv, islower): ... oldlimit = iv[islower] ... return "oldlimit: {0}, islower: {1}".format(oldlimit, islower) >>> t = IntervalTree([Interval(0, 10)]) >>> t.chop(3, 7, datafunc) >>> sorted(t)[0] Interval(0, 3, 'oldlimit: 10, islower: True') >>> sorted(t)[1] Interval(7, 10, 'oldlimit: 0, islower: False') - Slicing You can also slice intervals in the tree without removing them: .. code:: python >>> t = IntervalTree([Interval(0, 10), Interval(5, 15)]) >>> t.slice(3) >>> sorted(t) [Interval(0, 3), Interval(3, 10), Interval(5, 15)] You can also set the data fields, for example, re-using ``datafunc()`` from above: .. code:: python >>> t = IntervalTree([Interval(5, 15)]) >>> t.slice(10, datafunc) >>> sorted(t)[0] Interval(5, 10, 'oldlimit: 15, islower: True') >>> sorted(t)[1] Interval(10, 15, 'oldlimit: 5, islower: False') Future improvements ------------------- See the issue tracker on GitHub. Based on -------- - Eternally Confuzzled's AVL tree - Wikipedia's Interval Tree - Heavily modified from Tyler Kahn's Interval Tree implementation in Python (GitHub project) - Incorporates contributions from: - konstantint/Konstantin Tretyakov of the University of Tartu (Estonia) - siniG/Avi Gabay - lmcarril/Luis M. Carril of the Karlsruhe Institute for Technology (Germany) Copyright --------- - Chaim-Leib Halbert, 2013-2015 - Modifications, Konstantin Tretyakov, 2014 Licensed under the Apache License, version 2.0. The source code for this project is at https://github.com/chaimleib/intervaltree Change log ========== Version 2.1.0 ------------- - Added: - ``merge_overlaps()`` method and tests - ``merge_equals()`` method and tests - ``range()`` method - ``span()`` method, for returning the difference between ``end()`` and ``begin()`` - Fixes: - Development version numbering is changing to be compliant with PEP440. Version numbering now contains major, minor and micro release numbers, plus the number of builds following the stable release version, e.g. 2.0.4b34 - Speed improvement: ``begin()`` and ``end()`` methods used iterative ``min()`` and ``max()`` builtins instead of the more efficient ``iloc`` member available to ``SortedDict`` - ``overlaps()`` method used to return ``True`` even if provided null test interval - Maintainers: - Added coverage test (``make coverage``) with html report (``htmlcov/index.html``) - Tests run slightly faster Version 2.0.4 ------------- - Fix: Issue #27: README incorrectly showed using a comma instead of a colon when querying the ``IntervalTree``: it showed ``tree[begin, end]`` instead of ``tree[begin:end]`` Version 2.0.3 ------------- - Fix: README showed using + operator for setlike union instead of the correct \| operator - Removed tests from release package to speed up installation; to get the tests, download from GitHub Version 2.0.2 ------------- - Fix: Issue #20: performance enhancement for large trees. ``IntervalTree.search()`` made a copy of the entire ``boundary_table`` resulting in linear search time. The ``sortedcollections`` package is now the sole install dependency Version 2.0.1 ------------- - Fix: Issue #26: failed to prune empty ``Node`` after a rotation promoted contents of ``s_center`` Version 2.0.0 ------------- - ``IntervalTree`` now supports the full ``collections.MutableSet`` API - Added: - ``__delitem__`` to ``IntervalTree`` - ``Interval`` comparison methods ``lt()``, ``gt()``, ``le()`` and ``ge()`` to ``Interval``, as an alternative to the comparison operators, which are designed for sorting - ``IntervalTree.from_tuples(iterable)`` - ``IntervalTree.clear()`` - ``IntervalTree.difference(iterable)`` - ``IntervalTree.difference_update(iterable)`` - ``IntervalTree.union(iterable)`` - ``IntervalTree.intersection(iterable)`` - ``IntervalTree.intersection_update(iterable)`` - ``IntervalTree.symmetric_difference(iterable)`` - ``IntervalTree.symmetric_difference_update(iterable)`` - ``IntervalTree.chop(a, b)`` - ``IntervalTree.slice(point)`` - Deprecated ``IntervalTree.extend()`` -- use ``update()`` instead - Internal improvements: - More verbose tests with progress bars - More tests for comparison and sorting behavior - Code in the README is included in the unit tests - Fixes - BACKWARD INCOMPATIBLE: On ranged queries where ``begin >= end``, the query operated on the overlaps of ``begin``. This behavior was documented as expected in 1.x; it is now changed to be more consistent with the definition of ``Interval``\ s, which are half-open. - Issue #25: pruning empty Nodes with staggered descendants could result in invalid trees - Sorting ``Interval``\ s and numbers in the same list gathered all the numbers at the beginning and the ``Interval``\ s at the end - ``IntervalTree.overlaps()`` and friends returned ``None`` instead of ``False`` - Maintainers: ``make install-testpypi`` failed because the ``pip`` was missing a ``--pre`` flag Version 1.1.1 ------------- - Removed requirement for pyandoc in order to run functionality tests. Version 1.1.0 ------------- - Added ability to use ``Interval.distance_to()`` with points, not just ``Intervals`` - Added documentation on return types to ``IntervalTree`` and ``Interval`` - ``Interval.__cmp__()`` works with points too - Fix: ``IntervalTree.score()`` returned maximum score of 0.5 instead of 1.0. Now returns max of subscores instead of avg - Internal improvements: - Development version numbering scheme, based on ``git describe`` the "building towards" release is appended after a hyphen, eg. 1.0.2-37-g2da2ef0-1.10. The previous tagged release is 1.0.2, and there have been 37 commits since then, current tag is g2da2ef0, and we are getting ready for a 1.1.0 release - Optimality tests added - ``Interval`` overlap tests for ranges, ``Interval``\ s and points added Version 1.0.2 ------------- | -Bug fixes: | - ``Node.depth_score_helper()`` raised ``AttributeError`` | - README formatting Version 1.0.1 ------------- - Fix: pip install failure because of failure to generate README.rst Version 1.0.0 ------------- - Renamed from PyIntervalTree to intervaltree - Speed improvements for adding and removing Intervals (~70% faster than 0.4) - Bug fixes: - BACKWARD INCOMPATIBLE: ``len()`` of an ``Interval`` is always 3, reverting to default behavior for ``namedtuples``. In Python 3, ``len`` returning a non-integer raises an exception. Instead, use ``Interval.length()``, which returns 0 for null intervals and ``end - begin`` otherwise. Also, if the ``len() === 0``, then ``not iv`` is ``True``. - When inserting an ``Interval`` via ``__setitem__`` and improper parameters given, all errors were transformed to ``IndexError`` - ``split_overlaps`` did not update the ``boundary_table`` counts - Internal improvements: - More robust local testing tools - Long series of interdependent tests have been separated into sections Version 0.4 ----------- - Faster balancing (~80% faster) - Bug fixes: - Double rotations were performed in place of a single rotation when presented an unbalanced Node with a balanced child. - During single rotation, kept referencing an unrotated Node instead of the new, rotated one Version 0.3.3 ------------- - Made IntervalTree crash if inited with a null Interval (end <= begin) - IntervalTree raises ValueError instead of AssertionError when a null Interval is inserted Version 0.3.2 ------------- - Support for Python 3.2+ and 2.6+ - Changed license from LGPL to more permissive Apache license - Merged changes from https://github.com/konstantint/PyIntervalTree to https://github.com/chaimleib/PyIntervalTree - Interval now inherits from a namedtuple. Benefits: should be faster. Drawbacks: slight behavioural change (Intervals not mutable anymore). - Added float tests - Use setup.py for tests - Automatic testing via travis-ci - Removed dependency on six - Interval improvements: - Intervals without data have a cleaner string representation - Intervals without data are pickled more compactly - Better hashing - Intervals are ordered by begin, then end, then by data. If data is not orderable, sorts by type(data) - Bug fixes: - Fixed crash when querying empty tree - Fixed missing close parenthesis in examples - Made IntervalTree crash earlier if a null Interval is added - Internals: - New test directory - Nicer display of data structures for debugging, using custom test/pprint.py (Python 2.6, 2.7) - More sensitive exception handling - Local script to test in all supported versions of Python - Added IntervalTree.score() to measure how optimally a tree is structured Version 0.2.3 ------------- - Slight changes for inclusion in PyPI. - Some documentation changes - Added tests - Bug fix: interval addition via [] was broken in Python 2.7 (see http://bugs.python.org/issue21785) - Added intervaltree.bio subpackage, adding some utilities for use in bioinformatics Version 0.2.2b -------------- - Forked from https://github.com/MusashiAharon/PyIntervalTree Keywords: interval-tree data-structure intervals tree Platform: UNKNOWN Classifier: Development Status :: 4 - Beta Classifier: Intended Audience :: Developers Classifier: Intended Audience :: Information Technology Classifier: Intended Audience :: Science/Research Classifier: Programming Language :: Python Classifier: Programming Language :: Python :: 2 Classifier: Programming Language :: Python :: 2.6 Classifier: Programming Language :: Python :: 2.7 Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.2 Classifier: Programming Language :: Python :: 3.3 Classifier: Programming Language :: Python :: 3.4 Classifier: License :: OSI Approved :: Apache Software License Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence Classifier: Topic :: Scientific/Engineering :: Bio-Informatics Classifier: Topic :: Scientific/Engineering :: Information Analysis Classifier: Topic :: Software Development :: Libraries Classifier: Topic :: Text Processing :: General Classifier: Topic :: Text Processing :: Linguistic Classifier: Topic :: Text Processing :: Markup intervaltree-2.1.0/README.md0000644000076500000240000002221012522724400016530 0ustar chaimleibstaff00000000000000[![Build status badge][]][build status] intervaltree ============ A mutable, self-balancing interval tree for Python 2 and 3. Queries may be by point, by range overlap, or by range envelopment. This library was designed to allow tagging text and time intervals, where the intervals include the lower bound but not the upper bound. Installing ---------- ```sh pip install intervaltree ``` Features -------- * Supports Python 2.6+ and Python 3.2+ * Initializing * blank `tree = IntervalTree()` * from an iterable of `Interval` objects (`tree = IntervalTree(intervals)`) * from an iterable of tuples (`tree = IntervalTree.from_tuples(interval_tuples)`) * Insertions * `tree[begin:end] = data` * `tree.add(interval)` * `tree.addi(begin, end, data)` * Deletions * `tree.remove(interval)` (raises `ValueError` if not present) * `tree.discard(interval)` (quiet if not present) * `tree.removei(begin, end, data)` (short for `tree.remove(Interval(begin, end, data))`) * `tree.discardi(begin, end, data)` (short for `tree.discard(Interval(begin, end, data))`) * `tree.remove_overlap(point)` * `tree.remove_overlap(begin, end)` (removes all overlapping the range) * `tree.remove_envelop(begin, end)` (removes all enveloped in the range) * Overlap queries * `tree[point]` * `tree[begin:end]` * `tree.search(point)` * `tree.search(begin, end)` * Envelop queries * `tree.search(begin, end, strict=True)` * Membership queries * `interval_obj in tree` (this is fastest, O(1)) * `tree.containsi(begin, end, data)` * `tree.overlaps(point)` * `tree.overlaps(begin, end)` * Iterable * `for interval_obj in tree:` * `tree.items()` * Sizing * `len(tree)` * `tree.is_empty()` * `not tree` * `tree.begin()` (the `begin` coordinate of the leftmost interval) * `tree.end()` (the `end` coordinate of the rightmost interval) * Set-like operations * union * `result_tree = tree.union(iterable)` * `result_tree = tree1 | tree2` * `tree.update(iterable)` * `tree |= other_tree` * difference * `result_tree = tree.difference(iterable)` * `result_tree = tree1 - tree2` * `tree.difference_update(iterable)` * `tree -= other_tree` * intersection * `result_tree = tree.intersection(iterable)` * `result_tree = tree1 & tree2` * `tree.intersection_update(iterable)` * `tree &= other_tree` * symmetric difference * `result_tree = tree.symmetric_difference(iterable)` * `result_tree = tree1 ^ tree2` * `tree.symmetric_difference_update(iterable)` * `tree ^= other_tree` * comparison * `tree1.issubset(tree2)` or `tree1 <= tree2` * `tree1 <= tree2` * `tree1.issuperset(tree2)` or `tree1 > tree2` * `tree1 >= tree2` * `tree1 == tree2` * Restructuring * `chop(begin, end)` (slice intervals and remove everything between `begin` and `end`) * `slice(point)` (slice intervals at `point`) * `split_overlaps()` (slice at all interval boundaries) * Copying and typecasting * `IntervalTree(tree)` (`Interval` objects are same as those in tree) * `tree.copy()` (`Interval` objects are shallow copies of those in tree) * `set(tree)` (can later be fed into `IntervalTree()`) * `list(tree)` (ditto) * Pickle-friendly * Automatic AVL balancing Examples -------- * Getting started ``` python >>> from intervaltree import Interval, IntervalTree >>> t = IntervalTree() >>> t IntervalTree() ``` * Adding intervals - any object works! ``` python >>> t[1:2] = "1-2" >>> t[4:7] = (4, 7) >>> t[5:9] = {5: 9} ``` * Query by point The result of a query is a `set` object, so if ordering is important, you must sort it first. ``` python >>> sorted(t[6]) [Interval(4, 7, (4, 7)), Interval(5, 9, {5: 9})] >>> sorted(t[6])[0] Interval(4, 7, (4, 7)) ``` * Query by range Note that ranges are inclusive of the lower limit, but non-inclusive of the upper limit. So: ``` python >>> sorted(t[2:4]) [] ``` But: ``` python >>> sorted(t[1:5]) [Interval(1, 2, '1-2'), Interval(4, 7, (4, 7))] ``` * Accessing an `Interval` object ``` python >>> iv = Interval(4, 7, (4, 7)) >>> iv.begin 4 >>> iv.end 7 >>> iv.data (4, 7) >>> begin, end, data = iv >>> begin 4 >>> end 7 >>> data (4, 7) ``` * Constructing from lists of intervals We could have made a similar tree this way: ``` python >>> ivs = [(1, 2), (4, 7), (5, 9)] >>> t = IntervalTree( ... Interval(begin, end, "%d-%d" % (begin, end)) for begin, end in ivs ... ) ``` Or, if we don't need the data fields: ``` python >>> t2 = IntervalTree(Interval(*iv) for iv in ivs) ``` * Removing intervals ``` python >>> t.remove( Interval(1, 2, "1-2") ) >>> sorted(t) [Interval(4, 7, '4-7'), Interval(5, 9, '5-9')] >>> t.remove( Interval(500, 1000, "Doesn't exist")) # raises ValueError Traceback (most recent call last): ValueError >>> t.discard(Interval(500, 1000, "Doesn't exist")) # quietly does nothing >>> del t[5] # same as t.remove_overlap(5) >>> t IntervalTree() ``` We could also empty a tree entirely: ``` python >>> t2.clear() >>> t2 IntervalTree() ``` Or remove intervals that overlap a range: ``` python >>> t = IntervalTree([ ... Interval(0, 10), ... Interval(10, 20), ... Interval(20, 30), ... Interval(30, 40)]) >>> t.remove_overlap(25, 35) >>> sorted(t) [Interval(0, 10), Interval(10, 20)] ``` We can also remove only those intervals completely enveloped in a range: ``` python >>> t.remove_envelop(5, 20) >>> sorted(t) [Interval(0, 10)] ``` * Chopping We could also chop out parts of the tree: ``` python >>> t = IntervalTree([Interval(0, 10)]) >>> t.chop(3, 7) >>> sorted(t) [Interval(0, 3), Interval(7, 10)] ``` To modify the new intervals' data fields based on which side of the interval is being chopped: ``` python >>> def datafunc(iv, islower): ... oldlimit = iv[islower] ... return "oldlimit: {0}, islower: {1}".format(oldlimit, islower) >>> t = IntervalTree([Interval(0, 10)]) >>> t.chop(3, 7, datafunc) >>> sorted(t)[0] Interval(0, 3, 'oldlimit: 10, islower: True') >>> sorted(t)[1] Interval(7, 10, 'oldlimit: 0, islower: False') ``` * Slicing You can also slice intervals in the tree without removing them: ``` python >>> t = IntervalTree([Interval(0, 10), Interval(5, 15)]) >>> t.slice(3) >>> sorted(t) [Interval(0, 3), Interval(3, 10), Interval(5, 15)] ``` You can also set the data fields, for example, re-using `datafunc()` from above: ``` python >>> t = IntervalTree([Interval(5, 15)]) >>> t.slice(10, datafunc) >>> sorted(t)[0] Interval(5, 10, 'oldlimit: 15, islower: True') >>> sorted(t)[1] Interval(10, 15, 'oldlimit: 5, islower: False') ``` Future improvements ------------------- See the [issue tracker][] on GitHub. Based on -------- * Eternally Confuzzled's [AVL tree][Confuzzled AVL tree] * Wikipedia's [Interval Tree][Wiki intervaltree] * Heavily modified from Tyler Kahn's [Interval Tree implementation in Python][Kahn intervaltree] ([GitHub project][Kahn intervaltree GH]) * Incorporates contributions from: * [konstantint/Konstantin Tretyakov][Konstantin intervaltree] of the University of Tartu (Estonia) * [siniG/Avi Gabay][siniG intervaltree] * [lmcarril/Luis M. Carril][lmcarril intervaltree] of the Karlsruhe Institute for Technology (Germany) Copyright --------- * [Chaim-Leib Halbert][GH], 2013-2015 * Modifications, [Konstantin Tretyakov][Konstantin intervaltree], 2014 Licensed under the [Apache License, version 2.0][Apache]. The source code for this project is at https://github.com/chaimleib/intervaltree [build status badge]: https://travis-ci.org/chaimleib/intervaltree.svg?branch=master [build status]: https://travis-ci.org/chaimleib/intervaltree [GH]: https://github.com/chaimleib/intervaltree [issue tracker]: https://github.com/chaimleib/intervaltree/issues [Konstantin intervaltree]: https://github.com/konstantint/PyIntervalTree [siniG intervaltree]: https://github.com/siniG/intervaltree [lmcarril intervaltree]: https://github.com/lmcarril/intervaltree [Confuzzled AVL tree]: http://www.eternallyconfuzzled.com/tuts/datastructures/jsw_tut_avl.aspx [Wiki intervaltree]: http://en.wikipedia.org/wiki/Interval_tree [Kahn intervaltree]: http://zurb.com/forrst/posts/Interval_Tree_implementation_in_python-e0K [Kahn intervaltree GH]: https://github.com/tylerkahn/intervaltree-python [Apache]: http://www.apache.org/licenses/LICENSE-2.0 intervaltree-2.1.0/README.rst0000644000076500000240000004011112522725110016737 0ustar chaimleibstaff00000000000000.. This file is automatically generated by setup.py from README.md and CHANGELOG.md. intervaltree ============ A mutable, self-balancing interval tree for Python 2 and 3. Queries may be by point, by range overlap, or by range envelopment. This library was designed to allow tagging text and time intervals, where the intervals include the lower bound but not the upper bound. Installing ---------- .. code:: sh pip install intervaltree Features -------- - Supports Python 2.6+ and Python 3.2+ - Initializing - blank ``tree = IntervalTree()`` - from an iterable of ``Interval`` objects (``tree = IntervalTree(intervals)``) - from an iterable of tuples (``tree = IntervalTree.from_tuples(interval_tuples)``) - Insertions - ``tree[begin:end] = data`` - ``tree.add(interval)`` - ``tree.addi(begin, end, data)`` - Deletions - ``tree.remove(interval)`` (raises ``ValueError`` if not present) - ``tree.discard(interval)`` (quiet if not present) - ``tree.removei(begin, end, data)`` (short for ``tree.remove(Interval(begin, end, data))``) - ``tree.discardi(begin, end, data)`` (short for ``tree.discard(Interval(begin, end, data))``) - ``tree.remove_overlap(point)`` - ``tree.remove_overlap(begin, end)`` (removes all overlapping the range) - ``tree.remove_envelop(begin, end)`` (removes all enveloped in the range) - Overlap queries - ``tree[point]`` - ``tree[begin:end]`` - ``tree.search(point)`` - ``tree.search(begin, end)`` - Envelop queries - ``tree.search(begin, end, strict=True)`` - Membership queries - ``interval_obj in tree`` (this is fastest, O(1)) - ``tree.containsi(begin, end, data)`` - ``tree.overlaps(point)`` - ``tree.overlaps(begin, end)`` - Iterable - ``for interval_obj in tree:`` - ``tree.items()`` - Sizing - ``len(tree)`` - ``tree.is_empty()`` - ``not tree`` - ``tree.begin()`` (the ``begin`` coordinate of the leftmost interval) - ``tree.end()`` (the ``end`` coordinate of the rightmost interval) - Set-like operations - union - ``result_tree = tree.union(iterable)`` - ``result_tree = tree1 | tree2`` - ``tree.update(iterable)`` - ``tree |= other_tree`` - difference - ``result_tree = tree.difference(iterable)`` - ``result_tree = tree1 - tree2`` - ``tree.difference_update(iterable)`` - ``tree -= other_tree`` - intersection - ``result_tree = tree.intersection(iterable)`` - ``result_tree = tree1 & tree2`` - ``tree.intersection_update(iterable)`` - ``tree &= other_tree`` - symmetric difference - ``result_tree = tree.symmetric_difference(iterable)`` - ``result_tree = tree1 ^ tree2`` - ``tree.symmetric_difference_update(iterable)`` - ``tree ^= other_tree`` - comparison - ``tree1.issubset(tree2)`` or ``tree1 <= tree2`` - ``tree1 <= tree2`` - ``tree1.issuperset(tree2)`` or ``tree1 > tree2`` - ``tree1 >= tree2`` - ``tree1 == tree2`` - Restructuring - ``chop(begin, end)`` (slice intervals and remove everything between ``begin`` and ``end``) - ``slice(point)`` (slice intervals at ``point``) - ``split_overlaps()`` (slice at all interval boundaries) - Copying and typecasting - ``IntervalTree(tree)`` (``Interval`` objects are same as those in tree) - ``tree.copy()`` (``Interval`` objects are shallow copies of those in tree) - ``set(tree)`` (can later be fed into ``IntervalTree()``) - ``list(tree)`` (ditto) - Pickle-friendly - Automatic AVL balancing Examples -------- - Getting started .. code:: python >>> from intervaltree import Interval, IntervalTree >>> t = IntervalTree() >>> t IntervalTree() - Adding intervals - any object works! .. code:: python >>> t[1:2] = "1-2" >>> t[4:7] = (4, 7) >>> t[5:9] = {5: 9} - Query by point | The result of a query is a ``set`` object, so if ordering is important, | you must sort it first. .. code:: python >>> sorted(t[6]) [Interval(4, 7, (4, 7)), Interval(5, 9, {5: 9})] >>> sorted(t[6])[0] Interval(4, 7, (4, 7)) - Query by range Note that ranges are inclusive of the lower limit, but non-inclusive of the upper limit. So: .. code:: python >>> sorted(t[2:4]) [] But: .. code:: python >>> sorted(t[1:5]) [Interval(1, 2, '1-2'), Interval(4, 7, (4, 7))] - Accessing an ``Interval`` object .. code:: python >>> iv = Interval(4, 7, (4, 7)) >>> iv.begin 4 >>> iv.end 7 >>> iv.data (4, 7) >>> begin, end, data = iv >>> begin 4 >>> end 7 >>> data (4, 7) - Constructing from lists of intervals We could have made a similar tree this way: .. code:: python >>> ivs = [(1, 2), (4, 7), (5, 9)] >>> t = IntervalTree( ... Interval(begin, end, "%d-%d" % (begin, end)) for begin, end in ivs ... ) Or, if we don't need the data fields: .. code:: python >>> t2 = IntervalTree(Interval(*iv) for iv in ivs) - Removing intervals .. code:: python >>> t.remove( Interval(1, 2, "1-2") ) >>> sorted(t) [Interval(4, 7, '4-7'), Interval(5, 9, '5-9')] >>> t.remove( Interval(500, 1000, "Doesn't exist")) # raises ValueError Traceback (most recent call last): ValueError >>> t.discard(Interval(500, 1000, "Doesn't exist")) # quietly does nothing >>> del t[5] # same as t.remove_overlap(5) >>> t IntervalTree() We could also empty a tree entirely: .. code:: python >>> t2.clear() >>> t2 IntervalTree() Or remove intervals that overlap a range: .. code:: python >>> t = IntervalTree([ ... Interval(0, 10), ... Interval(10, 20), ... Interval(20, 30), ... Interval(30, 40)]) >>> t.remove_overlap(25, 35) >>> sorted(t) [Interval(0, 10), Interval(10, 20)] We can also remove only those intervals completely enveloped in a range: .. code:: python >>> t.remove_envelop(5, 20) >>> sorted(t) [Interval(0, 10)] - Chopping We could also chop out parts of the tree: .. code:: python >>> t = IntervalTree([Interval(0, 10)]) >>> t.chop(3, 7) >>> sorted(t) [Interval(0, 3), Interval(7, 10)] To modify the new intervals' data fields based on which side of the interval is being chopped: .. code:: python >>> def datafunc(iv, islower): ... oldlimit = iv[islower] ... return "oldlimit: {0}, islower: {1}".format(oldlimit, islower) >>> t = IntervalTree([Interval(0, 10)]) >>> t.chop(3, 7, datafunc) >>> sorted(t)[0] Interval(0, 3, 'oldlimit: 10, islower: True') >>> sorted(t)[1] Interval(7, 10, 'oldlimit: 0, islower: False') - Slicing You can also slice intervals in the tree without removing them: .. code:: python >>> t = IntervalTree([Interval(0, 10), Interval(5, 15)]) >>> t.slice(3) >>> sorted(t) [Interval(0, 3), Interval(3, 10), Interval(5, 15)] You can also set the data fields, for example, re-using ``datafunc()`` from above: .. code:: python >>> t = IntervalTree([Interval(5, 15)]) >>> t.slice(10, datafunc) >>> sorted(t)[0] Interval(5, 10, 'oldlimit: 15, islower: True') >>> sorted(t)[1] Interval(10, 15, 'oldlimit: 5, islower: False') Future improvements ------------------- See the issue tracker on GitHub. Based on -------- - Eternally Confuzzled's AVL tree - Wikipedia's Interval Tree - Heavily modified from Tyler Kahn's Interval Tree implementation in Python (GitHub project) - Incorporates contributions from: - konstantint/Konstantin Tretyakov of the University of Tartu (Estonia) - siniG/Avi Gabay - lmcarril/Luis M. Carril of the Karlsruhe Institute for Technology (Germany) Copyright --------- - Chaim-Leib Halbert, 2013-2015 - Modifications, Konstantin Tretyakov, 2014 Licensed under the Apache License, version 2.0. The source code for this project is at https://github.com/chaimleib/intervaltree Change log ========== Version 2.1.0 ------------- - Added: - ``merge_overlaps()`` method and tests - ``merge_equals()`` method and tests - ``range()`` method - ``span()`` method, for returning the difference between ``end()`` and ``begin()`` - Fixes: - Development version numbering is changing to be compliant with PEP440. Version numbering now contains major, minor and micro release numbers, plus the number of builds following the stable release version, e.g. 2.0.4b34 - Speed improvement: ``begin()`` and ``end()`` methods used iterative ``min()`` and ``max()`` builtins instead of the more efficient ``iloc`` member available to ``SortedDict`` - ``overlaps()`` method used to return ``True`` even if provided null test interval - Maintainers: - Added coverage test (``make coverage``) with html report (``htmlcov/index.html``) - Tests run slightly faster Version 2.0.4 ------------- - Fix: Issue #27: README incorrectly showed using a comma instead of a colon when querying the ``IntervalTree``: it showed ``tree[begin, end]`` instead of ``tree[begin:end]`` Version 2.0.3 ------------- - Fix: README showed using + operator for setlike union instead of the correct \| operator - Removed tests from release package to speed up installation; to get the tests, download from GitHub Version 2.0.2 ------------- - Fix: Issue #20: performance enhancement for large trees. ``IntervalTree.search()`` made a copy of the entire ``boundary_table`` resulting in linear search time. The ``sortedcollections`` package is now the sole install dependency Version 2.0.1 ------------- - Fix: Issue #26: failed to prune empty ``Node`` after a rotation promoted contents of ``s_center`` Version 2.0.0 ------------- - ``IntervalTree`` now supports the full ``collections.MutableSet`` API - Added: - ``__delitem__`` to ``IntervalTree`` - ``Interval`` comparison methods ``lt()``, ``gt()``, ``le()`` and ``ge()`` to ``Interval``, as an alternative to the comparison operators, which are designed for sorting - ``IntervalTree.from_tuples(iterable)`` - ``IntervalTree.clear()`` - ``IntervalTree.difference(iterable)`` - ``IntervalTree.difference_update(iterable)`` - ``IntervalTree.union(iterable)`` - ``IntervalTree.intersection(iterable)`` - ``IntervalTree.intersection_update(iterable)`` - ``IntervalTree.symmetric_difference(iterable)`` - ``IntervalTree.symmetric_difference_update(iterable)`` - ``IntervalTree.chop(a, b)`` - ``IntervalTree.slice(point)`` - Deprecated ``IntervalTree.extend()`` -- use ``update()`` instead - Internal improvements: - More verbose tests with progress bars - More tests for comparison and sorting behavior - Code in the README is included in the unit tests - Fixes - BACKWARD INCOMPATIBLE: On ranged queries where ``begin >= end``, the query operated on the overlaps of ``begin``. This behavior was documented as expected in 1.x; it is now changed to be more consistent with the definition of ``Interval``\ s, which are half-open. - Issue #25: pruning empty Nodes with staggered descendants could result in invalid trees - Sorting ``Interval``\ s and numbers in the same list gathered all the numbers at the beginning and the ``Interval``\ s at the end - ``IntervalTree.overlaps()`` and friends returned ``None`` instead of ``False`` - Maintainers: ``make install-testpypi`` failed because the ``pip`` was missing a ``--pre`` flag Version 1.1.1 ------------- - Removed requirement for pyandoc in order to run functionality tests. Version 1.1.0 ------------- - Added ability to use ``Interval.distance_to()`` with points, not just ``Intervals`` - Added documentation on return types to ``IntervalTree`` and ``Interval`` - ``Interval.__cmp__()`` works with points too - Fix: ``IntervalTree.score()`` returned maximum score of 0.5 instead of 1.0. Now returns max of subscores instead of avg - Internal improvements: - Development version numbering scheme, based on ``git describe`` the "building towards" release is appended after a hyphen, eg. 1.0.2-37-g2da2ef0-1.10. The previous tagged release is 1.0.2, and there have been 37 commits since then, current tag is g2da2ef0, and we are getting ready for a 1.1.0 release - Optimality tests added - ``Interval`` overlap tests for ranges, ``Interval``\ s and points added Version 1.0.2 ------------- | -Bug fixes: | - ``Node.depth_score_helper()`` raised ``AttributeError`` | - README formatting Version 1.0.1 ------------- - Fix: pip install failure because of failure to generate README.rst Version 1.0.0 ------------- - Renamed from PyIntervalTree to intervaltree - Speed improvements for adding and removing Intervals (~70% faster than 0.4) - Bug fixes: - BACKWARD INCOMPATIBLE: ``len()`` of an ``Interval`` is always 3, reverting to default behavior for ``namedtuples``. In Python 3, ``len`` returning a non-integer raises an exception. Instead, use ``Interval.length()``, which returns 0 for null intervals and ``end - begin`` otherwise. Also, if the ``len() === 0``, then ``not iv`` is ``True``. - When inserting an ``Interval`` via ``__setitem__`` and improper parameters given, all errors were transformed to ``IndexError`` - ``split_overlaps`` did not update the ``boundary_table`` counts - Internal improvements: - More robust local testing tools - Long series of interdependent tests have been separated into sections Version 0.4 ----------- - Faster balancing (~80% faster) - Bug fixes: - Double rotations were performed in place of a single rotation when presented an unbalanced Node with a balanced child. - During single rotation, kept referencing an unrotated Node instead of the new, rotated one Version 0.3.3 ------------- - Made IntervalTree crash if inited with a null Interval (end <= begin) - IntervalTree raises ValueError instead of AssertionError when a null Interval is inserted Version 0.3.2 ------------- - Support for Python 3.2+ and 2.6+ - Changed license from LGPL to more permissive Apache license - Merged changes from https://github.com/konstantint/PyIntervalTree to https://github.com/chaimleib/PyIntervalTree - Interval now inherits from a namedtuple. Benefits: should be faster. Drawbacks: slight behavioural change (Intervals not mutable anymore). - Added float tests - Use setup.py for tests - Automatic testing via travis-ci - Removed dependency on six - Interval improvements: - Intervals without data have a cleaner string representation - Intervals without data are pickled more compactly - Better hashing - Intervals are ordered by begin, then end, then by data. If data is not orderable, sorts by type(data) - Bug fixes: - Fixed crash when querying empty tree - Fixed missing close parenthesis in examples - Made IntervalTree crash earlier if a null Interval is added - Internals: - New test directory - Nicer display of data structures for debugging, using custom test/pprint.py (Python 2.6, 2.7) - More sensitive exception handling - Local script to test in all supported versions of Python - Added IntervalTree.score() to measure how optimally a tree is structured Version 0.2.3 ------------- - Slight changes for inclusion in PyPI. - Some documentation changes - Added tests - Bug fix: interval addition via [] was broken in Python 2.7 (see http://bugs.python.org/issue21785) - Added intervaltree.bio subpackage, adding some utilities for use in bioinformatics Version 0.2.2b -------------- - Forked from https://github.com/MusashiAharon/PyIntervalTree intervaltree-2.1.0/setup.cfg0000644000076500000240000000034312522725110017074 0ustar chaimleibstaff00000000000000[egg_info] tag_build = tag_svn_revision = 0 tag_date = 0 [pytest] addopts = --doctest-modules --doctest-glob='README.md' --ignore=setup.py --ignore=*.pyc norecursedirs = *.egg* *doc* .* _* htmlcov scripts dist bin test/data intervaltree-2.1.0/setup.py0000644000076500000240000001707112522724400016774 0ustar chaimleibstaff00000000000000""" intervaltree: A mutable, self-balancing interval tree for Python 2 and 3. Queries may be by point, by range overlap, or by range envelopment. Distribution logic Note that "python setup.py test" invokes pytest on the package. With appropriately configured setup.cfg, this will check both xxx_test modules and docstrings. Copyright 2013-2015 Chaim-Leib Halbert Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. """ import os import errno import sys import subprocess from warnings import warn from setuptools import setup from setuptools.command.test import test as TestCommand import re ## CONFIG target_version = '2.1.0' create_rst = True def development_version_number(): p = subprocess.Popen('git describe'.split(), stdout=subprocess.PIPE) git_describe = p.communicate()[0].strip() release, build, commitish = git_describe.split('-') result = "{0}b{1}".format(release, build) return result is_dev_version = 'PYPI' in os.environ and os.environ['PYPI'] == 'pypitest' if is_dev_version: version = development_version_number() else: # This is a RELEASE version version = target_version print("Version: " + version) if is_dev_version: print("This is a DEV version.") print("Target: " + target_version) ## Filesystem utilities def read_file(path): """Reads file into string.""" with open(path, 'r') as f: data = f.read() return data def mkdir_p(path): """Like `mkdir -p` in unix""" if not path.strip(): return try: os.makedirs(path) except OSError as e: if e.errno == errno.EEXIST and os.path.isdir(path): pass else: raise def rm_f(path): """Like `rm -f` in unix""" try: os.unlink(path) except OSError as e: if e.errno == errno.ENOENT: pass else: raise def update_file(path, data): """Writes data to path, creating path if it doesn't exist""" # delete file if already exists rm_f(path) # create parent dirs if needed parent_dir = os.path.dirname(path) if not os.path.isdir(os.path.dirname(parent_dir)): mkdir_p(parent_dir) # write file with open(path, 'w') as f: f.write(data) ## PyTest # This is a plug-in for setuptools that will invoke py.test # when you run python setup.py test class PyTest(TestCommand): def finalize_options(self): TestCommand.finalize_options(self) self.test_args = [] self.test_suite = True def run_tests(self): import pytest # import here, because outside the required eggs aren't loaded yet sys.exit(pytest.main(self.test_args)) def get_rst(): if os.path.isdir('pyandoc/pandoc') and os.path.islink('pandoc'): print("Generating README.rst from README.md and CHANGELOG.md") return generate_rst() elif os.path.isfile('README.rst'): print("Reading README.rst") return read_file('README.rst') else: warn("No README.rst found!") print("Reading README.md") data = ''.join([ read_file('README.md'), '\n', read_file('CHANGELOG.md'), ]) return data ## Convert README to rst for PyPI def generate_rst(): """Converts Markdown to RST for PyPI""" md = read_file("README.md") md = pypi_sanitize_markdown(md) rst = markdown2rst(md) rst = pypi_prepare_rst(rst) changes_md = pypi_sanitize_markdown(read_file("CHANGELOG.md")) changes_rst = markdown2rst(changes_md) rst += "\n" + changes_rst # Write it if create_rst: update_file('README.rst', rst) else: rm_f('README.rst') return rst def markdown2rst(md): """Convert markdown to rst format using pandoc. No other processing.""" # import here, because outside it might not used try: import pandoc except ImportError as e: raise else: pandoc.PANDOC_PATH = 'pandoc' # until pyandoc gets updated doc = pandoc.Document() doc.markdown_github = md rst = doc.rst return rst ## Sanitizers def pypi_sanitize_markdown(md): """Prepare markdown for conversion to PyPI rst""" md = chop_markdown_header(md) md = remove_markdown_links(md) return md def pypi_prepare_rst(rst): """Add a notice that the rst was auto-generated""" head = """\ .. This file is automatically generated by setup.py from README.md and CHANGELOG.md. """ rst = head + rst return rst def chop_markdown_header(md): """ Remove empty lines and travis-ci header from markdown string. :param md: input markdown string :type md: str :return: simplified markdown string data :rtype: str """ md = md.splitlines() while not md[0].strip() or md[0].startswith('[!['): md = md[1:] md = '\n'.join(md) return md def remove_markdown_links(md): """PyPI doesn't like links, so we remove them.""" # named links, e.g. [hello][url to hello] or [hello][] md = re.sub( r'\[((?:[^\]]|\\\])+)\]' # link text r'\[((?:[^\]]|\\\])*)\]', # link name '\\1', md ) # url links, e.g. [example.com](http://www.example.com) md = re.sub( r'\[((?:[^\]]|\\\])+)\]' # link text r'\(((?:[^\]]|\\\])*)\)', # link url '\\1', md ) return md ## Run setuptools setup( name='intervaltree', version=version, install_requires=['sortedcontainers'], description='Editable interval tree data structure for Python 2 and 3', long_description=get_rst(), classifiers=[ # Get strings from http://pypi.python.org/pypi?%3Aaction=list_classifiers 'Development Status :: 4 - Beta', 'Intended Audience :: Developers', 'Intended Audience :: Information Technology', 'Intended Audience :: Science/Research', 'Programming Language :: Python', 'Programming Language :: Python :: 2', 'Programming Language :: Python :: 2.6', 'Programming Language :: Python :: 2.7', 'Programming Language :: Python :: 3', 'Programming Language :: Python :: 3.2', 'Programming Language :: Python :: 3.3', 'Programming Language :: Python :: 3.4', 'License :: OSI Approved :: Apache Software License', 'Topic :: Scientific/Engineering :: Artificial Intelligence', 'Topic :: Scientific/Engineering :: Bio-Informatics', 'Topic :: Scientific/Engineering :: Information Analysis', 'Topic :: Software Development :: Libraries', 'Topic :: Text Processing :: General', 'Topic :: Text Processing :: Linguistic', 'Topic :: Text Processing :: Markup', ], keywords="interval-tree data-structure intervals tree", # Separate with spaces author='Chaim-Leib Halbert, Konstantin Tretyakov', author_email='chaim.leib.halbert@gmail.com', url='https://github.com/chaimleib/intervaltree', download_url='https://github.com/chaimleib/intervaltree/tarball/' + version, license="Apache License, Version 2.0", packages=["intervaltree"], include_package_data=True, zip_safe=True, tests_require=['pytest'], cmdclass={'test': PyTest}, entry_points={} )