cigar-0.1.3/0000775000175000017500000000000012631623071013411 5ustar brentpbrentp00000000000000cigar-0.1.3/setup.cfg0000664000175000017500000000007312631623071015232 0ustar brentpbrentp00000000000000[egg_info] tag_build = tag_date = 0 tag_svn_revision = 0 cigar-0.1.3/MANIFEST.in0000664000175000017500000000006612631623056015154 0ustar brentpbrentp00000000000000include README.md include LICENSE include ez_setup.py cigar-0.1.3/LICENSE0000664000175000017500000000211212631622704014414 0ustar brentpbrentp00000000000000The MIT License (MIT) Copyright (c) 2013 Brent Pedersen - Bioinformatics Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. cigar-0.1.3/PKG-INFO0000664000175000017500000000433112631623071014507 0ustar brentpbrentp00000000000000Metadata-Version: 1.1 Name: cigar Version: 0.1.3 Summary: manipulate SAM cigar strings Home-page: https://github.com/brentp/cigar Author: Brent Pedersen Author-email: bpederse@gmail.com License: MIT Description: Cigar ===== cigar is a simple library for dealing with cigar strings. the most useful feature now is soft-masking from left or right. This allows one to adjust a SAM record only by changing the cigar string to soft-mask a number of bases such that the rest of the SAM record (pos, tlen, etc.) remain valid, but downstream tools will not consider the soft-masked bases in further analysis. ```Python >>> from cigar import Cigar >>> c = Cigar('100M') >>> len(c) 100 >>> str(c) '100M' >>> list(c.items()) [(100, 'M')] >>> c = Cigar('20H20M20S') >>> len(c) 40 >>> str(c) '20H20M20S' >>> list(c.items()) [(20, 'H'), (20, 'M'), (20, 'S')] >>> c.mask_left(29).cigar, c.cigar ('20H9S11M20S', '20H20M20S') >>> c = Cigar('10M20S10M') >>> c.mask_left(10).cigar '30S10M' >>> c.mask_left(9).cigar '9S1M20S10M' >>> Cigar('10S').mask_left(10).cigar '10S' >>> Cigar('10H').mask_left(10).cigar '10H' >>> Cigar('10H').mask_left(11).cigar '10H' >>> Cigar('10H').mask_left(9).cigar '10H' >>> Cigar('1M10H').mask_left(9).cigar '1S10H' >>> Cigar('5M10H').mask_left(9).cigar '5S10H' >>> c = Cigar('1S1H1S5H1S5M10H') >>> c.mask_left(9).cigar == c.cigar True >>> c = Cigar('1S1H1S5H1S5M10H') >>> c.mask_right(9).cigar == c.cigar True >>> c.mask_right(11).cigar '1S1H1S5H1S4M1S10H' ``` Installation ============ pip install cigar Platform: UNKNOWN Classifier: Topic :: Scientific/Engineering :: Bio-Informatics Classifier: Programming Language :: Python :: 2 Classifier: Programming Language :: Python :: 3 cigar-0.1.3/README.md0000664000175000017500000000245312631622704014676 0ustar brentpbrentp00000000000000Cigar ===== cigar is a simple library for dealing with cigar strings. the most useful feature now is soft-masking from left or right. This allows one to adjust a SAM record only by changing the cigar string to soft-mask a number of bases such that the rest of the SAM record (pos, tlen, etc.) remain valid, but downstream tools will not consider the soft-masked bases in further analysis. ```Python >>> from cigar import Cigar >>> c = Cigar('100M') >>> len(c) 100 >>> str(c) '100M' >>> list(c.items()) [(100, 'M')] >>> c = Cigar('20H20M20S') >>> len(c) 40 >>> str(c) '20H20M20S' >>> list(c.items()) [(20, 'H'), (20, 'M'), (20, 'S')] >>> c.mask_left(29).cigar, c.cigar ('20H9S11M20S', '20H20M20S') >>> c = Cigar('10M20S10M') >>> c.mask_left(10).cigar '30S10M' >>> c.mask_left(9).cigar '9S1M20S10M' >>> Cigar('10S').mask_left(10).cigar '10S' >>> Cigar('10H').mask_left(10).cigar '10H' >>> Cigar('10H').mask_left(11).cigar '10H' >>> Cigar('10H').mask_left(9).cigar '10H' >>> Cigar('1M10H').mask_left(9).cigar '1S10H' >>> Cigar('5M10H').mask_left(9).cigar '5S10H' >>> c = Cigar('1S1H1S5H1S5M10H') >>> c.mask_left(9).cigar == c.cigar True >>> c = Cigar('1S1H1S5H1S5M10H') >>> c.mask_right(9).cigar == c.cigar True >>> c.mask_right(11).cigar '1S1H1S5H1S4M1S10H' ``` Installation ============ pip install cigar cigar-0.1.3/cigar.py0000664000175000017500000001101012631623065015044 0ustar brentpbrentp00000000000000""" cigar is a simple library for dealing with cigar strings. the most useful feature now is soft-masking from left or right. This allows one to adjust a SAM record only by changing the cigar string to soft-mask a number of bases such that the rest of the SAM record (pos, tlen, etc.) remain valid, but downstream tools will not consider the soft-masked bases in further analysis. >>> c = Cigar('100M') >>> len(c) 100 >>> str(c) '100M' >>> list(c.items()) [(100, 'M')] >>> c = Cigar('20H20M20S') >>> len(c) 40 >>> str(c) '20H20M20S' >>> list(c.items()) [(20, 'H'), (20, 'M'), (20, 'S')] >>> c.mask_left(29).cigar, c.cigar ('20H9S11M20S', '20H20M20S') >>> c = Cigar('10M20S10M') >>> c.mask_left(10).cigar '30S10M' >>> c.mask_left(9).cigar '9S1M20S10M' >>> Cigar('10S').mask_left(10).cigar '10S' >>> Cigar('10H').mask_left(10).cigar '10H' >>> Cigar('10H').mask_left(11).cigar '10H' >>> Cigar('10H').mask_left(9).cigar '10H' >>> Cigar('1M10H').mask_left(9).cigar '1S10H' >>> Cigar('5M10H').mask_left(9).cigar '5S10H' >>> c = Cigar('1S1H1S5H1S5M10H') >>> c.mask_left(9).cigar == c.cigar True >>> c = Cigar('1S1H1S5H1S5M10H') >>> c.mask_right(9).cigar == c.cigar True >>> c.mask_right(11).cigar '1S1H1S5H1S4M1S10H' """ from __future__ import print_function from itertools import groupby from operator import itemgetter __version__ = "0.1.3" class Cigar(object): read_consuming_ops = ("M", "I", "S", "=", "X") ref_consuming_ops = ("M", "D", "N", "=", "X") def __init__(self, cigar_string): self.cigar = cigar_string def items(self): if self.cigar == "*": yield (0, None) raise StopIteration cig_iter = groupby(self.cigar, lambda c: c.isdigit()) for g, n in cig_iter: yield int("".join(n)), "".join(next(cig_iter)[1]) def __str__(self): return self.cigar def __repr__(self): return "Cigar('%s')" % self def __len__(self): """ sum of MIS=X ops shall equal the sequence length. """ return sum(l for l, op,in self.items() \ if op in Cigar.read_consuming_ops) def reference_length(self): return sum(l for l, op in self.items() \ if op in Cigar.ref_consuming_ops) def mask_left(self, n_seq_bases, mask="S"): """ Return a new cigar with cigar string where the first `n_seq_bases` are soft-masked unless they are already hard-masked. """ cigs = list(self.items()) new_cigs = [] c, cum_len = self.cigar, 0 for i, (l, op) in enumerate(cigs): if op in Cigar.read_consuming_ops: cum_len += l if op == "H": cum_len += l new_cigs.append(cigs[i]) elif cum_len < n_seq_bases: new_cigs.append(cigs[i]) else: # the current cigar element is split by the masking. right_extra = cum_len - n_seq_bases new_cigs.append((l - right_extra, 'S')) if right_extra != 0: new_cigs.append((right_extra, cigs[i][1])) if cum_len >= n_seq_bases: break else: pass new_cigs[:i] = [(l, op if op in "HS" else "S") for l, op in new_cigs[:i]] new_cigs.extend(cigs[i + 1:]) return Cigar(Cigar.string_from_elements(new_cigs)).merge_like_ops() @classmethod def string_from_elements(self, elements): return "".join("%i%s" % (l, op) for l, op in elements if l !=0) def mask_right(self, n_seq_bases, mask="S"): """ Return a new cigar with cigar string where the last `n_seq_bases` are soft-masked unless they are already hard-masked. """ return Cigar(Cigar(self._reverse_cigar()).mask_left(n_seq_bases, mask)._reverse_cigar()) def _reverse_cigar(self): return Cigar.string_from_elements(list(self.items())[::-1]) def merge_like_ops(self): """ >>> Cigar("1S20M").merge_like_ops() Cigar('1S20M') >>> Cigar("1S1S20M").merge_like_ops() Cigar('2S20M') >>> Cigar("1S1S1S20M").merge_like_ops() Cigar('3S20M') >>> Cigar("1S1S1S20M1S1S").merge_like_ops() Cigar('3S20M2S') """ cigs = [] for op, grps in groupby(self.items(), itemgetter(1)): cigs.append((sum(g[0] for g in grps), op)) return Cigar(self.string_from_elements(cigs)) if __name__ == "__main__": import doctest doctest.testmod() cigar-0.1.3/ez_setup.py0000664000175000017500000002613612631622704015633 0ustar brentpbrentp00000000000000#!python """Bootstrap setuptools installation To use setuptools in your package's setup.py, include this file in the same directory and add this to the top of your setup.py:: from ez_setup import use_setuptools use_setuptools() To require a specific version of setuptools, set a download mirror, or use an alternate download directory, simply supply the appropriate options to ``use_setuptools()``. This file can also be run as a script to install or upgrade setuptools. """ import os import shutil import sys import tempfile import tarfile import optparse import subprocess import platform import textwrap from distutils import log try: from site import USER_SITE except ImportError: USER_SITE = None DEFAULT_VERSION = "2.0.1" DEFAULT_URL = "https://pypi.python.org/packages/source/s/setuptools/" def _python_cmd(*args): args = (sys.executable,) + args return subprocess.call(args) == 0 def _install(tarball, install_args=()): # extracting the tarball tmpdir = tempfile.mkdtemp() log.warn('Extracting in %s', tmpdir) old_wd = os.getcwd() try: os.chdir(tmpdir) tar = tarfile.open(tarball) _extractall(tar) tar.close() # going in the directory subdir = os.path.join(tmpdir, os.listdir(tmpdir)[0]) os.chdir(subdir) log.warn('Now working in %s', subdir) # installing log.warn('Installing Setuptools') if not _python_cmd('setup.py', 'install', *install_args): log.warn('Something went wrong during the installation.') log.warn('See the error message above.') # exitcode will be 2 return 2 finally: os.chdir(old_wd) shutil.rmtree(tmpdir) def _build_egg(egg, tarball, to_dir): # extracting the tarball tmpdir = tempfile.mkdtemp() log.warn('Extracting in %s', tmpdir) old_wd = os.getcwd() try: os.chdir(tmpdir) tar = tarfile.open(tarball) _extractall(tar) tar.close() # going in the directory subdir = os.path.join(tmpdir, os.listdir(tmpdir)[0]) os.chdir(subdir) log.warn('Now working in %s', subdir) # building an egg log.warn('Building a Setuptools egg in %s', to_dir) _python_cmd('setup.py', '-q', 'bdist_egg', '--dist-dir', to_dir) finally: os.chdir(old_wd) shutil.rmtree(tmpdir) # returning the result log.warn(egg) if not os.path.exists(egg): raise IOError('Could not build the egg.') def _do_download(version, download_base, to_dir, download_delay): egg = os.path.join(to_dir, 'setuptools-%s-py%d.%d.egg' % (version, sys.version_info[0], sys.version_info[1])) if not os.path.exists(egg): tarball = download_setuptools(version, download_base, to_dir, download_delay) _build_egg(egg, tarball, to_dir) sys.path.insert(0, egg) # Remove previously-imported pkg_resources if present (see # https://bitbucket.org/pypa/setuptools/pull-request/7/ for details). if 'pkg_resources' in sys.modules: del sys.modules['pkg_resources'] import setuptools setuptools.bootstrap_install_from = egg def use_setuptools(version=DEFAULT_VERSION, download_base=DEFAULT_URL, to_dir=os.curdir, download_delay=15): to_dir = os.path.abspath(to_dir) rep_modules = 'pkg_resources', 'setuptools' imported = set(sys.modules).intersection(rep_modules) try: import pkg_resources except ImportError: return _do_download(version, download_base, to_dir, download_delay) try: pkg_resources.require("setuptools>=" + version) return except pkg_resources.DistributionNotFound: return _do_download(version, download_base, to_dir, download_delay) except pkg_resources.VersionConflict as VC_err: if imported: msg = textwrap.dedent(""" The required version of setuptools (>={version}) is not available, and can't be installed while this script is running. Please install a more recent version first, using 'easy_install -U setuptools'. (Currently using {VC_err.args[0]!r}) """).format(VC_err=VC_err, version=version) sys.stderr.write(msg) sys.exit(2) # otherwise, reload ok del pkg_resources, sys.modules['pkg_resources'] return _do_download(version, download_base, to_dir, download_delay) def _clean_check(cmd, target): """ Run the command to download target. If the command fails, clean up before re-raising the error. """ try: subprocess.check_call(cmd) except subprocess.CalledProcessError: if os.access(target, os.F_OK): os.unlink(target) raise def download_file_powershell(url, target): """ Download the file at url to target using Powershell (which will validate trust). Raise an exception if the command cannot complete. """ target = os.path.abspath(target) cmd = [ 'powershell', '-Command', "(new-object System.Net.WebClient).DownloadFile(%(url)r, %(target)r)" % vars(), ] _clean_check(cmd, target) def has_powershell(): if platform.system() != 'Windows': return False cmd = ['powershell', '-Command', 'echo test'] devnull = open(os.path.devnull, 'wb') try: try: subprocess.check_call(cmd, stdout=devnull, stderr=devnull) except: return False finally: devnull.close() return True download_file_powershell.viable = has_powershell def download_file_curl(url, target): cmd = ['curl', url, '--silent', '--output', target] _clean_check(cmd, target) def has_curl(): cmd = ['curl', '--version'] devnull = open(os.path.devnull, 'wb') try: try: subprocess.check_call(cmd, stdout=devnull, stderr=devnull) except: return False finally: devnull.close() return True download_file_curl.viable = has_curl def download_file_wget(url, target): cmd = ['wget', url, '--quiet', '--output-document', target] _clean_check(cmd, target) def has_wget(): cmd = ['wget', '--version'] devnull = open(os.path.devnull, 'wb') try: try: subprocess.check_call(cmd, stdout=devnull, stderr=devnull) except: return False finally: devnull.close() return True download_file_wget.viable = has_wget def download_file_insecure(url, target): """ Use Python to download the file, even though it cannot authenticate the connection. """ try: from urllib.request import urlopen except ImportError: from urllib2 import urlopen src = dst = None try: src = urlopen(url) # Read/write all in one block, so we don't create a corrupt file # if the download is interrupted. data = src.read() dst = open(target, "wb") dst.write(data) finally: if src: src.close() if dst: dst.close() download_file_insecure.viable = lambda: True def get_best_downloader(): downloaders = [ download_file_powershell, download_file_curl, download_file_wget, download_file_insecure, ] for dl in downloaders: if dl.viable(): return dl def download_setuptools(version=DEFAULT_VERSION, download_base=DEFAULT_URL, to_dir=os.curdir, delay=15, downloader_factory=get_best_downloader): """Download setuptools from a specified location and return its filename `version` should be a valid setuptools version number that is available as an egg for download under the `download_base` URL (which should end with a '/'). `to_dir` is the directory where the egg will be downloaded. `delay` is the number of seconds to pause before an actual download attempt. ``downloader_factory`` should be a function taking no arguments and returning a function for downloading a URL to a target. """ # making sure we use the absolute path to_dir = os.path.abspath(to_dir) tgz_name = "setuptools-%s.tar.gz" % version url = download_base + tgz_name saveto = os.path.join(to_dir, tgz_name) if not os.path.exists(saveto): # Avoid repeated downloads log.warn("Downloading %s", url) downloader = downloader_factory() downloader(url, saveto) return os.path.realpath(saveto) def _extractall(self, path=".", members=None): """Extract all members from the archive to the current working directory and set owner, modification time and permissions on directories afterwards. `path' specifies a different directory to extract to. `members' is optional and must be a subset of the list returned by getmembers(). """ import copy import operator from tarfile import ExtractError directories = [] if members is None: members = self for tarinfo in members: if tarinfo.isdir(): # Extract directories with a safe mode. directories.append(tarinfo) tarinfo = copy.copy(tarinfo) tarinfo.mode = 448 # decimal for oct 0700 self.extract(tarinfo, path) # Reverse sort directories. directories.sort(key=operator.attrgetter('name'), reverse=True) # Set correct owner, mtime and filemode on directories. for tarinfo in directories: dirpath = os.path.join(path, tarinfo.name) try: self.chown(tarinfo, dirpath) self.utime(tarinfo, dirpath) self.chmod(tarinfo, dirpath) except ExtractError as e: if self.errorlevel > 1: raise else: self._dbg(1, "tarfile: %s" % e) def _build_install_args(options): """ Build the arguments to 'python setup.py install' on the setuptools package """ return ['--user'] if options.user_install else [] def _parse_args(): """ Parse the command line for options """ parser = optparse.OptionParser() parser.add_option( '--user', dest='user_install', action='store_true', default=False, help='install in user site package (requires Python 2.6 or later)') parser.add_option( '--download-base', dest='download_base', metavar="URL", default=DEFAULT_URL, help='alternative URL from where to download the setuptools package') parser.add_option( '--insecure', dest='downloader_factory', action='store_const', const=lambda: download_file_insecure, default=get_best_downloader, help='Use internal, non-validating downloader' ) options, args = parser.parse_args() # positional arguments are ignored return options def main(version=DEFAULT_VERSION): """Install or upgrade setuptools and EasyInstall""" options = _parse_args() tarball = download_setuptools(download_base=options.download_base, downloader_factory=options.downloader_factory) return _install(tarball, _build_install_args(options)) if __name__ == '__main__': sys.exit(main()) cigar-0.1.3/setup.py0000664000175000017500000000207312631623046015127 0ustar brentpbrentp00000000000000import ez_setup import sys ez_setup.use_setuptools() from setuptools import setup # from mpld3 def get_version(path): """Get the version info from package without importing it""" import ast with open(path) as init_file: module = ast.parse(init_file.read()) version = (ast.literal_eval(node.value) for node in ast.walk(module) if isinstance(node, ast.Assign) and node.targets[0].id == "__version__") try: return next(version) except StopIteration: raise ValueError("version could not be located") setup(name='cigar', version=get_version("cigar.py"), description="manipulate SAM cigar strings", py_modules=['cigar'], author="Brent Pedersen", author_email="bpederse@gmail.com", url="https://github.com/brentp/cigar", license="MIT", long_description=open('README.md').read(), classifiers=[ 'Topic :: Scientific/Engineering :: Bio-Informatics', 'Programming Language :: Python :: 2', 'Programming Language :: Python :: 3' ], ) cigar-0.1.3/cigar.egg-info/0000775000175000017500000000000012631623071016170 5ustar brentpbrentp00000000000000cigar-0.1.3/cigar.egg-info/PKG-INFO0000664000175000017500000000433112631623071017266 0ustar brentpbrentp00000000000000Metadata-Version: 1.1 Name: cigar Version: 0.1.3 Summary: manipulate SAM cigar strings Home-page: https://github.com/brentp/cigar Author: Brent Pedersen Author-email: bpederse@gmail.com License: MIT Description: Cigar ===== cigar is a simple library for dealing with cigar strings. the most useful feature now is soft-masking from left or right. This allows one to adjust a SAM record only by changing the cigar string to soft-mask a number of bases such that the rest of the SAM record (pos, tlen, etc.) remain valid, but downstream tools will not consider the soft-masked bases in further analysis. ```Python >>> from cigar import Cigar >>> c = Cigar('100M') >>> len(c) 100 >>> str(c) '100M' >>> list(c.items()) [(100, 'M')] >>> c = Cigar('20H20M20S') >>> len(c) 40 >>> str(c) '20H20M20S' >>> list(c.items()) [(20, 'H'), (20, 'M'), (20, 'S')] >>> c.mask_left(29).cigar, c.cigar ('20H9S11M20S', '20H20M20S') >>> c = Cigar('10M20S10M') >>> c.mask_left(10).cigar '30S10M' >>> c.mask_left(9).cigar '9S1M20S10M' >>> Cigar('10S').mask_left(10).cigar '10S' >>> Cigar('10H').mask_left(10).cigar '10H' >>> Cigar('10H').mask_left(11).cigar '10H' >>> Cigar('10H').mask_left(9).cigar '10H' >>> Cigar('1M10H').mask_left(9).cigar '1S10H' >>> Cigar('5M10H').mask_left(9).cigar '5S10H' >>> c = Cigar('1S1H1S5H1S5M10H') >>> c.mask_left(9).cigar == c.cigar True >>> c = Cigar('1S1H1S5H1S5M10H') >>> c.mask_right(9).cigar == c.cigar True >>> c.mask_right(11).cigar '1S1H1S5H1S4M1S10H' ``` Installation ============ pip install cigar Platform: UNKNOWN Classifier: Topic :: Scientific/Engineering :: Bio-Informatics Classifier: Programming Language :: Python :: 2 Classifier: Programming Language :: Python :: 3 cigar-0.1.3/cigar.egg-info/SOURCES.txt0000664000175000017500000000025712631623071020060 0ustar brentpbrentp00000000000000LICENSE MANIFEST.in README.md cigar.py ez_setup.py setup.py cigar.egg-info/PKG-INFO cigar.egg-info/SOURCES.txt cigar.egg-info/dependency_links.txt cigar.egg-info/top_level.txtcigar-0.1.3/cigar.egg-info/dependency_links.txt0000664000175000017500000000000112631623071022236 0ustar brentpbrentp00000000000000 cigar-0.1.3/cigar.egg-info/top_level.txt0000664000175000017500000000000612631623071020716 0ustar brentpbrentp00000000000000cigar